Day 16 – 🎶 Deck The Halls With Perf Improvements 🎶

In the UK our lack of Thanksgiving leaves us with Christmas as a period of giving thanks and reflection up to the New Year. To that end I wanted to put together several bits and pieces I’ve been sat on for a while, around the state of Perl 6 performance, that highlight just how much effort is going into this. I’m not sure the wider programming community appreciates the pace and volume of effort that’s happening.

I’m not a core dev, but I have been a humble user of Perl 6 since just before the 2010 introduction of Rakudo*. Frequently the effort that’s already gone into Rakudo is overshadowed by the perceived effort yet to come. This is especially true of people taking a fresh look at Rakudo Perl 6, who might imagine a fly-by look is what next Christmas will be like. But Perl 6 has historically proven things always improve by next Christmas, for any Christmas you choose.

All the way back in Christmas 2014 I wrote an advent post about why I thought Perl 6 was great for doing Bioinformatics work. What was left out of that post, was why the implementation of Perl 6 on Rakudo was not at all ready for doing any serious Bioinformatics. The performance was really not there at all! My first attempts in Perl 6 (when the Parrot VM was in full force) left me with simple operations taking tens of minutes to execute, that I’d expect to be millisecond level perf. This is unfortunately anecdotal because I didn’t keep good track of timings then. But it was certainly not a great starting place.

However, fast forwarding to 2014 and MoarVM I felt comfortable writing the advent post, because I was cognisant of how much things had improved in my 4 years of being a user. But also that all development was on finishing the language definition and correct implementation back then. I am however a user that has been waiting for perf to get there. The time I think has mostly arrived. For this I have to give thanks to the tremendous daily hard work put in by all the core devs. It’s been incredible and motivating to watch it unfold. For me this Christmas is the goal Christmas, it’s arrived. 👏🏻🎊

I have been running and timing the tests for my BioInfo module that does some basic manipulations of biological sequence data for many years now. It does this in a really terrible way. Lots of mistakes in allocation and dropping of hashes in tight loops etc. But I’ve left this code alone -by now- for the better part of half a decade. Quietly benchmarking in private, and occasionally applauding efforts on the IRC channel when a quantum leap in perf was visible. Sub 10s was a big one! It happened suddenly from 30/40s. That jump came after I hinted on IRC a place my code was especially slow from profiling!

This is a bit of a long term view, if I zoom in on just this last year you can see that performance is still improving by integer factors if not large quantities of time.

Keep in mind that all of these profiles are not from released versions of the Rakudo compiler but HEAD from that day. So occasionally there is the odd performance regression as you can see above that usually isn’t left for a release.

So what’s going on? How are things getting better? There are several reasons. Many of the algorithmic choices and core built in functions in Perl 6 have been progressively and aggressively optimised at a source level (more later). But the MoarVM virtual machine backing Rakudo has also increased in its ability to optimise and JIT down to native code and inline specialised versions of code. This is in part thanks to the –profile option available with Rakudo Perl 6 since 2014 that provides all of this info.In the above plot of how MoarVM has treated the code frames of my compiled Perl 6 tests it should hopefully be clear that since this summer there are considerably more frames being JIT compiled, less interpreted, and almost all of the specialised frames (orange) end up as native JIT (green). If you want to know more about the recent work on the “spesh” MoarVM code specializer you can read about it on Jonathan Worthington’s 4-part posting on his blog. Baart Weigmans also has a blog outlining his work on the JIT compiler, and a nice talk presented recently about lots of new functionality that’s yet to land that should hopefully let many new developers pile on and help improve the JIT. So if that feels interesting as a challenge to you I recommend checking out the above links.

So that’s my benchmark and my goals, most of which revolve around data structure creation and parsing. However, what about other stuff like numeric work? Has that kept up too? Without anyone pushing, like I pushed my view of where things could be improved. The answer is yes!

Once upon a time, back in 2013 a gentleman by the name of Tim King took an interest in finding prime numbers in Perl 6. Tim was fairly upset with the performance he discovered. Rightly so. He started out with the following pretty code:

Find any prime via a junction on the definition of a prime, really a nice elegant solution! But Tim was aghast that Junctions were slow, with the above code taking him 11s to see the first 1000 primes. Today that super high level code takes 0.96s.

Being unhappy with how slow the junction based code was Tim went on to do the more standard iterative approaches. Tim vanished from online shortly after these posts. But he left a legacy that I continued. His code for the prime benchmarks and my adaptation with results through time can be found in this gist. Below is the punchline with another graph showing the average time taken for finding the first 1000 primes over 100 trials each. Vertical lines in 2015 is a higher standard deviation.

Again with a zoomed in view of more recently (with the latest data point worrying me a little that I screwed up somehow…)

The above convergence to a point, is the overhead of starting and stopping the Rakudo runtime and MoarVM. Finding primes isn’t the effort it once was, with it being marginally slower than just Rakudo starting. At least an order of magnitude faster regardless of how high level and elegant the code solution you choose.

Ok so we’ve seen MoarVM got some shiny new moving parts. But huge effort has been put in by developers like Liz, jnthn, Zoffix and more recently in the world of strings Samcv to improve what MoarVM and Rakudo are actually doing under the hood algorithmically.

Sidenote: I’m sure I am not doing most other devs justice at all, especially by ignoring JVM efforts in this post. I would recommend everyone goes and checks out the commit log to see just how many people are now involved in making Rakudo faster, better, stronger. Im sure they would like to see your thanks at the bottom of this article too!

So saving you a job of checking out the commit log I’ve done some mining of my own looking at commits since last Christmas related to perf gains. Things with N% or Nx faster. Like the following:

3c6277c77 Have .codes use nqp::codes op. 350% faster for short strings

ee4593601 Make Baggy (^) Baggy about 150x faster

Those two commits on their own would be an impressive boost to a programming project in the timescale of a years core development. But they are just two of hundreds of commits this year alone.

Below are some histograms of the numbers of commits and the % and x multiplier increase in performance they mentioned. You can grep the logs yourself with the code above. There are some even more exciting gains during 2016 worth checking out.

These really are the perf improvement commits for 2017 alone, with more landing almost daily. This doesn’t even include many of the I/O perf gains from Zoffix’ grant, as they were not always benchmarked before/after. 2016 is equally dense with some crazy >1000x improvements. Around ten commits alone this year with 40x improvement! This is really impressive to see. At least to me. I think its also not obvious to many on the project how much they’re accomplishing. Remember these are singular commits. Some are even compounded improvement over the year!

I will leave it here. But really thank you core devs, all of you. It’s been a great experience watching and waiting. But now it’s time for me to get on with some Perl 6 code in 2018! It’s finally Christmas.