I’ve already mentioned Bisectable in one of the advent posts two years ago, but since then a lot has changed, so I think it’s time to give a brief history of the bisectable
bot and its friends.
First of all, let’s define the problem that is being solved. Sometimes it happens that a commit introduces an unintended change in behavior (a bug). Usually we call that a regression, and in some cases the easiest way to figure out what went wrong and fix it is to first find which commit introduced the regression.
There are exactly 9000 commits between Rakudo 2015.12 and 2018.12, and even though it’s not over 9000, that’s still a lot.
Luckily, we don’t need to test all of the revisions. Assuming that the behavior wasn’t changing back and forth all the time, we can use binary search.
git bisect
and binary search
Basically, given any commit range, we take a commit in the “middle” of the range and test it. If it’s “bad” or if it shows the “new” (now incorrect) behavior, then we can throw away the second half of our range (because we know that the change must have happened before that commit or exactly on that commit). Similarly we throw away the other half if it is “good” (or “old”). So instead of testing all 9000 commits we can just check about log n
revisions (≈13).
Git comes with git bisect
command which implements the binary search logic for you. All you have to do is give it some starting points and then for every commit it jumps to, tell if it is good/bad. If you do that enough times, it’ll tell you which commit is at fault.
That’s all good, but there are two problems with it.
Problem ❶: Skipping
Let’s imagine a situation where 2 + 2
used to return 4
(correct!), but now returns 42
(… also right, but not quite).
So you kick off the bisection process, git jumps between revisions, you test them. If it’s 4
then it’s good
(or old
), if it’s 42
then it is bad
(or new
). But then you stumble upon this behavior:
> 2 + 2
Merry Christmas!
… Now what? Clearly that specific revision is somewhat special. We can’t tell if our bug is present or not, we simply can’t know. Yes, it doesn’t print 4
, but we are looking for a very specific bug, so it doesn’t classify as “new” behavior either. Of course, we can toss a coin and mark it randomly as old
or new
, and hope for a Christmas miracle… but that has a 50% probability (if we see only one of these) to divert the binary search into the wrong direction.
For these cases git provides a special skip
command.
If you are testing manually, then it is somewhat straightforward to handle these revisions (as long as you remember that you should skip
them). However, because of problem ❷, a lot of people are tempted to use git bisect run
which automates the process with a script. It is possible to skip revisions using a script too (use exit code 125
), but it is not that obvious how to figure out which revisions should be skipped.
Problem ❷: Build times
Let’s take the optimistic figure of 13 to estimate the amount of revisions that we are going to test. Remember that it doesn’t include commits that we will have to skip, and possibly other extra builds that we might want to test.
The amount of time it takes to build rakudo varies depending on the hardware, but let’s optimistically say that it takes us 2 minutes to build rakudo on a particular commit and test it.
13 × 2 = 26 (minutes)
That’s not very convenient, right? And if something goes wrong during the process… you start over, and then you wait.
Bisectable
In 2016, after seeing the pain of those who have to run git bisect
manually (actually, mostly myself), I wondered:
<AlexDaniel> has anybody thought about building rakudo for every single commit, so that you can quickly run git bisect?
The cost-benefit analysis of the idea was promptly questioned:
<perlpilot> AlexDaniel: do you believe that bisects will be common in the future?
To which I provided a very detailed justification:
<AlexDaniel> perlpilot: yes
Three days later, the bot joined the channel. The reactions were quite interesting to see:
<moritz> woah
<tadzik> wow
<RabidGravy> OoOOOoooh
<llfourn> Cooooool
Little did we know back then. Even I had no idea how useful it will turn out. Fast forward 2 years:
<lizmat> with regards to size of commits: I try to keep them as small and contained as possible, to allow for easier bisecting
<lizmat> in that sense, bisectable6 has changed the way I code
<lizmat> also: bisectable6 has made me worry less about changes I commit
<lizmat> because it usually limits the places to look for fixing an issue so much, that they can be fixed within minutes rather than hours
<lizmat> or at least show the cause very quickly (so the short-time fix may mean a revert)
<AlexDaniel> \o/
But it wasn’t always perfect. About one hour after the introduction of the bot, it was used for its purpose:
<moritz> bisect:
try { NaN.Rat == NaN; exit 0 }; exit 1
<bisectable> moritz: (2016-05-02) https://github.com/rakudo/rakudo/commit/949a7c7
However, because of an off-by-one, it returned the wrong commit. The actual commit was e2f1fa7, and 949a7c7 is its parent.
Honestly, the bot was very bad back then. For example, it fully relied on the exit code, so you couldn’t just throw 2 + 2
into it and expect it to check the output. Eventually, different modes were implemented, and nowadays the bot first checks the behavior on the starting points (e.g. 2015.12 and HEAD), and determines the best strategy to perform the bisection. For example, if the signal is different (e.g. a SEGV), then it bisects based on the signal. If the signal is same, but the exit code is different, then it uses the exit code. If all else can’t be used, it bisects using the output.
Keep in mind that bisectable checks for you if perl6
binary can’t be built. This means that in most cases you don’t need to add your own logic for skipping. Not only it brought the bisection time from tens of minutes to a few seconds, it also gives results that are more reliable/correct.
Storage
Some time later the commit range was expanded to 2014.01
… HEAD
, meaning all commits starting from the first ever Rakudo on Moar release. Currently it has over 17000 builds. It may sound like a lot, but with every rakudo installation taking just ≈28 MB, that’s not too much. Having a few TB of storage should get you going for a few years to come.
That being said, I don’t have that luxury on my server. It has a RAID of 120 GB SSDs, so the whole thing not only has to fit into that little amount of space, but it should also leave enough space for the rest of the system.
There was a lot of experimentation (one, two) involved in figuring out the best strategy to save space, but long story short, we can go as low as about half a megabyte per build! Of course, it is always a tradeoff between the compression ratio and decompression speed, but using modern compression algorithms (zstd, lrzip) everything is relatively easy.
More bots, more better
Shortly after Bisectable was released, people saw an opportunity for other tools. Want to run some code on a specific commit? Sure, here’s a bot for that. Want to download a prebuilt rakudo archive instead of wasting your own cpu time? Yes, there’s another bot. Want to graph some info about rakudo? Of course there’s a bot for that!
And it continued until we reached the total of 17 bots! Some argue that these bots should stop multiplying like that, and perhaps people are right. But I guess the point is that now it is extremely easy to build upon Whateverable to create more tools for developers, which is of course great.
OK, now what?
So bisectable can bisect across thousands of commits in no time. It consumes very little storage space, and it doesn’t require full understanding of the bisection process from the user. Now that the bisection is free and easy, can we do more?
Yes, Blin!
You may have heard about Toaster. Toaster is a tool that attempts to install every module in the ecosystem on two or more revisions. For example, let’s say that the last release was 2018.12 and the release manager is about to cut a rakudo release from master HEAD. You can then run toaster on 2018.12
and master
, and it will show which modules used to install cleanly but no longer do.
That gives us the information that something is likely wrong in Rakudo, but doesn’t tell what exactly. Given that this post was mostly about Bisectable, you can probably guess where this is going.
Project Blin – Toasting Reinvented
Blin is a quality assurance tool for Rakudo releases. It is used to find regressions in rakudo, but unlike Toaster, not only it tells which modules are no longer installable, it also bisects rakudo to find out which commit caused the issue. Of course, it is built around Whateverable, so that extra functionality doesn’t cost much (and doesn’t even require a lot of code). As a bonus, it generates nice graphs to visualize how the issue propagates from module dependencies (though that is not very common).
One important feature of Blin is that it tries to install every module just once. So if module B depends on module A, A will be tested and installed once, and then reused for the testing of B. Because this process is parallelized, you may wonder how it was implemented. Basically, it uses the underrated react/whenever
feature:
# slightly simplified
react {
for @modules -> $module {
whenever Promise.allof($module.depends.keys».done) {
start { process-module $module, … }
}
}
}
For every module (we have more than 1200 now) it creates its own whenever
block which fires when its dependencies are satisfied. In my opinion, that’s the whole implementation of the main logic in Blin, everything else is just glue to get Whateverable and Zef working together to achieve what we need, + some output generation.
In some way, Blin didn’t change much in the way we do quality assurance for Rakudo. Toaster was already able to give us some basic info (albeit slower) so that we could start the investigation, and in the past I was known for shoving weird things (e.g. full modules with dependencies) into bisectable. It’s just that now it is much easier, and when The Day comes, I won’t be punished for robot abuse.
Future
Whateverable and Blin together have 243 open issues. Both projects work great and are very useful, but as we say, they are Less Than Awesome. Most issues are relatively easy and fun to work with, but they require time. If there’s anything you can help with or if you want to maintain these projects, please feel free to do so. And if you want to build your own tools based on Whateverable (which we probably need a lot!), see this hello world gist.
🎅🎄, 🥞
Great write up! I’d definitely would like to help, and I do have some issues assigned, so I know where to start.