Day 11 — Perl 6 Core Hacking: It Slipped Through The QASTs

One of the great things about Perl 6 is how accessible to regular users the compiler is to hack on. Easy bugs require nothing but knowledge of Perl 6 itself, since a lot of the Rakudo compiler is written in Perl 6. Slightly tougher bugs require knowledge of NQP, and tougher still are bugs involving Grammar and Actions. Things progress further in difficulty from there, going as far as assembly hacking on VM level, but today, we’ll stick around in Rakudo land. We have a task at hand!

Santa is having some difficulties generating his Naughty-or-Nice list due to a bug in Perl 6. He traced it down to the use of the S/// substitution operator with :g modifier that for some reason returns an empty list instead of the original string if no matches were made:

say S:g/naughty// with 'only nice list';
# OUTPUT: ()

Time to dig in and fix this, if anyone is to get any presents this year!

The Bots To The Rescue

The first thing to do when fixing a bug is to find out when it first appeared. AlexDaniel++ and MasterDuke++ implemented several IRC bots that make this task extremely simple. They are available in #perl6 and #perl6-dev IRC channels, and you can play with them in #zofbot IRC channel, without annoying anyone. We’ll be using bisectable6 bot to find out when the S/// operator got broken:

<Zoffix> bisectable6, help
<bisectable6> Zoffix, Like this: bisectable6: old=2015.12 new=HEAD exit
    1 if (^∞).grep({ last })[5] // 0 == 4 # RT128181

Just give the bot a piece of code, optionally specifying the starting and ending commits, and it’ll bisect by either the exit code, or failing at that, by output.

<Zoffix> bisectable6, S:g/d// given 'abc'
<bisectable6> Zoffix, Bisecting by output (old=2016.10 new=524368c)
    because on both starting points the exit code is 0
<bisectable6> Zoffix, bisect log:
    https://gist.github .com/c2cf9c3a7b6d13a43c34f64b96090e31
<bisectable6> Zoffix, (2016-10-23)
    https://github  .com/rakudo/rakudo/commit/b7201a8f22338a906f2d8027a21387e8f5c77f41

The last link is the interesting bit, it tells us the S:g/// was working fine until that commit. The commit does seem related—it’s the refactor lizmat++ did to make .match 150%–1400% faster—but it’s quite big and it’s not obvious how it’s linked to the workings of the S/// operator. Let’s find out, shall we?

How Do You Spell That?

We can specify the --target command line argument to perl6 executable to ask it for the output of a particular stage of the program (run perl6 --statestats -e '' to see names of all stages). Let’s output the parse stage, to find out which tokens in the Grammar we should look into:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 --target=parse -e 'S:g/d//'
- statementlist: S:g/d//
  - statement: 1 matches
    - EXPR: S:g/d//
      - value: S:g/d//
        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g
          - sibble: /d//
            - right:
            - babble:
              - B:
            - left: d
              - termseq: d
                - termaltseq: d
                  - termconjseq: 1 matches
                    - termalt: 1 matches
                      - termconj: 1 matches
                        - termish: 1 matches
                          - noun: 1 matches
                            - atom: d

There are some general tokens in the output—such as statementlist, statement, EXPR, and value—we can just gloss over. They are about statements and we want stuff for the operator itself, so the interesting bit start with this:

        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g

Let’s pop open the Grammar in our text editor and locate a token called quote. It can also be a rule, regex or method, but tokens are most common. The first thing we can locate is this:

proto token quote { <...> }
token quote:sym<apos>  {
    :dba('single quotes') "'" ~ "'"    
    <nibble(self.quote_lang(%*LANG<Quote>, "'", "'", ['q']))>
}
token quote:sym<sapos> {
    :dba('curly single quotes') "‘" ~ "’"
    <nibble(self.quote_lang(%*LANG<Quote>, "", "", ['q']))>
}

The Grammar that parses Perl 6 isn’t much different from the grammar you’d use as a user of Perl 6, so most of it probably looks familiar to you. The quote token is a proto regex, so looking further at the output --target=parse gave us, we see we need :sym<S> variant of it.

Scrolling a bit through the quote‘s candidates, we finally come across :sym<s> that sets the <sym> capture to either s or S:

token quote:sym<s> {
    <sym=[Ss]> (s)**0..1
    :my %*RX;
    :my $*INTERPOLATE := 1;
    :my $*SUBST_LHS_BLOCK;
    :my $*SUBST_RHS_BLOCK;
    {
        %*RX<s> := 1 if $/[0]
    }
    <.qok($/)>
    <rx_adverbs>
    <sibble(%*RX<P5> ?? %*LANG<P5Regex> !! %*LANG<Regex>, %*LANG<Quote>, ['qq'])>
    [ <?{ $<sibble><infixish> }> || <.old_rx_mods>? ]

}

So this token handles both s/// and S/// operators and its body is unimpressive: it seems to be no more than than some set up work. With the name of the token in hand, we now know what to look for in the Actions: method quote:sym<s>.

While finding it in actions is easy… it’s quite a biggie, with 177 lines of code to its name. However, someone nice left us a comment that fits into our puzzle:

method quote:sym<s>($/) {
    # We are emulating Str.subst/subst-mutate here, by calling match,
    # assigning the result to a temporary variable etc.
    ...

Recall bisectable6‘s results? The commit it pointed out was work on the .match method and according to the comment for S/// operator, it uses .match to do its stuff. Let’s execute that method on builds before and after the commit bisectable6 found for us. There’s another handy bot to do that for us: commitable6.

Give it a commit SHA or one of the release tags along with code to run and it’ll give you output for that code on that commit:

<Zoffix> committable6, 2016.11 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.11»: ()

<Zoffix> committable6, 2016.10 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

We ran the code on 2016.11 and 2016.10 releases and the output indicates there’s no difference… or is there? The problem with using say as a debugging tool is it often omits things we may find useful. A better alternative is the dd routine that is a Rakudo-specific utility dumper sub that’s not part of standard Perl 6 language. Give it some args and it’ll dump them out. Let’s give it a spin:

<Zoffix> committable6, 2016.11 dd 'abc'.match: :g, /d/
<comittable6> Zoffix, ¦«2016.11»: slip()

<Zoffix> committable6, 2016.10 dd 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

Aha! Another puzzle piece! When :g adverb is in use, on failed matches .match used to return an empty list, but after lizmat++’s .match improvements, it started to return Empty, which is an empty Slip. Slips tend to flatten themselves out into the outer container, so perhaps that’s causing an issue in the S///‘s action method? Let’s take a closer look at it.

Slippety Slip

A bird’s-eye view of method quote:sym<s> action shows it does some setup work and then codegens a QAST (“Q” Abstract Syntax Tree). It’d be helpful to take a look at what it generates.

One method of doing so is using the same --target feature we’ve used to get the parse stage, except we’d use the ast stage. So the command would be this:

perl6 --target=ast -e 'S:g/d//'

If you actually run that, you’ll get a text wall of QAST, and it may be tough to spot which are the bits actually generated by the S/// operator. Luckily, there’s a better way! The QAST node objects have .dump method that dumps them the same style as what you see in --target=ast output. So checkout the compiler’s repo if you haven’t already done so, pop open src/Perl6/Actions.nqp file, go to the end of method quote:sym<s> and stick note($past.dump) in there to print the dump of the QAST generated for the S/// operator:

    ...
    );
    $past.annotate('is_S', $<sym> eq 'S');
    note($past.dump); # <----------------------- like that
    make WANTED($past, 's///');  # never carp about s/// in sink context
}

(Why is it called $past and not $qast? Historical reasons: QAST used to be PAST, for Parrot Abstract Syntax Tree).

Now, compile Rakudo:

perl Configure.pl --gen-moar --gen-nqp --backends=moar
make
make test
make install

And execute our buggy S/// match to make the line we added print out S///‘s QAST:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'S:g/d// given "abc"'
- QAST::Op(locallifetime)  :is_S<?> S:g/d//
  - QAST::Stmt
    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)
    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)
    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)
      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>
    - QAST::Stmt
    - QAST::Var(lexical $/)

There are docs for types of QAST you see here, or we can just wing it.

We callmethod match and bind the result to subst_result_1:

    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)

We call nqp::p6store (p6* ops are documented in Rakudo’s repo), giving it the result of infix:<,>($/) as container and the return of .match call as value:

    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)

We check if anything matched (for :g matches, we check for a Positional that has any .elems in it):

    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)

If we did have matches, call Str!APPLY-MATCHES:

      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)

If we didn’t have matches, call nqp::p6store, storing the $_ (this is our original string S/// works on) in the $/:

      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>

Since we know the commit bisectable6 found makes .match return an empty slip for failed matches, it’s that last bit of QAST that should look suspicious, since slips flatten themselves out. We’ll return to why we’re storing into an &infix:<,>($/) rather than into $/ directly, but first, let’s write the NQP equivalent of such a setup.

We have two variables: $/ with Empty and $_ with our original string. The QAST::Op node maps out to an nqp op with the same name, so our suspicious bit looks something like this:

use nqp;

$_ = 'abc';
$/ = Empty;
nqp::p6store( &infix:<,>($/), $_);

Yet another helpful bot, camelia, lets us run a piece of code straight from IRC. Just use trigger m: with some code. Let’s try it out:

<Zoffix> m: use nqp; $_ = 'abc'; $/ = Empty;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Slip $/ = slip$()␤»

<Zoffix> m: use nqp; $_ = 'abc'; $/ = List.new;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Str $/ = "abc"␤»

The results show that when $/ is an Empty, it ends up still being it after the p6store, while if $/ is an empty List, it happily takes a string. We finally connected the S/// operator with the commit that introduced the bug and found why it occurs (although, slips behaving like that may be a bug of its own). Let’s trace where that Empty in Str.match comes from and why it’s there.

What Sourcery Is This?

There’s another bot (it’s the future! people have lots of bots!), SourceBaby, that can give you a link to source code for a routine. It uses CoreHackers::Sourcery module under the hood and takes arguments to give to its sourcery routine. Trigger it with the s: trigger:

<Zoffix> s: 'abc', 'match', \(/d/, :g)
<SourceBaby> Zoffix, Sauce is at
    https://github.com/rakudo/rakudo/blob/164eb42/src/core/Str.pm#L946

We gave it an object to call a method on (a Str), a string with the method name (match), and a Capture with which arguments the method is to be called. In return, it gave a URL to the multi that handles those args:

multi method match(Regex:D $pattern, :global(:$g)!, *%_) {
    nqp::if(
      nqp::elems(nqp::getattr(%_,Map,'$!storage')),
      self!match-cursor(nqp::getlexcaller('$/'),
        $pattern($cursor-init(Cursor,self,:0c)), 'g', $g, %_),
      nqp::if(
        $g,
        self!match-list(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)),
          CURSOR-GLOBAL, POST-MATCH),
        self!match-one(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)))
      )
    )
}

No Empty here, but we can see that when $g is true, we call self!match-list. It’s a private method, so SourceBaby would not be able to help with it. Let’s find it by searching the same source file:

# Create list from the appropriate Sequence given the move
method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      Empty,
    ))
}

And there’s our Empty! The commit message doesn’t mention why we changed from an empty List to an Empty, there are no comments in the source code explaining it, so we’ll have to resort to the most technologically non-advanced debugging tool in our arsenal… asking people.

The Dev IRC Channel

If you have questions about core development, join #perl6-dev IRC channel on Freenode. In this case, we can ask lizmat++ if she remembers whether there was a reason for that Empty.

If the person you’re trying to reach isn’t currently online, you can use the messaging bot, using the .tell trigger, followed by the person’s nick, followed by message. When the bot sees the person talk, it will deliver the message.

<babydrop> .ask stmuk_ so is `zef` now the installer being
    shipped with R*? I notice our REPL message still
    references panda; wondering if that should read zef now
<yoleaux2> babydrop: I'll pass your message to stmuk_.

After the discussion about the Empty, there doesn’t appear to be any specific reason to return it in this case, so we’ll change it to return an empty List instead, just as its old behavior was, and that will also fix our bug. The new !match-list then looks like this:

method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      List.new,
    ))
}

Compile the compiler; this time we can just run make install, since everything is already configured and pre-built from the last time we compiled:

make install

Check the change did fix the bug:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'say S:g/naughty// with "only nice list"'
only nice list

And run the test suite:

TEST_JOBS=6 make spectest

The TEST_JOBS env var lets you run multiple test files at once and the optimal value to set it at is around 1.3 times the number of cores in your computer. If you have a very meaty box (or endless patience), you can run make stresstest instead, for a more thorough test run.

With the spectest passing all of it’s tests, we are ready to finish off our work.

Test It!

The test suite is located in t/spec and is automatically checked out from its repo when you run make spectest. You can also simply delete that directory and clone your own fork as t/spec instead.

Usually, it’s easy to locate the file where the test can go into by running tree -f | grep 'some search term'. We fixed an issue with substitutions, so let’s go for this:

zoffix@VirtualBox:~/CPANPRC/rakudo/t/spec$ tree -f | grep subst
│   ├── ./integration/substr-after-match-in-gather-in-for.t
├── ./S05-substitution
│   ├── ./S05-substitution/67222.t
│   ├── ./S05-substitution/match.t
│   ├── ./S05-substitution/subst.rakudo.moar
│   └── ./S05-substitution/subst.t
│   ├── ./S32-str/substr-eq.t
│   ├── ./S32-str/substr-rw.rakudo.moar
│   ├── ./S32-str/substr-rw.t
│   ├── ./S32-str/substr.t

The ./S05-substitution/subst.t file looks like a decent candidate, pop it open. Bump the plan at the top of file by the number of tests you’re adding, then add the test at the end of the file (or a more appropriate spot):

plan 185;

...

is-deeply (S:g/FAIL// with 'foo'), 'foo',
    'S:g/// returns original string on failure to match';

Run the test file, to ensure everything passes:

make t/spec/S05-substitution/subst.t

And commit! We’re done! Santa’s Naughty-or-Nice list shall work fine from now on.

The Final Mystery

Recall that &infix:<,>($/) thing that was causing the bug when $/ contained an Empty? So what is that all about?

If you don’t know something about Perl 6, just come to our #perl6 IRC channel and ask. This is what I did when I couldn’t understand the purpose of that infix thing and after a long discussion, finding old bug tickets, and testing old bugs… we came to the conclusion these are no longer needed here!

So along with a bug fix, we also cleaned up codegen. At least that’s in theory, perhaps by doing so we created another bug that will send us on yet another great hunting journey.

Conclusion

It’s easy to give a helping hand to the core developers of Perl 6 by fixing some of the bugs. Starting from easy things that require nothing more than knowledge of Perl 6, you can progressively learn more about the internals and fix tougher problems.

The perl6 compiler comes with a number of useful output methods like --target=ast and --target=parse that can aid in debugging. An army of IRC bots makes it easy to navigate source code both in space and time, by either giving you a link to an implementation or producing output of some particular commit.

Lastly, a very valuable resource we have available is the people of Perl 6, who can help you out. Whether you’re digging deep into the guts of the compiler, or just starting out with computer programming.

Join us. We have… bugs to fix!