Day 11 — Perl 6 Core Hacking: It Slipped Through The QASTs

One of the great things about Perl 6 is how accessible to regular users the compiler is to hack on. Easy bugs require nothing but knowledge of Perl 6 itself, since a lot of the Rakudo compiler is written in Perl 6. Slightly tougher bugs require knowledge of NQP, and tougher still are bugs involving Grammar and Actions. Things progress further in difficulty from there, going as far as assembly hacking on VM level, but today, we’ll stick around in Rakudo land. We have a task at hand!

Santa is having some difficulties generating his Naughty-or-Nice list due to a bug in Perl 6. He traced it down to the use of the S/// substitution operator with :g modifier that for some reason returns an empty list instead of the original string if no matches were made:

say S:g/naughty// with 'only nice list';
# OUTPUT: ()

Time to dig in and fix this, if anyone is to get any presents this year!

The Bots To The Rescue

The first thing to do when fixing a bug is to find out when it first appeared. AlexDaniel++ and MasterDuke++ implemented several IRC bots that make this task extremely simple. They are available in #perl6 and #perl6-dev IRC channels, and you can play with them in #zofbot IRC channel, without annoying anyone. We’ll be using bisectable6 bot to find out when the S/// operator got broken:

<Zoffix> bisectable6, help
<bisectable6> Zoffix, Like this: bisectable6: old=2015.12 new=HEAD exit
    1 if (^∞).grep({ last })[5] // 0 == 4 # RT128181

Just give the bot a piece of code, optionally specifying the starting and ending commits, and it’ll bisect by either the exit code, or failing at that, by output.

<Zoffix> bisectable6, S:g/d// given 'abc'
<bisectable6> Zoffix, Bisecting by output (old=2016.10 new=524368c)
    because on both starting points the exit code is 0
<bisectable6> Zoffix, bisect log:
    https://gist.github .com/c2cf9c3a7b6d13a43c34f64b96090e31
<bisectable6> Zoffix, (2016-10-23)
    https://github  .com/rakudo/rakudo/commit/b7201a8f22338a906f2d8027a21387e8f5c77f41

The last link is the interesting bit, it tells us the S:g/// was working fine until that commit. The commit does seem related—it’s the refactor lizmat++ did to make .match 150%–1400% faster—but it’s quite big and it’s not obvious how it’s linked to the workings of the S/// operator. Let’s find out, shall we?

How Do You Spell That?

We can specify the --target command line argument to perl6 executable to ask it for the output of a particular stage of the program (run perl6 --statestats -e '' to see names of all stages). Let’s output the parse stage, to find out which tokens in the Grammar we should look into:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 --target=parse -e 'S:g/d//'
- statementlist: S:g/d//
  - statement: 1 matches
    - EXPR: S:g/d//
      - value: S:g/d//
        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g
          - sibble: /d//
            - right:
            - babble:
              - B:
            - left: d
              - termseq: d
                - termaltseq: d
                  - termconjseq: 1 matches
                    - termalt: 1 matches
                      - termconj: 1 matches
                        - termish: 1 matches
                          - noun: 1 matches
                            - atom: d

There are some general tokens in the output—such as statementlist, statement, EXPR, and value—we can just gloss over. They are about statements and we want stuff for the operator itself, so the interesting bit start with this:

        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g

Let’s pop open the Grammar in our text editor and locate a token called quote. It can also be a rule, regex or method, but tokens are most common. The first thing we can locate is this:

proto token quote { <...> }
token quote:sym<apos>  {
    :dba('single quotes') "'" ~ "'"    
    <nibble(self.quote_lang(%*LANG<Quote>, "'", "'", ['q']))>
}
token quote:sym<sapos> {
    :dba('curly single quotes') "‘" ~ "’"
    <nibble(self.quote_lang(%*LANG<Quote>, "", "", ['q']))>
}

The Grammar that parses Perl 6 isn’t much different from the grammar you’d use as a user of Perl 6, so most of it probably looks familiar to you. The quote token is a proto regex, so looking further at the output --target=parse gave us, we see we need :sym<S> variant of it.

Scrolling a bit through the quote‘s candidates, we finally come across :sym<s> that sets the <sym> capture to either s or S:

token quote:sym<s> {
    <sym=[Ss]> (s)**0..1
    :my %*RX;
    :my $*INTERPOLATE := 1;
    :my $*SUBST_LHS_BLOCK;
    :my $*SUBST_RHS_BLOCK;
    {
        %*RX<s> := 1 if $/[0]
    }
    <.qok($/)>
    <rx_adverbs>
    <sibble(%*RX<P5> ?? %*LANG<P5Regex> !! %*LANG<Regex>, %*LANG<Quote>, ['qq'])>
    [ <?{ $<sibble><infixish> }> || <.old_rx_mods>? ]

}

So this token handles both s/// and S/// operators and its body is unimpressive: it seems to be no more than than some set up work. With the name of the token in hand, we now know what to look for in the Actions: method quote:sym<s>.

While finding it in actions is easy… it’s quite a biggie, with 177 lines of code to its name. However, someone nice left us a comment that fits into our puzzle:

method quote:sym<s>($/) {
    # We are emulating Str.subst/subst-mutate here, by calling match,
    # assigning the result to a temporary variable etc.
    ...

Recall bisectable6‘s results? The commit it pointed out was work on the .match method and according to the comment for S/// operator, it uses .match to do its stuff. Let’s execute that method on builds before and after the commit bisectable6 found for us. There’s another handy bot to do that for us: commitable6.

Give it a commit SHA or one of the release tags along with code to run and it’ll give you output for that code on that commit:

<Zoffix> committable6, 2016.11 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.11»: ()

<Zoffix> committable6, 2016.10 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

We ran the code on 2016.11 and 2016.10 releases and the output indicates there’s no difference… or is there? The problem with using say as a debugging tool is it often omits things we may find useful. A better alternative is the dd routine that is a Rakudo-specific utility dumper sub that’s not part of standard Perl 6 language. Give it some args and it’ll dump them out. Let’s give it a spin:

<Zoffix> committable6, 2016.11 dd 'abc'.match: :g, /d/
<comittable6> Zoffix, ¦«2016.11»: slip()

<Zoffix> committable6, 2016.10 dd 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

Aha! Another puzzle piece! When :g adverb is in use, on failed matches .match used to return an empty list, but after lizmat++’s .match improvements, it started to return Empty, which is an empty Slip. Slips tend to flatten themselves out into the outer container, so perhaps that’s causing an issue in the S///‘s action method? Let’s take a closer look at it.

Slippety Slip

A bird’s-eye view of method quote:sym<s> action shows it does some setup work and then codegens a QAST (“Q” Abstract Syntax Tree). It’d be helpful to take a look at what it generates.

One method of doing so is using the same --target feature we’ve used to get the parse stage, except we’d use the ast stage. So the command would be this:

perl6 --target=ast -e 'S:g/d//'

If you actually run that, you’ll get a text wall of QAST, and it may be tough to spot which are the bits actually generated by the S/// operator. Luckily, there’s a better way! The QAST node objects have .dump method that dumps them the same style as what you see in --target=ast output. So checkout the compiler’s repo if you haven’t already done so, pop open src/Perl6/Actions.nqp file, go to the end of method quote:sym<s> and stick note($past.dump) in there to print the dump of the QAST generated for the S/// operator:

    ...
    );
    $past.annotate('is_S', $<sym> eq 'S');
    note($past.dump); # <----------------------- like that
    make WANTED($past, 's///');  # never carp about s/// in sink context
}

(Why is it called $past and not $qast? Historical reasons: QAST used to be PAST, for Parrot Abstract Syntax Tree).

Now, compile Rakudo:

perl Configure.pl --gen-moar --gen-nqp --backends=moar
make
make test
make install

And execute our buggy S/// match to make the line we added print out S///‘s QAST:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'S:g/d// given "abc"'
- QAST::Op(locallifetime)  :is_S<?> S:g/d//
  - QAST::Stmt
    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)
    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)
    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)
      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>
    - QAST::Stmt
    - QAST::Var(lexical $/)

There are docs for types of QAST you see here, or we can just wing it.

We callmethod match and bind the result to subst_result_1:

    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)

We call nqp::p6store (p6* ops are documented in Rakudo’s repo), giving it the result of infix:<,>($/) as container and the return of .match call as value:

    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)

We check if anything matched (for :g matches, we check for a Positional that has any .elems in it):

    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)

If we did have matches, call Str!APPLY-MATCHES:

      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)

If we didn’t have matches, call nqp::p6store, storing the $_ (this is our original string S/// works on) in the $/:

      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>

Since we know the commit bisectable6 found makes .match return an empty slip for failed matches, it’s that last bit of QAST that should look suspicious, since slips flatten themselves out. We’ll return to why we’re storing into an &infix:<,>($/) rather than into $/ directly, but first, let’s write the NQP equivalent of such a setup.

We have two variables: $/ with Empty and $_ with our original string. The QAST::Op node maps out to an nqp op with the same name, so our suspicious bit looks something like this:

use nqp;

$_ = 'abc';
$/ = Empty;
nqp::p6store( &infix:<,>($/), $_);

Yet another helpful bot, camelia, lets us run a piece of code straight from IRC. Just use trigger m: with some code. Let’s try it out:

<Zoffix> m: use nqp; $_ = 'abc'; $/ = Empty;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Slip $/ = slip$()␤»

<Zoffix> m: use nqp; $_ = 'abc'; $/ = List.new;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Str $/ = "abc"␤»

The results show that when $/ is an Empty, it ends up still being it after the p6store, while if $/ is an empty List, it happily takes a string. We finally connected the S/// operator with the commit that introduced the bug and found why it occurs (although, slips behaving like that may be a bug of its own). Let’s trace where that Empty in Str.match comes from and why it’s there.

What Sourcery Is This?

There’s another bot (it’s the future! people have lots of bots!), SourceBaby, that can give you a link to source code for a routine. It uses CoreHackers::Sourcery module under the hood and takes arguments to give to its sourcery routine. Trigger it with the s: trigger:

<Zoffix> s: 'abc', 'match', \(/d/, :g)
<SourceBaby> Zoffix, Sauce is at
    https://github.com/rakudo/rakudo/blob/164eb42/src/core/Str.pm#L946

We gave it an object to call a method on (a Str), a string with the method name (match), and a Capture with which arguments the method is to be called. In return, it gave a URL to the multi that handles those args:

multi method match(Regex:D $pattern, :global(:$g)!, *%_) {
    nqp::if(
      nqp::elems(nqp::getattr(%_,Map,'$!storage')),
      self!match-cursor(nqp::getlexcaller('$/'),
        $pattern($cursor-init(Cursor,self,:0c)), 'g', $g, %_),
      nqp::if(
        $g,
        self!match-list(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)),
          CURSOR-GLOBAL, POST-MATCH),
        self!match-one(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)))
      )
    )
}

No Empty here, but we can see that when $g is true, we call self!match-list. It’s a private method, so SourceBaby would not be able to help with it. Let’s find it by searching the same source file:

# Create list from the appropriate Sequence given the move
method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      Empty,
    ))
}

And there’s our Empty! The commit message doesn’t mention why we changed from an empty List to an Empty, there are no comments in the source code explaining it, so we’ll have to resort to the most technologically non-advanced debugging tool in our arsenal… asking people.

The Dev IRC Channel

If you have questions about core development, join #perl6-dev IRC channel on Freenode. In this case, we can ask lizmat++ if she remembers whether there was a reason for that Empty.

If the person you’re trying to reach isn’t currently online, you can use the messaging bot, using the .tell trigger, followed by the person’s nick, followed by message. When the bot sees the person talk, it will deliver the message.

<babydrop> .ask stmuk_ so is `zef` now the installer being
    shipped with R*? I notice our REPL message still
    references panda; wondering if that should read zef now
<yoleaux2> babydrop: I'll pass your message to stmuk_.

After the discussion about the Empty, there doesn’t appear to be any specific reason to return it in this case, so we’ll change it to return an empty List instead, just as its old behavior was, and that will also fix our bug. The new !match-list then looks like this:

method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      List.new,
    ))
}

Compile the compiler; this time we can just run make install, since everything is already configured and pre-built from the last time we compiled:

make install

Check the change did fix the bug:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'say S:g/naughty// with "only nice list"'
only nice list

And run the test suite:

TEST_JOBS=6 make spectest

The TEST_JOBS env var lets you run multiple test files at once and the optimal value to set it at is around 1.3 times the number of cores in your computer. If you have a very meaty box (or endless patience), you can run make stresstest instead, for a more thorough test run.

With the spectest passing all of it’s tests, we are ready to finish off our work.

Test It!

The test suite is located in t/spec and is automatically checked out from its repo when you run make spectest. You can also simply delete that directory and clone your own fork as t/spec instead.

Usually, it’s easy to locate the file where the test can go into by running tree -f | grep 'some search term'. We fixed an issue with substitutions, so let’s go for this:

zoffix@VirtualBox:~/CPANPRC/rakudo/t/spec$ tree -f | grep subst
│   ├── ./integration/substr-after-match-in-gather-in-for.t
├── ./S05-substitution
│   ├── ./S05-substitution/67222.t
│   ├── ./S05-substitution/match.t
│   ├── ./S05-substitution/subst.rakudo.moar
│   └── ./S05-substitution/subst.t
│   ├── ./S32-str/substr-eq.t
│   ├── ./S32-str/substr-rw.rakudo.moar
│   ├── ./S32-str/substr-rw.t
│   ├── ./S32-str/substr.t

The ./S05-substitution/subst.t file looks like a decent candidate, pop it open. Bump the plan at the top of file by the number of tests you’re adding, then add the test at the end of the file (or a more appropriate spot):

plan 185;

...

is-deeply (S:g/FAIL// with 'foo'), 'foo',
    'S:g/// returns original string on failure to match';

Run the test file, to ensure everything passes:

make t/spec/S05-substitution/subst.t

And commit! We’re done! Santa’s Naughty-or-Nice list shall work fine from now on.

The Final Mystery

Recall that &infix:<,>($/) thing that was causing the bug when $/ contained an Empty? So what is that all about?

If you don’t know something about Perl 6, just come to our #perl6 IRC channel and ask. This is what I did when I couldn’t understand the purpose of that infix thing and after a long discussion, finding old bug tickets, and testing old bugs… we came to the conclusion these are no longer needed here!

So along with a bug fix, we also cleaned up codegen. At least that’s in theory, perhaps by doing so we created another bug that will send us on yet another great hunting journey.

Conclusion

It’s easy to give a helping hand to the core developers of Perl 6 by fixing some of the bugs. Starting from easy things that require nothing more than knowledge of Perl 6, you can progressively learn more about the internals and fix tougher problems.

The perl6 compiler comes with a number of useful output methods like --target=ast and --target=parse that can aid in debugging. An army of IRC bots makes it easy to navigate source code both in space and time, by either giving you a link to an implementation or producing output of some particular commit.

Lastly, a very valuable resource we have available is the people of Perl 6, who can help you out. Whether you’re digging deep into the guts of the compiler, or just starting out with computer programming.

Join us. We have… bugs to fix!

Day 10 — Let’s learn and try double-ended priority queue with Perl 6

Hello! My name is Itsuki Toyota. I’m a web developer in Japan.

In this post, let me introduce Algorithm::MinMaxHeap.

Introduction

Algorithm::MinMaxHeap is a Perl 6 implementation of double-ended priority queue (i.e. min-max heap).1

Fig.1 shows an example of this data structure:

minmaxheap

Fig.1

This data structure has an interesting property, that is, it has both max-heap and min-heap in a single tree.

More precisely, the min-max ordering is maintained in a single tree as the below description:

  • values stored at nodes on even levels (i.e. min levels) are smaller than or equal to the values stored at their descendants (if any)
  • values stored at nodes on odd levels (i.e. max levels) are greater than or equal to the values stored at their descendants (if any)

Algorithm::MinMaxHeap Internals

I’ll shortly explain what this double-ended priority queue does internally when some frequently used methods (e.g. insert, pop) are called on.2
They maintain the min-max ordering by the following operations:

insert($item)

1. Pushes the given item at the first available leaf position.
2. Checks whether it maintains the min-max ordering by traversing the nodes from the leaf to the root. If it finds a violating node, it swaps this node with a proper node and continues the checking operation.

pop-max

1. Extracts the maximum value node.
2. Fills the vacant position with the last node of the queue.
3. Checks whether it maintains the min-max ordering by traversing the nodes from the filled position to the leaf. If it finds a violating node, it swaps this node with a proper node and continues the checking operation.

pop-min

1. Extracts the minimum value node.
2. Fills the vacant position with the last node of the queue.
3. Checks whether it maintains the min-max ordering by traversing the nodes from the filled position to the leaf. If it finds a violating node, it swaps this node with a proper node and continues the checking operation.

Let’s try !

Then let me illustrate how to use Algorithm::MinMaxHeap.

Example 1

The first example is the simplest one.

Source:

use Algorithm::MinMaxHeap;

my $queue = Algorithm::MinMaxHeap[Int].new; # (#1)
my Int @items = (^10).pick(*);
@items.say; # [1 2 6 4 5 9 0 3 7 8]
$queue.insert($_) for @items; # (#2)
$queue.pop-max.say; # 9 (#3)

In this example, Algorithm::MinMaxHeap[Int].new creates a queue object, where all of its nodes are type Int. (#1)

$queue.insert($_) for @items inserts the numbers in the list. (#2)

$queue.pop-max pops a node which stores the maximum value. (#3)

Example 2

The second example defines user-defined type constraints using subset:

Source (chimera.p6):

use Algorithm::MinMaxHeap;

my subset ChimeraRat of Cool where Num|Rat; # (#1)
my $queue = Algorithm::MinMaxHeap[ChimeraRat].new;

$queue.insert(10e0); # ok
$queue.insert(1/10); # ok
$queue.insert(10); # die # (#2)

Output:

$ perl6 chimera.p6
Type check failed in assignment to @!nodes; expected ChimeraRat but got Int (10)
in method insert at /home/itoyota/.rakudobrew/moar-nom/install/share/perl6/site/sources/240192C19BBAACD01AB9686EE53F67BC530F8545 (Algorithm::MinMaxHeap) line 12
in block  at chimera.p6 line 8

In this example, subset ChimeraRat is a Cool, but only allows Num or Rat. (#1)

Therefore, when you insert 10 to the queue, it returns the error message and die.

It’s because 10 is a Int object. (#2)

Example 3

The third example inserts user-defined classes to the queue.

Source (class.p6):

use Algorithm::MinMaxHeap;

my class State {
 also does Algorithm::MinMaxHeap::Comparable[State]; # (#1) 
 has DateTime $.time;
 has $.payload;
 submethod BUILD(:$!time, :$!payload) { }
 method compare-to(State $s) { # (#2) 
   if $!time == $s.time {
     return Order::Same;
   }
   if $!time > $s.time {
     return Order::More;
   }
   if $!time < $s.time {
     return Order::Less;
   }
 }
}

my @items;

@items.push(State.new(time => DateTime.new(:year(1900), :month(6)),
 payload => "Rola"));
@items.push(State.new(time => DateTime.new(:year(-300), :month(3)),
 payload => "Taro"));
@items.push(State.new(time => DateTime.new(:year(1963), :month(12)),
 payload => "Hanako"));
@items.push(State.new(time => DateTime.new(:year(2020), :month(6)),
 payload => "Jack"));

my Algorithm::MinMaxHeap[Algorithm::MinMaxHeap::Comparable] $queue .= new;
$queue.insert($_) for @items;
$queue.pop-max.say until $queue.is-empty;

Output:

 $ perl6 class.p6
 State.new(time => DateTime.new(2020,6,1,0,0,0), payload => "Jack")
 State.new(time => DateTime.new(1963,12,1,0,0,0), payload => "Hanako")
 State.new(time => DateTime.new(1900,6,1,0,0,0), payload => "Rola")
 State.new(time => DateTime.new(-300,3,1,0,0,0), payload => "Taro")

In this example, the class State does the role Algorithm::MinMaxHeap::Comparable[State], where Algorithm::MinMaxHeap::Comparable[State] defines the stub method compare-to as:
method compare-to(State) returns Order:D { ... };
(#1)

Therefore, in this case, the State class must define a method compare-to so that it accepts a State object and returns an Order:D object. (#2)

Footnote

  • 1 Atkinson, Michael D., et al. “Min-max heaps and generalized priority queues.” Communications of the ACM 29.10 (1986): 996-1000.
  • 2 For more exact information, please read the above paper.

Thank you for reading my post !

Day 9 – A preview of the ‘hackable’ JIT compiler

Among programming languages Perl 6 has an unfortunate but historically well-deserved reputation for being slow, even compared to languages such as Perl 5, Ruby or Python. This is not really an accident, however. From the outset, perl6 was designed to be extensible by the programmer in many different ways, including defining custom operators, dynamic containers for variables, and a powerful meta-object protocol that allow you to modify the behaviour of objects and classes from regular Perl code. In the future, we might well add macros to that list. To this are added a rich set of control structures and phasers. Such features are useful, powerful and fun.

But such features also make it relatively difficult to execute perl6 efficiently, because they introduce overhead and polymorphism. Even before the language was released last year, work had started to improve the efficiency of Rakudo Perl 6. Because Rakudo is complex and layered, this works involves many aspects and many individuals, from core routine efficiency improvements to precompilation of modules. MoarVM also tries to specialize ‘hot’ code for specific object types at runtime, which reduces the polymorphism of that code, and allows simpler operations to be substituted for complex operations. For example, object attribute access can in many cases be reduced to a simple memory read instruction.

Since late summer 2014, MoarVM has contained a JIT compiler that I developed. At its core, JIT compilation reduces the overhead from interpretation. That is to say, rather than fetch new instructions from a bytecode stream and dispatch to execute them, a JIT compiler generates the code to execute these instructions in a series. Ideally, this allows the executed code to ‘flow’ directly through the program without interruption. In practice, there is still the machinery of the VM, for example for garbage collection, to take into account. Also, note that in many VM implementations, the JIT compiler and the type specialization code are combined, whereas in MoarVM they are distinct components.

Since summer 2015 (yes, that long) I’ve been working on an improved backend for that compiler, which was funded by the Perl foundation. One of the goals for the development of that backend (which I misleadingly call the ‘expression JIT compiler’) was to enable extension points for specialized object types and instructions. These extension points are designed to enable a greater range of type specialization and to improve code generation gradually. The best part, I think, is that they should require relatively little knowledge of compiler internals or assembly language. For this advent calendar post, I’d like to demonstrate those extension points.

First, consider the following innocent-looking perl6 code.

sub foo(@a) {
    @a[0] = @a.elems;
}

In case this (obviously critical) `foo` subroutine is invoked hundreds of times, MoarVM will start to optimize it, first by recording what @a really is. In most cases, that will be an instance of the Array class. If it always the same, and supposing the elems method is sufficiently small (which it is), that method call will be inlined into the foo routine. The assignment of @a[0] is somewhat more involved than it looks and what it would be like in a language like Java. In Perl 6, an array holds a sequence of containers, and it is the container to which the value is assigned. So the primitive sequence of operations in ‘foo’ after MoarVM has optimized it is similar to:

my $i := nqp::elems(@a);
my $c := nqp::atpos_o(@a, 0);
nqp::assign($c, $i);

The ‘nqp’ namespace signifies that these are primitive operations in the interpreter. However, they are still polymorphic, because arrays can be represented by multiple different stores. For example, arrays that are used by NativeCall foreign function interface use a different representation (for compatibility reasons) than the arrays natively used by the VM (which are called MVMArray internally). Thus, these operations are polymorphic, and MoarVM needs a dispatch table to find the correct implementation.

If the exact implementation type of @a is known, then this polymorphism is unnecessary. In principle, the type specializer could insert a specialized non-polymorphic instruction for each of those operations for the types found during the recording phase. That is not very practical though as there are many different types and each type would have to support many different operations.

However, with the expression compiler it will be relatively easy to add specialized code for a specific implementation. See for instance the following (somewhat simplified) implementations of the ‘elems’ and ‘atpos_o’ MoarVM instructions, which would work on MVMArray objects:

(template: elems (^getf $1 MVMArray body.elems))

(template: atpos_o 
  (load (index (^getf $1 MVMArray body.slots.o) $2 8)
        (&SIZEOF MVMObject*)))

Such ‘expression templates’ are textual representations of the templates used in the compiler to convert relatively high-level MoarVM bytecode to low-level code which is easier to compile and optimize. Although the code above might look complex, it is actually pretty straightforward:

  • Expressions are grouped by parentheses and can contain subexpression, exactly as in LISP.
  • The ‘template:’ keyword declares a template, and ‘elems’ declares that this template represents the ‘elems’ MoarVM instruction.
  • Numbers with a ‘$’ sigil represent the (input) operands to the MoarVM instruction. In case of elems, the 1st operand is the MVMArray object.
  • ‘load’ and ‘index’ are simple expression nodes, which represent a particular low-level operation.
  • ‘^getf’ is a template macro, which expands to a more complex expression of ‘load’ and ‘addr’ nodes which compute the address of the ‘body.elems’ field in the MVMArray structure and load a value from it.
  • &SIZEOF is an expression that is translated – prior to inclusion by MoarVM – into a C sizeof expression. With the ‘&’ syntax, any C macro can be referenced, and with that any compile-time expression may be evaluated.

Such templates can be added without any knowledge of assembler whatsoever, although some knowledge of C structure and VM internals are probably preferable. I hope that by making this simple enough I can convince others to share the task of JIT compiler development with me :-).

For those who are feeling more adventurous yet, it is also simple to hook into the data-driven code generator known as the tiler. The tiler picks the CPU instructions to implement the low-level code generated from expression templates. Each CPU instruction is represented by a ’tile’, which ‘covers’ part of the expression graph. The tiler tries to pick the optimal instruction so that the entire code graph is covered as cheaply as possible. As the x86 instruction set is very large, it is nearly impossible for me to write tiles for every possible instruction. Adding a tile is not very hard, though. Suppose for instance that we’d be writing a program to compute the XOR of all values in the array:

my $x = 0;
for @a -> $i {
    $x +^= $i;
}

In reality, there are still multiple barriers to implementing this as a tight loop, but suppose that we have taken them down. (We can, it is a matter of time and energy). Then you might find that the sequence of INDEX, LOAD and XOR operations were be inefficient, and you could optimize that into a single instruction. You can declare a tile for that instruction as follows:

(tile: xor_load_idx 
   (xor reg (load (index reg reg $stride) $size)) reg)
  • This uses the same syntax as the expression templates, which is convenient for parsing, but also makes the correspondence between templates and tiles clear
  • The ’tile:’ keyword now declares a tile, and ‘xor_load_idx’ the name of the tile implementation.
  • The words with a $sigil now specify constants. The ‘xor’, ‘load’, and ‘index’ words specify expression nodes (operations).
  • ‘reg’ specifies that this tile consumes values in registers (in the first expression) and yields a value in a register (as the last word in the expression).

The tile implementation would look as follows:

MVM_JIT_TILE_DECL(xor_load_idx) {
    MVMint8 out  = tile->values[0];
    MVMint8 base = tile->values[1];
    MVMint8 idx  = tile->values[2];
    /* this only works on 8-byte sized indexes (for now) */
    ASSERT(tile->args[0] == 8 && tile->args[1] == 8);
    | xor Rq(out), qword [Rq(base)+Rq(idx)*8];
}

The `MVM_JIT_TILE_DECL` macro declares the tile function with all required parameters. The ’tile’ parameter which is declared and passed automatically contains operational details such as the registers used (in ’tile->values’) and constant parameters (in ’tile->args’). The function of the assembly-code fragment that follows the ‘|’ symbol is to declare a piece of machine code that is to be assembled at runtime. For this we use dynasm with a few patches to support picking ‘dynamic’ registers on x86-64. The result of this addition would be that the aforementioned sequence of instructions would be compiled to a single one and your (hypothetical) program would run a few percent faster. Although small, such improvements can add up.

Unfortunately, all of this is not done yet. Although in a way the main body of the compiler exists, some essential pieces (especially the register allocator) need further development, and there are quite a few bugs and oversights that need to be fixed, too. So unfortunately you can’t play with this – yet. However, I do hope that I’ve given you a taste of what you can do when it is finished, which I hope to be soon. And with that, I wish you a happy holiday.

Day 8 — How to Make, Use, and Abuse Perl 6 Subsets

Ever reached for a Perl 6 type and found it good enough, but not perfect? Perhaps, you wanted an IntEven, StrPalindrome, or YourCustomClassWhereAttrFooIsBar. Never fear, Perl 6’s subsets are here!

What Are Subsets?

You can think of them as a refinement on some type and you can use them in most places where you’d use a type, such as in type constraints. To create one, use the subset keyword, along with a where keyword specifying your refinement:

    subset Even where * %% 2;
    say 1 ~~ Even; # False
    say 2 ~~ Even; # True

The WhateverStar * is the value being checked and the %% operator checks if that value is evenly divisible by 2. We can now use our subset on the right side of a smartmatch operator to check whether a value is an even number! Pretty awesome. What else can we do with it?

How about type-constraining a variable:

    my Even $x = 42; # all good

    $x = 43; # and this?
    # Type check failed in assignment to $x; expected Even but got Int (43)
    #   in block <unit> at script.p6 line 3

Or type-constraining input and output of a routine:

    sub takes-an-even-only (Even $x) { $x² }
    sub returns-an-even-only returns Even { $^x² }

    say takes-an-even-only   42; # 1764
    say returns-an-even-only 42; # 1764

    say takes-an-even-only   43;
    # Constraint type check failed for parameter '$x'
    #   in sub takes-an-even-only at script.p6 line 2
    #   in block <unit> at script.p6 line 8

    say returns-an-even-only 43;
    # Type check failed for return value; expected Even but got Int (1849)
    #   in sub returns-an-even-only at script.p6 line 3
    #   in block <unit> at script.p6 line 13

That’s all pretty sweet, but our Even accepts strings and other weird stuff:

    say '42.0000' ~~ Even; # True
    say class { method Real { 42 } }.new ~~ Even; # True

There’s a reason for that: we never specified what type we’re making a subset of, so by default, it used Any. Let’s fix that!

Getting Typical

If you want to create a subset based on some type, specify that type with the of keyword:

    subset IntEven of Int where * %% 2;

Now, before the where refinement even runs, the value we’re checking against must first pass the Int type constraint:

    say 42 ~~ IntEven; # True
    say 43 ~~ IntEven; # False
    say '42.0000' ~~ IntEven; # False
    say class { method Real { 42 } }.new ~~ IntEven; # False

We’re not limited to numerics! Let’s try a Str:

    subset StrPalindrome of Str where {
        .flip eq $_ given .comb(/\w+/).join.lc;
    }

    say ’Madam, I'm Adam.~~ StrPalindrome; # True
    say '1 on 1'  ~~ StrPalindrome; # False

We’re now using a more complex refinement in the where clause, using a code block. Just like the WhateverCode version with the *, the block receives the value to check as its argument, which it aliases to $_ topical variable. The code block tells us whether the argument is a palindrome, by returning a truthy or falsy value.

So how far can we go with the type constraints in our subsets?

Custom Made

We can type-constrain a subset using any class we have lying around! How about this one:

    class Awesome { has $.awesomeness-level }
    my $obj1 = Awesome.new: :10000awesomeness-level;
    my $obj2 = Awesome.new: :31337awesomeness-level;

We make a class called Awesome that has a public attribute called awesomeness-level. We also create two instances of that class, setting the awesomeness-level to 10000 in $obj1 and to 31337 in $obj1. So how about a subset that checks whether awesomeness-level is a prime number? It’s just a single line of code:

    subset AwesomePrime of Awesome where .awesomeness-level.is-prime;

    say $obj1 ~~ AwesomePrime; # False
    say $obj2 ~~ AwesomePrime; # True

The where block of the subset is “thunked,” which means the compiler takes an expression and turns it into a code block for us, so we don’t have to explicitly use a codeblock here, nor do we need a WhateverStar. The value being checked is in the $_ topical variable, which is what method calls use when you don’t specify what you’re calling them on. Thus, our subset expects a thing of type Awesome and then checks whether its awesomeness-level attribute is a prime number!

By using such subsets, you can create routines that only accept your custom objecs of a specific configuration. For example, in code of an IRC::Client bot, we can create a subset of IRC::Client::Message for messages that are received from bot admins, and then register events only for such messages:

    subset BotAdmin of IRC::Client::Message where .host eq conf<bot-admins>.any;

    multi method irc-to-me (BotAdmin $e where /:i ^ 'run' $<steps>=.+ $/ ) {
        ...
    }

The subset calls the subroutine that reads configuration and provides a list of bot admin hosts against which the host of the sender of the message is checked. We’re encapsulating the logic that checks we have an acceptable object to work with, and we’re able to call that logic even before we enter our method.

So if we can do that with subsets… is there anything we can’t do?

Time for Some Heavy Abuse!

Let’s do something crazy! A subroutine that fetches a link to a website and checks whether it contains a mention of Perl 6:

    use LWP::Simple;
    sub is-perl-site { LWP::Simple.get( $^website ).contains: 'Perl 6' }

There’s nothing crazy about that, you say? Then, how about we use that subroutine as a refiner in a subset where clause:

    subset PerlWebsite where &is-perl-site;

    say 'http://perl6.party' ~~ PerlWebsite; # True
    say 'http://lolcats.com' ~~ PerlWebsite; # False

In fact, we can make a routine that only accepts URLs to websites mentioning Perl 6:

    sub ain't-taking-non-perl-stuff (PerlWebsite $url) {
        say "Why! I can already tell $url is awesome!";
    }

    ain't-taking-non-perl-stuff 'http://perl6.party';
    # Why! I can already tell http://perl6.party is awesome!

    ain't-taking-non-perl-stuff 'http://lolcats.com';
    # Constraint type check failed for parameter '$url'
    #   in sub ain't-taking-non-perl-stuff at script.p6 line 8
    #   in block <unit> at script.p6 line 15

But do you notice something off? The error message is rather poor… Let’s improve it!

What we know so far is the where clause takes some code to run and if that code’s result is falsy, the typecheck will fail. That means inside the where we can know whether or not the typecheck will fail before we even return from it. Let’s put that to use:

    sub is-perl-site {
        given LWP::Simple.get( $^website ).contains: 'Perl 6' {
            when :so  { True }
            when :not {
                say ’This ain't no website containing "Perl 6"!‘;
                False;
            }
        }
    }

    subset PerlWebsite where &is-perl-site;

In the routine that checks a website mentions Perl 6, in the case when it does :not contain a mention of Perl 6, we say a helpful message, indicating what exactly was wrong. Let’s run this:

    sub ain't-taking-non-perl-stuff (PerlWebsite $url) {
        say "Why! I can already tell $url is awesome!";
    }

    ain't-taking-non-perl-stuff 'http://perl6.party';
    # Why! I can already tell http://perl6.party is awesome!

    ain't-taking-non-perl-stuff 'http://lolcats.com';
    # This ain't no website containing "Perl 6"!
    # This ain't no website containing "Perl 6"!
    # Constraint type check failed for parameter '$url'
    #   in sub ain't-taking-non-perl-stuff at script.p6 line 16
    #   in block <unit> at script.p6 line 23

Whoa! The message printed twice. What gives?

It’s actually expected that the refinement in subsets is an inexpensive and relatively simple operation… With that expectation in mind, the parameter binder—which doesn’t know how to generate errors—simply passes its stuff through the slower code path—which does—and it’s that slower code path that runs the where code the second time, triggering our message one more time.

So yes, doing overly complex stuff in subsets is abusive. However, you can throw an exception inside the where to avoid the repetition of the message:

    sub is-perl-site {
        LWP::Simple.get( $^website ).contains: 'Perl 6'
            or die ’This ain't no website containing "Perl 6"!‘;
    }

    ...

    ain't-taking-non-perl-stuff 'http://lolcats.com';
    # This ain't no website containing "Perl 6"!
    #   in sub is-perl-site at z.p6 line 4
    #   in any accepts_type at gen/moar/m-Metamodel.nqp line 3472
    #   in sub ain't-taking-non-perl-stuff at z.p6 line 11
    #   in block <unit> at z.p6 line 18

And don’t forget to check out Subset::Helper and Subset::Common modules.

What About a Light Spanking?

There is one type of abuse cheating with subsets that can get you out of a bind: fiddling with narrowness when it comes to resolution of multi candidates.

For example, let’s say you hate humanity and you wish to change the meaning of the infix + operator on two Ints to do subtraction instead of addition. You, of course, write this:

    multi sub infix:<+> (Int $a, Int $b) { $a$b }

But as you run a sample code, over the sound of your evil laughter…

    Ambiguous call to 'infix:<+>'; these signatures all match:
    :(Int:D \a, Int:D \b --> Int:D)
    :(Int $a, Int $b)
      in block <unit> at z.p6 line 4

… the complier errors out.

You see, core language already has an infix + operator that takes two Ints! When you add one of your own, you create an ambiguity. To resolve this issue, we need to somehow create an Int that the compiler thinks is narrower than an Int, but in reality isn’t. Sounds tough? Not an issue for subsets:

    subset NarrowInt of Int where {True};
    multi sub infix:<+> (NarrowInt $a, NarrowInt $b) { $a$b }

    say 42 + 2; # 40

It worked! We created a subset of Int, so we match all of Ints, just like we wanted. In the refinement, however, we specify a single code block that always returns True, making that refinement always succeed, and making our subset accept all the values a regular Int accepts, while being narrower than a regular Int as far as multi resolution goes.

If you’re wondering why we had to use an explicit block, it’s because the where smartmatches, and the smartmatch against a True produces a warning, because it’s always true, and while that is what we want here, in most code such a construct is a mistake.

But if you’re upset about writing one-too-many characters, here’s neat trick:

    multi sub infix:<+> (Int $a where {True}, Int $b where {True}) { $a - $b }
    say 42 + 2;

You don’t need to create an explicit subset, and can stick a where clause right onto the thing you’re working with to refine just that thing. The type constraint on it will function as the of ... of a subset.

You can also type constraint a variable with a subset and still add a where clause. Or create a subset of a subset of a subset and still add… well, we’re getting carried away.

When they stop calling…

Consider this piece of wonderful code:

    class Thingie {
        multi method stuff ($ where /meows/) { say "Meow meow!"; }
    }

    class SubThingie is Thingie {
        multi method stuff ($ where /foos/) { say "Just foos..."; }
    }

    SubThingie.new.stuff: 'meows and foos'; # Just foos...

You have a class with a multi method. Along with it, you have a subclass of it with another multi method of the same name. Both have a where clause and when you call the method with input that can match either multi, the subclass’s multi gets called. But what do you do if you want to reverse that… you want the parent class’s multi to be called, if both multies matches the input.

The first solution is very simple. Just add a type constraint (we’ll use Str) in the parent class, while leaving it off in the child:

    class Thingie {
        multi method stuff (Str $ where /meows/) { say "Meow meow!"; }
    }

    class SubThingie is Thingie {
        multi method stuff ($ where /foos/) { say "Just foos..."; }
    }

    SubThingie.new.stuff: 'meows and foos'; # Meow meow!

The presence of a type constraint on the method in the parent class makes it narrower than the one in the subclass, so even though the subclass’s method can also accept the input, it’s the parent class that gets to take care of it.

However, what if we wanted the same Str type constraint on both methods? The parent class we’ll leave as is: just a normal Str type constraint. In the kid, however, we’ll use a wider subset of Any (that’s the default if you don’t specify the of, remember?), but in its where clause we’ll smartmatch against Str, to ensure the subset accepts only Strs:

    class Thingie {
        multi method stuff (Str $ where /meows/) { say "Meow meow!"; }
    }

    class SubThingie is Thingie {
        subset WiderStr where { $_ ~~ Str };
        multi method stuff (WiderStr $ where /foos/) { say "Just foos..."; }
    }

    SubThingie.new.stuff: 'meows and foos'; # Meow meow!

The result is the opposite of a cheat we made in the previous section: instead of a subset that matches a type exactly, but is narrower than it, we now created a subset that matches a type exactly, but is wider than it, as far as multi candidate resolution goes. And yes, you can just merge the two where clauses instead of creating a subset, producing:

    multi method stuff ($ where { $_ ~~ Str and $_ ~~ /foos/ }) {
        say "Just foos...";
    }

It’ll work the same.

Conclusion

Subsets are a powerful feature that lets you specify refinements on existing core and custom types. You can smartmatch against a subset to perform a check on a value, or you can use subsets to type-contraint variables, parameters, and return values.

You can use the subset keyword to create a named subset, or you can attach a refinement onto a specific variable or parameter with a where clause. Subsets can also be used to effect alternative narrowness of a parameter, to affect multi candidate resolution order.

Subsets can also be abused to perform very complex operations, but… that’s probably a bad idea.

-Ofun

Day 7 — Set In Your Ways: Perl 6’s Setty and Baggy Types

There’s a relatively common pattern I see with people writing code that counts… say, DNA bases in a string:

my %counts;
%counts{$_}++ for 'AGTCAGTCAGTCTTTCCCAAAAT'.comb;
say %counts<A T G C>; # (7 7 3 6)

Make a Hash. For each thing you want to count, ++ that key in that Hash. So what’s the problem?

Perl 6 actually has specialized types that are more appropriate for this operation; for example, the Bag:

'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)

Let’s talk about these types and all the fancy operators that come with them!

A Note on Unicode

I’ll be using fancy-pants Unicode versions of operators and symbols in this post, because they look purty. However, all of them have what we call “Texas” equivalents you can use instead.

Ready. Set. Go.

The simplest of these types is a Set. It will keep exactly one of each item, so if you have multiple objects that are the same, the duplicates will be discarded:

say set 1, 2, 2, "foo", "a", "a", "a", "a", "b";
# OUTPUT: set(a, foo, b, 1, 2)

As you can see, the result has only one a and only one 2. We can use the , U+2208 ELEMENT OF, set membership operator to check if an item is in a set:

my $mah-peeps = set <babydrop iBakeCake Zoffix viki>;
say 'Weeee \o/' if 'Zoffix'  $mah-peeps;
# OUTPUT: Weeee \o/

The set operators are coercive, so we don’t need to explicitly create a set; they’ll do it for us:

say 'Weeee \o/' if 'Zoffix'  <babydrop iBakeCake Zoffix viki>;
# OUTPUT: Weeee \o/

But pay attention when using allomorphs:

say 'Weeee \o/' if 42  <1 42 72>;
# No output

say 'Weeee \o/' if 42  +«<1 42 72>; # coerce allomorphs to Numeric
# OUTPUT: Weeee \o/

The angle brackets create allomorphs for numerics, so in the first case above, our set contains a bunch of IntStr objects, while the left hand side of the operator has a regular Int, and so the comparison fails. In the second case, we coerce allomorphs to their numeric component with a hyper operator and the test succeeds.

While testing membership is super exciting, we can do more with our sets! How about some intersections?

my $admins = set <Zoffix mst [Coke] lizmat>;
my $live-in-North-America = set <Zoffix [Coke] TimToady huggable>;

my $North-American-admins = $admins  $live-in-North-America;
say $North-American-admins;
# OUTPUT: set(Zoffix, [Coke])

We intersected two sets with the , U+2229 INTERSECTION, intersection operator and received a set that contains only the elements present in both original sets. You can chain these operations too, so membership will be checked in all of the provided sets in the chain:

say <Zoffix lizmat>  <huggable Zoffix>  <TimToady huggable Zoffix>;
# OUTPUT: set(Zoffix)

Another handy operator is the set difference operator, whose Unicode look I find somewhat annoying: No, it’s not a backslash (\), but a U+2216 SET MINUS character (luckily, you can use the much more obvious (-) Texas version).

The usefulness of the operator compensates its shoddy looks:

my @spammers = <spammety@sam.com  spam@in-a-can.com  yum@spam.com>;
my @senders  = <perl6@perl6.org   spammety@sam.com   good@guy.com>;

for keys @senders  @spammers -> $non-spammer {
    say "Message from $non-spammer";
}

# OUTPUT:
# Message from perl6@perl6.org
# Message from good@guy.com

We have two arrays: one contains a list of spammers’ addresses and another contains a list of senders. How to get a list of senders, without any spammers in it? Just use the (fine, fine, the (-)) operator!

We then use the for loop to iterate over the results, and as you can see from the output, all spammers were omitted… But why is keys there?

The reason is Setty and Mixy types are a lot like hashes, in a sense that they have keys and values for those keys. Set types always have True as values, and since we don’t care about iterating over Pair objects in our loop, we use the keys to get just the keys of the set: the email addresses.

However, hash-like semantics aren’t useless on Sets. For example, we can take a slice, and with :k adverb return just the elements that the set contains:

my $meows = set <
    Abyssinian  Aegean  Manx      Siamese  Siberian  Snowshoe
    Sokoke      Sphynx  Suphalak  Thai
>;

say $meows<Sphynx  Raas  Ragamuffin  Thai>:k;
# OUTPUT: (Sphynx Thai)

But what happens if we try to delete an item from a set?

say $meows<Siamese>:delete;
# Cannot call 'DELETE-KEY' on an immutable 'Set'
# in block <unit> at z.p6 line 6

We can’t! The Set type is immutable. However, just like Map type has a mutable version Hash, so does the Set type has a mutable version: the SetHash. There isn’t a cutesy helper sub to create one, so we’ll use the constructor instead:

my $s = SetHash.new: <a a a b c d>;
say $s;
$s<a d>:delete;
say $s;

# SetHash.new(a, c, b, d)
# SetHash.new(c, b)

Voilà! We successfully deleted a slice. So, what other goodies does Santa have in his… bag?

Gag ’em ‘n’ Bag ’em

Related to Sets is another type: a Bag, and yes, it’s also immutable, with the corresponding mutable type being BagHash. We already saw at the start of this article we can use a Bag to count stuff, and just like a Set, a Bag is hash-like, which is why we could view a slice of the four DNA amino acids:

'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)

While a Set has all values set to True, a Bag‘s values are integer weights. If you put two things that are the same into a Bag there’ll be just one key for them, but the value will be 2:

my $recipe = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
say $recipe;
# OUTPUT: bag(cup of flour, egg(2), cup of milk)

And of course, we can use our handy operators to combine bags! Here, we’ll be using , U+228E MULTISET UNION, operator, which looks a lot clearer in its Texas version: (+)

my $pancakes = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
my $omelette = bag 'egg', 'egg',  'egg', 'cup of milk';

my $shopping-bag = $pancakes  $omelette  <gum  chocolate>;
say $shopping-bag;
# bag(gum, cup of flour, egg(5), cup of milk(2), chocolate)

We used two of our Bags along with a 2-item list, which got correctly coerced for us, so we didn’t have to do anything.

A more impressive operator is , U+227C PRECEDES OR EQUAL TO, and it’s mirror , U+227D SUCCEEDS OR EQUAL TO, which tell whether a Baggy on the narrow side of the operator is a subset of the Baggy on the other side; meaning all the objects in the smaller Baggy are present in the larger one and their weights are at most as big.

Here’s a challenge: we have some materials and some stuff we want to build. Problem is, we don’t have enough materials to build all the stuff, so what we want to know is what combinations of stuff we can build. Let’s use some Bags!

my $materials = bag 'wood' xx 300, 'glass' xx 100, 'brick' xx 3000;
my @wanted =
    bag('wood' xx 200, 'glass' xx 50, 'brick' xx 3000) but 'house',
    bag('wood' xx 100, 'glass' xx 50)                  but 'shed',
    bag('wood' xx 50)                                  but 'dog-house';

say 'We can build...';
.put for @wanted.combinations.grep: { $materials  [] |$^stuff-we-want };

# OUTPUT:
# We can build...
#
# house
# shed
# dog-house
# house shed
# house dog-house
# shed dog-house

The $materials is a Bag with our materials. We used xx repetition operator to indicate quantities of each. Then we have a @wanted Array with three Bags in it: that’s the stuff we want to build. We’ve also used used the but operator on them to mix in names for them to override what those bags will .put out as at the end.

Now for the interesting part! We call .combinations on our list of stuff we want, and just as the name suggests, we get all the possible combinations of stuff we can build. Then, we .grep over the result, looking for any combination that takes at most all of the materials we have (that’s the operator). On it’s fatter end, we have our $materials Bag and on its narrower end, we have the operator that adds the bags of each combination of our stuff we want together, except we use it as a metaoperator [⊎], which is the same as putting that operator between each item of $^stuff-we-want. In case you it’s new to you: the $^ twigil on $^stuff-we-want creates an implicit signature on our .grep block and we can name that variable anything we want.

And there we have it! The output of the program shows we can build any combination of stuff, except the one that contains all three items. I guess we just can’t have it all…

…But wait! There’s more!

Mixing it Up

Let’s look back at our recipe code. There’s something not quite perfect about it:

my $recipe = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
say $recipe;
# OUTPUT: bag(cup of flour, egg(2), cup of milk)

What if a recipe calls for half a cup of milk instead of a whole one? How do we represent a quarter of a teaspoon of salt, if Bags can only ever have integer weights?

The answer to that is the Mix type (with the corresponding mutable version, MixHash). Unlike a Bag, a Mix supports all Real weights, including negative weights. Thus, our recipe is best modeled with a Mix.

my $recipe = Mix.new-from-pairs:  'egg'          => 2, 'cup of milk' => ½,
                                  'cup of flour' => ¾, 'salt'        => ¼;
say $recipe;
# mix(salt(0.25), cup of flour(0.75), egg(2), cup of milk(0.5))

Be sure to quote your keys and don’t use colonpair form (:42a, or :a(42)), since those are treated as named arguments. There’s also a mix routine, but it doesn’t take weights and functions just like bag routine, except returning a Mix. And, of course, you can use a .Mix coercer on a hash or a list of pairs.

Less-Than-Awesome creation aside, let’s make something with mixes! Say, you’re an alchemist. You want to make a bunch of awesome potions and you need to know the total amount of ingredients you’ll need. However, you realize that some of the ingredients needed by some reactions are actually produced as a byproduct by other reactions you’re making. So, what’s the most efficient amount of stuff you’ll need? Mixes to the rescue!

my %potions =
    immortality  => (:oxium(6.98), :morphics(123.3),  :unobtainium(2)   ).Mix,
    invisibility => (:forma(9.85), :rubidium(56.3),   :unobtainium(−0.3)).Mix,
    strength     => (:forma(9.15), :rubidium(−30.3),  :kuva(0.3)        ).Mix,
    speed        => (:forma(1.35), :nano-spores(1.3), :kuva(1.3)        ).Mix;

say [] %potions.values;
# OUTPUT: mix(unobtainium(1.7), nano-spores(1.3), morphics(123.3),
#              forma(20.35), oxium(6.98), rubidium(26), kuva(1.6))

For convenience, we set up a Hash, with keys being names of potions and values being Mixes with quantities of ingredients. For reactions that produce one of the ingredients we seek, we’ve used negative weights, indicating the amount produced.

Then, we used the same set addition operator we saw earlier, in it’s meta form: [⊎]. We supply it the .values of our Hash that are our Mixes, and it happily adds up all of our ingredients, which we see in the output.

Look at unobtainium and rubidium: the set operator correctly accounted for the quantities produced by reactions where those ingredients have negative weights!

With immortality potion successfully mixed, all we need to do now is figure out what to do for the next few millennia… How about coding some Perl 6?

Day 6 – Perl 6 Books — the Time is Ripe

One question we occasionally get on the phenomenal #perl6 IRC channel is about Perl 6 books. It turns out, some folks don’t want to learn just from tutorials, blog posts and docs. Last year, we didn’t have any good answers to that question.

A few months ago, there seemed to be a flurry of such questions, and at the same time, rumours about new book projects started to spread.

If I remember correctly, the first rumour was when Laurent Rosenfeld contacted me in June, asking for feedback on a manuscript. He had translated a good 200 pages of the Think Python book to Perl 6. This is primarily a book teaching programming to beginners. Later I was happy to hear that a major publisher has accepted the manuscript. The manuscript is now finished, and under review. So I guess we’ll see e-book and print versions of this book in due time.

Then brian d foy opened a kickstarter to raise funds for a “Learning Perl 6” book. By the time this is published, the kickstarter project should still be open, so I warmly encourage you to support this. Having a book by a prominent member of the Perl 5 community, and author of several good Perl books would surely be a boon for the Perl 6 community.

Before the publication of brian’s project, I’ve been mulling about writing my own Perl 6 book. In the past, I’ve tried that once already, and failed. But I had more success with another book project of mine, so I decided to give it another shot. The working title is Perl 6 by example. Content for the book is published in form of blog posts, and later turned into book chapters.

Later I learned that Ken Youens-Clark had written a book on metagenomics that spends about 150 pages explaining Perl 6. And it’s free, you can download the PDF, epub and mobi versions right away! If you’ve followed the advent calendar, you might have read Ken’s earlier post on Bioinformatics

In summary, we have one book starring Perl 6 content, one in progress with the manuscript in stage “feature complete”, one in the kick-starting phase which you can support, and a fourth being written right now, with parts being pre-published in the form of blog posts.

If you want to keep up-to-date with news on Perl 6 books, you can sign up for the low volume Perl 6 book mailing list (less than one mail per month).

I hope that in next year’s advent calendar, we’ll see four reviews of Perl 6 books.

Day 5 – How to use the docs

…or why (nearly) everything is an object.

Spread out over all the pages of the official documentation of Perl 6, the class Signature is linked to more then 40 times. That shows not only that signatures are everywhere but also demonstrates a peculiarity of Perl 6 that will become more important as the language matures. No really, we are not done yet.

But first let’s have a look at some code.

multi sub trait_mod:<is> (Sub $s, :$foo!) is foo {
    say 'is foo called'
}
sub bar {}
&trait_mod:(&bar, :foo);

Here we define a trait that happens to be of type Sub which Rakudo (re)defines in Sub.pm. So there is actually a class for this build-in type. In fact, there are classes for anything besides classes and grammars. Having a Class for class would be a bit circular and prevent a slang to define its own object system. That you can do with Metamodel::ClassHOW, which is the Class behind class. But back to the trait.

A trait is somewhat macroish in nature as it is applied at compile time to some object its signature refers to. That allows the MMD to pick the candidate defined above when we use it with is foo. (There are a whopping 91 candidates of is defined in Rakudo.) Since it’s called at compile time it has to be called with something that exists at that time. All those classes for Sub, Label and friends provide a nice target for such a call. And yes, you can apply a trait to a label if you call it the right way in a BEGIN phaser. Combined with the power of compile time mixins and the the mighty .wrap you can go wild and improve the tool that is most dear to you — the Perl 6 compiler. If you observed the example carefully you likely spotted that the trait is applied to itself. If you really want to you could have infinite recursion at compile time.

Perl 6 is a dynamic dynamic language. The programs you write are dynamic and you can mend them and bend them to your will, both at compile- and runtime. And so you can do with the compiler. And that leads us to how to use the docs. Since language features are represented as a class, we link heavily to those classes to provide detailed discussions and you are encouraged to follow those links before you continue with the main paragraph the link refers from.

Macros will make good use of all those objects. With any luck next year will show us how.