Day 15 – Parsing Globs using Grammars

UPDATE 12/21/16: After some discussions here and at reddit, I realized I’d made a couple important mistakes. Sorry about that. I have corrected GlobAction and a bug in the glob helper subroutine. I also replaced the use of “AST” with “parse tree” below as well as that is more technically correct.

If you are at all familiar with Perl, you are likely to be familiar with regexes. In Perl 6, regexes have been extended to provide features and syntax allowing complex parsing tasks to be simplified and grouped together into grammars. To give a taste of this, let’s consider how to parse file globs. It’s simple, it’s well-defined, and I have some practice writing parsers for the task.

Introducing File Globs

In case you are not familiar with them, a file glob it a sort of pattern match for file names. For example, consider this glob:

*.txt

This will match any file ending in “.txt”. The asterisk (“*”) means “match as many characters as needed to make the match”. This matches “foo.txt” and “blah.blah.txt”, but not “z.html”.

Here’s another glob to consider:

.??*

This will match any file that starts with a period (“.”) followed by 2 or more characters. That means it will match a file named “.zshrc”, but not one named “..”, making it a popular heuristic for finding “hidden” files on a Unix file system.

Basic Grammar

What does a basic glob parsing grammar look like? Let me show a simple one and then I will explain the interesting bits:

grammar Glob {
    token TOP { + }
    token term {
        || 
        || 
    }
    token match {  |  }
    token match-any { '*' }
    token match-one { '?' }
    token char { . }
}

This is just about the simplest possible grammar for a glob that covers the two most popular forms. Let’s consider the parts.

A grammar is just a sort of class. Grammars may contain rules and tokens. Rules and tokens are like methods, but which are defined as a regex and are used to match the input to the grammar. Here we define a grammar named “Glob” to parse our glob.

Every grammar may define a special rule or token named “TOP”. This becomes the default regex to use when matching input with that grammar. Without it, any attempt to parse with the grammar must name the rule or token to begin with.

Aside from that each rule and token defines a regex to use when matching the input. For example, the match-any token here will match only an asterisk (“*”). Similarly, the char token will match any single character.

To be useful as a grammar, rules and tokens refer to one another using angle brackets (“<>”). Here, TOP contains one or more term matches, term matches either a match or a char (using || to try each match in the order given).

Whitespace Handling

Why do I only use token keywords and never make use of the rule? Because of whitespace. A rule inserts some implicit whitespace handling, which is helpful in keeping parsers readable. However, this handling is not helpful when parsing file globs.

Consider the following rule and token:

token { 'a' 'b'+ }
rule  { 'a' 'b'+ }

The token will match “abbbbb” and “ab” but not “a b” or “a bbbb”. The rule, however, will not match “abbbbb” or “ab” but will match “a b” and “a bbbb”. This is the whitespace handling at work. The rule above is implicitly similar to this token:

token { 'a'  'b'+  }

That ws is a predefined token available in all grammars. The built-in definition implies there is a word-break and zero or more spaces, newlines, or other whitespace.

Depending on what works better for you, you may redefine ws if you prefer something different:

token ws { "\n" } # whitespace is always newlines

Or you may just use tokens to avoid the implicit whitespace handling. This latter solution is what I believe makes the most sense for this grammar.

Parsing Input

Now that we have our basic grammar, how do we parse the input? Here’s a simple example:

my $ast = Glob.parse("*.?");
dd $ast;
# OUTPUT is an ugly, long AST
# Match $ast = Match.new(ast => Any, list => (), hash =>
# Map.new((:term([Match.new(ast => Any, list => (), hash =>
# Map.new((:match(Match.new(ast => Any, list => (), hash => Map.new(()), 
# orig => "*.?", to => 1, from => 0)))), orig => "*.?", to => 1, from => 
# 0), Match.new(ast => Any, list => (), hash => Map.new((:char(Match.new(
# ast => Any, list => (), hash => Map.new(()), orig => "*.?", to => 2, 
# from => 1)))), orig => "*.?", to => 2, from => 1), Match.new(ast => Any, 
# list => (), hash => Map.new((:match(Match.new(ast => Any, list => (), 
# hash => Map.new(()), orig => "*.?", to => 3, from => 2)))), orig => 
# "*.?", to => 3, from => 2)]))), orig => "*.?", to => 3, from => 0) 

As we get output, the parse succeeded. Given how accepting our grammar is, everything but an empty string ought to succeed. If a parser fails, the result of parse is an undefined Any.

If you’ve used regexes in Perl 6 at all, this ASTparse tree should look very familiar. Grammars are really nothing more than a little bit of extra structure around regexes. The ASTsparse trees they generate are just regex Match objects that may have additional matches nested within.

To do something useful with this ASTparse tree, you have some options. You could write a tree walker that will go through the ASTparse tree and do something interesting. However, Perl grammars provide a built-in mechanism to do that while in the middle of the parse called an action class. Let’s use an action class instead.

Action Classes

An action class is a regular class that can be paired to a grammar to perform actions based upon each rule or token matched. The methods in the action class will mirror the names of the rules and tokens in the grammar and be called for each.

Here is an action class that is able to translate the parsed glob into a matcher class:

class GlobAction {
    method TOP($/) { make GlobMatcher.new(terms => $».made) }
    method term($/) { make $/.values».made.first }
    method match($/) { make $/.values».made.first }
    method match-any($/) { make '*' }
    method match-one($/) { make '?' }
    method char($/) { make ~$/ }
}

Each method will receive a single argument which is the match object. If the argument list is not given, the implicit match variable $/ is used instead (here we make that explicit). The action class can perform any action in response to the match it wants. Often, this involves rewriting the return value of the match by using the make command. This sets the value returned by the .made method on the match (as you can see is used in term and match).

To put this action class into use, you pass the action class during parsing like so:

my $matcher = Glob.parse("*.txt", :actions(GlobAction.new)).made;
for  -> $f {
    say "$f matches" if $f ~~ $matcher;
}

Now all we need is the magic defined within GlobMatcher, which does the real work of testing whether a given file name matches the parsed glob.

Putting it all together

Here is an implementation of matching. It uses the ACCEPTS method to define a smartmatch operation that can be applied to any defined string.

class GlobMatcher {
    has @.terms;

    method ACCEPTS(Str:D $s) {
        my @backtracks = [ \([ $s.comb ], [ @!terms ]) ];
        BACKTRACK: while @backtracks.pop -> (@letters, @terms) {
            LETTER: while @letters.shift -> $c {
                last LETTER unless @terms;

                my $this-term = @terms.shift;
                my $next-term = @terms[0];

                given $this-term {
                    when '*' {
                        # Continue to the next term if we can
                        if @terms 
                        and $next-term ne '*' | '?' 
                        and $c eq $next-term {
                            push @backtracks, 
                                ([ @letters ], [ '*', |@terms ]);
                            redo;
                        }

                        # Match anything, and try again next round
                        unshift @terms, $this-term;
                    }

                    # We have a letter, so we match!
                    when '?' { }

                    # Only match exactly
                    default {
                        # If not an exact match, we fail; 
                        # try again if we can
                        next BACKTRACK if $c ne $this-term;
                    }
                }
            }

            # If we matched everything, we succeed
            return True unless @terms;

            # Otherwise, try the next backtrack, if any
        }

        # We ran out of back tracks, so we fail
        False;
    }
}

This algorithm is kind of ugly and might be prettier if I’d implemented it via a DFA or translated to regular expressions or something, but it works.

Typing out the code to parse is a bit tedious, so we’ll build ourselves a little helper to make it cleaner looking:

sub glob($glob, :$globlang = Glob) { 
    $globlang.parse($glob, :actions(GlobAction.new)).made; 
}

for  -> $f {
    next unless $f ~~ glob('*.txt');
    say "$f matches";
}

We could even use it in a nice grep:

.grep(glob('*.txt')).say;

Extending the Grammar

Now that we have a basic working grammar, let’s say we want to expand it to support a variant based on ANSI/SO SQL’s LIKE operator. It is very much like a glob, but uses percent (“%”) in place of asterisk (“*”) and underscore (“_”) in place of question mark (“?”). We could implement this as a brand new grammar, but let’s make our new grammar a sub-class instead, so we can avoid duplicating work.

grammar SQLGlob is Glob {
    token match-any { '%' }
    token match-one { '_' }
}

.grep(
    glob('%.txt', :globlang(SQLGlob))
).say;

That was too easy.

Expanding the Grammar

From here, we might also try to expand the grammar to handle character classes or curly-expansions:

[abc]*.{txt,html}

In BSD-like globs, the above will match any file starting with “a” or “b” or “c” that also ends with “.txt” or “.html”. In which case, we might consider using a proto to make it easy to expand the kinds of matches we allow. A full demonstration of this is beyond the scope of this post, but it could look something like this:

class Glob {
    ...

    token term {
        || 
        || 
        || 
    }
    proto token expansion { * }
    proto token match { * }

    ...
}

class BasicGlob is Glob {
    token match:any { '*' }
    token match:one { '?' }
}

class BSDGlob is BasicGlob {
    token expansion:list { '{' + % ',' '}' }
    token match:char-class { '['  ']' }
}

class SQLGlob is Glob {
    token match:any { '%' }
    token match:one { '_' }
}

By using a proto token or proto rule we gain a nice way of handling alternative matches within grammars. This allows us to create variety of grammar variants that don’t repeat themselves. It also gives us the flexibility to allow developers using your grammars to subclass and modify them to suit their custom needs without monkey patching.

A working example of something like this can be found in the source for IO::Glob. (Though, much of that having been written long enough ago that I’m not sure it is the best possible example of what to do, but it is an example.)

Grammars are a really nifty way of grouping a set of regexes together to build a complete parser as a unit. They are easy to read, provide tools that allow you to avoid repeating yourself, and I find them to be an enjoyable tool to play with and use in Perl 6.

Cheers.

Day 14 – Targetting MoarVM, the Wrong Way

MoarVM is a virtual machine specifically designed to be a backend for the NQP compiler toolchain in general and the Rakudo Perl 6 compiler in particular.

It is not restricted to running Perl 6, though, and if anyone wants to implement their own language on top of it, Jonathan has been kind enough to provide free course material that walks you through the process. In particular, the code examples for PHPish and Rubyish are worth a quick look to see how things are supposed to work.

However, where there’s a Right Way of doing things, there’s also a Wrong Way, and that’s what we’re gonna look at today!

Generating Bytecode

MoarVM bytecode is generated from MAST trees, defined in lib/MAST/Nodes.nqp of your MoarVM checkout. The file states:

# This file contains a set of nodes that are compiled into MoarVM
# bytecode. These nodes constitute the official high-level interface
# to the VM. At some point, the bytecode itself will be declared
# official also. Note that no text-based mapping to/from these nodes
# will ever be official, however.

This has historical reasons: Parrot, the VM that Rakudo used to target, had an unhealthy overreliance on its textual intermediate representation PIR. Personally, I think it is a good idea to have some semi-official text-based bytecode representation – you just shouldn’t use it as the exchange format between compilation stages.

That’s where doing things the Wrong Way come in: During the last two weeks, I’ve started writing an assembler targetting MAST and a compiler for a tiny low-level language targetting this assembly dialect, doing exactly what I just told you not to do.

Why did I? What I hope to accomplish eventually is providing a bootstrapped alternative to the NQP toolchain, and you have to start your bootstrapping process somewhere.

Currently, only a few bits and pieces have been implemented, but these bits and pieces are somewhat functional and you can do such useful things as echo input from stdin to stdout:

$ cat t/echo.tiny
fn main() {
    obj stdin = getstdin
    do {
        str line = readline stdin
        int len = chars line
        done unless len
        print line
        redo
    }
    exit 0
}

You can either run the code directly

$ ./moartl0 --run t/echo.tiny

compile it first

$ ./moartl0 --compile t/echo.tiny

$ moar t/echo.moarvm

or take a look at the generated assembly

$ ./moartl0 --dump t/echo.tiny
.hll tiny
.frame main
.label bra0_main
    .var obj v0_stdin
    getstdin $v0_stdin
.label bra1_do
    .var str v1_line
    readline_fh $v1_line $v0_stdin
    .var int v2_len
    chars $v2_len $v1_line
    unless_i $v2_len @ket1_do
    print $v1_line
    goto @bra1_do
.label ket1_do
    .var int i0
    const_i64 $i0 0
    exit $i0
.label ket0_main
# ok

There isn’t really anything fancy going on here: Text goes in, text goes out, we can explain that.

Note that the assembly language is not yet finalized, but so far I’ve opted for a minimalistic syntax that has VM instructions separated from its operands by whitespace and accompanied by assembler directives prefixed with a ..

Under the Hood

If you were to look at the source code of the compiler (as we probably should – this is supposed to be the Perl 6 advent calendar, after all), you might discover some useful idiom likes using a proto declaration

proto MAIN(|) {
    CATCH {
        ... # handle errors
        exit 1;
    }

    ... # preprocess @*ARGS
    {*}
}

to accompany our multi MAIN subs that define the command line interface.

However, you would also come across things that might not necessarily be considered best practice.

For one, the compiler is not reentrant: In general, we’re supposed to pass state along the call chain either in the form of arguments (the implicit self parameter of methods is a special case of that) or possibly as dynamic variables. When writing compilers specifically, the latter tend to be useful to implement recursive declarations like nested lexical scopes: a lexical frame of the target language will correspond to a dynamic frame of the parser. If you don’t care about reentrancy, though, you can just go with global variabes and use the temp prefix to achieve the same result.

For another, the compiler also doesn’t use grammars, but instead, the body of the line-based parsing loop is a single regex, essentially

# next-line keeps track of line numbering and trims the string
while ($_ := next-line) !=:= IterationEnd { /^[
    | ['#'|$]                       # ignore comments and empty lines
    | (:s ld (\w+)'()' '{'${ ... }) # start of load frame definition
    | (:s fn (\w+)${ ... })         # forward declaration of a function
    | ...                           # more statements
    || {bailout}
]/ }

The blocks { ... } represent the actions that have been embedded into the regex after $ anchors terminating each line.

That’s not really a style of programming I’d be comfortable advocating for in general – but Perl being Perl, There’s More Than One Way to Do It: For better or worse, Perl 6 gives programmers a lot of freedom to structure code how they see fit. As the stage 0 compiler is supposed to be supplanted anyway, I decided to have some fun instead of crafting a proper architecture.

In comparison, the assembler implemented in NQP is far more vanilla, with state held by an actions object.

But… Why?

The grinches among you may ask, What is this even doing here? Is this just someone’s personal side project that just happens to be written in Perl 6, of no greater use to the community at large?

Well, potentially, but not necessarily:

First, I do plan on writing a disassembler for MoarVM bytecode, and that may come in handy for bug hunting, testing or when looking for optimization opportunities.

Second, when running on MoarVM, Perl 6 code may load and interact with compilation units written in our tiny language or even hand-optimized VM assembly. The benefit over something like NativeCall is that we never leave the VM sandbox, and in contrast to foreign code that has to be treated as black boxes, the JIT compiler will be able to do its thing.

Third, an expansion of the MoarVM ecosystem might attract the attention of language enthusiasts beyond the Perl community, with at least the chance that MoarVM could deliver on what Parrot promised.

However, for now that’s all just idle speculation – remember, all there is right now is a two weeks old toy I came up with when looking for something to write about for this advent calendar. It’s a very real possibility that this project will die a quiet death before amounting to anything. But on the off chance it does not, it’s nice to have a hot cup of the preferred beverage of your choice and dream about a future where MoarVM rises as a butterfly-winged phoenix from the ashes of a dead parrot….

Day 13 – Audio Streaming done Completely Wrong

Starting out

I made the audio streaming source client Audio::Libshout about a year and a half ago, it works quite well: with the speed improvements in Rakudo I have been able to stream 320kb/s MP3 without a problem, but it always annoyed me that it was difficult to test properly and even to test it at all required an Icecast server. Even with an icecast server that I could stream to, it would be necessary to actually listen to the stream in order to determine whether the stream was actually functioning correctly.

This all somewhat came to a head earlier in the year when I discovered that even the somewhat rudimentary tests that I had been using for the asynchronous streaming support had failed to detect that the stream wasn’t being fed at all. What I really needed was a sort of dumb streaming server that could act in place of the real icecast and could be instrumented to determine whether a connection was being made and that the correct audio data was being received. How hard could it be? After all it was just a special kind of web server.

I should have known from experience that this was a slippery slope, but hey.

A little aside about Audio Streaming

An icecast streaming server is basically an HTTP server that feeds a continuous stream of encoded audio data to the client listeners who connect with an HTTP GET request, the data to be streamed is typically provided by a source client which will connect to the server, probably authenticate using HTTP Authentication, and start sending the data at a steady rate that is proportional to the bitrate of the stream. libshout connects with a custom request method of SOURCE which is inherited from its earlier shoutcast origins, though icecast itself understands PUT as well for the source client. Because it is the responsibility of the listening client to supply the decoded audio data to the soundcard at exactly the right rate and the encoded data contains the bitrate of the stream as transmitted from the source the timing demands on the server are not too rigorous: it just has to be consistent and fast enough that a buffer on the client can be kept sufficiently full to supply the audio data to the sound card. Icecast does a little more in detail to, for instance, adjust for clients that don’t seem to be reading fast enough and so forth, but in principle it’s all relatively simple.

Putting it together

As might be obvious by now, an audio streaming server differs from a typical HTTP server in that rather than serving some content from disk or generated by the program itself for example, it needs to share data received on one client connection with one or more other client connections. In the simplest C implementation one might have a set of shared buffers, one of which is being populated from the source connection at any given time, whilst the others are being consumed by the client connections, alternating on filling and depletion. Whether the implementation settles on a non-blocking or threaded source possibly the most critical part of the code will be the synchronisation between the source writer and the client readers to ensure that a buffer is not being read from and written to at the same time.

In Perl 6 of course you’d hope you didn’t need to worry about these kind of annoying details as there are well thought out concurrency features that abstract away most of the nasty details.

From one to many

Perhaps the simplest program that illustrates how easy this might be would be this standalone version of the old Unix service chargen :

use v6.c;

my $supplier = Supplier.new;

start {
    loop {
        for ( 33 ... 126 ).map( { Buf.new($_) }) -> $c {
            sleep 0.05;
            $supplier.emit($c);
        }
    }
}

my $sig = signal(SIGPIPE).tap( -> $v {
    $sig.close;
});

react {
    whenever IO::Socket::Async.listen('localhost', 3333) -> $conn {
        my $p = Promise.new;
        CATCH {
            when /'broken pipe'/ {
                if $p.status !~~ Kept {
                    $p.keep: "done";
                }
            }
        }

        my $write = $supplier.Supply.tap( -> $v {
            if $p.status ~~ Planned {
                $conn.write: $v;
            }
            else {
                $write.close;
            }
        });
    }
}

In this the Supplier is being fed asynchronously to stand in for the possible source client in our streaming server, and each client that connects on port 3333 will receive the same stream of characters – if you connect with two clients (telnet or netcat for instance,) you will see they are getting the same data at roughly the same time.

The Supplier provides a shared sequence of data of which there can be many consumers, so each connection provided by the IO::Socket::Async will be fed the data emitted to the Supplier starting at the point the client connected.

The CATCH here is to deal with the client disconnecting, as the first our code will know about this is when we try to write to the connection, we’re not expecting any input from the client we can check and besides the characters becoming available to write may happen sooner than the attempt to read may register the close, so, while it may seem like a bit of a hack, it’s the most reliable and simple way of doing this: protecting from further writes with a Promise. If, by way of experiment, you were to omit the CATCH you would find that the server would quit without warning the first time the first client disconnected.

I’ll gloss over the signal outside the react as that only seems necessary in the case where we didn’t get any input data on the first connection.

Making something nearly useful

The above example is almost all we need to make something that you might be able to use, all we need is for it to handle an HTTP connection from the clients and get a source of actual MP3 data into the Supply and we’re good. To handle the HTTP parts we’ll just use the handy HTTP::Server::Tiny which will conveniently take a Supply that will feed the output data, so in fact we end up with quite a tiny program:

use HTTP::Server::Tiny;

sub MAIN(Str $file) {

    my $supplier = Supplier.new;

    my $handle = $file.IO.open(:bin);

    my $control-promise = start {
        while !$handle.eof {
            my $c = $handle.read(1024);
            $supplier.emit($c);
        }
    }

    sub stream(%env) {
        return 200, [ Content-Type => 'audio/mpeg', Pragma => 'no-cache', icy-name => 'My First Streaming Server'], $supplier.Supply;
    }
    HTTP::Server::Tiny.new(port => 3333).run(&stream);
}

The HTTP::Server::Tiny will run the stream subroutine for every request and we need to do is return the status code, some headers, and a supply from which the output data will be read, the client connection will be closed when the Supply is done (that is when the done method is called on the source Supplier.) It couldn’t really be much more
simple.

Just start the program with the path to a file containing MP3 audio and then point your favourite streaming client at port 3333 on your localhost and you should get the stream, I say should as it makes no attempt to regulate the rate at which the audio data is fed to the client. But for constant bit-rate MP3 data and a client that will buffer as much as it
can get it works.

Of course a real streaming server would read the data frame by frame and adjust the rate according to the bit-rate of each frame. I actually have made (but not yet released,) Audio::Format::MP3::Frame to help do this, but it would be over-kill for this example.

Relaying a stream

Of course the original intent of this streaming server was to be able to test a streaming source client, so we are going to have to add another part that will recognise a source connection, read from that and relay it to the normal clients in a similar way to the above.

You’ll recall that the libshout client library will connect with a request method of SOURCE so we can adjust the file streaming example to identify the source connect and feed the Supplier with the data read from the connection:

use HTTP::Server::Tiny;

sub MAIN() {

    my $supplier = Supplier.new;

    my Bool $got-source = False;

    sub stream(%env) {
        if %env<REQUEST_METHOD> eq 'GET' {
            if $got-source {
                return 200, [ Content-Type => 'audio/mpeg', Pragma => 'no-cache', icy-name => 'My First Streaming Server'], $supplier.Supply;
            }
            else {
                return 404, [ Content-Type => 'text/plain'], "no stream connected";
            }
        }
        elsif %env<REQUEST_METHOD> eq 'SOURCE' {
            my $connection = %env<p6sgix.io>;

            my $finish-promise = Promise.new;

            $connection.Supply(:bin).tap(-> $v {
                $supplier.emit($v);
            }, done => -> { $finish-promise.keep: "done" });

            $got-source = True;
            return 200, [ Content-Type => 'audio/mpeg' ], supply { whenever $finish-promise { done; } };
        }
    }
    HTTP::Server::Tiny.new(port => 3333).run(&stream);
}

You’ll see immediately that nearly all of the action is happening in the request handler subroutine: the GET branch is almost unchanged from the previous example (except that it will bail with a 404 if the source isn’t connected,) the SOURCE branch replaces the reading of the file previously. HTTP::Server::Tiny makes reading the streamed data from the source client really easy as it provides the connected IO::Socket::Async to the handler in the p6sgix.io (which I understand was originally primarily to support the WebSocket module,) theSupply of which is tapped to feed the shared Supplier that
conveys the audio data to the clients. All else that is necessary is to return with a Supply that is not intended to actually provide any data but just to close when the client closes their connection.

Now all you have to do is run the script and feed it with some source client like the following:

use Audio::Libshout;

multi sub MAIN(Str $file, Int $port = 3333, Str $password = 'hackme', Str $mount = '/foo') {

    my $shout = Audio::Libshout.new(:$port, :$password, :$mount, format => Audio::Libshout::Format::MP3);
    $shout.open;
    my $fh = $file.IO.open(:bin);

    while not $fh.eof {
        my $buf = $fh.read(4096);
        say $buf.elems;
        $shout.send($buf);
        $shout.sync;
    }

    $fh.close;
    $shout.close;
}

(Providing the path to some MP3 file again,) then if you connect a streaming audio player you will be getting some audio again.

You might notice that the script can take a password and a mount which aren’t used in this case, this is because Audio::Libshout requires them and also because this is basically the script that I have been using to test streaming servers for the last year or so.

Surprisingly this tiny streaming server works quite well, in testing I found that it ran out of threads before it got too bogged down handling the streams with multiple clients, showing how relatively efficient and well thought out the Perl 6 asynchronous model is. And how simple it is to put together a program that would probably require a lot more
code in many other languages.

Where do we go from here

I’m pretty sure that you wouldn’t want to use this code to serve up a popular radio station, but it would definitely be sufficient for my original testing purposes with a little additional instrumentation.

Of course I couldn’t just stop there so I worked up the as yet unreleased Audio::StreamThing which uses the same basic design with shared supplies, but works more like Icecast in having multiple mounts for individual streams, provision for
authentication and better exception handling.

If you’d find it useful I might even release it.

Postscript

I’d just like to get a mention in for the fabulous DJ Mike Stern as I always use his recorded
sets for testing this kind of stuff for some reason.

Have fun and make more noise.

Day 12 – Avoiding User Namespace Pollution with Perl 6 Modules

A bit of trickery is needed

As Perl 5 module authors, we were able to allow selective exporting easily by use of the @EXPORT_OK array where we listed all the objects that the user of our module could import by name, thereby giving the user full control over possible namespace clashes. However, in spite of all the new goodies provided by Perl 6, that feature is not yet available to us Perl 6 module authors. (For more information see issue #127305 on https://rt.perl.org.)

But the good news is there is a way to get the same effect, albeit with a bit more effort. The trick is in using the full power of tags with the export trait available with every object in a module.

One might get a little glassy-eyed when reading the Perl 6 docs concerning module import and export, of which a portion of code is shown here:

unit module MyModule;
my package EXPORT::DEFAULT {
    our sub foo { ... }
}
my package EXPORT::other {
    our sub bar { ... }
}

where each object to be exported is enclosed in a named export category block. That method has serious limitations (in addition to requiring another set of child blocks):

  • the objects cannot appear in more than one grouping and
  • the objects cannot be assigned other tags via the is export adjective on individual objects.

Consequently, to emulate Perl 5’s @EXPORT_OK array, each object would have to be wrapped in a unique block and could not have multiple tags. But then the docs say that all that ugly package exporting code can be replaced by this much simpler code for each item to be exported:

sub foo is export { ... }
sub bar is export(:other) { ... }

And therein is our solution: We will use export tags to (1) allow the user to import only what he or she desires to import or (2) import everything. The module author can also be helpful by adding other tags which will allow importing sets of objects (but such sets are not always so easily determined in practice).

Note one restriction before we continue: only valid identifier names are allowed in the tags, so we can’t use sigils or other special characters as we could in Perl 5.

For an example we use a simple, but unusual, Perl 6 module with multiple objects to be exported (and all have the same name, probably not a good practice, but handy for demonstration of the disambiguation problem):

unit module Foo;
sub bar() is export { say "bar" }
constant $bar is export = 99;
constant %bar is export = (a => 1);
constant @bar is export = <a b>;

We can then have access to all of those objects by merely using the module in a program file:

use Foo;
bar;
say $bar;
say %bar;
say @bar;

Executing it yields:

bar
99
{a => 1}
[a b]

as expected. Now let’s modify the module by adding an export tag to the bar routine:

sub bar() is export(:bar) { say "bar" }

and then, when our program is executed, we get:

===SO6RRY!=== Error while compiling...
Undeclared routine:
    bar used at line 7. Did you mean 'VAR', 'bag'?

so we modify our use line in the program to:

use Foo :bar;

and execute the program again to get:

===SORRY!=== Error while compiling...
Variable '$bar' is not declared. Did you mean '&bar'?

It seems that as soon as we, the users, restrict the use statement with a tag, the non-tagged objects are not available! Now we, the authors, have two choices:

  • tag all objects with the same tag or
  • tag them with separate tags.

If we tag them the same, then all will be available—which is probably not a problem. However, that would defeat our goal of having unique tags.

If we name them with unique tags, we will need some way to distinguish them (remember, we can’t use their sigils), which leads us to a possible convention we show here in the final module:

unit module Foo;
sub bar() is export(:bar :SUB) { say "bar" }
constant $bar is export(:bar-s :CONST) = 99;
constant %bar is export(:bar-h :CONST) = (a => 1);
constant @bar is export(:bar-a :CONST) = <a b>;

Notice the suffixes on the non-subroutine objects (all constants) have their kind indicated by the ‘-X’ where the ‘X’ is ‘s’ for scalar, ‘h’ for hash, and ‘a’ for array. We have also added an additional tag which groups the objects into the general categories of subroutines or constants so the user could import, say, just the constants as a set, e.g, use Foo :CONST;.

Now we just change the use line to

use Foo :bar, :bar-s, :bar-h, :bar-a;

and get

bar
99
{a => 1}
[a b]

again.

Thus the good news is we can add a tag to the export trait that is the same as the name of the subroutine, but the sad news is we can’t use the appropriate sigil as in Perl 5 to disambiguate among objects with the same name. The solution to that is to use some convention as demonstrated above (akin to the ugly Hungarian notation in C) that will have the same effect.

Of course in this somewhat-contrived example, the user could have imported all the objects at once by using the special, automatically-defined tag `:ALL:

use Foo :ALL;

but the author has provided users of the module complete flexibility in its application for them.

Conclusion

We now have a way to protect the user’s namespace by requiring him or her to selectively import objects by “name” (or perhaps other subsets of objects) unless the user chooses to import everything. The only downsides I see are:

  • the extra effort required by the module author to explicitly tag all exportable objects and
  • the restriction on selecting an appropriate tag for different objects with the same name.

I love Perl 6, and the old Perl motto TIMTOWTDI still holds true as we’ve just seen!

Merry Christmas! // Happy Holidays! I hope you will enjoy the gift of Perl 6 throughout the coming year!


Note: Thanks to Moritz Lenz (IRC #perl6: user moritz) for constructive comments about the bad practice of exporting variables—hence the example now exports constants instead.

 

Day 11 — Perl 6 Core Hacking: It Slipped Through The QASTs

One of the great things about Perl 6 is how accessible to regular users the compiler is to hack on. Easy bugs require nothing but knowledge of Perl 6 itself, since a lot of the Rakudo compiler is written in Perl 6. Slightly tougher bugs require knowledge of NQP, and tougher still are bugs involving Grammar and Actions. Things progress further in difficulty from there, going as far as assembly hacking on VM level, but today, we’ll stick around in Rakudo land. We have a task at hand!

Santa is having some difficulties generating his Naughty-or-Nice list due to a bug in Perl 6. He traced it down to the use of the S/// substitution operator with :g modifier that for some reason returns an empty list instead of the original string if no matches were made:

say S:g/naughty// with 'only nice list';
# OUTPUT: ()

Time to dig in and fix this, if anyone is to get any presents this year!

The Bots To The Rescue

The first thing to do when fixing a bug is to find out when it first appeared. AlexDaniel++ and MasterDuke++ implemented several IRC bots that make this task extremely simple. They are available in #perl6 and #perl6-dev IRC channels, and you can play with them in #zofbot IRC channel, without annoying anyone. We’ll be using bisectable6 bot to find out when the S/// operator got broken:

<Zoffix> bisectable6, help
<bisectable6> Zoffix, Like this: bisectable6: old=2015.12 new=HEAD exit
    1 if (^∞).grep({ last })[5] // 0 == 4 # RT128181

Just give the bot a piece of code, optionally specifying the starting and ending commits, and it’ll bisect by either the exit code, or failing at that, by output.

<Zoffix> bisectable6, S:g/d// given 'abc'
<bisectable6> Zoffix, Bisecting by output (old=2016.10 new=524368c)
    because on both starting points the exit code is 0
<bisectable6> Zoffix, bisect log:
    https://gist.github .com/c2cf9c3a7b6d13a43c34f64b96090e31
<bisectable6> Zoffix, (2016-10-23)
    https://github  .com/rakudo/rakudo/commit/b7201a8f22338a906f2d8027a21387e8f5c77f41

The last link is the interesting bit, it tells us the S:g/// was working fine until that commit. The commit does seem related—it’s the refactor lizmat++ did to make .match 150%–1400% faster—but it’s quite big and it’s not obvious how it’s linked to the workings of the S/// operator. Let’s find out, shall we?

How Do You Spell That?

We can specify the --target command line argument to perl6 executable to ask it for the output of a particular stage of the program (run perl6 --statestats -e '' to see names of all stages). Let’s output the parse stage, to find out which tokens in the Grammar we should look into:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 --target=parse -e 'S:g/d//'
- statementlist: S:g/d//
  - statement: 1 matches
    - EXPR: S:g/d//
      - value: S:g/d//
        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g
          - sibble: /d//
            - right:
            - babble:
              - B:
            - left: d
              - termseq: d
                - termaltseq: d
                  - termconjseq: 1 matches
                    - termalt: 1 matches
                      - termconj: 1 matches
                        - termish: 1 matches
                          - noun: 1 matches
                            - atom: d

There are some general tokens in the output—such as statementlist, statement, EXPR, and value—we can just gloss over. They are about statements and we want stuff for the operator itself, so the interesting bit start with this:

        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g

Let’s pop open the Grammar in our text editor and locate a token called quote. It can also be a rule, regex or method, but tokens are most common. The first thing we can locate is this:

proto token quote { <...> }
token quote:sym<apos>  {
    :dba('single quotes') "'" ~ "'"    
    <nibble(self.quote_lang(%*LANG<Quote>, "'", "'", ['q']))>
}
token quote:sym<sapos> {
    :dba('curly single quotes') "‘" ~ "’"
    <nibble(self.quote_lang(%*LANG<Quote>, "", "", ['q']))>
}

The Grammar that parses Perl 6 isn’t much different from the grammar you’d use as a user of Perl 6, so most of it probably looks familiar to you. The quote token is a proto regex, so looking further at the output --target=parse gave us, we see we need :sym<S> variant of it.

Scrolling a bit through the quote‘s candidates, we finally come across :sym<s> that sets the <sym> capture to either s or S:

token quote:sym<s> {
    <sym=[Ss]> (s)**0..1
    :my %*RX;
    :my $*INTERPOLATE := 1;
    :my $*SUBST_LHS_BLOCK;
    :my $*SUBST_RHS_BLOCK;
    {
        %*RX<s> := 1 if $/[0]
    }
    <.qok($/)>
    <rx_adverbs>
    <sibble(%*RX<P5> ?? %*LANG<P5Regex> !! %*LANG<Regex>, %*LANG<Quote>, ['qq'])>
    [ <?{ $<sibble><infixish> }> || <.old_rx_mods>? ]

}

So this token handles both s/// and S/// operators and its body is unimpressive: it seems to be no more than than some set up work. With the name of the token in hand, we now know what to look for in the Actions: method quote:sym<s>.

While finding it in actions is easy… it’s quite a biggie, with 177 lines of code to its name. However, someone nice left us a comment that fits into our puzzle:

method quote:sym<s>($/) {
    # We are emulating Str.subst/subst-mutate here, by calling match,
    # assigning the result to a temporary variable etc.
    ...

Recall bisectable6‘s results? The commit it pointed out was work on the .match method and according to the comment for S/// operator, it uses .match to do its stuff. Let’s execute that method on builds before and after the commit bisectable6 found for us. There’s another handy bot to do that for us: commitable6.

Give it a commit SHA or one of the release tags along with code to run and it’ll give you output for that code on that commit:

<Zoffix> committable6, 2016.11 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.11»: ()

<Zoffix> committable6, 2016.10 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

We ran the code on 2016.11 and 2016.10 releases and the output indicates there’s no difference… or is there? The problem with using say as a debugging tool is it often omits things we may find useful. A better alternative is the dd routine that is a Rakudo-specific utility dumper sub that’s not part of standard Perl 6 language. Give it some args and it’ll dump them out. Let’s give it a spin:

<Zoffix> committable6, 2016.11 dd 'abc'.match: :g, /d/
<comittable6> Zoffix, ¦«2016.11»: slip()

<Zoffix> committable6, 2016.10 dd 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

Aha! Another puzzle piece! When :g adverb is in use, on failed matches .match used to return an empty list, but after lizmat++’s .match improvements, it started to return Empty, which is an empty Slip. Slips tend to flatten themselves out into the outer container, so perhaps that’s causing an issue in the S///‘s action method? Let’s take a closer look at it.

Slippety Slip

A bird’s-eye view of method quote:sym<s> action shows it does some setup work and then codegens a QAST (“Q” Abstract Syntax Tree). It’d be helpful to take a look at what it generates.

One method of doing so is using the same --target feature we’ve used to get the parse stage, except we’d use the ast stage. So the command would be this:

perl6 --target=ast -e 'S:g/d//'

If you actually run that, you’ll get a text wall of QAST, and it may be tough to spot which are the bits actually generated by the S/// operator. Luckily, there’s a better way! The QAST node objects have .dump method that dumps them the same style as what you see in --target=ast output. So checkout the compiler’s repo if you haven’t already done so, pop open src/Perl6/Actions.nqp file, go to the end of method quote:sym<s> and stick note($past.dump) in there to print the dump of the QAST generated for the S/// operator:

    ...
    );
    $past.annotate('is_S', $<sym> eq 'S');
    note($past.dump); # <----------------------- like that
    make WANTED($past, 's///');  # never carp about s/// in sink context
}

(Why is it called $past and not $qast? Historical reasons: QAST used to be PAST, for Parrot Abstract Syntax Tree).

Now, compile Rakudo:

perl Configure.pl --gen-moar --gen-nqp --backends=moar
make
make test
make install

And execute our buggy S/// match to make the line we added print out S///‘s QAST:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'S:g/d// given "abc"'
- QAST::Op(locallifetime)  :is_S<?> S:g/d//
  - QAST::Stmt
    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)
    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)
    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)
      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>
    - QAST::Stmt
    - QAST::Var(lexical $/)

There are docs for types of QAST you see here, or we can just wing it.

We callmethod match and bind the result to subst_result_1:

    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)

We call nqp::p6store (p6* ops are documented in Rakudo’s repo), giving it the result of infix:<,>($/) as container and the return of .match call as value:

    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)

We check if anything matched (for :g matches, we check for a Positional that has any .elems in it):

    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)

If we did have matches, call Str!APPLY-MATCHES:

      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)

If we didn’t have matches, call nqp::p6store, storing the $_ (this is our original string S/// works on) in the $/:

      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>

Since we know the commit bisectable6 found makes .match return an empty slip for failed matches, it’s that last bit of QAST that should look suspicious, since slips flatten themselves out. We’ll return to why we’re storing into an &infix:<,>($/) rather than into $/ directly, but first, let’s write the NQP equivalent of such a setup.

We have two variables: $/ with Empty and $_ with our original string. The QAST::Op node maps out to an nqp op with the same name, so our suspicious bit looks something like this:

use nqp;

$_ = 'abc';
$/ = Empty;
nqp::p6store( &infix:<,>($/), $_);

Yet another helpful bot, camelia, lets us run a piece of code straight from IRC. Just use trigger m: with some code. Let’s try it out:

<Zoffix> m: use nqp; $_ = 'abc'; $/ = Empty;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Slip $/ = slip$()␤»

<Zoffix> m: use nqp; $_ = 'abc'; $/ = List.new;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Str $/ = "abc"␤»

The results show that when $/ is an Empty, it ends up still being it after the p6store, while if $/ is an empty List, it happily takes a string. We finally connected the S/// operator with the commit that introduced the bug and found why it occurs (although, slips behaving like that may be a bug of its own). Let’s trace where that Empty in Str.match comes from and why it’s there.

What Sourcery Is This?

There’s another bot (it’s the future! people have lots of bots!), SourceBaby, that can give you a link to source code for a routine. It uses CoreHackers::Sourcery module under the hood and takes arguments to give to its sourcery routine. Trigger it with the s: trigger:

<Zoffix> s: 'abc', 'match', \(/d/, :g)
<SourceBaby> Zoffix, Sauce is at
    https://github.com/rakudo/rakudo/blob/164eb42/src/core/Str.pm#L946

We gave it an object to call a method on (a Str), a string with the method name (match), and a Capture with which arguments the method is to be called. In return, it gave a URL to the multi that handles those args:

multi method match(Regex:D $pattern, :global(:$g)!, *%_) {
    nqp::if(
      nqp::elems(nqp::getattr(%_,Map,'$!storage')),
      self!match-cursor(nqp::getlexcaller('$/'),
        $pattern($cursor-init(Cursor,self,:0c)), 'g', $g, %_),
      nqp::if(
        $g,
        self!match-list(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)),
          CURSOR-GLOBAL, POST-MATCH),
        self!match-one(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)))
      )
    )
}

No Empty here, but we can see that when $g is true, we call self!match-list. It’s a private method, so SourceBaby would not be able to help with it. Let’s find it by searching the same source file:

# Create list from the appropriate Sequence given the move
method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      Empty,
    ))
}

And there’s our Empty! The commit message doesn’t mention why we changed from an empty List to an Empty, there are no comments in the source code explaining it, so we’ll have to resort to the most technologically non-advanced debugging tool in our arsenal… asking people.

The Dev IRC Channel

If you have questions about core development, join #perl6-dev IRC channel on Freenode. In this case, we can ask lizmat++ if she remembers whether there was a reason for that Empty.

If the person you’re trying to reach isn’t currently online, you can use the messaging bot, using the .tell trigger, followed by the person’s nick, followed by message. When the bot sees the person talk, it will deliver the message.

<babydrop> .ask stmuk_ so is `zef` now the installer being
    shipped with R*? I notice our REPL message still
    references panda; wondering if that should read zef now
<yoleaux2> babydrop: I'll pass your message to stmuk_.

After the discussion about the Empty, there doesn’t appear to be any specific reason to return it in this case, so we’ll change it to return an empty List instead, just as its old behavior was, and that will also fix our bug. The new !match-list then looks like this:

method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      List.new,
    ))
}

Compile the compiler; this time we can just run make install, since everything is already configured and pre-built from the last time we compiled:

make install

Check the change did fix the bug:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'say S:g/naughty// with "only nice list"'
only nice list

And run the test suite:

TEST_JOBS=6 make spectest

The TEST_JOBS env var lets you run multiple test files at once and the optimal value to set it at is around 1.3 times the number of cores in your computer. If you have a very meaty box (or endless patience), you can run make stresstest instead, for a more thorough test run.

With the spectest passing all of it’s tests, we are ready to finish off our work.

Test It!

The test suite is located in t/spec and is automatically checked out from its repo when you run make spectest. You can also simply delete that directory and clone your own fork as t/spec instead.

Usually, it’s easy to locate the file where the test can go into by running tree -f | grep 'some search term'. We fixed an issue with substitutions, so let’s go for this:

zoffix@VirtualBox:~/CPANPRC/rakudo/t/spec$ tree -f | grep subst
│   ├── ./integration/substr-after-match-in-gather-in-for.t
├── ./S05-substitution
│   ├── ./S05-substitution/67222.t
│   ├── ./S05-substitution/match.t
│   ├── ./S05-substitution/subst.rakudo.moar
│   └── ./S05-substitution/subst.t
│   ├── ./S32-str/substr-eq.t
│   ├── ./S32-str/substr-rw.rakudo.moar
│   ├── ./S32-str/substr-rw.t
│   ├── ./S32-str/substr.t

The ./S05-substitution/subst.t file looks like a decent candidate, pop it open. Bump the plan at the top of file by the number of tests you’re adding, then add the test at the end of the file (or a more appropriate spot):

plan 185;

...

is-deeply (S:g/FAIL// with 'foo'), 'foo',
    'S:g/// returns original string on failure to match';

Run the test file, to ensure everything passes:

make t/spec/S05-substitution/subst.t

And commit! We’re done! Santa’s Naughty-or-Nice list shall work fine from now on.

The Final Mystery

Recall that &infix:<,>($/) thing that was causing the bug when $/ contained an Empty? So what is that all about?

If you don’t know something about Perl 6, just come to our #perl6 IRC channel and ask. This is what I did when I couldn’t understand the purpose of that infix thing and after a long discussion, finding old bug tickets, and testing old bugs… we came to the conclusion these are no longer needed here!

So along with a bug fix, we also cleaned up codegen. At least that’s in theory, perhaps by doing so we created another bug that will send us on yet another great hunting journey.

Conclusion

It’s easy to give a helping hand to the core developers of Perl 6 by fixing some of the bugs. Starting from easy things that require nothing more than knowledge of Perl 6, you can progressively learn more about the internals and fix tougher problems.

The perl6 compiler comes with a number of useful output methods like --target=ast and --target=parse that can aid in debugging. An army of IRC bots makes it easy to navigate source code both in space and time, by either giving you a link to an implementation or producing output of some particular commit.

Lastly, a very valuable resource we have available is the people of Perl 6, who can help you out. Whether you’re digging deep into the guts of the compiler, or just starting out with computer programming.

Join us. We have… bugs to fix!

Day 10 — Let’s learn and try double-ended priority queue with Perl 6

Hello! My name is Itsuki Toyota. I’m a web developer in Japan.

In this post, let me introduce Algorithm::MinMaxHeap.

Introduction

Algorithm::MinMaxHeap is a Perl 6 implementation of double-ended priority queue (i.e. min-max heap).1

Fig.1 shows an example of this data structure:

minmaxheap

Fig.1

This data structure has an interesting property, that is, it has both max-heap and min-heap in a single tree.

More precisely, the min-max ordering is maintained in a single tree as the below description:

  • values stored at nodes on even levels (i.e. min levels) are smaller than or equal to the values stored at their descendants (if any)
  • values stored at nodes on odd levels (i.e. max levels) are greater than or equal to the values stored at their descendants (if any)

Algorithm::MinMaxHeap Internals

I’ll shortly explain what this double-ended priority queue does internally when some frequently used methods (e.g. insert, pop) are called on.2
They maintain the min-max ordering by the following operations:

insert($item)

1. Pushes the given item at the first available leaf position.
2. Checks whether it maintains the min-max ordering by traversing the nodes from the leaf to the root. If it finds a violating node, it swaps this node with a proper node and continues the checking operation.

pop-max

1. Extracts the maximum value node.
2. Fills the vacant position with the last node of the queue.
3. Checks whether it maintains the min-max ordering by traversing the nodes from the filled position to the leaf. If it finds a violating node, it swaps this node with a proper node and continues the checking operation.

pop-min

1. Extracts the minimum value node.
2. Fills the vacant position with the last node of the queue.
3. Checks whether it maintains the min-max ordering by traversing the nodes from the filled position to the leaf. If it finds a violating node, it swaps this node with a proper node and continues the checking operation.

Let’s try !

Then let me illustrate how to use Algorithm::MinMaxHeap.

Example 1

The first example is the simplest one.

Source:

use Algorithm::MinMaxHeap;

my $queue = Algorithm::MinMaxHeap[Int].new; # (#1)
my Int @items = (^10).pick(*);
@items.say; # [1 2 6 4 5 9 0 3 7 8]
$queue.insert($_) for @items; # (#2)
$queue.pop-max.say; # 9 (#3)

In this example, Algorithm::MinMaxHeap[Int].new creates a queue object, where all of its nodes are type Int. (#1)

$queue.insert($_) for @items inserts the numbers in the list. (#2)

$queue.pop-max pops a node which stores the maximum value. (#3)

Example 2

The second example defines user-defined type constraints using subset:

Source (chimera.p6):

use Algorithm::MinMaxHeap;

my subset ChimeraRat of Cool where Num|Rat; # (#1)
my $queue = Algorithm::MinMaxHeap[ChimeraRat].new;

$queue.insert(10e0); # ok
$queue.insert(1/10); # ok
$queue.insert(10); # die # (#2)

Output:

$ perl6 chimera.p6
Type check failed in assignment to @!nodes; expected ChimeraRat but got Int (10)
in method insert at /home/itoyota/.rakudobrew/moar-nom/install/share/perl6/site/sources/240192C19BBAACD01AB9686EE53F67BC530F8545 (Algorithm::MinMaxHeap) line 12
in block  at chimera.p6 line 8

In this example, subset ChimeraRat is a Cool, but only allows Num or Rat. (#1)

Therefore, when you insert 10 to the queue, it returns the error message and die.

It’s because 10 is a Int object. (#2)

Example 3

The third example inserts user-defined classes to the queue.

Source (class.p6):

use Algorithm::MinMaxHeap;

my class State {
 also does Algorithm::MinMaxHeap::Comparable[State]; # (#1) 
 has DateTime $.time;
 has $.payload;
 submethod BUILD(:$!time, :$!payload) { }
 method compare-to(State $s) { # (#2) 
   if $!time == $s.time {
     return Order::Same;
   }
   if $!time > $s.time {
     return Order::More;
   }
   if $!time < $s.time {
     return Order::Less;
   }
 }
}

my @items;

@items.push(State.new(time => DateTime.new(:year(1900), :month(6)),
 payload => "Rola"));
@items.push(State.new(time => DateTime.new(:year(-300), :month(3)),
 payload => "Taro"));
@items.push(State.new(time => DateTime.new(:year(1963), :month(12)),
 payload => "Hanako"));
@items.push(State.new(time => DateTime.new(:year(2020), :month(6)),
 payload => "Jack"));

my Algorithm::MinMaxHeap[Algorithm::MinMaxHeap::Comparable] $queue .= new;
$queue.insert($_) for @items;
$queue.pop-max.say until $queue.is-empty;

Output:

 $ perl6 class.p6
 State.new(time => DateTime.new(2020,6,1,0,0,0), payload => "Jack")
 State.new(time => DateTime.new(1963,12,1,0,0,0), payload => "Hanako")
 State.new(time => DateTime.new(1900,6,1,0,0,0), payload => "Rola")
 State.new(time => DateTime.new(-300,3,1,0,0,0), payload => "Taro")

In this example, the class State does the role Algorithm::MinMaxHeap::Comparable[State], where Algorithm::MinMaxHeap::Comparable[State] defines the stub method compare-to as:
method compare-to(State) returns Order:D { ... };
(#1)

Therefore, in this case, the State class must define a method compare-to so that it accepts a State object and returns an Order:D object. (#2)

Footnote

  • 1 Atkinson, Michael D., et al. “Min-max heaps and generalized priority queues.” Communications of the ACM 29.10 (1986): 996-1000.
  • 2 For more exact information, please read the above paper.

Thank you for reading my post !

Day 9 – A preview of the ‘hackable’ JIT compiler

Among programming languages Perl 6 has an unfortunate but historically well-deserved reputation for being slow, even compared to languages such as Perl 5, Ruby or Python. This is not really an accident, however. From the outset, perl6 was designed to be extensible by the programmer in many different ways, including defining custom operators, dynamic containers for variables, and a powerful meta-object protocol that allow you to modify the behaviour of objects and classes from regular Perl code. In the future, we might well add macros to that list. To this are added a rich set of control structures and phasers. Such features are useful, powerful and fun.

But such features also make it relatively difficult to execute perl6 efficiently, because they introduce overhead and polymorphism. Even before the language was released last year, work had started to improve the efficiency of Rakudo Perl 6. Because Rakudo is complex and layered, this works involves many aspects and many individuals, from core routine efficiency improvements to precompilation of modules. MoarVM also tries to specialize ‘hot’ code for specific object types at runtime, which reduces the polymorphism of that code, and allows simpler operations to be substituted for complex operations. For example, object attribute access can in many cases be reduced to a simple memory read instruction.

Since late summer 2014, MoarVM has contained a JIT compiler that I developed. At its core, JIT compilation reduces the overhead from interpretation. That is to say, rather than fetch new instructions from a bytecode stream and dispatch to execute them, a JIT compiler generates the code to execute these instructions in a series. Ideally, this allows the executed code to ‘flow’ directly through the program without interruption. In practice, there is still the machinery of the VM, for example for garbage collection, to take into account. Also, note that in many VM implementations, the JIT compiler and the type specialization code are combined, whereas in MoarVM they are distinct components.

Since summer 2015 (yes, that long) I’ve been working on an improved backend for that compiler, which was funded by the Perl foundation. One of the goals for the development of that backend (which I misleadingly call the ‘expression JIT compiler’) was to enable extension points for specialized object types and instructions. These extension points are designed to enable a greater range of type specialization and to improve code generation gradually. The best part, I think, is that they should require relatively little knowledge of compiler internals or assembly language. For this advent calendar post, I’d like to demonstrate those extension points.

First, consider the following innocent-looking perl6 code.

sub foo(@a) {
    @a[0] = @a.elems;
}

In case this (obviously critical) `foo` subroutine is invoked hundreds of times, MoarVM will start to optimize it, first by recording what @a really is. In most cases, that will be an instance of the Array class. If it always the same, and supposing the elems method is sufficiently small (which it is), that method call will be inlined into the foo routine. The assignment of @a[0] is somewhat more involved than it looks and what it would be like in a language like Java. In Perl 6, an array holds a sequence of containers, and it is the container to which the value is assigned. So the primitive sequence of operations in ‘foo’ after MoarVM has optimized it is similar to:

my $i := nqp::elems(@a);
my $c := nqp::atpos_o(@a, 0);
nqp::assign($c, $i);

The ‘nqp’ namespace signifies that these are primitive operations in the interpreter. However, they are still polymorphic, because arrays can be represented by multiple different stores. For example, arrays that are used by NativeCall foreign function interface use a different representation (for compatibility reasons) than the arrays natively used by the VM (which are called MVMArray internally). Thus, these operations are polymorphic, and MoarVM needs a dispatch table to find the correct implementation.

If the exact implementation type of @a is known, then this polymorphism is unnecessary. In principle, the type specializer could insert a specialized non-polymorphic instruction for each of those operations for the types found during the recording phase. That is not very practical though as there are many different types and each type would have to support many different operations.

However, with the expression compiler it will be relatively easy to add specialized code for a specific implementation. See for instance the following (somewhat simplified) implementations of the ‘elems’ and ‘atpos_o’ MoarVM instructions, which would work on MVMArray objects:

(template: elems (^getf $1 MVMArray body.elems))

(template: atpos_o 
  (load (index (^getf $1 MVMArray body.slots.o) $2 8)
        (&SIZEOF MVMObject*)))

Such ‘expression templates’ are textual representations of the templates used in the compiler to convert relatively high-level MoarVM bytecode to low-level code which is easier to compile and optimize. Although the code above might look complex, it is actually pretty straightforward:

  • Expressions are grouped by parentheses and can contain subexpression, exactly as in LISP.
  • The ‘template:’ keyword declares a template, and ‘elems’ declares that this template represents the ‘elems’ MoarVM instruction.
  • Numbers with a ‘$’ sigil represent the (input) operands to the MoarVM instruction. In case of elems, the 1st operand is the MVMArray object.
  • ‘load’ and ‘index’ are simple expression nodes, which represent a particular low-level operation.
  • ‘^getf’ is a template macro, which expands to a more complex expression of ‘load’ and ‘addr’ nodes which compute the address of the ‘body.elems’ field in the MVMArray structure and load a value from it.
  • &SIZEOF is an expression that is translated – prior to inclusion by MoarVM – into a C sizeof expression. With the ‘&’ syntax, any C macro can be referenced, and with that any compile-time expression may be evaluated.

Such templates can be added without any knowledge of assembler whatsoever, although some knowledge of C structure and VM internals are probably preferable. I hope that by making this simple enough I can convince others to share the task of JIT compiler development with me :-).

For those who are feeling more adventurous yet, it is also simple to hook into the data-driven code generator known as the tiler. The tiler picks the CPU instructions to implement the low-level code generated from expression templates. Each CPU instruction is represented by a ’tile’, which ‘covers’ part of the expression graph. The tiler tries to pick the optimal instruction so that the entire code graph is covered as cheaply as possible. As the x86 instruction set is very large, it is nearly impossible for me to write tiles for every possible instruction. Adding a tile is not very hard, though. Suppose for instance that we’d be writing a program to compute the XOR of all values in the array:

my $x = 0;
for @a -> $i {
    $x +^= $i;
}

In reality, there are still multiple barriers to implementing this as a tight loop, but suppose that we have taken them down. (We can, it is a matter of time and energy). Then you might find that the sequence of INDEX, LOAD and XOR operations were be inefficient, and you could optimize that into a single instruction. You can declare a tile for that instruction as follows:

(tile: xor_load_idx 
   (xor reg (load (index reg reg $stride) $size)) reg)
  • This uses the same syntax as the expression templates, which is convenient for parsing, but also makes the correspondence between templates and tiles clear
  • The ’tile:’ keyword now declares a tile, and ‘xor_load_idx’ the name of the tile implementation.
  • The words with a $sigil now specify constants. The ‘xor’, ‘load’, and ‘index’ words specify expression nodes (operations).
  • ‘reg’ specifies that this tile consumes values in registers (in the first expression) and yields a value in a register (as the last word in the expression).

The tile implementation would look as follows:

MVM_JIT_TILE_DECL(xor_load_idx) {
    MVMint8 out  = tile->values[0];
    MVMint8 base = tile->values[1];
    MVMint8 idx  = tile->values[2];
    /* this only works on 8-byte sized indexes (for now) */
    ASSERT(tile->args[0] == 8 && tile->args[1] == 8);
    | xor Rq(out), qword [Rq(base)+Rq(idx)*8];
}

The `MVM_JIT_TILE_DECL` macro declares the tile function with all required parameters. The ’tile’ parameter which is declared and passed automatically contains operational details such as the registers used (in ’tile->values’) and constant parameters (in ’tile->args’). The function of the assembly-code fragment that follows the ‘|’ symbol is to declare a piece of machine code that is to be assembled at runtime. For this we use dynasm with a few patches to support picking ‘dynamic’ registers on x86-64. The result of this addition would be that the aforementioned sequence of instructions would be compiled to a single one and your (hypothetical) program would run a few percent faster. Although small, such improvements can add up.

Unfortunately, all of this is not done yet. Although in a way the main body of the compiler exists, some essential pieces (especially the register allocator) need further development, and there are quite a few bugs and oversights that need to be fixed, too. So unfortunately you can’t play with this – yet. However, I do hope that I’ve given you a taste of what you can do when it is finished, which I hope to be soon. And with that, I wish you a happy holiday.

Day 8 — How to Make, Use, and Abuse Perl 6 Subsets

Ever reached for a Perl 6 type and found it good enough, but not perfect? Perhaps, you wanted an IntEven, StrPalindrome, or YourCustomClassWhereAttrFooIsBar. Never fear, Perl 6’s subsets are here!

What Are Subsets?

You can think of them as a refinement on some type and you can use them in most places where you’d use a type, such as in type constraints. To create one, use the subset keyword, along with a where keyword specifying your refinement:

    subset Even where * %% 2;
    say 1 ~~ Even; # False
    say 2 ~~ Even; # True

The WhateverStar * is the value being checked and the %% operator checks if that value is evenly divisible by 2. We can now use our subset on the right side of a smartmatch operator to check whether a value is an even number! Pretty awesome. What else can we do with it?

How about type-constraining a variable:

    my Even $x = 42; # all good

    $x = 43; # and this?
    # Type check failed in assignment to $x; expected Even but got Int (43)
    #   in block <unit> at script.p6 line 3

Or type-constraining input and output of a routine:

    sub takes-an-even-only (Even $x) { $x² }
    sub returns-an-even-only returns Even { $^x² }

    say takes-an-even-only   42; # 1764
    say returns-an-even-only 42; # 1764

    say takes-an-even-only   43;
    # Constraint type check failed for parameter '$x'
    #   in sub takes-an-even-only at script.p6 line 2
    #   in block <unit> at script.p6 line 8

    say returns-an-even-only 43;
    # Type check failed for return value; expected Even but got Int (1849)
    #   in sub returns-an-even-only at script.p6 line 3
    #   in block <unit> at script.p6 line 13

That’s all pretty sweet, but our Even accepts strings and other weird stuff:

    say '42.0000' ~~ Even; # True
    say class { method Real { 42 } }.new ~~ Even; # True

There’s a reason for that: we never specified what type we’re making a subset of, so by default, it used Any. Let’s fix that!

Getting Typical

If you want to create a subset based on some type, specify that type with the of keyword:

    subset IntEven of Int where * %% 2;

Now, before the where refinement even runs, the value we’re checking against must first pass the Int type constraint:

    say 42 ~~ IntEven; # True
    say 43 ~~ IntEven; # False
    say '42.0000' ~~ IntEven; # False
    say class { method Real { 42 } }.new ~~ IntEven; # False

We’re not limited to numerics! Let’s try a Str:

    subset StrPalindrome of Str where {
        .flip eq $_ given .comb(/\w+/).join.lc;
    }

    say ’Madam, I'm Adam.~~ StrPalindrome; # True
    say '1 on 1'  ~~ StrPalindrome; # False

We’re now using a more complex refinement in the where clause, using a code block. Just like the WhateverCode version with the *, the block receives the value to check as its argument, which it aliases to $_ topical variable. The code block tells us whether the argument is a palindrome, by returning a truthy or falsy value.

So how far can we go with the type constraints in our subsets?

Custom Made

We can type-constrain a subset using any class we have lying around! How about this one:

    class Awesome { has $.awesomeness-level }
    my $obj1 = Awesome.new: :10000awesomeness-level;
    my $obj2 = Awesome.new: :31337awesomeness-level;

We make a class called Awesome that has a public attribute called awesomeness-level. We also create two instances of that class, setting the awesomeness-level to 10000 in $obj1 and to 31337 in $obj1. So how about a subset that checks whether awesomeness-level is a prime number? It’s just a single line of code:

    subset AwesomePrime of Awesome where .awesomeness-level.is-prime;

    say $obj1 ~~ AwesomePrime; # False
    say $obj2 ~~ AwesomePrime; # True

The where block of the subset is “thunked,” which means the compiler takes an expression and turns it into a code block for us, so we don’t have to explicitly use a codeblock here, nor do we need a WhateverStar. The value being checked is in the $_ topical variable, which is what method calls use when you don’t specify what you’re calling them on. Thus, our subset expects a thing of type Awesome and then checks whether its awesomeness-level attribute is a prime number!

By using such subsets, you can create routines that only accept your custom objecs of a specific configuration. For example, in code of an IRC::Client bot, we can create a subset of IRC::Client::Message for messages that are received from bot admins, and then register events only for such messages:

    subset BotAdmin of IRC::Client::Message where .host eq conf<bot-admins>.any;

    multi method irc-to-me (BotAdmin $e where /:i ^ 'run' $<steps>=.+ $/ ) {
        ...
    }

The subset calls the subroutine that reads configuration and provides a list of bot admin hosts against which the host of the sender of the message is checked. We’re encapsulating the logic that checks we have an acceptable object to work with, and we’re able to call that logic even before we enter our method.

So if we can do that with subsets… is there anything we can’t do?

Time for Some Heavy Abuse!

Let’s do something crazy! A subroutine that fetches a link to a website and checks whether it contains a mention of Perl 6:

    use LWP::Simple;
    sub is-perl-site { LWP::Simple.get( $^website ).contains: 'Perl 6' }

There’s nothing crazy about that, you say? Then, how about we use that subroutine as a refiner in a subset where clause:

    subset PerlWebsite where &is-perl-site;

    say 'http://perl6.party' ~~ PerlWebsite; # True
    say 'http://lolcats.com' ~~ PerlWebsite; # False

In fact, we can make a routine that only accepts URLs to websites mentioning Perl 6:

    sub ain't-taking-non-perl-stuff (PerlWebsite $url) {
        say "Why! I can already tell $url is awesome!";
    }

    ain't-taking-non-perl-stuff 'http://perl6.party';
    # Why! I can already tell http://perl6.party is awesome!

    ain't-taking-non-perl-stuff 'http://lolcats.com';
    # Constraint type check failed for parameter '$url'
    #   in sub ain't-taking-non-perl-stuff at script.p6 line 8
    #   in block <unit> at script.p6 line 15

But do you notice something off? The error message is rather poor… Let’s improve it!

What we know so far is the where clause takes some code to run and if that code’s result is falsy, the typecheck will fail. That means inside the where we can know whether or not the typecheck will fail before we even return from it. Let’s put that to use:

    sub is-perl-site {
        given LWP::Simple.get( $^website ).contains: 'Perl 6' {
            when :so  { True }
            when :not {
                say ’This ain't no website containing "Perl 6"!‘;
                False;
            }
        }
    }

    subset PerlWebsite where &is-perl-site;

In the routine that checks a website mentions Perl 6, in the case when it does :not contain a mention of Perl 6, we say a helpful message, indicating what exactly was wrong. Let’s run this:

    sub ain't-taking-non-perl-stuff (PerlWebsite $url) {
        say "Why! I can already tell $url is awesome!";
    }

    ain't-taking-non-perl-stuff 'http://perl6.party';
    # Why! I can already tell http://perl6.party is awesome!

    ain't-taking-non-perl-stuff 'http://lolcats.com';
    # This ain't no website containing "Perl 6"!
    # This ain't no website containing "Perl 6"!
    # Constraint type check failed for parameter '$url'
    #   in sub ain't-taking-non-perl-stuff at script.p6 line 16
    #   in block <unit> at script.p6 line 23

Whoa! The message printed twice. What gives?

It’s actually expected that the refinement in subsets is an inexpensive and relatively simple operation… With that expectation in mind, the parameter binder—which doesn’t know how to generate errors—simply passes its stuff through the slower code path—which does—and it’s that slower code path that runs the where code the second time, triggering our message one more time.

So yes, doing overly complex stuff in subsets is abusive. However, you can throw an exception inside the where to avoid the repetition of the message:

    sub is-perl-site {
        LWP::Simple.get( $^website ).contains: 'Perl 6'
            or die ’This ain't no website containing "Perl 6"!‘;
    }

    ...

    ain't-taking-non-perl-stuff 'http://lolcats.com';
    # This ain't no website containing "Perl 6"!
    #   in sub is-perl-site at z.p6 line 4
    #   in any accepts_type at gen/moar/m-Metamodel.nqp line 3472
    #   in sub ain't-taking-non-perl-stuff at z.p6 line 11
    #   in block <unit> at z.p6 line 18

And don’t forget to check out Subset::Helper and Subset::Common modules.

What About a Light Spanking?

There is one type of abuse cheating with subsets that can get you out of a bind: fiddling with narrowness when it comes to resolution of multi candidates.

For example, let’s say you hate humanity and you wish to change the meaning of the infix + operator on two Ints to do subtraction instead of addition. You, of course, write this:

    multi sub infix:<+> (Int $a, Int $b) { $a$b }

But as you run a sample code, over the sound of your evil laughter…

    Ambiguous call to 'infix:<+>'; these signatures all match:
    :(Int:D \a, Int:D \b --> Int:D)
    :(Int $a, Int $b)
      in block <unit> at z.p6 line 4

… the complier errors out.

You see, core language already has an infix + operator that takes two Ints! When you add one of your own, you create an ambiguity. To resolve this issue, we need to somehow create an Int that the compiler thinks is narrower than an Int, but in reality isn’t. Sounds tough? Not an issue for subsets:

    subset NarrowInt of Int where {True};
    multi sub infix:<+> (NarrowInt $a, NarrowInt $b) { $a$b }

    say 42 + 2; # 40

It worked! We created a subset of Int, so we match all of Ints, just like we wanted. In the refinement, however, we specify a single code block that always returns True, making that refinement always succeed, and making our subset accept all the values a regular Int accepts, while being narrower than a regular Int as far as multi resolution goes.

If you’re wondering why we had to use an explicit block, it’s because the where smartmatches, and the smartmatch against a True produces a warning, because it’s always true, and while that is what we want here, in most code such a construct is a mistake.

But if you’re upset about writing one-too-many characters, here’s neat trick:

    multi sub infix:<+> (Int $a where {True}, Int $b where {True}) { $a - $b }
    say 42 + 2;

You don’t need to create an explicit subset, and can stick a where clause right onto the thing you’re working with to refine just that thing. The type constraint on it will function as the of ... of a subset.

You can also type constraint a variable with a subset and still add a where clause. Or create a subset of a subset of a subset and still add… well, we’re getting carried away.

When they stop calling…

Consider this piece of wonderful code:

    class Thingie {
        multi method stuff ($ where /meows/) { say "Meow meow!"; }
    }

    class SubThingie is Thingie {
        multi method stuff ($ where /foos/) { say "Just foos..."; }
    }

    SubThingie.new.stuff: 'meows and foos'; # Just foos...

You have a class with a multi method. Along with it, you have a subclass of it with another multi method of the same name. Both have a where clause and when you call the method with input that can match either multi, the subclass’s multi gets called. But what do you do if you want to reverse that… you want the parent class’s multi to be called, if both multies matches the input.

The first solution is very simple. Just add a type constraint (we’ll use Str) in the parent class, while leaving it off in the child:

    class Thingie {
        multi method stuff (Str $ where /meows/) { say "Meow meow!"; }
    }

    class SubThingie is Thingie {
        multi method stuff ($ where /foos/) { say "Just foos..."; }
    }

    SubThingie.new.stuff: 'meows and foos'; # Meow meow!

The presence of a type constraint on the method in the parent class makes it narrower than the one in the subclass, so even though the subclass’s method can also accept the input, it’s the parent class that gets to take care of it.

However, what if we wanted the same Str type constraint on both methods? The parent class we’ll leave as is: just a normal Str type constraint. In the kid, however, we’ll use a wider subset of Any (that’s the default if you don’t specify the of, remember?), but in its where clause we’ll smartmatch against Str, to ensure the subset accepts only Strs:

    class Thingie {
        multi method stuff (Str $ where /meows/) { say "Meow meow!"; }
    }

    class SubThingie is Thingie {
        subset WiderStr where { $_ ~~ Str };
        multi method stuff (WiderStr $ where /foos/) { say "Just foos..."; }
    }

    SubThingie.new.stuff: 'meows and foos'; # Meow meow!

The result is the opposite of a cheat we made in the previous section: instead of a subset that matches a type exactly, but is narrower than it, we now created a subset that matches a type exactly, but is wider than it, as far as multi candidate resolution goes. And yes, you can just merge the two where clauses instead of creating a subset, producing:

    multi method stuff ($ where { $_ ~~ Str and $_ ~~ /foos/ }) {
        say "Just foos...";
    }

It’ll work the same.

Conclusion

Subsets are a powerful feature that lets you specify refinements on existing core and custom types. You can smartmatch against a subset to perform a check on a value, or you can use subsets to type-contraint variables, parameters, and return values.

You can use the subset keyword to create a named subset, or you can attach a refinement onto a specific variable or parameter with a where clause. Subsets can also be used to effect alternative narrowness of a parameter, to affect multi candidate resolution order.

Subsets can also be abused to perform very complex operations, but… that’s probably a bad idea.

-Ofun

Day 7 — Set In Your Ways: Perl 6’s Setty and Baggy Types

There’s a relatively common pattern I see with people writing code that counts… say, DNA bases in a string:

my %counts;
%counts{$_}++ for 'AGTCAGTCAGTCTTTCCCAAAAT'.comb;
say %counts<A T G C>; # (7 7 3 6)

Make a Hash. For each thing you want to count, ++ that key in that Hash. So what’s the problem?

Perl 6 actually has specialized types that are more appropriate for this operation; for example, the Bag:

'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)

Let’s talk about these types and all the fancy operators that come with them!

A Note on Unicode

I’ll be using fancy-pants Unicode versions of operators and symbols in this post, because they look purty. However, all of them have what we call “Texas” equivalents you can use instead.

Ready. Set. Go.

The simplest of these types is a Set. It will keep exactly one of each item, so if you have multiple objects that are the same, the duplicates will be discarded:

say set 1, 2, 2, "foo", "a", "a", "a", "a", "b";
# OUTPUT: set(a, foo, b, 1, 2)

As you can see, the result has only one a and only one 2. We can use the , U+2208 ELEMENT OF, set membership operator to check if an item is in a set:

my $mah-peeps = set <babydrop iBakeCake Zoffix viki>;
say 'Weeee \o/' if 'Zoffix'  $mah-peeps;
# OUTPUT: Weeee \o/

The set operators are coercive, so we don’t need to explicitly create a set; they’ll do it for us:

say 'Weeee \o/' if 'Zoffix'  <babydrop iBakeCake Zoffix viki>;
# OUTPUT: Weeee \o/

But pay attention when using allomorphs:

say 'Weeee \o/' if 42  <1 42 72>;
# No output

say 'Weeee \o/' if 42  +«<1 42 72>; # coerce allomorphs to Numeric
# OUTPUT: Weeee \o/

The angle brackets create allomorphs for numerics, so in the first case above, our set contains a bunch of IntStr objects, while the left hand side of the operator has a regular Int, and so the comparison fails. In the second case, we coerce allomorphs to their numeric component with a hyper operator and the test succeeds.

While testing membership is super exciting, we can do more with our sets! How about some intersections?

my $admins = set <Zoffix mst [Coke] lizmat>;
my $live-in-North-America = set <Zoffix [Coke] TimToady huggable>;

my $North-American-admins = $admins  $live-in-North-America;
say $North-American-admins;
# OUTPUT: set(Zoffix, [Coke])

We intersected two sets with the , U+2229 INTERSECTION, intersection operator and received a set that contains only the elements present in both original sets. You can chain these operations too, so membership will be checked in all of the provided sets in the chain:

say <Zoffix lizmat>  <huggable Zoffix>  <TimToady huggable Zoffix>;
# OUTPUT: set(Zoffix)

Another handy operator is the set difference operator, whose Unicode look I find somewhat annoying: No, it’s not a backslash (\), but a U+2216 SET MINUS character (luckily, you can use the much more obvious (-) Texas version).

The usefulness of the operator compensates its shoddy looks:

my @spammers = <spammety@sam.com  spam@in-a-can.com  yum@spam.com>;
my @senders  = <perl6@perl6.org   spammety@sam.com   good@guy.com>;

for keys @senders  @spammers -> $non-spammer {
    say "Message from $non-spammer";
}

# OUTPUT:
# Message from perl6@perl6.org
# Message from good@guy.com

We have two arrays: one contains a list of spammers’ addresses and another contains a list of senders. How to get a list of senders, without any spammers in it? Just use the (fine, fine, the (-)) operator!

We then use the for loop to iterate over the results, and as you can see from the output, all spammers were omitted… But why is keys there?

The reason is Setty and Mixy types are a lot like hashes, in a sense that they have keys and values for those keys. Set types always have True as values, and since we don’t care about iterating over Pair objects in our loop, we use the keys to get just the keys of the set: the email addresses.

However, hash-like semantics aren’t useless on Sets. For example, we can take a slice, and with :k adverb return just the elements that the set contains:

my $meows = set <
    Abyssinian  Aegean  Manx      Siamese  Siberian  Snowshoe
    Sokoke      Sphynx  Suphalak  Thai
>;

say $meows<Sphynx  Raas  Ragamuffin  Thai>:k;
# OUTPUT: (Sphynx Thai)

But what happens if we try to delete an item from a set?

say $meows<Siamese>:delete;
# Cannot call 'DELETE-KEY' on an immutable 'Set'
# in block <unit> at z.p6 line 6

We can’t! The Set type is immutable. However, just like Map type has a mutable version Hash, so does the Set type has a mutable version: the SetHash. There isn’t a cutesy helper sub to create one, so we’ll use the constructor instead:

my $s = SetHash.new: <a a a b c d>;
say $s;
$s<a d>:delete;
say $s;

# SetHash.new(a, c, b, d)
# SetHash.new(c, b)

Voilà! We successfully deleted a slice. So, what other goodies does Santa have in his… bag?

Gag ’em ‘n’ Bag ’em

Related to Sets is another type: a Bag, and yes, it’s also immutable, with the corresponding mutable type being BagHash. We already saw at the start of this article we can use a Bag to count stuff, and just like a Set, a Bag is hash-like, which is why we could view a slice of the four DNA amino acids:

'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)

While a Set has all values set to True, a Bag‘s values are integer weights. If you put two things that are the same into a Bag there’ll be just one key for them, but the value will be 2:

my $recipe = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
say $recipe;
# OUTPUT: bag(cup of flour, egg(2), cup of milk)

And of course, we can use our handy operators to combine bags! Here, we’ll be using , U+228E MULTISET UNION, operator, which looks a lot clearer in its Texas version: (+)

my $pancakes = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
my $omelette = bag 'egg', 'egg',  'egg', 'cup of milk';

my $shopping-bag = $pancakes  $omelette  <gum  chocolate>;
say $shopping-bag;
# bag(gum, cup of flour, egg(5), cup of milk(2), chocolate)

We used two of our Bags along with a 2-item list, which got correctly coerced for us, so we didn’t have to do anything.

A more impressive operator is , U+227C PRECEDES OR EQUAL TO, and it’s mirror , U+227D SUCCEEDS OR EQUAL TO, which tell whether a Baggy on the narrow side of the operator is a subset of the Baggy on the other side; meaning all the objects in the smaller Baggy are present in the larger one and their weights are at most as big.

Here’s a challenge: we have some materials and some stuff we want to build. Problem is, we don’t have enough materials to build all the stuff, so what we want to know is what combinations of stuff we can build. Let’s use some Bags!

my $materials = bag 'wood' xx 300, 'glass' xx 100, 'brick' xx 3000;
my @wanted =
    bag('wood' xx 200, 'glass' xx 50, 'brick' xx 3000) but 'house',
    bag('wood' xx 100, 'glass' xx 50)                  but 'shed',
    bag('wood' xx 50)                                  but 'dog-house';

say 'We can build...';
.put for @wanted.combinations.grep: { $materials  [] |$^stuff-we-want };

# OUTPUT:
# We can build...
#
# house
# shed
# dog-house
# house shed
# house dog-house
# shed dog-house

The $materials is a Bag with our materials. We used xx repetition operator to indicate quantities of each. Then we have a @wanted Array with three Bags in it: that’s the stuff we want to build. We’ve also used used the but operator on them to mix in names for them to override what those bags will .put out as at the end.

Now for the interesting part! We call .combinations on our list of stuff we want, and just as the name suggests, we get all the possible combinations of stuff we can build. Then, we .grep over the result, looking for any combination that takes at most all of the materials we have (that’s the operator). On it’s fatter end, we have our $materials Bag and on its narrower end, we have the operator that adds the bags of each combination of our stuff we want together, except we use it as a metaoperator [⊎], which is the same as putting that operator between each item of $^stuff-we-want. In case you it’s new to you: the $^ twigil on $^stuff-we-want creates an implicit signature on our .grep block and we can name that variable anything we want.

And there we have it! The output of the program shows we can build any combination of stuff, except the one that contains all three items. I guess we just can’t have it all…

…But wait! There’s more!

Mixing it Up

Let’s look back at our recipe code. There’s something not quite perfect about it:

my $recipe = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
say $recipe;
# OUTPUT: bag(cup of flour, egg(2), cup of milk)

What if a recipe calls for half a cup of milk instead of a whole one? How do we represent a quarter of a teaspoon of salt, if Bags can only ever have integer weights?

The answer to that is the Mix type (with the corresponding mutable version, MixHash). Unlike a Bag, a Mix supports all Real weights, including negative weights. Thus, our recipe is best modeled with a Mix.

my $recipe = Mix.new-from-pairs:  'egg'          => 2, 'cup of milk' => ½,
                                  'cup of flour' => ¾, 'salt'        => ¼;
say $recipe;
# mix(salt(0.25), cup of flour(0.75), egg(2), cup of milk(0.5))

Be sure to quote your keys and don’t use colonpair form (:42a, or :a(42)), since those are treated as named arguments. There’s also a mix routine, but it doesn’t take weights and functions just like bag routine, except returning a Mix. And, of course, you can use a .Mix coercer on a hash or a list of pairs.

Less-Than-Awesome creation aside, let’s make something with mixes! Say, you’re an alchemist. You want to make a bunch of awesome potions and you need to know the total amount of ingredients you’ll need. However, you realize that some of the ingredients needed by some reactions are actually produced as a byproduct by other reactions you’re making. So, what’s the most efficient amount of stuff you’ll need? Mixes to the rescue!

my %potions =
    immortality  => (:oxium(6.98), :morphics(123.3),  :unobtainium(2)   ).Mix,
    invisibility => (:forma(9.85), :rubidium(56.3),   :unobtainium(−0.3)).Mix,
    strength     => (:forma(9.15), :rubidium(−30.3),  :kuva(0.3)        ).Mix,
    speed        => (:forma(1.35), :nano-spores(1.3), :kuva(1.3)        ).Mix;

say [] %potions.values;
# OUTPUT: mix(unobtainium(1.7), nano-spores(1.3), morphics(123.3),
#              forma(20.35), oxium(6.98), rubidium(26), kuva(1.6))

For convenience, we set up a Hash, with keys being names of potions and values being Mixes with quantities of ingredients. For reactions that produce one of the ingredients we seek, we’ve used negative weights, indicating the amount produced.

Then, we used the same set addition operator we saw earlier, in it’s meta form: [⊎]. We supply it the .values of our Hash that are our Mixes, and it happily adds up all of our ingredients, which we see in the output.

Look at unobtainium and rubidium: the set operator correctly accounted for the quantities produced by reactions where those ingredients have negative weights!

With immortality potion successfully mixed, all we need to do now is figure out what to do for the next few millennia… How about coding some Perl 6?

Day 6 – Perl 6 Books — the Time is Ripe

One question we occasionally get on the phenomenal #perl6 IRC channel is about Perl 6 books. It turns out, some folks don’t want to learn just from tutorials, blog posts and docs. Last year, we didn’t have any good answers to that question.

A few months ago, there seemed to be a flurry of such questions, and at the same time, rumours about new book projects started to spread.

If I remember correctly, the first rumour was when Laurent Rosenfeld contacted me in June, asking for feedback on a manuscript. He had translated a good 200 pages of the Think Python book to Perl 6. This is primarily a book teaching programming to beginners. Later I was happy to hear that a major publisher has accepted the manuscript. The manuscript is now finished, and under review. So I guess we’ll see e-book and print versions of this book in due time.

Then brian d foy opened a kickstarter to raise funds for a “Learning Perl 6” book. By the time this is published, the kickstarter project should still be open, so I warmly encourage you to support this. Having a book by a prominent member of the Perl 5 community, and author of several good Perl books would surely be a boon for the Perl 6 community.

Before the publication of brian’s project, I’ve been mulling about writing my own Perl 6 book. In the past, I’ve tried that once already, and failed. But I had more success with another book project of mine, so I decided to give it another shot. The working title is Perl 6 by example. Content for the book is published in form of blog posts, and later turned into book chapters.

Later I learned that Ken Youens-Clark had written a book on metagenomics that spends about 150 pages explaining Perl 6. And it’s free, you can download the PDF, epub and mobi versions right away! If you’ve followed the advent calendar, you might have read Ken’s earlier post on Bioinformatics

In summary, we have one book starring Perl 6 content, one in progress with the manuscript in stage “feature complete”, one in the kick-starting phase which you can support, and a fourth being written right now, with parts being pre-published in the form of blog posts.

If you want to keep up-to-date with news on Perl 6 books, you can sign up for the low volume Perl 6 book mailing list (less than one mail per month).

I hope that in next year’s advent calendar, we’ll see four reviews of Perl 6 books.