Day 21 – Show me the data!

Over the years, I have enjoyed using the different data dumpers that Perl5 offers. From the basic Data::Dumper to modules dumping in hexadecimal, JSON, with colors, handling closures, with a GUI, as graphs via dot and many other that fellow module developers have posted on CPAN (https://metacpan.org/search?q=data+dump&search_type=modules).

I always find things easier to understand when I can see data and relationships. The funkiest display belonging to ddd (https://www.gnu.org/software/ddd/) that I happen to fire up now and then just for the fun (in the example showing C data but it works as well with the Perl debugger).

ddd

Many dumpers are geared towards data transformation and data transmission/storage. A few modules specialize in generating output for the end user to read; I have worked on system that generated hundreds of thousands lines of output and it is close to impossible to read dumps generated by, say, Data::Dumper.

When I started using Perl6, I immediately felt the need to dump data structures (mainly because my noob code wasn’t doing what I expected it to do); This led me to port my Perl5 module (https://metacpan.org/pod/Data::TreeDumper  https://github.com/nkh/P6-Data-Dump-Tree) to Perl6. I am now also thinking about porting my HexDump module. I recommend warmly learning Perl6 by porting your modules (if you have any on CPAN), it’s fun, educative, useful for the Perl6 community, and your modules implement a need in a domain that you master leaving you time to concentrate on the Perl6.

My Perl5 module was ripe for a re-write and I wanted to see if and how it would be better if written in Perl6, I was not disappointed.

Perl6 is a big language, it takes time to get the pieces right, for a beginner it may seem daunting, even if one has years of experience, the secret is to take it easy, not give up and listen. Porting a module is the perfect exercise, you can take it easy because you have already done it before, you’re not going to give up because you know you can do it, and you have time to listen to people that have more experience (they also need your work), the Perl6 community has been examplary, helpful, patient, supportive and always present; if you haven visited #perl6 irc channel yet, now is a good time.

.perl

Every object in Perl6 has a ‘perl’ method, it can be used to dump the object and objects under it. The official documentation (https://docs.perl6.org/language/5to6-nutshell#Data%3A%3ADumper) provides a good example.

.gist

Every object also inherits a ‘gist’ method from Mu, the official documentation (https://docs.perl6.org/routine/gist#(Mu)_routine_gist) states: “Returns a string representation of the invocant, optimized for fast recognition by humans.”

dd, the micro dumper

It took me a while to discover this one, I saw that in a post on IRC. You know how it feel when you discover something simple after typing .perl and .gist a few hundred times, bahhh!

https://docs.perl6.org/routine/dd

The three dumpers above are built-in. They are also the fastest way to dump data but as much as their output is welcome, I know that it is possible to present data in a more legible way.

Enter Data::Dump

You can find the module on https://modules.perl6.org/ where all the Perl6 modules are. Perl6 modules link to repositories, Data::Dump source is on https://github.com/tony-o/perl6-data-dump.

Data::dump introduces color, depth limitation, and type specific dumps. The code is a compact hundred lines that is quite easy to understand. This module was quite helpful for a few cases that I had. It also dumps all the methods associated with objects. Unfortunately, it did fail on a few types of objects. Give it a try.

Data::Dump::Tree

Emboldened by the Perl6 community, the fact that I really needed a Dumper for visualization, and the experience from my Perl5 module (mainly the things that I wanted to be done differently) I started working on the module. I had some difficulties at the beginning, I knew nothing about the details of Perl6 and even if there is a resemblance with Perl5, it’s another beast. But I love it, it’s advanced, clean, and well designed, I am grateful for all the efforts that where invested in Perl6.

P6 vs P5 implementation

It’s less than half the size and does as much, which makes it clearer (as much as my newbie code can be considered clean). The old code was one monolithic module with a few long functions, the new code has a better organisation and some functionality was split out to extra modules. It may sound like bit-rot (and it probably is a little) but writing the new code in Perl6 made the changes possible, multi dispatch, traits and other built-in mechanism greatly facilitate the re-factoring.

What does it do that the other modules don’t?

I’ll only talk about a few points here and refer you to the documentation for all the details (https://raw.githubusercontent.com/nkh/P6-Data-Dump-Tree/master/lib/Data/Dump/Tree.pod); also have a look at the examples in the distribution.

The main goal for Data::Dump::Tree is readability, that is achieved with filter, type specific dumpers, colors, and dumper specialization via traits. In the examples directory, you can find JSON_parsed.pl which parses 20 lines of JSON by JSON::Tiny(https://github.com/moritz/json),. I’ll use it as an example below. The parsed data is dumped with .perl,  .gist , Data::Dump, and Data::Dump::Tree

.perl output (500 lines, unusable for any average human, Gods can manage)screenshot_20161219_185724

.gist (400 lines, quite readable, no color and long lines limit the readability a bit). Also note that it looks better here than on my terminal who has problems handling unicode properly.screenshot_20161219_190004

Data::Dump (4200 lines!, removing the methods would probably make it usable)screenshot_20161219_190439

The methods dump does not help.screenshot_20161219_190601

Data::Dump::Tree (100 lines, and you are the judge for readability as I am biased). Of course, Data::Dump::Tree is designed for this specific usage, first it understands Match objects, second it can display only part of the string that are matched, which greatly reduces the noise.
screenshot_20161219_190932

Tweeking output

The options are explained in the documentation but here is a little list
– Defining type specific dumper
screenshot_20161219_185409

– filtering to remove data or add a representation for a data set;  below the data structure is dumped as it is and then filtered (a filter that shows what it is doing).

As filtering happens on the “header” and “footer” is should be easy to make a HTML/DHTML plugin; Althoug bcat (https://rtomayko.github.io/bcat/), when using ASCII glyphs, works fine.

screenshot_20161219_191525
– set the display colors
– change the glyphs
– display address information or not
– use subscripts for indexes
– use ASCII, ANSI, or unicode for the glyphs

Diffs

I tried to implement a diff display with the Perl5 module but failed miserably as it needed architectural changes, The Perl6 version was much easier, in fact, it’s an add-on, a trait, that synchronizes two data dumps. This could be used in tests to show differences between expected and gotten data.

screenshot_20161219_184701
Of course we can eliminate the extra glyphs and the data that is equivalent (I also changed the glyph types to ASCII)screenshot_20161219_185035

From here

Above anything else, I hope many authors will start writing Perl6 modules. And I also hope to see other data dumping modules. As for Data::Dump::Tree, as it gathers more users, I hope to get requests for change, patches, and error reports.

Day 19 – Fixing Flow

Finding flow while coding is one of the joys of programming.

Encountering simple syntactic bugs, however, can sometimes interrupt flow. A single missing semicolon, for example, can result in a “WAT!?” followed by a “DOH!”

Perl 6 helps you around the code -> run -> fix cycle by identifying the cause and location of a bug and often suggesting a solution. Take this program:

say "hello"
say "world";

When you run it, Perl 6 will suggest what’s wrong, where the problem is, and what to do next …

===SORRY!=== Error while compiling /home/nige/hello-world.pl
Two terms in a row across lines (missing semicolon or comma?)
at /home/nige/hello-world.pl:6
------> say "hello"⏏

That helps to keep things flowing.

Normally, at this point, it’s off to your $EDITOR to manually add the semicolon and the cycle repeats.

What if Perl 6 could suggest a fix and apply it for you?

Introducing perl6fix

Here is the beginnings of a command-line utility called, perl6fix, that takes the hint from Perl 6 and tries to speed up the code -> run -> fix cycle by applying the fix for you.

hello-world

Let’s look at the code.

It needs a handle on bug descriptions found in Perl 6 output.

class Bug {
   has Int        $.line-number;
   has SourceFile $.source-file;
   has Str        $.description;
   has Str        $.pre-context;
}

And a way to describe fixes:

class Fix {

    has $.prompt;
    has $.pattern;
    has $.action;

    method apply ($bug) {
        $!action($bug);
    }
    method applies-to ($bug) {
        return True without $.pattern;
        return $bug.description ~~ $.pattern;
    }
}

And a way to update the source file:

class SourceFile is IO::Path {

    has @!content-lines = self.IO.slurp.lines;

    method append-to-first-matching-line ($append-char, $text-to-match) {
        return unless my $first-matching-index = @!content-lines.first(rx/$text-to-match/, :k);
        @!content-lines[$first-matching-index] ~= $append-char;
        self.save;
    }
    method swap-characters-on-line ($from-chars, $to-chars, $line-number) {
        @!content-lines[$line-number - 1].subst-mutate(/$from-chars/, $to-chars);
        self.save;
    }
    method save {
        self.IO.spurt(@!content-lines.join("\n"));
    }
}

Here is just some of the fixes I encountered while writing this program:

my @fixes = (
    Fix.new(
        prompt  => 'add semicolon',
        pattern => rx/'missing semicolon or comma?'/,
        action  => sub ($bug) {
            $bug.source-file.append-to-first-matching-line(';', $bug.pre-context);
        }
    ),
    Fix.new(
         prompt  => 'add comma',
         pattern => rx/'missing semicolon or comma?'/,
         action  => sub ($bug) {
             $bug.source-file.append-to-first-matching-line(',', $bug.pre-context);
         }
    ),
    Fix.new(
         prompt  => 'convert . to ~',
         pattern => rx/'Unsupported use of . to concatenate strings; in Perl 6 please use ~'/,
         action  => sub ($bug) {
              $bug.source-file.swap-characters-on-line('.', '~', $bug.line-number);
         }
    ),
    Fix.new(
         prompt  => 'convert qr to rx',
         pattern => rx/'Unsupported use of qr for regex quoting; in Perl 6 please use rx'/,
         action  => sub ($bug) {
              $bug.source-file.swap-characters-on-line('qr', 'rx', $bug.line-number);
         }
    ),
    # ADD YOUR OWN FIXES HERE
);

There’s many more potential fixes (I’m just starting).

The perl6fix script wraps perl6, captures STDERR (if there is any), and then looks for a bug report in the output:

sub find-bug ($perl6-command) {

    return unless my $error-output = capture-stderr($perl6-command);

    # show the error
    note($error-output); 

    # re-run the command again - this time grabbing a JSON version of the bug
    # set RAKUDO_EXCEPTIONS_HANDLER env var to JSON 
    return unless my $error-as-json = capture-stderr('RAKUDO_EXCEPTIONS_HANDLER=JSON ' ~ $perl6-command);
    return unless my $bug-description = from-json($error-as-json);

    # just handle these exception types to start with 
    for 'X::Syntax::Confused', 'X::Obsolete' -> $bug-type {
        next unless my $bug = $bug-description{$bug-type}; 
        return Bug.new(
            description => $bug<message>,
            source-file => SourceFile.new($bug<filename>),
            line-number => $bug<line>,
            pre-context => $bug<pre>
        );
    }
}

The next step is to see if there are any fixes for this type of bug:

sub fix-bugs ($perl6-command-line) {

    my $bug = find-bug($perl6-command-line);
 
    unless $bug {
        say 'No bugs found to fix.'; 
        exit;
    }

    # find a potential list of fixes for this type of bug
    my @found-fixes = @fixes.grep(*.applies-to($bug));

    say $bug.description;
    say $bug.source-file.path ~ ' ' ~ $bug.line-number;
    say 'Suggested fixes found: ';

    my $fix-count = 0;

    for @found-fixes -> $fix {
        $fix-count++;
        my $option = ($fix-count == 1)
                   ?? "*[1] " ~ $fix.prompt
                   !! " [$fix-count] " ~ $fix.prompt;
        say ' ' ~ $option;
     }

     my $answer = prompt('apply fix [1]? ') || 1;
     my $fix = @found-fixes[$answer - 1];

     $fix.apply($bug);

     # look for more bugs! until we're done
     fix-bugs($perl6-command-line);
}

With the help of a shell alias you can even run it to fix the previous perl6 command. Like so:

fix-image

Just add an alias to your bash or zsh profile. For example:

alias fix='/home/nige/perl6/perl6fix prev $(fc -ln -1)'

Now it’s your turn

As you can see this is just a start.

You’re welcome to take the full script and evolve your own set of automatic Perl 6 fixes.

You could even adapt the script to apply fixes for Perl 5 or other languages? What about a version using grammars? Or a macro-powered version integrated directly into Perl 6?

Well maybe that last one is something to look forward to next Christmas!

Hope you have a happy Christmas and a flowing, Perl 6 powered 2017!

Day 18 – Asynchronous Workflow with Tinky

State Machines

The idea of “state” and the “state machine” is so ubiquitous in software programming that we tend to use it all the time without even thinking about it very much: that we have a fixed number of states of some object or system and that there are fixed allowable transitions from one state to another will be almost inevitable when we start modelling some physical process or business procedure.

I quite like the much quoted description of a state machine used in the erlang documentation:

If we are in state S and the event E occurs, we should perform the actions A and make a transition to the state S’.

once we string a number of these together we could consider it to form a workflow. Whole books, definition languages and software systems have been made about this stuff.

Managing State

Often we find ourselves implementing state management in an application on an ad-hoc basis, defining the constraints on state transition in code, often in separate part of the application and relying on the other parts of the application to do the right thing. I’m sure I’m not alone to have seen code in the wild that purports to implement a state machine which in fact has proved to be entirely non-deterministic in the face of the addition of new states or actions in the system. Or systems where a new action has to be performed on some object when it enters a new state and the state can be entered in different ways in different parts of the application so new code has to be implemented in a number of different places with the inevitable consequence that one gets missed and it doesn’t work as defined.

Using a single source of state management with a consistent definition within an application can alleviate these kinds of problems and can actually make the design of the application simpler and clearer than it might otherwise have been.

Thus I was inspired to write Tinky which is basically a state management system that allows you to create workflows in an application.

Tinky allows you to compose a number of states and the transitions between them into an application workflow which can make the transitions between the states available as a set of Supplies: providing the required constraints on the allowed transitions and allowing you to implement the actions on those transitions in the appropriate place or manner for your application.

Simple Workflow

Perhaps the canonical example used for similar software is that of the bug tracking software, so let’s start with the simplest possible example with three states and two transitions between them:

use Tinky;

my $state-new  = Tinky::State.new(name => "new");
my $state-open = Tinky::State.new(name => "open");
my $state-done = Tinky::State.new(name => "done");

my @states = ( $state-new, $state-open, $state-done);

my $new-open   = Tinky::Transition.new(name => "open", from => $state-new, to => $state-open);
my $open-done  = Tinky::Transition.new(name => "done", from => $state-open, to => $state-done);

my @transitions = ( $new-open, $open-done);


my $workflow = Tinky::Workflow.new(name => "support", :@states, :@transitions, initial-state => $state-new);

This defines our three states “new”, “open” and “done” and two transitions between them, from “new” to “open” and “open” to “done”. This defines a “workflow” in which the state must go through “open” before becoming “done”.

Obviously this doesn’t do very much without an object that can take part in this workflow, so Tinky provides a role Tinky::Object that can be applied to a class who’s state you want to manage:

class Ticket does Tinky::Object {
    has Str $.ticket-number = (^100000).pick.fmt("%08d");
}

my $ticket = Ticket.new;

$ticket.apply-workflow($workflow);

say $ticket.state.name;          # new
$ticket.next-states>>.name.say;  # (open)

$ticket.state = $state-open;
say $ticket.state.name;          # open

The Tinky::Object role provides the accessors state and next-states to the object, the latter returning a list of the possible states that the object can be transitioned to (in this example there is only one, but there could be as many as your workflow definition allows,) you’ll notice that the state of the object is defaulted to new which is the state provided as initial-state to the Tinky::Workflow constructor.

The assignment to state is constrained by the workflow definition, so if in the above you were to do:

$ticket.state = $state-done;

This would result in an exception “No Transition for ‘new’ to ‘done'” and the state of the object would not be changed.

As a convenience the workflow object defines a role which provides methods named for the transitions and which is applied to the object when apply-workflow is called, thus the setting of the state in the above could be written as:

$ticket.open

However this feature has an additional subtlety (that I unashamedly stole from a javascript library,) in that if there are two transitions with the same name then it will still create a single method which will use the current state of the object to select which transition to apply; typically you might do this where the to state is the same, so for example if we added a new state ‘rejected’ which can be entered from both ‘new’ and ‘open’:

use Tinky;

my $state-new      = Tinky::State.new(name => "new");
my $state-open     = Tinky::State.new(name => "open");
my $state-done     = Tinky::State.new(name => "done");
my $state-rejected = Tinky::State.new(name => "rejected");


my @states = ( $state-new, $state-open, $state-done, $state-rejected);

my $new-open   = Tinky::Transition.new(name => "open", from => $state-new, to => $state-open);
my $new-rejected   = Tinky::Transition.new(name => "reject", from => $state-new, to => $state-rejected);
my $open-done  = Tinky::Transition.new(name => "done", from => $state-open, to => $state-done);
my $open-rejected  = Tinky::Transition.new(name => "reject", from => $state-open, to => $state-rejected);

my @transitions = ( $new-open,$new-rejected, $open-done, $open-rejected);


my $workflow = Tinky::Workflow.new(name => "support", :@states, :@transitions, initial-state => $state-new);

class Ticket does Tinky::Object {
    has Str $.ticket-number = (^100000).pick.fmt("%08d");
}

my $ticket-one = Ticket.new;
$ticket-one.apply-workflow($workflow);
$ticket-one.next-states>>.name.say;
$ticket-one.reject;
say $ticket-one.state.name;

my $ticket-two = Ticket.new;
$ticket-two.apply-workflow($workflow);
$ticket-two.open;
$ticket-two.next-states>>.name.say;
$ticket-two.reject;
say $ticket-two.state.name;

You are not strictly limited to having the similarly named transitions enter the same state, but they must have different from states (otherwise the method generated wouldn’t know which transition to apply).   Obviously if the method is called on an object which is not in a state for which there are any transitions an exception will be thrown.

So what about this asynchronous thing

All of this might be somewhat useful if we are merely concerned with constraining the sequence of states an object might be in, but typically we want to perform some action upon transition from one state to another (and this is explicitly stated in the definition above). So, for instance, in our ticketing example we might want to send some notification, recalculate resource scheduling or make a branch in a version control system for example.

Tinky provides for the state transition actions by means of a set of  Supplies on the states and transitions,  to which the object for which the transition has been performed is emitted. These “events” are emitted  on the state that is being left, the state that is being entered and the actual transition that was performed. The supplies are conveniently aggregated at the workflow level.

So, if, in the example above, we wanted to log every transition of state of a ticket and additional send a message when the ticket enters the “open” state we can simply tap the appropriate Supply to perform these actions:

use Tinky;

my $state-new      = Tinky::State.new(name => "new");
my $state-open     = Tinky::State.new(name => "open");
my $state-done     = Tinky::State.new(name => "done");
my $state-rejected = Tinky::State.new(name => "rejected");


my @states = ( $state-new, $state-open, $state-done, $state-rejected);

my $new-open   = Tinky::Transition.new(name => "open", from => $state-new, to => $state-open);
my $new-rejected   = Tinky::Transition.new(name => "reject", from => $state-new, to => $state-rejected);
my $open-done  = Tinky::Transition.new(name => "done", from => $state-open, to => $state-done);
my $open-rejected  = Tinky::Transition.new(name => "reject", from => $state-open, to => $state-rejected);

my @transitions = ( $new-open,$new-rejected, $open-done, $open-rejected);


my $workflow = Tinky::Workflow.new(name => "support", :@states, :@transitions, initial-state => $state-new);

# Make the required actions

$workflow.transition-supply.tap(-> ($trans, $object) { say "Ticket '{ $object.ticket-number }' went from { $trans.from.name }' to '{ $trans.to.name }'" });
$state-open.enter-supply.tap(-> $object { say "Ticket '{ $object.ticket-number }' is opened, sending email" });

class Ticket does Tinky::Object {
    has Str $.ticket-number = (^100000).pick.fmt("%08d");
}

my $ticket-one = Ticket.new;
$ticket-one.apply-workflow($workflow);
$ticket-one.next-states>>.name.say;
$ticket-one.reject;
say $ticket-one.state.name;

my $ticket-two = Ticket.new;
$ticket-two.apply-workflow($workflow);
$ticket-two.open;
$ticket-two.next-states>>.name.say;
$ticket-two.reject;
say $ticket-two.state.name;

Which will give some output like

[open rejected]
Ticket '00015475' went from new' to 'rejected'
rejected
Ticket '00053735' is opened, sending email
Ticket '00053735' went from new' to 'open'
[done rejected]
Ticket '00053735' went from open' to 'rejected'
rejected

The beauty of this kind of arrangement, for me at least, is that the actions can be defined at the most appropriate place in the code rather than all in one place and can also be added and removed at run time if required, it also works nicely with other sources of asynchronous events in Perl 6 such as timers, signals or file system notifications.

Defining a Machine

Defining a large set of states and transitions could prove somewhat tiresome and error prone if doing it in code like the above, so you could choose to build it from some configuration file or from a database of some sort, but for convenience I have recently released Tinky::JSON which allows you to define all of your states and transitions in a single JSON document.

The above example would then become something like:

use Tinky;
use Tinky::JSON;

my $json = q:to/JSON/;
{
    "states" : [ "new", "open", "done", "rejected" ],
    "transitions" : [
        {
            "name" : "open",
            "from" : "new",
            "to"   : "open"
        },
        {
            "name" : "done",
            "from" : "open",
            "to"   : "done"
        },
        {
            "name" : "reject",
            "from" : "new",
            "to"   : "rejected"
        },
        {
            "name" : "reject",
            "from" : "open",
            "to"   : "rejected"
        }
    ],
    "initial-state" : "new"
}
JSON

my $workflow = Tinky::JSON::Workflow.from-json($json);

$workflow.transition-supply.tap(-> ($trans, $object) { say "Ticket '{ $object.ticket-number }' went from { $trans.from.name }' to '{ $trans.to.name }'" });
$workflow.enter-supply("open").tap(-> $object { say "Ticket '{ $object.ticket-number }' is opened, sending email" });

class Ticket does Tinky::Object {
    has Str $.ticket-number = (^100000).pick.fmt("%08d");
}

my $ticket-one = Ticket.new;
$ticket-one.apply-workflow($workflow);
$ticket-one.next-states>>.name.say;
$ticket-one.reject;
say $ticket-one.state.name;

my $ticket-two = Ticket.new;
$ticket-two.apply-workflow($workflow);
$ticket-two.open;
$ticket-two.next-states>>.name.say;
$ticket-two.reject;
say $ticket-two.state.name;

As well as providing the means of constructing the workflow object from a JSON description it adds methods for accessing the states and transitions and their respective supplies by name rather than having to have the objects themselves to hand, which may be more convenient in your application. I’m still working out how to provide the definition of actions in a similarly convenient declarative way.

It would probably be easy to make something similar that can obtain the definition from an XML file (probably using XML::Class,) so let me know if you might find that useful.

Making something useful

My prime driver for making Tinky in the first place was for a still-in-progress online radio management software, this could potentially have several different workflows for different types of objects: the media for streaming may need to be uploaded, it may possibly require encoding to a streamable format, have silence detection performed and its metadata normalised and so forth before it is usable in a show; the shows themselves need to have either media added or a live streaming source configured and then be scheduled at the appropriate time and possibly also be recorded (and then the recording fed back into the media workflow.) All of this might be a little too complex for a short example, but an example that ships with Tinky::JSON is inspired by the media portion of this and was actually made in response to something someone was asking about on IRC a while ago.

The basic idea is that a process waits for WAV files to appear in some directory and then copies them to another directory where they are encoded (in this case to FLAC.) The nice thing about using the workflow model for this is that the code is kept quite compact and clear, since failure conditions can be handled locally to the action for the step in the process so deeply nested conditions or early returns are avoided, also because it all happens asynchronously it makes best of the processor time.

So the workflow is described in JSON as:

{
    "states" : [ "new", "ready", "copied", "done", "failed", "rejected" ],
    "transitions" : [
        {
            "name" : "ready",
            "from" : "new",
            "to"   : "ready"
        },
        {
            "name" : "reject",
            "from" : "new",
            "to"   : "rejected"
        },
        {
            "name" : "copied",
            "from" : "ready",
            "to"   : "copied"
        },
        {
            "name" : "fail",
            "from" : "ready",
            "to"   : "failed"
        },
        {
            "name" : "done",
            "from" : "copied",
            "to"   : "done"
        },
        {
            "name" : "fail",
            "from" : "copied",
            "to"   : "failed"
        }
    ],
    "initial-state" : "new"
}

Which defines our six states and the transitions between them. The “rejected” state is entered if the file has been seen before (from state “new”,) and the “failed” state may occur if there was a problem with either the copying or the encoding.

The program expects this to be in a file called “encoder.json” in the same directory as the program itself.

This example uses the ‘flac’ encoder but you could alter this to something else if you want.

use Tinky;
use Tinky::JSON;
use File::Which;


class ProcessFile does Tinky::Object {
    has Str $.path      is required;
    has Str $.out-dir   is required;
    has Str $.new-path;
    has Str $.flac-file;
    has     @.errors;
    method new-path() returns Str {
        $!new-path //= $!out-dir.IO.child($!path.IO.basename).Str;
    }
    method flac-file() returns Str {
        $!flac-file //= self.new-path.subst(/\.wav$/, '.flac');
        $!flac-file;
    }
}


multi sub MAIN($dir, Str :$out-dir = '/tmp/flac') {
    my ProcessFile @process-files;

    my $json = $*PROGRAM.parent.child('encoder.json').slurp;
    my $workflow = Tinky::JSON::Workflow.from-json($json);

    my $flac = which('flac') or die "no flac encoder";
    my $cp   = which('cp');

    my $watch-supply = IO::Notification.watch-path($dir).grep({ $_.path ~~ /\.wav$/ }).unique(as => { $_.path }, expires => 5);

    say "Watching '$dir'";

    react {
        whenever $watch-supply -> $change {
            my $pf = ProcessFile.new(path => $change.path, :$out-dir);
            say "Processing '{ $pf.path }'";
            $pf.apply-workflow($workflow);
        }
        whenever $workflow.applied-supply() -> $pf {
            if @process-files.grep({ $_.path eq $pf.path }) {
                $*ERR.say: "** Already processing '", $pf.path, "' **";
                $pf.reject;
            }
            else {
                @process-files.append: $pf;
                $pf.ready;
            }
        }
        whenever $workflow.enter-supply('ready') -> $pf {
            my $copy = Proc::Async.new($cp, $pf.path, $pf.new-path, :r);
            whenever $copy.stderr -> $error {
                $pf.errors.append: $error.chomp;
            }
            whenever $copy.start -> $proc {
                if $proc.exitcode {
                    $pf.fail;
                }
                else {
                    $pf.copied;
                }
            }
        }
        whenever $workflow.enter-supply('copied') -> $pf {
            my $encode = Proc::Async.new($flac,'-s',$pf.new-path, :r);
            whenever $encode.stderr -> $error {
                $pf.errors.append: $error.chomp;
            }
            whenever $encode.start -> $proc {
                if $proc.exitcode {
                    $pf.fail;
                }
                else {
                    $pf.done;
                }
            }
        }
        whenever $workflow.enter-supply('done') -> $pf {
            say "File '{ $pf.path }' has been processed to '{ $pf.flac-file }'";
        }
        whenever $workflow.enter-supply('failed') -> $pf {
            say "Processing of file '{ $pf.path }' failed with '{ $pf.errors }'";
        }
        whenever $workflow.transition-supply -> ($trans, $pf ) {
            $*ERR.say("File '{ $pf.path }' went from '{ $trans.from.name }' to '{ $trans.to.name }'");
        }
    }
}

If you start this with an argument of the directory where you want to pick up the files ffrom, it will wait until new files appear then create a new ProcessFile object and apply the workflow to it, then every object to which the workflow is applied is sent to the applied-supply which is tapped to check whether the file has already been processed: if it has (and this can happen because the file directory watch may emit more than one event for the creation of the file,) the object is moved to state ‘rejected’ and no further processing happens, otherwise it is moved to state ‘ready’ whereupon it is copied, and if successfully encoded.

Additional states (and transitions to enter them,) could easily be added to, for instance, store the details of the encoded file in a database, or even start playing it, or new actions could be added for existing states by adding additional “whenever” blocks. As it stands this will block forever waiting for new files;  however this could be integrated into a larger program by starting this in a new thread for instance.

The program and the JSON file are in the examples directory for Tinky::JSON, please feel free to grab and tinker with them.

Not quite all

Tinky has a fair bit more functionality that I don’t think I have space to describe here: there are facilities for the run-time validation of transition application and additional supplies that are emitted to at various stages of the workflow lifecycle. Hopefully your interest is sufficiently picqued that you might look at the documentation.

I am considering adding a cookbook-style document for the module for some common patterns that might arise in programs that might use it. If you have any ideas or questions please feel free to drop me a note.

Finally, I chose a deliberately un-descriptive name for the module as I didn’t want to make a claim that this would be the last word in the problem space. There are probably many more ways that a state managed workflow could be implemented nicely in Perl 6. I would be equally delighted if you totally disagree with my approach and release your own design as I would be if you decide to use Tinky.

Tinky Winky is the purple Teletubby with a red bag.

Day 17 – Testing in virtual time

Over the last month, most of my work time has been spent building a proof of concept for a project that I’ll serve as architect for next year. When doing software design, I find spikes (time-boxed explorations of problems) and rapid prototyping really useful ways to gain knowledge of new problem spaces that I will need to work in. Finding myself with a month to do this before the “real” start of the project has been highly valuable. Thanks to being under NDA, I can’t say much about the problem domain itself. I can, however, say that it involves a reasonable amount of concurrency: juggling different tasks that overlap in time.

Perl’s “whipuptitude” – the ability to quickly put something together – is fairly well known. Figuring that Perl 6’s various built-in concurrency constructs would allow me to whip up concurrent things rapidly, I decided to build the proof of concept in Perl 6. I’m happy to report that the bet paid off pretty well, and by now the proof of concept has covered all of the areas I hoped to explore – and some further ones that turned out to matter but that weren’t obvious at the start.

To me, building rapid prototypes explicitly does not mean writing crappy code. For sure, simplifications and assumptions of things not critical to the problem space are important. But my prototype code was both well tested and well structured. Why? Because part of rapid prototyping is being able to evolve the prototype quickly. That means being able to refactor rapidly. Decent quality, well-structured, well-tested code is important to that. In the end, I had ~2,500 lines of code covered by ~3,500 lines of tests.

So, I’ve spent a lot of time testing concurrent code. That went pretty well, and I was able to make good use of Test::Mock in order to mock components that returned a Promise or Supply also. The fact that Perl 6 has, from the initial language release, had ways to express asynchronous values (Promise) or asynchronous streams of values (Supply) is in itself a win for testing. Concurrent APIs expressed via these standard data structures are easy to fake, since you can put anything you want behind a Promise or a Supply.

My work project didn’t involve a huge amount of dealing with time, but in the odd place it did, and I realized that testing this code effectively would be a challenge. That gave me the idea of writing about testing time-based code for this year’s Perl 6 advent, which in turn gave me the final nudge I needed to write a module that’s been on my todo list all year. Using it, testing things involving time can be a lot more pleasant.

Today’s example: a failover mechanism

Timeouts are one of the most obvious and familiar places that time comes up in fairly “everyday” code. To that end, let’s build a simple failover mechanism. It should be used as follows:

my $failover-source = failover($source-a, $source-b, $timeout);

Where:

  • $source-a is a Supply
  • $source-b is a Supply
  • $timeout is a timeout in seconds (any Real number)
  • The result, assigned to $failover-source, is also Supply

And it should function as follows:

  • The Supply passed as $source-a is immediately tapped (which means it is requested to do whatever is needed to start producing values)
  • If it produces its first value before $timeout seconds, then we simply emit every value it produces to the result Supply and ignore $source-b
  • Otherwise, after $timeout seconds, we also tap $source-b
  • Whichever source then produces a value first is the one that we “latch” on to; any results from the other should be discarded

Consider, for example, that $source-a and $source-b are supplies that, when tapped, will send the same query to two different servers, which will stream back results over time. Normally we expect the first result within a couple of seconds. However, if the server queried by $source-a is overloaded or has other issues, then we’d like to try using the other one, $source-b, to see if it can produce results faster. It’s a race, but where A gets a head start.

Stubbing stuff in

So, in a Failover.pm6, let’s stub in the failover sub as follows:

sub failover(Supply $source-a, Supply $source-b, Real $timeout --> Supply) is export {
    supply {
    }
}

A t/failover.t then starts off as:

use Failover;
use Test;

# Tests will go here

done-testing;

And we’re ready to dig in to the fun stuff.

The first test

The simplest possible case for failover is when $source-a produces its first value in time. In this case, $source-b should be ignored totally. Here’s a test case for that:

subtest 'When first Supply produces a value in time, second not used', {
    my $test-source-a = supply {
        whenever Promise.in(1) {
            emit 'a 1';
        }
        whenever Promise.in(3) {
            emit 'a 2';
        }
    }
    my $test-source-b = supply {
        die "Should never be used";
    }
    my $failover-supply = failover($test-source-a, $test-source-b, 2);
    my $output = $failover-supply.Channel;

    is $output.receive, 'a 1', 'Received first value from source A';
    is $output.receive, 'a 2', 'Received second value from source A';
}

Here, we set up $test-source-a as a Supply that, when tapped, will emit a 1 after 1 second, and a 2 after 3 seconds. If $test-source-b is ever tapped it will die. We expect that if this wrongly happens, it will be at the 2 second mark, which is why a 2 is set to be emitted after 3 seconds. We then obtain a Channel from the resulting $failover-supply, which we can use to pull values from at will and check we got the right things. (On coercing a Supply to a Channel, the Supply is tapped, starting the flow of values, and each result value is fed into the Channel. Both completion and errors are also conveyed.)

Making it pass

There are a couple of ways that we might make this test pass. The absolute easiest one would be:

sub failover(Supply $source-a, Supply $source-b, Real $timeout) is export {
    return $source-a;
}

Which feels like cheating, but in TDD the code that passes the first test case almost always does. (It sometimes feels pointless to write said tests. More than once, they’ve ended up saving me when – while making a hard thing work – I ended up breaking the trivial thing.)

An equivalent, more forward-looking solution would be:

sub failover(Supply $source-a, Supply $source-b, Real $timeout) is export {
    supply {
        whenever $source-a {
            .emit;
        }
    }
}

Which is the identity operator on a Supply (just spit out everything you get). For those not familiar with supplies, it’s worth noting that this supply block does 3 useful things for you for free:

  • Passes along errors from $source-a
  • Passes along completion from $source-a
  • Closes the tap on $source-a – thus freeing up resources – if the tap on the supply we’re defining here is closed

Subscription management and error management are two common places for errors in asynchronous code; the supply/whenever syntax tries to do the Right Thing for you on both fronts. It’s more than just a bit of tinsel on the Christmas callback.

When the timeout…times out

So, time for a more interesting test case. This one covers the case where the $source-a fails to produce a value by the timeout. Then, $source-b produces a value within 1 second of being tapped – meaning its value should be relayed. We also want to ensure that even if $test-source-a were to emit a value a little later on, we’d disregard it. Here’s the test:

subtest 'When timeout reached, second Supply is used instead if it produces value first', {
    my $test-source-a = supply {
        whenever Promise.in(4) {
            emit 'a 1';
        }
    }
    my $test-source-b = supply {
        whenever Promise.in(1) { # start time + 2 (timeout) + 1
            emit 'b 1';
        }
        whenever Promise.in(3) { # start time + 2 (timeout) + 3
            emit 'b 2';
        }
    }
    my $failover-supply = failover($test-source-a, $test-source-b, 2);
    my $output = $failover-supply.Channel;

    is $output.receive, 'b 1', 'Received first value from source B';
    is $output.receive, 'b 2', 'Received second value from source B';
}

We expect a 1 to be ignored, because we chose $source-b. So, how can we make this pass? Here goes:

sub failover(Supply $source-a, Supply $source-b, Real $timeout --> Supply) is export {
    supply {
        my $emitted-value = False;

        whenever $source-a {
            $emitted-value = True;
            .emit;
        }

        whenever Promise.in($timeout) {
            unless $emitted-value {
                whenever $source-b {
                    .emit;
                }
            }
        }
    }
}

Will this pass the test? Both subtests?

Think about it…

Well, no, it won’t. Why? Because it doesn’t do anything about disregarding $source-a after it has started spitting out values from $source-b. It needs to commit to one or the other. Didn’t spot that? Good job we have tests! So, here’s a more complete solution that makes both subests pass:

sub failover(Supply $source-a, Supply $source-b, Real $timeout --> Supply) is export {
    supply {
        my enum Committed ;
        my $committed = None;

        whenever $source-a -> $value {
            given $committed {
                when None {
                    $committed = A;
                    emit $value;
                }
                when A {
                    emit $value;
                }
            }
        }

        whenever Promise.in($timeout) {
            if $committed == None {
                whenever $source-b -> $value {
                    $committed = B;
                    emit $value;
                }
            }
        }
    }
}

So tired of waiting

You’d think I’d be happy with this progress. Two passing test cases. Surely the end is in sight! Alas, development is getting…tedious. Yes, after just two test cases. Why? Here’s why:

$ time perl6-m -Ilib t/failover-bad.t
    ok 1 - Received first value from source A
    ok 2 - Received second value from source A
    1..2
ok 1 - When first Supply produces a value in time, second not used
    ok 1 - Received first value from source B
    ok 2 - Received second value from source B
    1..2
ok 2 - When timeout reached, second Supply is used instead if it produces value first
1..2

real    0m8.694s
user    0m0.600s
sys     0m0.072s

Every time I run my tests I’m waiting around 9 seconds now. And when I add more tests? Even longer! Now imagine I was going to write a whole suite of these failover and timeout routines, as a nice module. Or I was testing timeouts in a sizable app and would have dozens, even hundreds, of such tests.

Ouch.

Maybe, though, I could just make the timeouts smaller. Yes, that’ll do it! Here is how the second test looks now, for example:

subtest 'When timeout reached, second Supply is used instead if it produces value first', {
    my $test-source-a = supply {
        whenever Promise.in(0.04) {
            emit 'a 1';
        }
    }
    my $test-source-b = supply {
        whenever Promise.in(0.01) { # start time + 2 (timeout) + 1
            emit 'b 1';
        }
        whenever Promise.in(0.03) { # start time + 2 (timeout) + 3
            emit 'b 2';
        }
    }
    my $failover-supply = failover($test-source-a, $test-source-b, 0.02);
    my $output = $failover-supply.Channel;

    is $output.receive, 'b 1', 'Received first value from source B';
    is $output.receive, 'b 2', 'Received second value from source B';
}

You want it faster? Divide by 100! Job done.

Of course, anybody who has actually done this knows precisely what comes next. The first 3 times I ran my tests after this change, all was well. But guess what happened on the forth time?

ok 1 - When first Supply produces a value in time, second not used
    not ok 1 - Received first value from source B

    # Failed test 'Received first value from source B'
    # at t/failover-short.t line 41
    # expected: 'b 1'
    #      got: 'a 1'
    not ok 2 - Received second value from source B

    # Failed test 'Received second value from source B'
    # at t/failover-short.t line 42
    # expected: 'b 2'
    #      got: 'b 1'
    1..2
    # Looks like you failed 2 tests of 2
not ok 2 - When timeout reached, second Supply is used instead if it produces value first

Mysteriously…it failed. Why? Bad luck. My computer is a busy little machine. It can’t just give my test programs all the CPU all of the time. It needs to decode that music I’m listening to, check if I need to install the 10th set of security updates so far this month, and cope with my web browser wanting to do stuff because somebody tweeted something or emailed me. And so, once in a while, just after the clock hits 0.01 seconds and a thread grabs the whenever block to work on, that thread will be dragged off the CPU. Then, before it can get back on again, the one set to run at 0.04 seconds gets to go, and spits out its value first.

Sufficiently large times mean slow tests. Smaller values mean unreliable tests. Heck, suspend the computer in the middle of running the test suite and even a couple of seconds is too short for reliable tests.

Stop! Virtual time!

This is why I wrote Test::Scheduler. It’s an implementation of the Perl 6 Scheduler role that virtualizes time. Let’s go back to our test code and see if we can do better. First, I’ll import the module:

use Test::Scheduler;

Here’s the first test, modified to use Test::Scheduler:

subtest 'When first Supply produces a value in time, second not used', {
    my $*SCHEDULER = Test::Scheduler.new;

    my $test-source-a = supply {
        whenever Promise.in(1) {
            emit 'a 1';
        }
        whenever Promise.in(3) {
            emit 'a 2';
        }
    }
    my $test-source-b = supply {
        die "Should never be used";
    }
    my $failover-supply = failover($test-source-a, $test-source-b, 2);
    my $output = $failover-supply.Channel;

    $*SCHEDULER.advance-by(3);
    is $output.receive, 'a 1', 'Received first value from source A';
    is $output.receive, 'a 2', 'Received second value from source A';
}

Perhaps the most striking thing is how much hasn’t changed. The changes amount to two additions:

  1. The creation of a Test::Scheduler instance and the assignment to the $*SCHEDULER variable. This dynamic variable is used to specify the current scheduler to use, and overriding it allows us to swap in a different one, much like you can declare a $*OUT to do stuff like capturing I/O.
  2. A line to advance the test scheduler by 3 seconds prior to the two assertions.

The changes for the second test are very similar:

subtest 'When timeout reached, second Supply is used instead if it produces value first', {
    my $*SCHEDULER = Test::Scheduler.new;

    my $test-source-a = supply {
        whenever Promise.in(4) {
            emit 'a 1';
        }
    }
    my $test-source-b = supply {
        whenever Promise.in(1) { # start time + 2 (timeout) + 1
            emit 'b 1';
        }
        whenever Promise.in(3) { # start time + 2 (timeout) + 3
            emit 'b 2';
        }
    }
    my $failover-supply = failover($test-source-a, $test-source-b, 2);
    my $output = $failover-supply.Channel;

    $*SCHEDULER.advance-by(6);
    is $output.receive, 'b 1', 'Received first value from source B';
    is $output.receive, 'b 2', 'Received second value from source B';
}

And what difference does this make to the runtime of my tests? Here we go:

$ time perl6-m -Ilib t/failover-good.t
    ok 1 - Received first value from source A
    ok 2 - Received second value from source A
    1..2
ok 1 - When first Supply produces a value in time, second not used
    ok 1 - Received first value from source B
    ok 2 - Received second value from source B
    1..2
ok 2 - When timeout reached, second Supply is used instead if it produces value first
1..2

real    0m0.679s
user    0m0.628s
sys     0m0.060s

From 9 seconds to sub-second – and much of that will be fixed overhead rather than the time running the tests.

One more test

Let’s deal with the final of the requirements, just to round off the test writing and get to a more complete solution to the original problem. The remaining test we need is for the case where the timeout is reached, and we tap $source-b. Then, before it can produce a value, $source-a emits its first value. Therefore, we should latch on to $source-a.

subtest 'When timeout reached, and second Supply tapped, first value still wins', {
    my $*SCHEDULER = Test::Scheduler.new;

    my $test-source-a = supply {
        whenever Promise.in(3) {
            emit 'a 1';
        }
        whenever Promise.in(4) {
            emit 'a 2';
        }
    }
    my $test-source-b = supply {
        whenever Promise.in(2) { # start time + 2 (timeout) + 2
            emit 'b 1';
        }
    }
    my $failover-supply = failover($test-source-a, $test-source-b, 2);
    my $output = $failover-supply.Channel;

    $*SCHEDULER.advance-by(4);
    is $output.receive, 'a 1', 'Received first value from source A';
    is $output.receive, 'a 2', 'Received second value from source A';
}

This fails, because the latch logic wasn’t included inside of the whenever block that subscribes to $source-b. Here’s the easy fix for that:

sub failover(Supply $source-a, Supply $source-b, Real $timeout --> Supply) is export {
    supply {
        my enum Committed ;
        my $committed = None;

        whenever $source-a -> $value {
            given $committed {
                when None {
                    $committed = A;
                    emit $value;
                }
                when A {
                    emit $value;
                }
            }
        }

        whenever Promise.in($timeout) {
            if $committed == None {
                whenever $source-b -> $value {
                    given $committed {
                        when None {
                            $committed = B;
                            emit $value;
                        }
                        when B {
                            emit $value;
                        }
                    }
                }
            }
        }
    }
}

The easy thing is just a little bit repetitive, however. It would be nice to factor out the commonality into a sub. Here goes:

sub failover(Supply $source-a, Supply $source-b, Real $timeout --> Supply) is export {
    supply {
        my enum Committed ;
        my $committed = None;

        sub latch($onto) {
            given $committed {
                when None {
                    $committed = $onto;
                    True
                }
                when $onto {
                    True
                }
            }
        }

        whenever $source-a -> $value {
            emit $value if latch(A);
        }

        whenever Promise.in($timeout) {
            if $committed == None {
                whenever $source-b -> $value {
                    emit $value if latch(B);
                }
            }
        }
    }
}

And in under a second, the tests can now assure us that this was indeed a successful refactor. Note that this does not yet cancel a discarded request, perhaps saving duplicate work. I’ll leave that as an exercise for the reader.

Safety and realism

One thing you might wonder about here is whether this code is really thread safe. The default Perl 6 scheduler will schedule code across a bunch of threads. What if $source-a and $source-b emit their first value almost simultaneously?

The answer is that supply (and react) blocks promise Actor-like semantics, processing just one message at a time. So, if we’re inside of the whenever block for $source-a, and $source-b emits a message on another thread, then it will be queued up for processing afterwards.

One interesting question that follows on from this is whether the test scheduler somehow serializes everything onto the test thread in order to do its job. The answer is that no, it doesn’t do that. It wraps the default ThreadPoolScheduler and always delegates to it to actually run code. This means that, just as with the real scheduler, the code will run across multiple threads and on the thread pool. This helps to avoid a couple of problems. Firstly, it means that testing code that relies on having real threads (by doing stuff that really blocks a thread) is possible. Secondly, it means that Test::Scheduler is less likely to hide real data race bugs that may exist in the code under test.

Of course, it’s important to keep in mind that virtual time is still very much a simulation of real time. It doesn’t account for the fact that running code takes time; virtual time stands still while code runs. At the same time, it goes to some amount of effort to get the right sequencing when a time-based event triggered in virtual time leads to additional time-based events being scheduled. For example, imagine we schedule E1 in 2s and E2 in 4s, and then advance the virtual time by 4s. If the triggering of E1 schedules E3 in 1s (so, 3s relative to the start point), we need to have it happen before E2. To have this work means trying to identify when all the consequences of E1 have been shaken out before proceeding (which is easier said than done). Doing this will, however, prevent some possible overlaps that could take place in real time.

In summary…

Unit tests for code involving time can easily end up being slow and/or unreliable. However, if we can virtualize time, it’s possible to write tests that are both fast and reliable – as good unit tests should be. The Test::Scheduler module provides a way to do this in Perl 6. At the same time, virtual time is not a simulation of the real thing. The usual rules apply: a good unit test suite will get you far, but don’t forget to have some integration tests too!

Day 15 – Parsing Globs using Grammars

UPDATE 12/21/16: After some discussions here and at reddit, I realized I’d made a couple important mistakes. Sorry about that. I have corrected GlobAction and a bug in the glob helper subroutine. I also replaced the use of “AST” with “parse tree” below as well as that is more technically correct.

If you are at all familiar with Perl, you are likely to be familiar with regexes. In Perl 6, regexes have been extended to provide features and syntax allowing complex parsing tasks to be simplified and grouped together into grammars. To give a taste of this, let’s consider how to parse file globs. It’s simple, it’s well-defined, and I have some practice writing parsers for the task.

Introducing File Globs

In case you are not familiar with them, a file glob it a sort of pattern match for file names. For example, consider this glob:

*.txt

This will match any file ending in “.txt”. The asterisk (“*”) means “match as many characters as needed to make the match”. This matches “foo.txt” and “blah.blah.txt”, but not “z.html”.

Here’s another glob to consider:

.??*

This will match any file that starts with a period (“.”) followed by 2 or more characters. That means it will match a file named “.zshrc”, but not one named “..”, making it a popular heuristic for finding “hidden” files on a Unix file system.

Basic Grammar

What does a basic glob parsing grammar look like? Let me show a simple one and then I will explain the interesting bits:

grammar Glob {
    token TOP { + }
    token term {
        || 
        || 
    }
    token match {  |  }
    token match-any { '*' }
    token match-one { '?' }
    token char { . }
}

This is just about the simplest possible grammar for a glob that covers the two most popular forms. Let’s consider the parts.

A grammar is just a sort of class. Grammars may contain rules and tokens. Rules and tokens are like methods, but which are defined as a regex and are used to match the input to the grammar. Here we define a grammar named “Glob” to parse our glob.

Every grammar may define a special rule or token named “TOP”. This becomes the default regex to use when matching input with that grammar. Without it, any attempt to parse with the grammar must name the rule or token to begin with.

Aside from that each rule and token defines a regex to use when matching the input. For example, the match-any token here will match only an asterisk (“*”). Similarly, the char token will match any single character.

To be useful as a grammar, rules and tokens refer to one another using angle brackets (“<>”). Here, TOP contains one or more term matches, term matches either a match or a char (using || to try each match in the order given).

Whitespace Handling

Why do I only use token keywords and never make use of the rule? Because of whitespace. A rule inserts some implicit whitespace handling, which is helpful in keeping parsers readable. However, this handling is not helpful when parsing file globs.

Consider the following rule and token:

token { 'a' 'b'+ }
rule  { 'a' 'b'+ }

The token will match “abbbbb” and “ab” but not “a b” or “a bbbb”. The rule, however, will not match “abbbbb” or “ab” but will match “a b” and “a bbbb”. This is the whitespace handling at work. The rule above is implicitly similar to this token:

token { 'a'  'b'+  }

That ws is a predefined token available in all grammars. The built-in definition implies there is a word-break and zero or more spaces, newlines, or other whitespace.

Depending on what works better for you, you may redefine ws if you prefer something different:

token ws { "\n" } # whitespace is always newlines

Or you may just use tokens to avoid the implicit whitespace handling. This latter solution is what I believe makes the most sense for this grammar.

Parsing Input

Now that we have our basic grammar, how do we parse the input? Here’s a simple example:

my $ast = Glob.parse("*.?");
dd $ast;
# OUTPUT is an ugly, long AST
# Match $ast = Match.new(ast => Any, list => (), hash =>
# Map.new((:term([Match.new(ast => Any, list => (), hash =>
# Map.new((:match(Match.new(ast => Any, list => (), hash => Map.new(()), 
# orig => "*.?", to => 1, from => 0)))), orig => "*.?", to => 1, from => 
# 0), Match.new(ast => Any, list => (), hash => Map.new((:char(Match.new(
# ast => Any, list => (), hash => Map.new(()), orig => "*.?", to => 2, 
# from => 1)))), orig => "*.?", to => 2, from => 1), Match.new(ast => Any, 
# list => (), hash => Map.new((:match(Match.new(ast => Any, list => (), 
# hash => Map.new(()), orig => "*.?", to => 3, from => 2)))), orig => 
# "*.?", to => 3, from => 2)]))), orig => "*.?", to => 3, from => 0) 

As we get output, the parse succeeded. Given how accepting our grammar is, everything but an empty string ought to succeed. If a parser fails, the result of parse is an undefined Any.

If you’ve used regexes in Perl 6 at all, this ASTparse tree should look very familiar. Grammars are really nothing more than a little bit of extra structure around regexes. The ASTsparse trees they generate are just regex Match objects that may have additional matches nested within.

To do something useful with this ASTparse tree, you have some options. You could write a tree walker that will go through the ASTparse tree and do something interesting. However, Perl grammars provide a built-in mechanism to do that while in the middle of the parse called an action class. Let’s use an action class instead.

Action Classes

An action class is a regular class that can be paired to a grammar to perform actions based upon each rule or token matched. The methods in the action class will mirror the names of the rules and tokens in the grammar and be called for each.

Here is an action class that is able to translate the parsed glob into a matcher class:

class GlobAction {
    method TOP($/) { make GlobMatcher.new(terms => $».made) }
    method term($/) { make $/.values».made.first }
    method match($/) { make $/.values».made.first }
    method match-any($/) { make '*' }
    method match-one($/) { make '?' }
    method char($/) { make ~$/ }
}

Each method will receive a single argument which is the match object. If the argument list is not given, the implicit match variable $/ is used instead (here we make that explicit). The action class can perform any action in response to the match it wants. Often, this involves rewriting the return value of the match by using the make command. This sets the value returned by the .made method on the match (as you can see is used in term and match).

To put this action class into use, you pass the action class during parsing like so:

my $matcher = Glob.parse("*.txt", :actions(GlobAction.new)).made;
for  -> $f {
    say "$f matches" if $f ~~ $matcher;
}

Now all we need is the magic defined within GlobMatcher, which does the real work of testing whether a given file name matches the parsed glob.

Putting it all together

Here is an implementation of matching. It uses the ACCEPTS method to define a smartmatch operation that can be applied to any defined string.

class GlobMatcher {
    has @.terms;

    method ACCEPTS(Str:D $s) {
        my @backtracks = [ \([ $s.comb ], [ @!terms ]) ];
        BACKTRACK: while @backtracks.pop -> (@letters, @terms) {
            LETTER: while @letters.shift -> $c {
                last LETTER unless @terms;

                my $this-term = @terms.shift;
                my $next-term = @terms[0];

                given $this-term {
                    when '*' {
                        # Continue to the next term if we can
                        if @terms 
                        and $next-term ne '*' | '?' 
                        and $c eq $next-term {
                            push @backtracks, 
                                ([ @letters ], [ '*', |@terms ]);
                            redo;
                        }

                        # Match anything, and try again next round
                        unshift @terms, $this-term;
                    }

                    # We have a letter, so we match!
                    when '?' { }

                    # Only match exactly
                    default {
                        # If not an exact match, we fail; 
                        # try again if we can
                        next BACKTRACK if $c ne $this-term;
                    }
                }
            }

            # If we matched everything, we succeed
            return True unless @terms;

            # Otherwise, try the next backtrack, if any
        }

        # We ran out of back tracks, so we fail
        False;
    }
}

This algorithm is kind of ugly and might be prettier if I’d implemented it via a DFA or translated to regular expressions or something, but it works.

Typing out the code to parse is a bit tedious, so we’ll build ourselves a little helper to make it cleaner looking:

sub glob($glob, :$globlang = Glob) { 
    $globlang.parse($glob, :actions(GlobAction.new)).made; 
}

for  -> $f {
    next unless $f ~~ glob('*.txt');
    say "$f matches";
}

We could even use it in a nice grep:

.grep(glob('*.txt')).say;

Extending the Grammar

Now that we have a basic working grammar, let’s say we want to expand it to support a variant based on ANSI/SO SQL’s LIKE operator. It is very much like a glob, but uses percent (“%”) in place of asterisk (“*”) and underscore (“_”) in place of question mark (“?”). We could implement this as a brand new grammar, but let’s make our new grammar a sub-class instead, so we can avoid duplicating work.

grammar SQLGlob is Glob {
    token match-any { '%' }
    token match-one { '_' }
}

.grep(
    glob('%.txt', :globlang(SQLGlob))
).say;

That was too easy.

Expanding the Grammar

From here, we might also try to expand the grammar to handle character classes or curly-expansions:

[abc]*.{txt,html}

In BSD-like globs, the above will match any file starting with “a” or “b” or “c” that also ends with “.txt” or “.html”. In which case, we might consider using a proto to make it easy to expand the kinds of matches we allow. A full demonstration of this is beyond the scope of this post, but it could look something like this:

class Glob {
    ...

    token term {
        || 
        || 
        || 
    }
    proto token expansion { * }
    proto token match { * }

    ...
}

class BasicGlob is Glob {
    token match:any { '*' }
    token match:one { '?' }
}

class BSDGlob is BasicGlob {
    token expansion:list { '{' + % ',' '}' }
    token match:char-class { '['  ']' }
}

class SQLGlob is Glob {
    token match:any { '%' }
    token match:one { '_' }
}

By using a proto token or proto rule we gain a nice way of handling alternative matches within grammars. This allows us to create variety of grammar variants that don’t repeat themselves. It also gives us the flexibility to allow developers using your grammars to subclass and modify them to suit their custom needs without monkey patching.

A working example of something like this can be found in the source for IO::Glob. (Though, much of that having been written long enough ago that I’m not sure it is the best possible example of what to do, but it is an example.)

Grammars are a really nifty way of grouping a set of regexes together to build a complete parser as a unit. They are easy to read, provide tools that allow you to avoid repeating yourself, and I find them to be an enjoyable tool to play with and use in Perl 6.

Cheers.

Day 13 – Audio Streaming done Completely Wrong

Starting out

I made the audio streaming source client Audio::Libshout about a year and a half ago, it works quite well: with the speed improvements in Rakudo I have been able to stream 320kb/s MP3 without a problem, but it always annoyed me that it was difficult to test properly and even to test it at all required an Icecast server. Even with an icecast server that I could stream to, it would be necessary to actually listen to the stream in order to determine whether the stream was actually functioning correctly.

This all somewhat came to a head earlier in the year when I discovered that even the somewhat rudimentary tests that I had been using for the asynchronous streaming support had failed to detect that the stream wasn’t being fed at all. What I really needed was a sort of dumb streaming server that could act in place of the real icecast and could be instrumented to determine whether a connection was being made and that the correct audio data was being received. How hard could it be? After all it was just a special kind of web server.

I should have known from experience that this was a slippery slope, but hey.

A little aside about Audio Streaming

An icecast streaming server is basically an HTTP server that feeds a continuous stream of encoded audio data to the client listeners who connect with an HTTP GET request, the data to be streamed is typically provided by a source client which will connect to the server, probably authenticate using HTTP Authentication, and start sending the data at a steady rate that is proportional to the bitrate of the stream. libshout connects with a custom request method of SOURCE which is inherited from its earlier shoutcast origins, though icecast itself understands PUT as well for the source client. Because it is the responsibility of the listening client to supply the decoded audio data to the soundcard at exactly the right rate and the encoded data contains the bitrate of the stream as transmitted from the source the timing demands on the server are not too rigorous: it just has to be consistent and fast enough that a buffer on the client can be kept sufficiently full to supply the audio data to the sound card. Icecast does a little more in detail to, for instance, adjust for clients that don’t seem to be reading fast enough and so forth, but in principle it’s all relatively simple.

Putting it together

As might be obvious by now, an audio streaming server differs from a typical HTTP server in that rather than serving some content from disk or generated by the program itself for example, it needs to share data received on one client connection with one or more other client connections. In the simplest C implementation one might have a set of shared buffers, one of which is being populated from the source connection at any given time, whilst the others are being consumed by the client connections, alternating on filling and depletion. Whether the implementation settles on a non-blocking or threaded source possibly the most critical part of the code will be the synchronisation between the source writer and the client readers to ensure that a buffer is not being read from and written to at the same time.

In Perl 6 of course you’d hope you didn’t need to worry about these kind of annoying details as there are well thought out concurrency features that abstract away most of the nasty details.

From one to many

Perhaps the simplest program that illustrates how easy this might be would be this standalone version of the old Unix service chargen :

use v6.c;

my $supplier = Supplier.new;

start {
    loop {
        for ( 33 ... 126 ).map( { Buf.new($_) }) -> $c {
            sleep 0.05;
            $supplier.emit($c);
        }
    }
}

my $sig = signal(SIGPIPE).tap( -> $v {
    $sig.close;
});

react {
    whenever IO::Socket::Async.listen('localhost', 3333) -> $conn {
        my $p = Promise.new;
        CATCH {
            when /'broken pipe'/ {
                if $p.status !~~ Kept {
                    $p.keep: "done";
                }
            }
        }

        my $write = $supplier.Supply.tap( -> $v {
            if $p.status ~~ Planned {
                $conn.write: $v;
            }
            else {
                $write.close;
            }
        });
    }
}

In this the Supplier is being fed asynchronously to stand in for the possible source client in our streaming server, and each client that connects on port 3333 will receive the same stream of characters – if you connect with two clients (telnet or netcat for instance,) you will see they are getting the same data at roughly the same time.

The Supplier provides a shared sequence of data of which there can be many consumers, so each connection provided by the IO::Socket::Async will be fed the data emitted to the Supplier starting at the point the client connected.

The CATCH here is to deal with the client disconnecting, as the first our code will know about this is when we try to write to the connection, we’re not expecting any input from the client we can check and besides the characters becoming available to write may happen sooner than the attempt to read may register the close, so, while it may seem like a bit of a hack, it’s the most reliable and simple way of doing this: protecting from further writes with a Promise. If, by way of experiment, you were to omit the CATCH you would find that the server would quit without warning the first time the first client disconnected.

I’ll gloss over the signal outside the react as that only seems necessary in the case where we didn’t get any input data on the first connection.

Making something nearly useful

The above example is almost all we need to make something that you might be able to use, all we need is for it to handle an HTTP connection from the clients and get a source of actual MP3 data into the Supply and we’re good. To handle the HTTP parts we’ll just use the handy HTTP::Server::Tiny which will conveniently take a Supply that will feed the output data, so in fact we end up with quite a tiny program:

use HTTP::Server::Tiny;

sub MAIN(Str $file) {

    my $supplier = Supplier.new;

    my $handle = $file.IO.open(:bin);

    my $control-promise = start {
        while !$handle.eof {
            my $c = $handle.read(1024);
            $supplier.emit($c);
        }
    }

    sub stream(%env) {
        return 200, [ Content-Type => 'audio/mpeg', Pragma => 'no-cache', icy-name => 'My First Streaming Server'], $supplier.Supply;
    }
    HTTP::Server::Tiny.new(port => 3333).run(&stream);
}

The HTTP::Server::Tiny will run the stream subroutine for every request and we need to do is return the status code, some headers, and a supply from which the output data will be read, the client connection will be closed when the Supply is done (that is when the done method is called on the source Supplier.) It couldn’t really be much more
simple.

Just start the program with the path to a file containing MP3 audio and then point your favourite streaming client at port 3333 on your localhost and you should get the stream, I say should as it makes no attempt to regulate the rate at which the audio data is fed to the client. But for constant bit-rate MP3 data and a client that will buffer as much as it
can get it works.

Of course a real streaming server would read the data frame by frame and adjust the rate according to the bit-rate of each frame. I actually have made (but not yet released,) Audio::Format::MP3::Frame to help do this, but it would be over-kill for this example.

Relaying a stream

Of course the original intent of this streaming server was to be able to test a streaming source client, so we are going to have to add another part that will recognise a source connection, read from that and relay it to the normal clients in a similar way to the above.

You’ll recall that the libshout client library will connect with a request method of SOURCE so we can adjust the file streaming example to identify the source connect and feed the Supplier with the data read from the connection:

use HTTP::Server::Tiny;

sub MAIN() {

    my $supplier = Supplier.new;

    my Bool $got-source = False;

    sub stream(%env) {
        if %env<REQUEST_METHOD> eq 'GET' {
            if $got-source {
                return 200, [ Content-Type => 'audio/mpeg', Pragma => 'no-cache', icy-name => 'My First Streaming Server'], $supplier.Supply;
            }
            else {
                return 404, [ Content-Type => 'text/plain'], "no stream connected";
            }
        }
        elsif %env<REQUEST_METHOD> eq 'SOURCE' {
            my $connection = %env<p6sgix.io>;

            my $finish-promise = Promise.new;

            $connection.Supply(:bin).tap(-> $v {
                $supplier.emit($v);
            }, done => -> { $finish-promise.keep: "done" });

            $got-source = True;
            return 200, [ Content-Type => 'audio/mpeg' ], supply { whenever $finish-promise { done; } };
        }
    }
    HTTP::Server::Tiny.new(port => 3333).run(&stream);
}

You’ll see immediately that nearly all of the action is happening in the request handler subroutine: the GET branch is almost unchanged from the previous example (except that it will bail with a 404 if the source isn’t connected,) the SOURCE branch replaces the reading of the file previously. HTTP::Server::Tiny makes reading the streamed data from the source client really easy as it provides the connected IO::Socket::Async to the handler in the p6sgix.io (which I understand was originally primarily to support the WebSocket module,) theSupply of which is tapped to feed the shared Supplier that
conveys the audio data to the clients. All else that is necessary is to return with a Supply that is not intended to actually provide any data but just to close when the client closes their connection.

Now all you have to do is run the script and feed it with some source client like the following:

use Audio::Libshout;

multi sub MAIN(Str $file, Int $port = 3333, Str $password = 'hackme', Str $mount = '/foo') {

    my $shout = Audio::Libshout.new(:$port, :$password, :$mount, format => Audio::Libshout::Format::MP3);
    $shout.open;
    my $fh = $file.IO.open(:bin);

    while not $fh.eof {
        my $buf = $fh.read(4096);
        say $buf.elems;
        $shout.send($buf);
        $shout.sync;
    }

    $fh.close;
    $shout.close;
}

(Providing the path to some MP3 file again,) then if you connect a streaming audio player you will be getting some audio again.

You might notice that the script can take a password and a mount which aren’t used in this case, this is because Audio::Libshout requires them and also because this is basically the script that I have been using to test streaming servers for the last year or so.

Surprisingly this tiny streaming server works quite well, in testing I found that it ran out of threads before it got too bogged down handling the streams with multiple clients, showing how relatively efficient and well thought out the Perl 6 asynchronous model is. And how simple it is to put together a program that would probably require a lot more
code in many other languages.

Where do we go from here

I’m pretty sure that you wouldn’t want to use this code to serve up a popular radio station, but it would definitely be sufficient for my original testing purposes with a little additional instrumentation.

Of course I couldn’t just stop there so I worked up the as yet unreleased Audio::StreamThing which uses the same basic design with shared supplies, but works more like Icecast in having multiple mounts for individual streams, provision for
authentication and better exception handling.

If you’d find it useful I might even release it.

Postscript

I’d just like to get a mention in for the fabulous DJ Mike Stern as I always use his recorded
sets for testing this kind of stuff for some reason.

Have fun and make more noise.

Day 11 — Perl 6 Core Hacking: It Slipped Through The QASTs

One of the great things about Perl 6 is how accessible to regular users the compiler is to hack on. Easy bugs require nothing but knowledge of Perl 6 itself, since a lot of the Rakudo compiler is written in Perl 6. Slightly tougher bugs require knowledge of NQP, and tougher still are bugs involving Grammar and Actions. Things progress further in difficulty from there, going as far as assembly hacking on VM level, but today, we’ll stick around in Rakudo land. We have a task at hand!

Santa is having some difficulties generating his Naughty-or-Nice list due to a bug in Perl 6. He traced it down to the use of the S/// substitution operator with :g modifier that for some reason returns an empty list instead of the original string if no matches were made:

say S:g/naughty// with 'only nice list';
# OUTPUT: ()

Time to dig in and fix this, if anyone is to get any presents this year!

The Bots To The Rescue

The first thing to do when fixing a bug is to find out when it first appeared. AlexDaniel++ and MasterDuke++ implemented several IRC bots that make this task extremely simple. They are available in #perl6 and #perl6-dev IRC channels, and you can play with them in #zofbot IRC channel, without annoying anyone. We’ll be using bisectable6 bot to find out when the S/// operator got broken:

<Zoffix> bisectable6, help
<bisectable6> Zoffix, Like this: bisectable6: old=2015.12 new=HEAD exit
    1 if (^∞).grep({ last })[5] // 0 == 4 # RT128181

Just give the bot a piece of code, optionally specifying the starting and ending commits, and it’ll bisect by either the exit code, or failing at that, by output.

<Zoffix> bisectable6, S:g/d// given 'abc'
<bisectable6> Zoffix, Bisecting by output (old=2016.10 new=524368c)
    because on both starting points the exit code is 0
<bisectable6> Zoffix, bisect log:
    https://gist.github .com/c2cf9c3a7b6d13a43c34f64b96090e31
<bisectable6> Zoffix, (2016-10-23)
    https://github  .com/rakudo/rakudo/commit/b7201a8f22338a906f2d8027a21387e8f5c77f41

The last link is the interesting bit, it tells us the S:g/// was working fine until that commit. The commit does seem related—it’s the refactor lizmat++ did to make .match 150%–1400% faster—but it’s quite big and it’s not obvious how it’s linked to the workings of the S/// operator. Let’s find out, shall we?

How Do You Spell That?

We can specify the --target command line argument to perl6 executable to ask it for the output of a particular stage of the program (run perl6 --statestats -e '' to see names of all stages). Let’s output the parse stage, to find out which tokens in the Grammar we should look into:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 --target=parse -e 'S:g/d//'
- statementlist: S:g/d//
  - statement: 1 matches
    - EXPR: S:g/d//
      - value: S:g/d//
        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g
          - sibble: /d//
            - right:
            - babble:
              - B:
            - left: d
              - termseq: d
                - termaltseq: d
                  - termconjseq: 1 matches
                    - termalt: 1 matches
                      - termconj: 1 matches
                        - termish: 1 matches
                          - noun: 1 matches
                            - atom: d

There are some general tokens in the output—such as statementlist, statement, EXPR, and value—we can just gloss over. They are about statements and we want stuff for the operator itself, so the interesting bit start with this:

        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g

Let’s pop open the Grammar in our text editor and locate a token called quote. It can also be a rule, regex or method, but tokens are most common. The first thing we can locate is this:

proto token quote { <...> }
token quote:sym<apos>  {
    :dba('single quotes') "'" ~ "'"    
    <nibble(self.quote_lang(%*LANG<Quote>, "'", "'", ['q']))>
}
token quote:sym<sapos> {
    :dba('curly single quotes') "‘" ~ "’"
    <nibble(self.quote_lang(%*LANG<Quote>, "", "", ['q']))>
}

The Grammar that parses Perl 6 isn’t much different from the grammar you’d use as a user of Perl 6, so most of it probably looks familiar to you. The quote token is a proto regex, so looking further at the output --target=parse gave us, we see we need :sym<S> variant of it.

Scrolling a bit through the quote‘s candidates, we finally come across :sym<s> that sets the <sym> capture to either s or S:

token quote:sym<s> {
    <sym=[Ss]> (s)**0..1
    :my %*RX;
    :my $*INTERPOLATE := 1;
    :my $*SUBST_LHS_BLOCK;
    :my $*SUBST_RHS_BLOCK;
    {
        %*RX<s> := 1 if $/[0]
    }
    <.qok($/)>
    <rx_adverbs>
    <sibble(%*RX<P5> ?? %*LANG<P5Regex> !! %*LANG<Regex>, %*LANG<Quote>, ['qq'])>
    [ <?{ $<sibble><infixish> }> || <.old_rx_mods>? ]

}

So this token handles both s/// and S/// operators and its body is unimpressive: it seems to be no more than than some set up work. With the name of the token in hand, we now know what to look for in the Actions: method quote:sym<s>.

While finding it in actions is easy… it’s quite a biggie, with 177 lines of code to its name. However, someone nice left us a comment that fits into our puzzle:

method quote:sym<s>($/) {
    # We are emulating Str.subst/subst-mutate here, by calling match,
    # assigning the result to a temporary variable etc.
    ...

Recall bisectable6‘s results? The commit it pointed out was work on the .match method and according to the comment for S/// operator, it uses .match to do its stuff. Let’s execute that method on builds before and after the commit bisectable6 found for us. There’s another handy bot to do that for us: commitable6.

Give it a commit SHA or one of the release tags along with code to run and it’ll give you output for that code on that commit:

<Zoffix> committable6, 2016.11 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.11»: ()

<Zoffix> committable6, 2016.10 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

We ran the code on 2016.11 and 2016.10 releases and the output indicates there’s no difference… or is there? The problem with using say as a debugging tool is it often omits things we may find useful. A better alternative is the dd routine that is a Rakudo-specific utility dumper sub that’s not part of standard Perl 6 language. Give it some args and it’ll dump them out. Let’s give it a spin:

<Zoffix> committable6, 2016.11 dd 'abc'.match: :g, /d/
<comittable6> Zoffix, ¦«2016.11»: slip()

<Zoffix> committable6, 2016.10 dd 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

Aha! Another puzzle piece! When :g adverb is in use, on failed matches .match used to return an empty list, but after lizmat++’s .match improvements, it started to return Empty, which is an empty Slip. Slips tend to flatten themselves out into the outer container, so perhaps that’s causing an issue in the S///‘s action method? Let’s take a closer look at it.

Slippety Slip

A bird’s-eye view of method quote:sym<s> action shows it does some setup work and then codegens a QAST (“Q” Abstract Syntax Tree). It’d be helpful to take a look at what it generates.

One method of doing so is using the same --target feature we’ve used to get the parse stage, except we’d use the ast stage. So the command would be this:

perl6 --target=ast -e 'S:g/d//'

If you actually run that, you’ll get a text wall of QAST, and it may be tough to spot which are the bits actually generated by the S/// operator. Luckily, there’s a better way! The QAST node objects have .dump method that dumps them the same style as what you see in --target=ast output. So checkout the compiler’s repo if you haven’t already done so, pop open src/Perl6/Actions.nqp file, go to the end of method quote:sym<s> and stick note($past.dump) in there to print the dump of the QAST generated for the S/// operator:

    ...
    );
    $past.annotate('is_S', $<sym> eq 'S');
    note($past.dump); # <----------------------- like that
    make WANTED($past, 's///');  # never carp about s/// in sink context
}

(Why is it called $past and not $qast? Historical reasons: QAST used to be PAST, for Parrot Abstract Syntax Tree).

Now, compile Rakudo:

perl Configure.pl --gen-moar --gen-nqp --backends=moar
make
make test
make install

And execute our buggy S/// match to make the line we added print out S///‘s QAST:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'S:g/d// given "abc"'
- QAST::Op(locallifetime)  :is_S<?> S:g/d//
  - QAST::Stmt
    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)
    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)
    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)
      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>
    - QAST::Stmt
    - QAST::Var(lexical $/)

There are docs for types of QAST you see here, or we can just wing it.

We callmethod match and bind the result to subst_result_1:

    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)

We call nqp::p6store (p6* ops are documented in Rakudo’s repo), giving it the result of infix:<,>($/) as container and the return of .match call as value:

    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)

We check if anything matched (for :g matches, we check for a Positional that has any .elems in it):

    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)

If we did have matches, call Str!APPLY-MATCHES:

      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)

If we didn’t have matches, call nqp::p6store, storing the $_ (this is our original string S/// works on) in the $/:

      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>

Since we know the commit bisectable6 found makes .match return an empty slip for failed matches, it’s that last bit of QAST that should look suspicious, since slips flatten themselves out. We’ll return to why we’re storing into an &infix:<,>($/) rather than into $/ directly, but first, let’s write the NQP equivalent of such a setup.

We have two variables: $/ with Empty and $_ with our original string. The QAST::Op node maps out to an nqp op with the same name, so our suspicious bit looks something like this:

use nqp;

$_ = 'abc';
$/ = Empty;
nqp::p6store( &infix:<,>($/), $_);

Yet another helpful bot, camelia, lets us run a piece of code straight from IRC. Just use trigger m: with some code. Let’s try it out:

<Zoffix> m: use nqp; $_ = 'abc'; $/ = Empty;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Slip $/ = slip$()␤»

<Zoffix> m: use nqp; $_ = 'abc'; $/ = List.new;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Str $/ = "abc"␤»

The results show that when $/ is an Empty, it ends up still being it after the p6store, while if $/ is an empty List, it happily takes a string. We finally connected the S/// operator with the commit that introduced the bug and found why it occurs (although, slips behaving like that may be a bug of its own). Let’s trace where that Empty in Str.match comes from and why it’s there.

What Sourcery Is This?

There’s another bot (it’s the future! people have lots of bots!), SourceBaby, that can give you a link to source code for a routine. It uses CoreHackers::Sourcery module under the hood and takes arguments to give to its sourcery routine. Trigger it with the s: trigger:

<Zoffix> s: 'abc', 'match', \(/d/, :g)
<SourceBaby> Zoffix, Sauce is at
    https://github.com/rakudo/rakudo/blob/164eb42/src/core/Str.pm#L946

We gave it an object to call a method on (a Str), a string with the method name (match), and a Capture with which arguments the method is to be called. In return, it gave a URL to the multi that handles those args:

multi method match(Regex:D $pattern, :global(:$g)!, *%_) {
    nqp::if(
      nqp::elems(nqp::getattr(%_,Map,'$!storage')),
      self!match-cursor(nqp::getlexcaller('$/'),
        $pattern($cursor-init(Cursor,self,:0c)), 'g', $g, %_),
      nqp::if(
        $g,
        self!match-list(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)),
          CURSOR-GLOBAL, POST-MATCH),
        self!match-one(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)))
      )
    )
}

No Empty here, but we can see that when $g is true, we call self!match-list. It’s a private method, so SourceBaby would not be able to help with it. Let’s find it by searching the same source file:

# Create list from the appropriate Sequence given the move
method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      Empty,
    ))
}

And there’s our Empty! The commit message doesn’t mention why we changed from an empty List to an Empty, there are no comments in the source code explaining it, so we’ll have to resort to the most technologically non-advanced debugging tool in our arsenal… asking people.

The Dev IRC Channel

If you have questions about core development, join #perl6-dev IRC channel on Freenode. In this case, we can ask lizmat++ if she remembers whether there was a reason for that Empty.

If the person you’re trying to reach isn’t currently online, you can use the messaging bot, using the .tell trigger, followed by the person’s nick, followed by message. When the bot sees the person talk, it will deliver the message.

<babydrop> .ask stmuk_ so is `zef` now the installer being
    shipped with R*? I notice our REPL message still
    references panda; wondering if that should read zef now
<yoleaux2> babydrop: I'll pass your message to stmuk_.

After the discussion about the Empty, there doesn’t appear to be any specific reason to return it in this case, so we’ll change it to return an empty List instead, just as its old behavior was, and that will also fix our bug. The new !match-list then looks like this:

method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      List.new,
    ))
}

Compile the compiler; this time we can just run make install, since everything is already configured and pre-built from the last time we compiled:

make install

Check the change did fix the bug:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'say S:g/naughty// with "only nice list"'
only nice list

And run the test suite:

TEST_JOBS=6 make spectest

The TEST_JOBS env var lets you run multiple test files at once and the optimal value to set it at is around 1.3 times the number of cores in your computer. If you have a very meaty box (or endless patience), you can run make stresstest instead, for a more thorough test run.

With the spectest passing all of it’s tests, we are ready to finish off our work.

Test It!

The test suite is located in t/spec and is automatically checked out from its repo when you run make spectest. You can also simply delete that directory and clone your own fork as t/spec instead.

Usually, it’s easy to locate the file where the test can go into by running tree -f | grep 'some search term'. We fixed an issue with substitutions, so let’s go for this:

zoffix@VirtualBox:~/CPANPRC/rakudo/t/spec$ tree -f | grep subst
│   ├── ./integration/substr-after-match-in-gather-in-for.t
├── ./S05-substitution
│   ├── ./S05-substitution/67222.t
│   ├── ./S05-substitution/match.t
│   ├── ./S05-substitution/subst.rakudo.moar
│   └── ./S05-substitution/subst.t
│   ├── ./S32-str/substr-eq.t
│   ├── ./S32-str/substr-rw.rakudo.moar
│   ├── ./S32-str/substr-rw.t
│   ├── ./S32-str/substr.t

The ./S05-substitution/subst.t file looks like a decent candidate, pop it open. Bump the plan at the top of file by the number of tests you’re adding, then add the test at the end of the file (or a more appropriate spot):

plan 185;

...

is-deeply (S:g/FAIL// with 'foo'), 'foo',
    'S:g/// returns original string on failure to match';

Run the test file, to ensure everything passes:

make t/spec/S05-substitution/subst.t

And commit! We’re done! Santa’s Naughty-or-Nice list shall work fine from now on.

The Final Mystery

Recall that &infix:<,>($/) thing that was causing the bug when $/ contained an Empty? So what is that all about?

If you don’t know something about Perl 6, just come to our #perl6 IRC channel and ask. This is what I did when I couldn’t understand the purpose of that infix thing and after a long discussion, finding old bug tickets, and testing old bugs… we came to the conclusion these are no longer needed here!

So along with a bug fix, we also cleaned up codegen. At least that’s in theory, perhaps by doing so we created another bug that will send us on yet another great hunting journey.

Conclusion

It’s easy to give a helping hand to the core developers of Perl 6 by fixing some of the bugs. Starting from easy things that require nothing more than knowledge of Perl 6, you can progressively learn more about the internals and fix tougher problems.

The perl6 compiler comes with a number of useful output methods like --target=ast and --target=parse that can aid in debugging. An army of IRC bots makes it easy to navigate source code both in space and time, by either giving you a link to an implementation or producing output of some particular commit.

Lastly, a very valuable resource we have available is the people of Perl 6, who can help you out. Whether you’re digging deep into the guts of the compiler, or just starting out with computer programming.

Join us. We have… bugs to fix!