Day 4 – Parsing with Grammars (Book Extract)

The following is an extract from the book Parsing with Perl 6 Regexes and Grammars: A Recursive Descent into Parsing, by Moritz Lenz, published 2017 by Apress Media, LLC. Reprinted with permission.

This books is being published right now. At least the ebook version should become available for purchase this month, and print version can be pre-ordered from Amazon. It should ship in January 2018 at the latest, but with a bit of luck, it’ll be available by Christmas.

Below you can find chapter 9, Parsing with Grammars. The previous chapters discuss the building blocks of regexes in great detail, how regexes interact with Perl 6 code, match objects, regex mechanics, common regex techhniques, and reusing and composing regexes. You can acquire some of this background by reading the official documentation on regexes.

Later chapters cover action classes and objects, how to report high-quality parse errors, Unicode support, and finally three case studies.

And now, enjoy!


Grammars are the proverbial Swiss-army chain saw1 for parsing.

In this chapter, we will explore them in more detail. Most importantly, we will discuss how to harness their power.

Understanding Grammars

Grammars implement a top-down approach to parsing. The entry point, usually the regex TOP, knows about the coarse-grained structure and calls further regexes that descend into the gory details. Recursion can be involved too. For example, if you parse a mathematical expression, a term can be an arbitrary expression inside a pair of parentheses.

This is a top-down structure, or more precisely a recursive descent parser. If no backtracking is involved, we call it a predictive parser, because at each position in the string, we know exactly what we’re looking for — we can predict what the next token is going to be (even if we can only predict that it might be one of a set of alternatives).

The resulting match tree corresponds in structure exactly to the call structure of regexes in the grammar. Let’s consider parsing a mathematical expression that only includes the operators *, +, and parentheses for grouping:

grammar MathExpression {
    token TOP    { <sum> }
    rule sum     { <product>+ %  '+' }
    rule product { <term>+ % '*' }
    rule term    { <number> | <group> }
    rule group   { '(' <sum> ')' }
    token number { \d+ }
}

say MathExpression.parse('2 + 4 * 5 * (1 + 3)');

From the grammar itself you can already see the potential for recursion: sum calls product, which calls term, which calls group, which calls sum again. This allows parsing of nested expressions of arbitrary depth.

Parsing the example above produces the following match object:

⌜2 + 4 * 5 * (1 + 3)⌟
 sum => ⌜2 + 4 * 5 * (1 + 3)⌟
  product => ⌜2 ⌟
   term => ⌜2 ⌟
    number => ⌜2⌟
  product => ⌜4 * 5 * (1 + 3)⌟
   term => ⌜4 ⌟
    number => ⌜4⌟
   term => ⌜5 ⌟
    number => ⌜5⌟
   term => ⌜(1 + 3)⌟
    group => ⌜(1 + 3)⌟
     sum => ⌜1 + 3⌟
      product => ⌜1 ⌟
       term => ⌜1 ⌟
        number => ⌜1⌟
      product => ⌜3⌟
       term => ⌜3⌟
        number => ⌜3⌟

If you want to know how a particular number was parsed, you can follow the path backwards by looking for lines above the current line that are indented less; for instance, the number 1 was parsed by token number, called from term, called from product, and so on.

We can verify this by raising an exception from token number:

    token number {
        (\d+)
        { die "how did I get here?" if $0 eq '1' }
    }

This indeed shows the call chain in the backtrace, with the most immediate context at the top:

how did I get here?
  in regex number at bt.p6 line 9
  in regex term at bt.p6 line 5
  in regex product at bt.p6 line 4
  in regex sum at bt.p6 line 3
  in regex group at bt.p6 line 6
  in regex term at bt.p6 line 5
  in regex product at bt.p6 line 4
  in regex sum at bt.p6 line 3
  in regex TOP at bt.p6 line 2
  in block <unit> at bt.p6 line 13

This grammar only uses tokens and rules, so there is no backtracking involved, and the grammar is a predictive parser. This is fairly typical. Many grammars work fine without backtracking, or with backtracking in just a few places.

Recursive Descent Parsing and Precedence

The MathExpression grammar has two rules which are structurally identical:

    rule sum { <product>+ %  '+' }
    rule product { <term>+ % '*' }

Instead, we could have written

rule  expression { <operator>+ % <term> }
token operator   {  '*' | '+' }

or even used the proto token construct discussed in the previous chapter to parse different operators. The reason I chose the first, more repetitive, approach is that it makes the match structure correspond to the precedence of the operators * and +.

When evaluating the mathematical expression 1 + 2 * 5, mathematicians and most programming languages evaluate the 2 * 5 first, because the * operator has tighter precedence than +. The result is then substituted back into the expression, leading to 1 + 10, and finally 11 as the result.

When parsing such expressions with the first version of the grammar, the structure of the parse tree expresses this grouping: it has — as the top level — a single sum, with the operands being 1 and 2 * 5.

This comes at a cost: we need a separate rule and name for each precedence level, and the nesting of the resulting match object has at least one level per precedence level. Furthermore, adding more precedence levels later on is not trivial, and very hard to do in a generic way. If you are not willing to accept these costs, you can instead use the flat model with a single token for parsing all operators. If you then need the structure in a way that reflects precedence, you can write code that transforms the list into a tree. This is commonly called an operator precedence parser.

Left Recursion and Other Traps

To avoid infinite recursion, you have to take care that each possible recursion cycle advances the cursor position by at least one character. In the MathExpression grammar, the only possible recursion cycle is sumproducttermgroupsum, and group can only match if it consumes an initial open parenthesis, (.

If recursion does not consume a character, it is called left recursion and needs special language support that Perl 6 does not offer. A case in point is

token a { <a>? 'a' }

which could match the same input as the regex a+, but instead loops infinitely without progressing.

A common technique to avoid left recursion is to have a structure where you can order regexes from generic (here sum) to specific (number). You only have to be careful and check for consumed characters when a regex deviates from that order (e.g. group calling sum).

Another potential source of infinite loops is when you quantify a regex that can match the empty string. This can happen when parsing a language that actually allows something to be empty. For instance, in UNIX shells, you can assign variables by potentially leaving the right-hand side empty:

VAR1=value
VAR2=

When writing a grammar for UNIX shell commands, it might be tempting to write a token string { \w* } that would potentially match an empty string. In a situation that allows for more than one string literal, <string>+ can then hang, because the effective regex, [\w*]+, tries to match a zero-width string infinitely many times.

Once you are aware of the problem, the solution is pretty simple: change the token to not allow an empty string (token string { \w+ }), and explicitly take care of situations where an empty string is allowed:

    token assignment {
        <variable> '=' <string>?
    }

Starting Simple

Even though a grammar works from the top down, developing a grammar works best from the bottom up. It is often not obvious from the start what the overall structure of a grammar will be, but you usually have a good idea about the terminal tokens: those that match text directly without calling other subrules.

In the earlier example of parsing mathematical expressions, you might not have known from the start how to arrange the rules that parse sums and products, but it’s likely that you knew you had to parse a number at some point, so you can start by writing:

grammar MathExpression {
    token number { \d+ }
}

This is not much, but it’s also not very complicated, and it’s a good way to get over the writer’s block that programmers sometimes face when challenged with a new problem area. Of course, as soon as you have a token, you can start to write some tests:

grammar MathExpression {
    token number { \d+ }
}

multi sub MAIN(Bool :$test!) {
    use Test;
    plan 2;
    ok MathExpression.parse('1234', :rule<number>),
        '<number> parses 1234';
    nok MathExpression.parse('1+4', :rule<number>),
        '<number> does not parse 1+4';
}

Now you can start to build your way up to more complex expressions:

grammar MathExpression {
    token number { \d+ }
    rule product { <number>+ % '*' }
}

multi sub MAIN(Bool :$test!) {
    use Test;
    plan 5;
    ok MathExpression.parse('1234', :rule<number>),
        '<number> parses 1234';
    nok MathExpression.parse('1+4', :rule<number>),
        '<number> does not parse 1+4';

    ok MathExpression.parse('1234', :rule<product>),
        '<product> can parse a simple number';
    ok MathExpression.parse('1*3*4', :rule<product>),
        '<product> can parse three terms';
    ok MathExpression.parse('1 * 3', :rule<product>),
        '<product> and whitespace';
}

It is worth it to include whitespace early on in the tests. The example above looks innocent enough, but the last test actually fails. There is no rule that matches the whitespace between the 1 and the *. Adding a space in the regex between the <number> and the + quantifier makes the tests pass again, because the whitespace inserts an implicit <.ws> call.

Such subtleties are easy to catch if you start really simple and catch them as soon as possible. If instead you give in to the temptation of writing down a whole grammar from top to bottom, you can spend many hours debugging why some seemingly simple thing such as an extra space makes the parse fail.

Assembling Complete Grammars

Once you have written the basic tokens for lexical analysis, you can progress to combining them. Typically the tokens do not parse whitespace at the borders of their matches, so the rules that combine them do that.

In the MathExpression example in the previous section, rule product directly called number, even though we now know that the final version uses an intermediate step, rule term, which can also parse an expression in parentheses. Introducing this extra step does not invalidate the tests we have written for product, because the strings it matched in the early version still match. Introducing more layers happens naturally when you start with a grammar that handles a subset of the language, which you later expand.

Debugging Grammars

There are two failure modes for a regex or a grammar: it can match when it’s not supposed to match (a false positive), or it can fail to match when it’s supposed to match (a false negative). Typically, false positives are easier to understand, because you can inspect the resulting match object and see which regexes matched which part of the string.

There is a handy tool for debugging false negatives: the Grammar::Tracer module. If you load the module in a file containing a grammar, running the grammar produces diagnostic information that can help you find out where a match went wrong.

Note that this is only a diagnostic tool for developers; if you want to give end users better error messages, please read Chapter 11 for improvement suggestions.

You need to install the Perl 6 module Grammar::Debugger, which also contains Grammar::Tracer. If you use the moritzlenz/perl6-regex-alpine docker image, this is already done for you. If you installed Perl 6 via another method, you need to run

zef install Grammar::Debugger

on the command line. If zef is not yet installed, follow the installation instructions on the zef GitHub page.

Let’s look at the Perl 6 module Config::INI by Tadeusz Sośnierz. It contains the following grammar (slightly reformatted here):

grammar INI {
    token TOP {
        ^ <.eol>* <toplevel>?  <sections>* <.eol>* $
            }
    token toplevel { <keyval>* }
    token sections { <header> <keyval>* }
    token header   { ^^ \h* '[' ~ ']' $<text>=<-[ \] \n ]>+
                     \h* <.eol>+ }
    token keyval   { ^^ \h* <key> \h* '=' \h* <value>? \h*
                     <.eol>+ }
    regex key      { <![#\[]> <-[;=]>+ }
    regex value    { [ <![#;]> \N ]+ }
    token eol      { [ <[#;]> \N* ]? \n }
}

Suppose we want to understand why it does not parse the following piece of input text:

a = b
[foo]
c: d

So, before the grammar, we insert the line

use Grammar::Tracer;

and after it, add a small piece of code that calls the .parse method of that grammar:

INI.parse(q:to/EOF/);
a = b
[foo]
c: d
EOF

This produces a sizable, but fairly informative piece of output.

Each entry consists of a name of a regex, like TOP or eol (for "end of line"), followed by the indented output of the regexes it calls. After each regex comes a line containing an asterisk (*) and either MATCH followed by the string segment that the regex matched, or FAIL if the regex failed.

Let’s look at the output piece by piece, even if it comes out in one chunk:

TOP
|  eol
|  * FAIL
|  toplevel
|  |  keyval
|  |  |  key
|  |  |  * MATCH "a "
|  |  |  value
|  |  |  * MATCH "b"
|  |  |  eol
|  |  |  * MATCH "\n"
|  |  |  eol
|  |  |  * FAIL
|  |  * MATCH "a = b\n"
|  |  keyval
|  |  |  key
|  |  |  * FAIL
|  |  * FAIL
|  * MATCH "a = b\n"

This tells us that TOP called eol, which failed to match. Since the call to eol is quantified with *, this does not cause the match of TOP to fail. TOP then calls key, which matches the text "a", and value, which matches "b". The eol regex then proceeds to match the newline character, fails on the second attempt (since there are no two newline characters in a row). This causes the initial keyval token to match successfully. A second call to keyval matches pretty quickly (in the call to key). Then, the match of token toplevel proceeds successfully, consuming the string "a = b\n".

So far this all looks as expected. Now let’s take a look at the second chunk of output:

|  sections
|  |  header
|  |  |  eol
|  |  |  * MATCH "\n"
|  |  |  eol
|  |  |  * FAIL
|  |  * MATCH "[foo]\n"
|  |  keyval
|  |  |  key
|  |  |  * MATCH "c: d\n"
|  |  * FAIL
|  * MATCH "[foo]\n"

TOP next calls sections, wherein token header successfully matches the string "[foo]\n". Then, keyval calls key, which matches the whole line "c: d\n". Wait, that’s not right, is it? We might have expected key to only match the c. I certainly wouldn’t have expected it to match a newline character at the end. The lack of an equals sign in the input causes the regex engine to never even call regex value. But since keyval is again quantified with the star * quantifier, the match of the calling regex sections succeeds in matching just the header "[foo]\n".

The last part of the Grammar::Tracer output follows:

|  sections
|  |  header
|  |  * FAIL
|  * FAIL
|  eol
|  * FAIL
* FAIL

It’s FAILs from here on. The second call to sections again tries to parse a header, but its next input is still "c: d\n", so it fails. As does the end-of-string anchor $ in token TOP, failing the overall match in method parse.

So we have learned that regex key matched the whole line c: d\n, but since no equals sign (=) follows it, token keyval cannot parse this line. Since no other regex (notably not header) matches it, this is where the match fails.

As you can see from this example run, Grammar::Tracer enables us to pinpoint where a parse failure happens, even though we had to look carefully through its output to locate it. When you run it in a terminal, you automatically get colored output, with FAIL having a red and MATCH a green background, and token names standing out in bold white (instead of the usual gray) output. This makes it easier to scan from the bottom (where a failed match usually leaves a trail of red FAILs) up to the trailing successful matches, and then look in the vicinity of the border between matches and failures.

Since debugging imposes a significant mental burden, and the output from Grammar::Tracer tends to grow quickly, it is generally advisable to reduce the failing input to a minimal specimen. In the case described above, we could have removed the first line of the input string and saved ten lines of Grammar::Tracer output to look through.

Parsing Whitespace and Comments

As said before, the idiomatic way to parse insignificant whitespace is by calling <.ws>, typically implicitly by using whitespace in a rule. The default ws implementation, <!ww>\s*, works well for many languages, but it has its limits.

In a surprising number of file formats and computer languages, there is significant whitespace that <.ws> would just gobble up. These include INI files (where a newline typically indicates a new key/value pair), Python and YAML (where indentation is used for grouping), CSV (where a newline signals a new record), and Makefiles (where indentation is required to be with a tabulator character).

In these cases, it is best practice to override ws in your own grammar to match only insignificant whitespace. Let’s take a look at a second, minimalistic INI parser, independently developed from the one described in the previous section:

grammar INIFile {
    token TOP { <section>* }
    token section {
        <header>
        <keyvalue>*
    }
    rule header {
        '['  <-[ \] \n ]>+ ']' <.eol>
    }
    rule keyvalue {
        ^^
        $<key>=[\w+]
        <[:=]>
        $<value>=[<-[\n;#]>*]
        <.eol>
    }
    token ws { <!ww> \h* }
    token eol {
        \n [\h*\n]*
    }
}

This parses simple INI configuration files like this:

[db]
driver: mysql
host: db01.example.com
port: 122
username: us123
password: s3kr1t

Take note how this grammar uses two paths for parsing whitespace: a custom ws token that only matches horizontal whitespace (blanks and tabs), and a separate token eol that matches (significant) line breaks. The eol token also gobbles up further lines consisting only of whitespace.

If a language supports comments, and you don’t want them to appear in your parse tree, you can parse them either in your ws token, or in eol (or your equivalent thereof). Which one it is depends on where comments are allowed. In INI files, they are only allowed after a key-value pair or in a line on their own, so eol would be the fitting place. In contrast, SQL allows comments in every place where whitespace is allowed, so it is natural to parse them in ws:

# comment parsing for SQL:
token ws { <!ww> \s* [ '--' \N* \n ]* }

# comment parsing for INI files:
token eol { [ [ <[#;]> \N* ]? \n ]+ }

Keeping State

Some of the more interesting data formats and languages require the parser to store things (at least temporarily) to be able to correctly parse them. A case in point is the C programming language, and others inspired by its syntax (such as C++ and Java). Such languages allow variable declarations of the form type variable = initial_value, like this:

int x = 42;

This is valid syntax, but only if the first word is a type name. In contrast, this would be invalid, because x is not a type:

int x = 42;
x y = 23;

From these examples it is pretty clear that the parser must have a record of all the types it knows. Since users can also declare types in their code files, the parser must be able to update this record.

Many languages also require that symbols (variables, types, and functions) be declared before they are referenced. This too requires the grammar to keep track of what has been declared and what hasn’t. This record of what has been declared (and what is a type or not, and possibly other meta information) is called a symbol table.

Instead of parsing the full C programming language, let’s consider a minimalist language that just allows assignments of lists of numbers, and variables to variables:

a = 1
b = 2
c = a, 5, b

If we don’t impose declaration rules, it’s pretty easy to write a grammar:

grammar VariableLists {
    token TOP        { <statement>* }
    rule  statement  { <identifier> '=' <termlist> \n }
    rule  termlist   { <term> * % ',' }
    token term       { <identifier> | <number> }
    token number     { \d+ }
    token identifier { <:alpha> \w* }
    token ws         { <!ww> \h* }
}

Now we demand that variables can only be used after they’ve been assigned to, so that the following input would be invalid, because b is not declared in the second line, where it’s used:

a = 1
c = a, 5, b
b = 2

To maintain a symbol table, we need three new elements: a declaration of the symbol table, some code that adds a variable name to the symbol table when the assignment has been parsed, and finally a check whether a variable has been declared at the time we come across it in a term list:

grammar VariableLists {
    token TOP {
        :my %*SYMBOLS;
        <statement>*
    }
    token ws { <!ww> \h* }
    rule statement {
        <identifier>
        { %*SYMBOLS{ $<identifier> } = True }
        '=' <termlist>
        \n
    }
    rule termlist { <term> * % ',' }
    token term { <variable> | <number> }
    token variable {
        <identifier>
        <?{ %*SYMBOLS{ $<identifier> } }>
    }
    token number { \d+ }
    token identifier { <:alpha> \w* }
}

In the token TOP, :my %*SYMBOLS declares a variable. Declarations in regexes start with a colon (:), and end with a semicolon (;). In between they look like normal declarations in Perl 6. The % sigil signals that the variable is a hash — a mapping of string keys to values. The * makes it a dynamic variable — a variable that is not limited to the current scope but also visible to code (or regexes, which are also code) that is called from the current scope. Since this is an unusually large scope, it is custom to choose a variable in CAPITAL LETTERS.

The second part, adding a symbol to the symbol table, happens in the rule statement:

    rule statement {
        <identifier>
        { %*SYMBOLS{ $<identifier> } = True }
        '=' <termlist>
        \n
    }

Inside the curly braces is regular (non-regex) Perl 6 code, so we can use it to manipulate the hash %*SYMBOLS. The expression $<identifier> accesses the capture for the variable name2. Thus, if this rule parses a variable a, this statement sets %*SYMBOLS{ 'a' } = True.

The placement of the code block is relevant. Putting it before the call to termlist means that the variable is already known when the term list is parsed, so it accepts input like a = 2, a. If we call termlist first, this kind of input is rejected.

Speaking of rejection, this part happens in token variable. term now calls the new token variable (previously it called identifier directly), and variable validates that the symbol has been declared before:

    token term { <variable> | <number> }
    token variable {
        <identifier>
        <?{ %*SYMBOLS{ $<identifier> } }>
    }

You might remember from earlier examples that <?{ ... }> executes a piece of Perl 6 code, and fails the parse if it returns a false value. If $<identifier> is not in %*SYMBOLS, this is exactly what happens. At this time the non-backtracking nature of tokens is important. If the variable being parsed is abc, and a variable a is in %*SYMBOLS, backtracking would try shorter matches for <identifier> until it hits a, and then succeeds3.

Since %*SYMBOLS is declared in token TOP, you have to duplicate this declaration when you try to call rules other than TOP from outside the grammar. Without a declaration such as my %*SYMBOLS;, a call like

VariableLists.parse('abc', rule => 'variable');

dies with

Dynamic variable %*SYMBOLS not found

Implementing Lexical Scoping with Dynamic Variables

Many programming languages have the concept of a lexical scope. A scope is the area in a program where a symbol is visible. We call a scope lexical if the scope is determined solely by the structure of the text (and not, say, runtime features of the program).

Scopes can typically be nested. A variable declared in one scope is visible in this scope, and in all inner, nested scopes (unless an inner scope declares a variable of the same name, in which case the inner declaration hides the outer).

Coming back to the toy language of lists and assignments, we can introduce a pair of curly braces to denote a new scope, so this is valid:

a = 1
b = 2
{
    c = a, 5, b
}

but the next example is invalid, because it declares b only in an inner scope, and so it is not visible in the outer scope:

a = 1
{
    b = 2
}
c = a, 5, b

To implement these rules in a grammar, we can make use of an important observation: dynamic scoping in a grammar corresponds to lexical scoping in text it parses. If we have a regex block that parses both the delimiters of a scope and the things inside that scope, its dynamic scope is confined to all of the regexes it calls (directly and indirectly), and that is also the extent of the lexical scope it matches in the input text.

Let’s take a look at how we can implement dynamic scoping:

grammar VariableLists {
    token TOP {
        :my %*SYMBOLS;
        <statement>*
    }
    token ws { <!ww> \h* }
    token statement {
        | <declaration>
        |  <block>
    }
    rule declaration {
        <identifier>
        { %*SYMBOLS{ $<identifier> } = True; }
        '=' <termlist>
        \n
    }
    rule block {
        :my %*SYMBOLS = CALLERS::<%*SYMBOLS>;
        '{' \n*
            <statement>*
        '}' \n*
    }
    rule termlist { <term> * % ',' }
    token term { <variable> | <number> }
    token variable {
        <identifier>
        <?{ %*SYMBOLS{ $<identifier> } }>
    }
    token number { \d+ }
    token identifier { <:alpha> \w* }
}

There are a few changes to the previous version of this grammar: the rule statement has been renamed to declaration and the new rule statement parses either a declaration or a block.

All the interesting bits happen in the block rule. The line :my %*SYMBOLS = CALLERS::<%*SYMBOLS>; declares a new dynamic variable %*SYMBOLS and initializes it with the previous value of that variable. CALLERS::<%*SYMBOLS> looks through the caller, and the caller’s caller, and so on for a variable %*SYMBOLS, and thus looks up the value corresponding to the outer scope. The initialization creates a copy of the hash, such that changes to one copy do not affect the other copies.

Let’s take a look at what happens when this grammar parses the following input:

a = 1
b = 2
{
    c = a, 5, b
}

After the first two lines, %*SYMBOLS has the value {a => True, b => True}. When rule block parses the opening curly bracket on the third line, it creates a copy of %*SYMBOLS. The declaration of c on the fourth line inserts the pair c => True into the copy of %*SYMBOLS. After rule block parses the closing curly brace on the last line, it exits successfully, and the copy of %*SYMBOLS goes out of scope. This leaves us with the earlier version of %*SYMBOLS (with only the keys a and b), which then goes out of scope when TOP exits.

Scoping Through Explicit Symbol Tables

Using dynamic variables for managing symbol tables usually works pretty well, but there are some edge cases where a more explicit approach works better. Such edge cases include those where there are so many symbols that copying becomes prohibitively expensive, or where more than the top-most scope must be inspected, or when copying the symbol table is impractical for other reasons.

Consequently, you can write a class for your symbol table (which in the simplest case uses an array as a stack of scopes) and explicitly call methods on it when entering and leaving scopes, when declaring a variable, and for checking whether a variable is known in a scope:

class SymbolTable {
    has @!scopes = {}, ;
    method enter-scope() {
        @!scopes.push({})
    }
    method leave-scope() {
        @!scopes.pop();
    }
    method declare($variable) {
        @!scopes[*-1]{$variable} = True
    }
    method check-declared($variable) {
        for @!scopes.reverse -> %scope {
            return True if %scope{$variable};
        }
        return False;
    }
}

grammar VariableLists {
    token TOP {
        :my $*ST = SymbolTable.new();
        <statement>*
    }
    token ws { <!ww> \h* }
    token statement {
        | <declaration>
        |  <block>
    }
    rule declaration {
        <identifier>
        { $*ST.declare( $<identifier> ) }
        '=' <termlist>
        \n
    }
    rule block {
        '{' \n*
            { $*ST.enter-scope() }
            <statement>*
            { $*ST.leave-scope() }
        '}' \n*
    }
    rule termlist { <term> * % ',' }
    token term { <variable> | <number> }
    token variable {
        <identifier>
        <?{ $*ST.check-declared($<identifier>) }>
    }
    token number { \d+ }
    token identifier { <:alpha> \w* }
}

The class SymbolTable has the private array attribute @!scopes which is initialized with a list containing a single, empty hash {}. Entering a scope means pushing an empty hash on top of this array, and when leaving the scope it is removed again through the pop method call. A variable declaration adds its name to the top-most hash, @!scopes[*-1].

Checking for the presence of a variable must not just consider the top-most hash, because variables are inherited to inner scopes. Here we go through the all scopes in reverse order, from inner-most to outer-most scope. The order of traversal is not relevant for a simple Boolean check, but if you need to look up information associated with the variable, it is important to adhere to this order to reference the correct one.

Token TOP creates a new object of class SymbolTable, declaration calls the declare method, and token variable calls method check-declared. The rule block calls enter-scope before parsing the statement list, and leave-scope afterwards. This works, but only if the statement list can be parsed successfully; if not, rule block fails before it manages to call leave-scope.

Perl 6 has a safety feature for such situations: if you prefix a statement with LEAVE, Perl 6 calls it for you at routine exit, in all circumstances where this is possible (even if an exception is thrown). Since the LEAVE phaser only works in regular code and not in regexes, we need to wrap the regex in a method:

    method block {
        $*ST.enter-scope();
        LEAVE $*ST.leave-scope();
        self.block_wrapped();
    }
    rule block_wrapped {
        '{' \n*
            <statement>*
        '}' \n*
    }

Now we have the same robustness as the approach with dynamic variables, and more flexibility to add extra code to the symbol table, at the cost of more code and increased effort.

Summary

Perl 6 grammars are a declarative way to write recursive descent parsers. Without backtracking, they are predictive; at each point, we know what list of tokens to expect.

The recursive nature of grammars comes with the risk of left recursion, a situation where a recursive path does not consume any characters, and so leads to an infinite loop.

In spite of the top-down nature of grammars, writing them typically happens from the bottom up: starting with lexical analysis, and then moving up to parsing larger structures.

Complex languages require additional state for successful and precise parsing. We have seen how you can use dynamic variables to hold state in grammars, how their scope corresponds to lexical scoping in the input, and how symbol tables can be written and integrated into grammars.


  1. Like a Swiss-army knife, but with much more power.

  2. At this point it is crucial that identifier does not parse its surrounding whitespace. Hence the principle that tokens do not care about whitespace, and the rules that call those tokens parse the whitespace.

  3. In this case, this would be harmless, because no other rule could match the rest of the variable, leading to a parse error nonetheless. But in more complicated cases, this kind of unintended backtracking can lead to errors that are very puzzling for the maintainer of the grammar.

Day 3 – LetterOps with Perl6

Scale

“Scale! Scale is everything!”.

Elves scattered in all directions when the booming voice of Santa reached them.

“This operation is prepared for, what, thirty four children? And now we have zillions of them! And adults are sending letters too!”

Buzzius the elf stepped forward and spurted “But now we have computers!”, darting back again to his elvish pursuits.

“What good are they? Pray tell me, what can I do if I still have to read every single letter?”.

Diodius the elf briefly raised his head from his hiding place and said “Tell the children to send a letter in a text file”.

Santa stopped yelling and scratched his well-bearded chin. “I can do that”. Early children adopters sent a letter just like this one.

Dear Santa: I have been a good boy so I want you to bring me a collection of scythes and an ocean liner with a captain and a purser and a time travel machine and instructions to operate it and I know I haven't been so good at times but that is why I'm asking the time machine so that I can make it good and well and also find out what happened on July 13th which I completely forgot.

“I can do that?”. Santa repeated to himself. He would have to extract a list of gifts out of that single-line mess. For instance, dividing it by and.

And, of course, using Perl 6, which being able to use as a variable, and even runic our $ᚣ = True was his favorite language. In a single line you can get all the chunks obtaining something like this:

[ "Dear Santa: I have been a good boy so I want you to bring me a collection of scythes", "an ocean liner with a captain", "a purser", "a time travel machine", "instructions to operate it", "I know I haven't been so good at times but that is why I'm asking the time machine so that I can make it good", "well", "also find out what happened on July 13th which I completely forgot.\n" ]

The /\s* «and» \s*/ regexp took the ands and also trimmed spaces, creating a set of sentences. And these sentences might or might not contain something the customer wanted Santa to bring. Which made Santa start roaring again. “Scale and structure! We need to scale and we need structure!”

Markdown to the rescue

Marcius pitched in. “Everybody knows Markdown. It’s text, with a few marks thrown in for structure.”

Oakius, who was working towards his promotion to Elf, Second class, said. “Use the elvish-est language, Elm. You know, it’s elf but for a letter”

“I can do that”, said Santa. Elves loved his can do approach. So he installed the whole thing and did this little program

Santa was quiet for about 30 seconds. And then his roaring could be heard again.

“Never, you hear me? Never I want to hear again about this spawn from the Easter Bunny or other evil creatures”.

Those elves nearest the screen observed lots of red, but not nice red, and nothing resembling working code. So they gave Rudolph the (nice) Red Nose Reindeer a note, which he dutifully carried pricked in one of his smaller antlers.

“Should we go back to Perl6 then?”

Processing Markdown with Perl6

Santa Found Text::Markdown, which he promptly installed with

zef install Text::Markdown

It had Text, it had Markdown, it promised to process it, that was all he needed. So he got word to his customer base that markdown was going to be needed this year if you wanted this guy to go down your chimney with a burlap bag with nice stuff in it.

Once again, early adopters answered with this

# Dear Santa

I have been a good boy so I want you to bring me a collection of
scythes and an ocean liner with a captain and a purser and a time
travel machine and instructions to operate it and I know I haven't
been so good at times but that is why I'm asking the time machine so
that I can make it good and well and also find out what happened on
July 13th which I completely forgot.

Well, it is Markdown, is is not? It’s properly addressed and all. “Properly addressing a letter is important”, Santa said aloud, in a not-quite-yell that only startled Rudolph, which was the only one hanging around. “It gives structure. Let us check whether letters have this”.

“Wow!” Said Santa. And then, “Wow”. Just a few lines of code, one to read and understand the structure of the document, another one to check if there is at least one that is a heading. It will say True if that is the case. And it was true.

Santa was happy for a tiny while. He scratched the scruff of the neck of Rudolph, who was kind of surprised by this. Then he stopped doing it. Rudolph looked up and backed his hind legs just this tiny bit, feeling unhappiness.

More structure is needed.

Santa had found this letter:

# Dude

## Welll...

I have been a naughty person


## Requests

Well...

Proper addressing and everything, he could not waste his time with persons that had not been good. Scale. And resources. Resources should be spent only in good persons, not in bad persons. Bad persons are bad, and that’s that. So went back to coding, Rudolph slipped away looking for lichen candy or whatever, and he produced this:

Santa was kind of proud of the trick that extracted the paragraphs after the second heading, as well as the fact that he had been able to put to good use the Thorn letter, which he loved. He also loved functional programming, having cut his teeth in Lisp. So he created this flip-switch that is initially false but flips on when the element it is dealing with is a heading and its level is two. He was also happy that he could do this kind of thing with the structured layered on top of the text by the marks.

Besides, he could check whether the word “good” was present in any of the paragraphs between that heading (Behavior) and the next. And any is so cool. It is enough that one of the paragraph mentions good. The last line will first return an array of Boolean values, and will eventually say True if just one of them includes good. False otherwise. Good for culling the good from the bad.

Santa was happ-y. -Ier. But still.

The toys are the important thing here.

So what he actually wanted was a list of the toys. After requesting, once again, a change of letter format, which he could do because he was Santa and everyone wanted his free stuff for Christmas, he started to receive letters with this structure:

# Dear Santa

## Behavior

I have been a good boy 


## Requests

And this is what I want

 - scythes 
 - an ocean liner with a captain and a purser
 - a time travel machine and instructions to operate it 

What they lack in spontaneity they have in structure. And structure is good. You can get a list of requests thus:

That is really an unsaintly list of chained list processing expressions. And this sentence before this one has an list of list mentions that is almost as bad. But let us see what is going on there.

First thing in the list, we take only what comes after the Requests heading, using regular expressions and stuff. We could have probably pared it down to a transformation to Str but we would have lost the structure. And structure is important, Santa is never tired of repeating that. Next we extract only those elements that are actually a list, taking out all the fluff.

And it so happens that there is such a thing as too much structure. The list has elements that have elements that have elements in it.

That, or Text::Markdown could do with a big makeover. Which is what the author of this post is putting on his particular wish list.

Not there yet

But almost. We have the list, and now Santa finds things like time travel machines and Mondays and things like that. He cannot order Mondays in the elf factory. He would have to read every single list of things. But no worries. That can be taken care of, too:

Simply enough, this program goes over the saved list of items in the wish list, and checks for product-ness. Is it a product? It goes. Are you asking for last Friday evening, which you completely missed? It does not, and don’t you dare to waste Santa’s time, boy.

The gist of the thing is in the Wikidata query, which uses the brand-new Wikidata::API module. This module just sends stuff to the Wikidata API and returns it as an object. Believe it or not, that is what the SPARQL query does: inserts the item name into the query, makes the query, and returns true if the number of returned elements is not zero. Productness at your fingertips! In a few lines of code! Now he could just chain all the stuff together and obtain from a letter containing this

 - Morning sickness
 - Scythe
 - Mug

Just the two of them which you can actually order from your local, downtown, mom and pop shop, which is where Santa actually goes to secretly buy all the stuff because he buys in bulk and he gets a pretty good deal.

Santa smiled, and a loud cheer erupted from the crowd of elves, reindeers, and a couple of puffins that were there for no good reason. They then set down to

Wrap up

Santa and Perl 6 are a good match, simply because they both came in Christmas time. Santa finds you can do lots of useful things with it, by itself or by using one of the fine modules that have become available lately.

The author of this, however, will include in his letter to Santa some help to carry ahead with the two modules used in this post, maintained by him, and which need more experienced coders to test, extend and maybe rewrite from scratch. But he is happy to see that mundane and slightly divine things like processing letters to Santa can be done straight away using Perl6. And you should it too.

Code and samples for this post are available from GitHub. Also this text. Help and suggestion are very much welcome.

Day 2 – Perl 6: Sigils, Variables, and Containers

Having a rudimentary understanding of containers is vital for enjoyable programming in Perl 6. They're ubiquitous and not only do they affect the kind of variables you get, they also dictate how Lists and Maps behave when iterated.

Today, we'll learn what containers are and how to work with them, but first, I'd like you to temporarily forget everything you might know or suspect about Perl 6's sigils and variables, especially if you're coming from Perl 5's background. Everything.

Show Me The Money

In Perl 6, a variable is prefixed with a $ sigil and is given a value with a binding operator (:=). Like so:

my $foo := 42;
say "The value is $foo"; # OUTPUT: «The value is 42␤»

If you've followed my suggestion to forget everything you know, it won't shock you to learn the same applies to List and Hash types:

my $ordered-things := <foo bar ber>;
my $named-things := %(:42foo, :bar<ber>);
say "$named-things<foo> bottles of $ordered-things[2] on the wall";
# OUTPUT: «42 bottles of ber on the wall␤»
.say for $ordered-things; # OUTPUT: «foo␤bar␤ber␤»
.say for $named-things; # OUTPUT: «bar => ber␤foo => 42␤»

Knowing just this, you can write a great variety of programs, so if you ever start to feel like there's just too much to learn, remember you don't have to learn everything at once.

We Wish You a Merry Listmas

Let's try doing more things with our variables. It's not uncommon to want to change a value in a list. How well do we fare with what we have so far?

my $list := (1, 2, 3);
$list[0] := 100;
# OUTPUT: «Cannot use bind operator with this left-hand side […] »

Although we can bind to variables, if we attempt to bind to some value, we get an error, regardless of whether the value comes from a List or just, say, a literal:

1 := 100;
# OUTPUT: «Cannot use bind operator with this left-hand side […] »

This is how Lists manage to be immutable. However, 'Tis The Season and wishes do come true, so let's wish for a mutable List!

What we need to get a hold of is a Scalar object because the binding operator can work with it. As the name suggests, a Scalar holds one thing. You can't instantiate a Scalar via the .new method, but we can get them by just declaring some lexical variables; don't need to bother giving them names:

my $list := (my $, my $, my $);
$list[0] := 100;
say $list; # OUTPUT: «(100 (Any) (Any))␤»

The (Any) in the output are the default values of the containers (on that, a bit later). Above, it seems we managed to bind a value to a list's element after List's creation, did we not? Indeed we did, but…

my $list := (my $, my $, my $);
$list[0] := 100;
$list[0] := 200;
# OUTPUT: «Cannot use bind operator with this left-hand side […] »

The binding operation replaces the Scalar container with a new value (100), so if we try to bind again, we're back to square one, trying to bind to a value instead of a container again.

We need a better tool for the job.

That's Your Assignment

The binding operator has a cousin: the assignment operator (=). Instead of replacing our Scalar containers with a binding operator, we'll use the assignment operator to assign, or "store", our values in the containers:

my $list := (my $ = 1, my $ = 2, my $ = 3);
$list[0] = 100;
$list[0] = 200;
say $list;
# OUTPUT: «(200 2 3)␤»

Now, we can assign our original values right from the start, as well as replace them with other values whenever we want to. We can even get funky and put different type constraints on each of the containers:

my $list := (my Int $ = 1, my Str $ = '2', my Rat $ = 3.0);
$list[0] = 100; # OK!
$list[1] = 42; # Typecheck failure!
# OUTPUT: «Type check failed in assignment;
# expected Str but got Int (42) […] »

That's somewhat indulgent, but there is one thing that could use a type constraint: the $list variable. We'll constrain it to the Positional role to ensure it can only hold Positional types, like List and Array:

my Positional $list := (my $ = 1, my $ = '2', my $ = 3.0);

Don't know about you, but that looks awfully verbose to me. Luckily, Perl 6 has syntax to simplify it!

Position@lly

First, let's get rid of the explicit type constraint on the variable. In Perl 6, you can use @ instead of $ as a sigil to say that you want the variable to be type-constrained with role Positional:

my @list := 42;
# OUTPUT: «Type check failed in binding;
# expected Positional but got Int (42) […] »

Second, instead of parentheses to hold our List, we'll use square brackets. This tells the compiler to create an Array instead of a List. Arrays are mutable and they stick each of their elements into a Scalar container automatically, just like we did manually in the previous section:

my @list := [1, '2', 3.0];
@list[0] = 100;
@list[0] = 200;
say @list;
# OUTPUT: «[200 2 3]␤»

Our code became a lot shorter, but we can toss out a couple more characters. Just like assigning, instead of binding, to a $-sigiled variable gives you a Scalar container for free, you can assign to @-sigiled variable to get a free Array. If we switch to assignment, we can get rid of the square brackets altogether:

my @list = 1, '2', 3.0;

Nice and concise.

Similar ideas are behind %– and &-sigiled variables. The % sigil implies a type-constraint on Associative role and offers the same shortcuts for assignment (giving you a Hash) and creates Scalar containers for the values. The &-sigiled variables type-constrain on role Callable and assignment behaves similar to $ sigils, giving a free Scalar container whose value you can modify:

my %hash = :42foo, :bar<ber>;
say %hash; # OUTPUT: «{bar => ber, foo => 42}␤»
my &reversay = sub { $^text.flip.say }
reversay '6 lreP ♥ I'; # OUTPUT: «I ♥ Perl 6␤»
# store a different Callable in the same variable
&reversay = *.uc.say; # a WhateverCode object
reversay 'I ♥ Perl 6'; # OUTPUT: «I ♥ PERL 6␤»

The One and Only

Earlier we learned that assignment to $-sigiled variables gives you a free Scalar container. Since scalars, as the name suggests, contain just one thing… what exactly happens if you put a List into a Scalar? After all, the Universe remains unimploded when you try to do that:

my $listish = (1, 2, 3);
say $listish; # OUTPUT: «(1 2 3)␤»

Such behaviour may make it seem that Scalar is a misnomer, but it does actually treat the entire list as a single thing. We can observe the difference in a couple of ways. Let's compare a List bound to a $-sigiled variable (so no Scalar is involved) with one that is assigned into a $-sigiled variable (automatic Scalar container):

# Binding:
my $list := (1, 2, 3);
say $list.perl;
say "Item: $_" for $list;
# OUTPUT:
# (1, 2, 3)
# Item: 1
# Item: 2
# Item: 3
# Assignment:
my $listish = (1, 2, 3);
say $listish.perl;
say "Item: $_" for $listish;
# OUTPUT:
# $(1, 2, 3)
# Item: 1 2 3

The .perl method gave us an extra insight and showed the second List with a $ before it, to indicate it's containerized in a Scalar. More importantly, when we iterated over our Lists with the for loop, the second List resulted in just a single iteration: the entire List as one item! The Scalar lives up to its name.

This behaviour isn't merely of academic interest. Recall that Arrays (and Hashes) create Scalar containers for their values. This means that if we nest things, even if we select an individual list or hash stored inside the Array (or Hash) and try to iterate over it, it'd be treated as just a single item:

my @stuff = (1, 2, 3), %(:42foo, :70bar);
say "List Item: $_" for @stuff[0];
say "Hash Item: $_" for @stuff[1];
# OUTPUT:
# List Item: 1 2 3
# Hash Item: bar 70
# foo 42

The same reasoning—that lists and hashes in Scalar containers are a single item—applies when you try to flatten an Array's elements or pass them as an argument to a slurpy parameter:

my @stuff = (1, 2, 3), %(:42foo, :70bar);
say flat @stuff;
# OUTPUT: «((1 2 3) {bar => 70, foo => 42})␤»
-> *@args { @args.say }(@stuff)
# OUTPUT: «[(1 2 3) {bar => 70, foo => 42}]␤»

It's this behaviour that can drive Perl 6 beginners up the wall, especially those who come from auto-flattening languages, such as Perl 5. However, now that we know why this behaviour is observed, we can change it!

Decont

If the Scalar container is the culprit, all we need to do is remove it. We need to de-containerize our list and hash, or "decont" for short. In your Perl 6 travels, you'll find several ways to accomplish that, but one way that's designed precisely for that is the decont methodop (<>):

my @stuff = (1, 2, 3), %(:42foo, :70bar);
say "Item: $_" for @stuff[0]<>;
say "Item: $_" for @stuff[1]<>;
# OUTPUT:
# Item: 1
# Item: 2
# Item: 3
# Item: bar 70
# Item: foo 42

It's easy to remember: it looks like a squished box (a trampled container). After retrieving our containerized items by indexing into the Array, we appended the decont and removed the contents from their Scalar containers, causing our loop to iterate over each item in them individually.

If you wish to decont every element of an Array in one go, simply use the hyper operator (», or >> if you prefer ASCII) along with the decont:

my @stuff = (1, 2, 3), %(:42foo, :70bar);
say flat @stuff»<>;
# OUTPUT: «(1 2 3 bar => 70 foo => 42)␤»
-> *@args { @args.say }(@stuff»<>)
# OUTPUT: «[1 2 3 bar => 70 foo => 42]␤»

With the containers removed, our list and hash flattened just like we wanted. And of course, we could have avoided the Array and bound our original List to the variable instead. Since Lists don't put their elements into containers, there's nothing to decont:

my @stuff := (1, 2, 3), %(:42foo, :70bar);
say flat @stuff;
# OUTPUT: «(1 2 3 bar => 70 foo => 42)␤»
-> *@args { @args.say }(@stuff)
# OUTPUT: «[1 2 3 bar => 70 foo => 42]␤»

Don't Let It Slip Away

While we're here, it's worth noting that many people use the slip operator (|), when they want to do the decont (we're not talking about using it when passing arguments to Callables):

my @stuff = (1, 2, 3), (4, 5);
say "Item: $_" for |@stuff[0];
# OUTPUT:
# Item: 1
# Item: 2
# Item: 3

Although it gets the job done as far as deconting goes, it can introduce subtle bugs that could be very difficult to track down. Try to spot one here, in a program that iterates over an infinite list of non-negative integers and prints those that are prime:

my $primes = ^.grep: *.is-prime;
say "$_ is a prime number" for |$primes;

Give up? This program leaks memory… very slowly. Even though, we're iterating over an infinite list of items, that's not an issue because .grep method returns a Seq object that doesn't keep already-iterated items around and so memory usage never grows there.

The problematic part is our | slip operator. It converts our Seq into a Slip, which is a type of a List and keeps around all of the values we already consumed. Here's a modified version of the program that grows faster, if you wanted to see that growth in htop:

# CAREFUL! Don't consume all of your resources!
my $primes = ^.map: *.self;
Nil for |$primes;

Let's try it again, but this time using the decont method op:

my $primes = ^.map: *.self;
Nil for $primes<>;

The memory usage is stable now and the program can sit there and iterate until the end of times. Of course, since we know it's the Scalar container that causes containerization and we wish to avoid it here, we can simply bind the Seq to the variable instead:

my $primes := ^.map: *.self;
Nil for $primes;

I Want Less

If you detest sigils, Perl 6 got something you can smile about: sigil-less variables. Just prefix the name with a backslash during declaration, to indicate you don't want no stinkin' sigils:

my= 42;
say Δ²; # OUTPUT: «1764␤»

You don't get any free Scalars with such variables and so, during declaration, it makes no difference between binding or assignment to them. They behave similar to how binding a value to a $-sigiled variable behaves, including the ability to bind Scalars and make the variable mutable:

my= my $ = 42;
Δ = 11;
say Δ²; # OUTPUT: «121␤»

A more common place where you might see such variables is as parameters of routines, here, these mean you want is raw trait applied to the parameter. The meaning exists for the + positional slurpy parameter as well (no backslash is needed), where having it is raw means you won't get unwanted Scalar containers due to the slurpy being an Array as it has the @ the sigil:

sub sigiled ($x is raw, +@y) {
$x = 100;
say flat @y
}
sub sigil-less (\x, +y) {
x = 200;
say flat y
}
my $x = 42;
sigiled $x, (1, 2), (3, 4); # OUTPUT: «((1 2) (3 4))␤»
say $x; # OUTPUT: «100␤»
sigil-less $x, (1, 2), (3, 4); # OUTPUT: «(1 2 3 4)␤»
say $x; # OUTPUT: «200␤»

Defaulting on Default Defaults

One awesome feature offered by containers is default values. You may have heard that in Perl 6 Nil signals the absence of a value and not a value in itself. Container defaults is where it comes into play:

my $x is default(42);
say $x; # OUTPUT: «42␤»
$x = 10;
say $x; # OUTPUT: «10␤»
$x = Nil;
say $x; # OUTPUT: «42␤»

A container's default value is given to it using the is default trait. Its argument is evaluated at compile time and the resultant value is used whenever the container lacks a value. Since Nil's job is to signal just that, assigning a Nil into a container will result in the container containing its default value, not a Nil.

Defaults can be given to Array and Hash containers just the same and if you wish your containers to contain a Nil literally, when no value is present, just specify Nil as a default:

my @a is default<meow> = 1, 2, 3;
say @a[0, 2, 42]; # OUTPUT: «(1 3 meow)␤»
@a[0]:delete;
say @a[0]; # OUTPUT: «meow␤»
my %h is default(Nil) = :bar<ber>;
say %h<bar foos>; # OUTPUT: «(ber Nil)␤»
%h<bar>:delete;
say %h<bar> # OUTPUT: «Nil␤»

The container's default has a default default: the explicit type constraint that's present on the container:

say my Int $y; # OUTPUT: «(Int)␤»
say my Mu $z; # OUTPUT: «(Mu)␤»
say my Int $i where *.is-prime; # OUTPUT: «(<anon>)␤»
$i.new; # OUTPUT: (exception) «You cannot create […]»

If no explicit type constraint is present, the default default is an Any type object:

say my $x; # OUTPUT: «(Any)␤»
say $x = Nil; # OUTPUT: «(Any)␤»

Note that the default values you may use in routine signatures for optional parameters are not the container defaults and assigning Nil to subroutine arguments or into parameters will not utilize the defaults from the signature.

Customizing

If the standard behaviour of containers doesn't suit your needs, you can make your own container, using the Proxy type:

my $collector := do {
my @stuff;
Proxy.new: :STORE{ @stuff.push: @_[1] },
:FETCH{ @stuff.join: "|" }
}
$collector = 42;
$collector = 'meows';
say $collector; # OUTPUT: «42|meows␤»
$collector = 'foos';
say $collector; # OUTPUT: «42|meows|foos␤»

The interface is somewhat clunky, but it gets the job done. We create the Proxy object using method .new that takes two required named arguments: STORE and FETCH, each taking a Callable.

The FETCH Callable gets called whenever a value is read from the container, which can happen more times than is immediately apparent: in the code above, the FETCH Callable is called 10 times as the container percolates through dispatch and routines of the two say calls. The Callable is called with a single positional argument: the Proxy object itself.

The STORE Callable gets called whenever a value is stored into our container, for example, with an assignment operator (=). The first positional argument to the Callable is the Proxy object itself, and the second argument is the value that was given to be stored.

We'd like STORE and FETCH Callables to share the @stuff variable, so we use the do statement prefix with a code block to contain it all nicely inside.

We bind our Proxy to a variable and the rest is just normal variable usage. The output shows the altered behaviour our custom container provides.

Proxies are also handy as a return value from methods to provide extra behaviour with mutable attributes. For example, here's an attribute that from the outside appears to be just a normal mutable attribute, but actually coerces its value from an Any type to an Int

class Foo {
has $!foo;
method foo {
Proxy.new: :STORE(-> $, Int() $!foo { $!foo }),
:FETCH{ $!foo }
}
}
my $o = Foo.new;
$o.foo = ' 42.1e0 ';
say $o.foo; # OUTPUT: «42␤»

Quite sweet! And if you want a Proxy with a better interface with a few more features under its belt, check out the Proxee module.

That's All, Folks

That about covers it all. The remaining beasts you'll see in the land of Perl 6 are "twigils": variables with TWO symbols before the name, but as far as containers go, they behave the same as the variables we've covered. The second symbol simply indicates additional information, such as whether the variable is an implied positional or named parameter…

sub test { say "$^implied @:parameters[]" }
test 'meow', :parameters<says the cat>;
# OUTPUT: «meow says the cat␤»

…or whether the variable is a private or public attribute:

with class Foo {
has $!foo = 42;
has @.bar = 100;
method what's-foo { $!foo }
}.new {
say .bar; # OUTPUT: «[100]␤»
say .what's-foo # OUTPUT: «42␤»
}

That's a journey for another day, however.

Conclusion

Perl 6 has a rich system of variables and containers that differs vastly from Perl 5. It's important to understand the way it works, as it affects the way iteration and flattening of lists and hashes behaves.

Assignment to variables offers valuable shortcuts, such as providing Scalar, Array, or Hash containers, depending on the sigil. Binding to variables allows you to bypass such shortcuts, if you so require.

Sigil-less variables exist in Perl 6 and they have similar behaviour to how $-sigiled variables with binding work. When used as parameters, these variables behave like is raw trait was applied to them.

Lastly, containers can have default values and it's possible to create your own custom containers that can either be bound to a variable or returned from a routine.

Happy Holidays!

Day 1 – The Grinch of Perl 6: A Practical Guide to Ruining Christmas

Look at them! All smiling and happy. Coworkers, friends, and close family members. All enjoying programming in Perl 6 version 6.c "Christmas". Great concurrency primitives, core grammars, and a fantastic object model. It sickens me!

But wait a second… wait just a second. I got an idea. An awful idea. I got a wonderful, awful idea! We can ruin their "Christmas". All we need is a few tricks up our sleeves. Muahuahahaha!!


Welcome to the 2017th Perl 6 Advent Calendar! Each day, from today until Christmas, we'll have an awesome blog post about Perl 6 lined up for you.

Today, we'll show our naughty side and purposefully do naughty things. Sure, these have good uses, but being naughty is a lot more fun. Let's begin!

But True does False

Have you heard of the but operator? A fun little thing:

say True but False ?? 'Tis true' !! 'Tis false';
# OUTPUT: «Tis false␤»
my $n = 42 but 'forty two';
say $n; # OUTPUT: «forty two␤»
say $n + 7; # OUTPUT: «49␤»

It's an infix operator that first clones the object on the left hand side and then mixes in a role provided on the right hand side into the clone:

my $n = 42 but role Evener {
method is-even { self %% 2 }
}
say $n.is-even; # OUTPUT: «True␤»
say $n.^name; # OUTPUT: «Int+{Evener}␤»

Those aren't roles in the first two examples above. The but operator has a handy shortcut: if the thing on the right isn't a role, it creates one for you! The role will have a single method, named after the .^name of the object on the right hand side, and the method will simply return the given object. Thus, this…

put True but 'some boolean'; # OUTPUT: «some boolean␤»

…is equivalent to:

put True but role {
method ::(BEGIN 'some boolean'.^name) {
'some boolean'
}
} # OUTPUT: «some boolean␤»

The .^name on our string returns Str, since it's a Str object:

say 'some boolean'.^name; # OUTPUT: «Str␤»

And so the role provides a method named Str, which put calls on non-Str objects to obtain a stringy value to output, causing our boolean to have an altered stringy representation.

As an example, string '0' is True in Rakudo Perl 6 but is False in Pumpkin Perl 5. Using the but operator, we can alter a string to behave like Perl 5's version:

role Perl5Str {
method Bool {
nextsame unless self eq '0';
False
}
}
sub perlify { $^v but Perl5Str };
say so perlify 'meows'; # OUTPUT: «True␤»
say so perlify '0'; # OUTPUT: «False␤»
say so perlify ''; # OUTPUT: «False␤»

The role provides the .Bool method that the so routine calls. Inside the method, we re-dispatch to the original .Bool method using nextsame routine unless the string is a '0', in which case we simply return False.

The but operator has a brother: an infix does operator. It behaves very similarly, except it does not clone. (N.B.: the shortcut for automatically making roles from non-roles is available in does only on bleeding edge Rakudo, version 2017.11-1-g47ebc4a and up)

my $o = class { method stuff { 'original' } }.new;
say $o.stuff; # OUTPUT: «original␤»
$o does role { method stuff { 'modded' } };
say $o.stuff; # OUTPUT: «modded␤»

Some of the things in a program are globally accessible and in some implementations (e.g. Rakudo), certain constants are cached. This means we can get quite naughty in a separate part of a program and those Christmas celebrators won't even know what hit 'em!

How about, we override what the prompt routine reads? They like Christmas? We'll give them some Christmas trees:

$*IN does role { method get { "🎄 {callsame} 🎄" } }
my $name = prompt "Enter your name: ";
say "You entered your name as: $name";
# OUTPUT
# Enter your name: (typed by user:) Zoffix Znet
# You entered your name as: 🎄 Zoffix Znet 🎄

That override will work even if we stick it into a module. We can also kick it up a notch and mess with enums and cached constants, though this naughtiness likely won't be able to cross the module boundary and other implementation-specific cache invalidation:

True does False;
say 42 ?? "tis true" !! "tis false";
# OUTPUT: «tis true␤»

So far, that didn't quite have the wanted impact, but let's try coercing our number to a Bool:

True does False;
say 42.Bool ?? "tis true" !! "tis false";
# OUTPUT: «tis false␤»

There we go! And now, for the final Grinch-worthy touch, we'll mess with numerical results of computations on numbers. Rakudo caches Int constants. Infix + operator also uses the internal-ish-ish .Bridge method when computing with numerics of different types. So, let's override the .Bridge on our constant to return something funky:

BEGIN 42 does role { method Bridge { 12e0 } }
say 42 + 15; # OUTPUT: «57␤»
say 42 + 15e0; # OUTPUT: «27␤»

That's proper evil, sure to ruin any Christmas, but we're only getting started…

Wrapping It Up

What kind of Christmas would it be without wrapped presents?! Oh, for presents we shall have and Perl 6's .wrap method provided by Routine type will let us wrap 'em up, oh so good.

use soft;
sub foo { say 'in foo' }
&foo.wrap: -> | {
say 'in the wrap';
callsame;
say 'back in the wrap';
}
foo;
# OUTPUT:
# in the wrap
# in foo
# back in the wrap

We enable use soft pragma to prevent unwanted inlining of routines that would otherwise interfere with our wrap. Then, we use a routine we want to wrap as a noun by using it with its & sigil and call the .wrap method that takes a Callable.

The given Callable's signature must be compatible with the one on the wrapped routine (or its proto if it's a multi); otherwise we'd not be able to both dispatch to the routine correctly and call the wrapper with the args. In the example above, we simply use an anonymous Capture (|) to accept all possible arguments.

Inside the Callable we have two say calls and make use of callsame routine to call the next available dispatch candidate, which happens to be our original routine. This comes in handy, since were we to attempt to call foo by its name inside the wrapper, we'd start the dispatch over from scratch, resulting in an infinite dispatch loop.

Since methods are Routines, we can wrap them as well. We can get a hold of the Method object using the .^lookup meta method:

IO::Handle.^lookup('print').wrap: my method (|c) {
my &wrapee = nextcallee;
wrapee self, "🎄 Ho-ho-ho! 🎄\n";
wrapee self, |c
};
print "Hello, World!\n";
# OUTPUT:
# 🎄 Ho-ho-ho! 🎄
# Hello, World!

Here, we grab the .print method from IO::Handle type and .wrap it. We wish to make use of self inside the method, so we're wrapping using a standalone method (my method …) instead of a block or a subroutine. The reason we want to have self is to be able to call the very method we're wrapping to print our Christmassy message. Because our method is detached, the callwith and related routines will need self fed to them along with the rest of the args, to ensure we continue dispatch to the right object.

Inside the wrap, we use the nextcallee routine to obtain the original method.If it's a multi, we'll get the proto, not a specific candidate that best matches the original arguments, so the next candidate ordering is slightly different inside the wrap, compared to traditional routines. We grab the nextcallee in to a variable, because we want to call it more than once and calling it shifts the routine off the dispatch stack. In the first call, we print our Christmassy message and in the second call, we merely slip our Capture (|c) of original args, performing the call like it were originally meant to happen.

Thanks to the .wrap, we can alter or even completely redefine behaviour of subroutines and methods, which is sure to be jolly fun when your friends try to use them. Ho-ho-ho!

Invisibility Cloak

The tricks we've played so far are wonderfully terrible, but they're just too obvious and too… visible. Since Perl 6 has superb Unicode support, I think we should search the mass of Unicode characters for some fun mischief. In particular, we're looking for invisible characters that are NOT whitespace. Just one is sufficient for our purpose, but these four are fairly invisible on my computer:

[⁠] U+2060 WORD JOINER [Cf]
[⁡] U+2061 FUNCTION APPLICATION [Cf]
[⁢] U+2062 INVISIBLE TIMES [Cf]
[⁣] U+2063 INVISIBLE SEPARATOR [Cf]

Perl 6 supports custom terms and operators that can consist of any characters, except whitespace. For example, here's my patented Shrug Operator:

sub infix:<¯\(°_o)/¯> {
($^a, $^b).pick
}
say 'Coke' ¯\(°_o)/¯ 'Pepsi';
# OUTPUT: «Pepsi␤»

And here's a term, made out of non-identifier characters (we could've used the actual characters in the definition as well):

sub term:«\c[family: woman woman boy boy]» {
'♫ We— are— ♪ faaaamillyyy ♬'
}
say 👩‍👩‍👦‍👦;
# OUTPUT: «♫ We— are— ♪ faaaamillyyy ♬»

With our invisible, non-whitespace characters in hand, we can make invisible operators and terms!

sub infix:«\c[INVISIBLE TIMES]» { $^a × $^b }
my \r = 42;
say "Area of the circle is " ~ π⁢r²;
# OUTPUT: «Area of the circle is 5541.76944093239␤»

Let's make a Jolly module that will export some invisible terms and operators. We'll then sprinkle them into our Christmassy friends' code:

unit module Jolly;
sub term:«\c[INVISIBLE TIMES]» is export { 42 }
sub infix:«\c[INVISIBLE TIMES]» is export {
$^a × $^b
}
sub prefix:«\c[INVISIBLE SEPARATOR]» (|)
is looser(&[,]) is export
{
say "Ho-ho-ho!";
}

We've used the same character for the term and the infix operator. That's fine, as Perl 6 has fairly strict expectation of terms being followed by operators and vice versa, so it'll know when we meant to use the term or when to use the infix operator. Here's the resultant Grinch code, along with the output it produces:

say 42⁢⁢;
# OUTPUT:
# 1764
# Ho-ho-ho!

That'll sure be fun to debug! Here's a list of characters in that line of code, for you to see where we've used our invisible goodies:

.say for '⁣say 42⁢⁢;'.uninames;
# OUTPUT:
# INVISIBLE SEPARATOR
# LATIN SMALL LETTER S
# LATIN SMALL LETTER A
# LATIN SMALL LETTER Y
# SPACE
# DIGIT FOUR
# DIGIT TWO
# INVISIBLE TIMES
# INVISIBLE TIMES
# SEMICOLON

Ho-Ho-Ho

Productivity at Christmas time drops to a standstill. People have the Holidays and the New Year on their minds. Wouldn't surprise me to see a whole bunch of TODO comments in all the codes. But what if we were able to detect and complain about them? There's nothing more Grinch-like than aborting program compilation whenever someone is feeling lazy!

Perl 6 has Slangs. It's an experimental feature that currently does not have an officially supported interface, however, for our purpose, it'll do just fine.

Using Slangs, it's possible to lexically mutate Perl 6's grammar and introduce language features and behaviour, just like a Perl 6 core developer would:

BEGIN $*LANG.refine_slang: 'MAIN',
role SomeExtraGrammar {
token term:sym<meow> {
'This is not a syntax error'
}
},
role SomeExtraActions {
method EXPR (Mu $/) {
say "Parsed expression: " ~ $/;
nextsame
}
}
This is not a syntax error;
say 'hehe'
# OUTPUT:
# Parsed expression: This is not a syntax error
# Parsed expression: 'hehe'
# Parsed expression: say 'hehe'
# hehe

The "experimental" part of the Slangs feature largely lies in having to rely on the structure of core Grammar and core Actions; currently there's no official guarantee those will remain unchanged, which makes Slangs fragile.

For our naughty, Grinchy trick, we'll be modifying behaviour of comments and if we read the code to trace what calls the comment token, we'll find it's actually part of the redefined ws token, which, as you may know from everyday Perl 6 grammars, is responsible for whitespace matching in, among other things, grammar rules.

This complicates the matter slightly, as ws is such a cornerstone token that, along with comp_unit, statementlist, and statement, it can't be modified in the mainline (code outside routines and blocks). The reason is the Slang is loaded after the mainline is already being parsed using the stock version of these tokens. The tokens inside statement token can be changed even in the mainline, because statement token reblesses the grammar, but ws does not get such luxury.

Since we're starting to tread far into the deep end… enough talk! Let's code:

BEGIN $*LANG.refine_slang: 'MAIN', role {
token comment:sym<todo> {
'#' \s* 'TODO' ':'? \s+ <( \N*
{ die "Ho-ho-ho! I think you were"
~ " meant to finish " ~ $/ }
}
}
sub business-stuff {
# TODO: business stuff
}
# OUTPUT:
# ===SORRY!===
# Ho-ho-ho! I think you were meant to finish business stuff

We use the BEGIN phaser to make the Slang modification happen at compile time, since we're trying to affect how further compilation is performed.

We added a new proto token comment:sym<todo> to core Perl 6 grammar that matches content similar to what a regular comment would match, except it also looks for the TODO our Christmassy friends decided to leave around. The \N* atom captures whatever string the user typed after the TODO and the <( match capture marker tells the compiler to exclude the previously matched stuff in the token from the captured text inside the Match object stored in the $/ variable.

At the end of the token, we simply use a code block to die with a message that tells the user to finish up their TODO. Quite crafty!

Since we'd rather the user not notice our jolly tricks, let's stick the Slang into a module that's to be loaded by the target code. We'll just make a slight tweak to the original code:

# File: ./Jolly.pm6
sub EXPORT {
$*LANG.refine_slang: 'MAIN', role {
token comment:sym<todo> {
'#' \s* 'TODO' ':'? \s+ <( \N*
{ die "Ho-ho-ho! I think you were"
~ " meant to finish " ~ $/ }
}
}
Map.new
}
# File: ./script.p6
use lib <.>;
use Jolly;
sub business-stuff {
# TODO: business stuff
}
# OUTPUT:
# ===SORRY!===
# Ho-ho-ho! I think you were meant to finish business stuff

We want the slang to run at the compilation time of the script, not the module, so we removed the BEGIN phaser and instead stuck the code to be inside sub EXPORT, which will run when the module is used during script's compilation. The Map.new is just how I prefer to write {} in EXPORT sub, to indicate we do not wish to export any symbols. In our script, we now merely have to use the module and the Slang gets activated. Awesome!

Conclusion

Today, we started off the 2017 Perl 6 Advent Calendar by being naughty Grinches and messing with users' programs. We mutated objects using but and does operators. Wrapped methods and subroutines with our custom routines that implemented extra features. Made invisible terms and operators. And even mutated the language itself to do our bidding.

Over the next 23 days, we'll see more Perl 6 Advent articles, so be sure to check back. And maybe, by the end of it all, our Grinchy hearts will grow three sizes…

-Ofun

Day 24 – Make It Snow

Hello again, fellow sixers! Today I’d like to take the opportunity to highlight a little module of mine that has grown up in some significant ways this year. It’s called Terminal::Print and I’m suspecting you might already have a hint of what it can do just from the name. I’ve learned a lot from writing this module and I hope to share a few of my takeaways.

Concurrency is hard

Earlier in the year I decided to finally try to tackle multi-threading in Terminal::Print and… succeeded more or less, but rather miserably. I wrapped the access to the underlying grid (a two-dimensional array of Cell objects) in a react block and had change-cell and print-cell emit their respective actions on a Supply. The react block then handled these actions. Rather slowly, unfortunately.

Yet, there was hope. After jnthn++ fixed a constructor bug in OO::Monitors I was able to remove all the crufty hand-crafted handling code and instead ensure that method calls to the Terminal::Print::Grid object would only run in a single thread at any given time. (This is the class which holds the two-dimensional array mentioned before and was likewise the site of my react block experiment).

Here below are the necessary changes:

- unit class Terminal::Print::Grid;
+ use OO::Monitors;
+ unit monitor Terminal::Print::Grid;

This shift not only improved the readability and robustness of the code, it was significantly faster! Win! To me this is really an amazing dynamic of Perl 6. jnthn’s brilliant, twisted mind can write a meta-programming module that makes it dead simple for me to add concurrency guarantees at a specific layer of my library. My library in turn makes it dead simple to print from multiple threads at once on the screen! It’s whipuptitude enhancers all the the way down!

That said, our example today will not be writing from multiple threads. For some example code that utilizes async, I point you to examples/async.p6 and examples/matrix-ish.p6.

Widget Hero

Terminal::Print is really my first open source library in the sense that it is the first time that I have started my own module from scratch with the specific goal of filling a gap in a given language’s ecosystem. It is also the first time that I am not the sole contributor! I would be remiss not to mention japhb++ in this post, who has contributed a great deal in a relatively short amount of time.

In addition to all the performance related work and the introduction of a span-oriented printing mechanism, japhb’s work on widgets especially deserves its own post! For now let’s just say that it has been a real pleasure to see the codebase grow and extend even as I have been too distracted to do much good. My takeaway here is a new personal milestone in my participation in libre/open software (my first core contributor!) that reinforces all the positive dynamics it can have on a code base.

Oh, and I’ll just leave this here as a teaser of what the widget work has in store for us:

rpg-ui-p6

You can check it out in real-time and read the code at examples/rpg-ui.p6.

Snow on the Golf Course

Now you are probably wondering, where is the darn, snow! Well, here we go! The full code with syntax highlighting is available in examples/snowfall.p6. I will be going through it step by step below.

use Terminal::Print;

class Coord {
    has Int $.x is rw where * <= T.columns = 0;
    has Int $.y is rw where * <= T.rows = 0 ;
}

Here we import Terminal::Print. The library takes the position that when you import it somewhere, you are planning to print to the screen. To this end we export an instantiated Terminal::Print object into the importer’s lexical scope as T. This allows me to immediately start clarifying the x and y boundaries of our coordinate system based on run-time values derived from the current terminal window.

class Snowflake {
    has $.flake = ('❆','❅','❄').roll;
    has $.pos = Coord.new;
}

sub create-flake {
    state @cols = ^T.columns .pick(*); # shuffled
    if +@cols > 0 {
        my $rand-x = @cols.pop;
        my $start-pos = Coord.new: x => $rand-x;
        return Snowflake.new: pos => $start-pos;
    } else {
        @cols = ^T.columns .pick(*);
        return create-flake;
    }
}

Here we create an extremely simple Snowflake class. What is nice here is that we can leverage the default value of the $.flake attribute to always be random at construction time.

Then in create-flake we are composing a way to make sure we have hit every x coordinate as a starting point for the snowfall. Whenever create-flake gets called, we pop a random x coordinate out of the @cols state variable. The state variable enables this cool approach because we can manually fill @cols with a new randomized set of our available x coordinates once it is depleted.

draw( -> $promise {

start {
    my @flakes = create-flake() xx T.columns;
    my @falling;
    
    Promise.at(now + 33).then: { $promise.keep };
    loop {
        # how fast is the snowfall?
        sleep 0.1; 
    
        if (+@flakes) {
            # how heavy is the snowfall?
            my $limit = @flakes > 2 ?? 2            
                                    !! +@flakes;
            # can include 0, but then *cannot* exclude $limit!
            @falling.push: |(@flakes.pop xx (0..$limit).roll);  
        } else {
            @flakes = create-flake() xx T.columns;
        }
    
        for @falling.kv -> $idx, $flake {
            with $flake.pos.y -> $y {
                if $y > 0 {
                    T.print-cell: $flake.pos.x, ($flake.pos.y - 1), ' ';
                }

                if $y < T.rows {
                    T.print-cell: $flake.pos.x, $flake.pos.y, $flake.flake;            
                }

                try {
                    $flake.pos.y += 1;
                    CATCH {
                        # the flake has fallen all the way
                        # remove it and carry on!
                        @falling.splice($idx,1,());
                        .resume;
                    }
                }
            }
        }
    }
}

});

Let’s unpack that a bit, shall we?

So the first thing to explain is draw. This is a handy helper routine that is also imported into the current lexical scope. It takes as its single argument a block which accepts a Promise. The block should include a start block so that keeping the argument promise works as expected. The implementation of draw is shockingly simple.

So draw is really just short-hand for making sure the screen is set up and torn down properly. It leverages promises as (I’m told) a “conv-var” which according to the Promises spec might be an abuse of promises. I’m not very futzed about it, to be honest, since it suits my needs quite well.

This approach also makes it quite easy to create a “time limit” for the snowfall by scheduling a promise to be kept at now + 33 — thirty three seconds from when the loop starts. then we keep the promise and draw shuts down the screen for us. This makes “escape” logic for your screensavers quite easy to implement (note that SIGINT also restores your screen properly. The most basic exit strategy works as expected, too :) ).

The rest is pretty straightforward, though I’d point to the try block as a slightly clever (but not dangerously so) combination of where constraints on Coord‘s attributes and Perl 6’s resumable exceptions.

Make it snow!

And so, sixers, I bid you farewell for today with a little unconditional love from ab5tract’s curious little corner of the universe. Cheers!

snowfall-p6

Day 24 – One Year On

This time of year invites one to look back on things that have been, things that are and things that will be.

Have Been

I was reminded of things that have been when I got my new notebook a few weeks ago. Looking for a good first sticker to put on it, I came across an old ActiveState sticker:

If you don’t know Perl
you don’t know Dick

A sticker from 2000! It struck me that that sticker was as old as Perl 6. Only very few people now remember that a guy called Dick Hardt was actually the CEO of ActiveState at the time. So even though the pun may be lost on most due to the mists of time, the premise still rings true to me: that Perl is more about a state of mind, then about versions. There will always be another version of Perl. Those who don’t know Perl are doomed to re-implement it, poorly. Which, to me, is why so many ideas were borrowed from Perl. And still are!

Are

Where are we now? Is it the moment we know, We know, we know? I don’t think we are at twenty thousand people using Perl 6 just yet. But we’re keeping our fingers crossed. Just in case.

We are now 12 compiler releases after the initial Christmas release of Perl 6. In this year, many, many areas of Rakudo Perl 6 and MoarVM have dramatically improved in speed and stability. Our canary-in-the-coalmine test has dropped from around 14 seconds a year ago to around 5.5 seconds today. A complete spectest run is now about 3 minutes, whereas it was about 4.5 minutes at the beginning of the year, while about 4000 tests were added (from about 50K to 54K). And we now have 757 modules in the Perl 6 ecosystem (aka temporary CPAN for Perl 6 modules), with a few more added every week.

The #perl6 IRC channel has been too busy for me to follow consistently. But if you have a question related to Perl 6 and you want a quick answer, the #perl6 channel is the place to be. You don’t even have to install an IRC client: you can also use a browser to chat, or just follow “live” what is being said.

There are also quite a few useful bots on that channel: they e.g. take care of running a piece of Perl 6 code for you. Or find out at which commit the behaviour of a specific piece of code changed. These are very helpful for the developers of Perl 6, who usually also hang out on the #perl6-dev IRC channel. Which could be you! The past year, at least one contributor was added to the CREDITS every month!

Will Be

The coming year will see at least three Perl 6 books being published. First one will be Think Perl 6 – How To Think Like A Computer Scientist by Laurent Rosenfeld. It is an introduction to programming using Perl 6. But even for those of you with programming experience, it will be a good book to start learning Perl 6. And I can know. Because I’ve already read it :-)

Second one will be Learning Perl 6 by veteran Perl developer and writer brian d foy. It will have the advantage of being written by a seasoned writer going through the newbie experience that most people will have when coming from Perl 5.

The third one will be Perl 6 By Example by Moritz Lenz, which will, as the title already gives away, introduce Perl 6 topics by example.

There’ll be at least two (larger) Perl Conferences apart from many smaller Perl workshops: the The Perl Conference NA on 18-23 June, and the The Perl Conference in Amsterdam on 9-11 August. Where you will meet all sorts of nice people!

And for the rest? Expect a faster, leaner, Perl 6 and MoarVM compiler release on the 3rd Saturday every month. And an update of weekly events in the Perl 6 Weekly on every Monday evening/Tuesday morning (depending on where you live).

Day 23 – Everything is either wrong or less than awesome

Have you ever spent your precious time on submitting a bug report for some project, only to get a response that you’re an idiot and you should f⊄∞÷ off?

Right! Well, perhaps consider spending your time on Perl 6 to see that not every free/open-source project is like this.

In the Perl 6 community, there is a very interesting attitude towards bug reports. Is it something that was defined explicitly early on? Or did it just grow organically? This remains to be a Christmas mystery. But the thing is, if it wasn’t for that, I wouldn’t be willing to submit all the bugs that I submitted over the last year (more than 100). You made me like this.

Every time someone submits a bug report, Perl 6 hackers always try to see if there is something that can done better. Yes, sometimes the bug report is invalid. But even if it is, is there any way to improve the situation? Perhaps a warning could be thrown? Well, if so, then we treat the behavior as LTA (Less Than Awesome), and therefore the bug report is actually valid! We just have to tweak it a little bit, meaning that the ticket will now be repurposed to improve or add the error message, not change the behavior of Perl 6.

The concept of LTA behavior is probably one of the key things that keeps us from rejecting features that may seem to do too little good for the amount of effort required to implement them, but in the end become game changers. Another closely related concept that comes to mind is “Torment the implementors on behalf of the users”.

OK, but what if this behavior is well-defined and is actually valid? In this case, it is still probably our fault. Why did the user get into this situation? Maybe the documentation is not good enough? Very often that is the issue, and we acknowledge that. So in a case of a problem with the documentation, we will usually ask you to submit a bug report for the documentation, but very often we will do it ourselves.

Alright, but what if the documentation for this particular case is in place? Well, maybe the thing is not easily searchable? That could be the reason why the user didn’t find it in the first place. Or maybe we lack some links? Maybe the places that should link to this important bit of information are not doing so? In other words, perhaps there are still ways to improve the docs!

But if not, then yes, we will have to write some tests for this particular case (if there are no tests yet) and reject the ticket. This happens sometimes.

The last bit, even if obvious to some, is still worth mentioning. We do not mark tickets resolved without tests. One reason is that we want roast (which is a Perl 6 spec) to be as full as possible. The other reason is that we don’t want regressions to happen (thanks captain obvious!). As the first version of Perl 6 was released one year ago, we are no longer making any changes that would affect the behavior of your code. However, occasional regressions do happen, but we have found an easy way to deal with those!

If you are not on #perl6 channel very often, you might not know that we have a couple of interesting bots. One of them is bisectable. In short, Bisectable performs a more user-friendly version of git bisect, but instead of building Rakudo on each commit, it has done it before you even asked it to! That is, it has over 5500 rakudo builds, one for every commit done in the last year and a half. This turns the time to run git bisect from minutes to about 10 seconds (Yes, 10 seconds is less than awesome! We are working on speeding it up!). And there are other bots that help us inspect the progress. The most recent one is Statisfiable, here is one of the graphs it can produce.

So if you pop up on #perl6 with a problem that seems to be a regression, we will be able to find the cause in seconds. Fixing the issue will usually take a bit more than that though, but when the problem is significant, it will usually happen in a day or two. Sorry for breaking your code in attempts to make it faster, we will do better next time!

But as you are reading this, perhaps you may be interested in seeing some bug reports? I thought that I’d go through the list of bugs of the last year to show how horribly broken things were, just to motivate the reader to go hunting for bugs. The bad news (oops, good news I mean), it seems that the number of “horrible” bugs is decreasing a bit too fast. Thanks to many Rakudo hackers, things are getting more stable at a very rapid pace.

Anyway, there are still some interesting things I was able to dig up:

  • RT #128804 – this is one of the examples where we attempt to print something better than “syntax error”, but have a problem in the error message itself. This was fixed, and now the error message says Cannot convert string to number: malformed base-35 number in 'li⏏zmat' (indicated by ⏏). Can you spot why this error message is Less Than Awesome?

  • RT #128421 – sometimes we are just wrong for no good reason. Makes you wonder how many other bugs like this are hiding somewhere. Can you find one?

That being said, my favorite bug of all times is RT #127473. Three symbols in the source code causing it to go into an infinite loop printing stuff about QAST nodes. That’s a rather unique issue, don’t you think?

I hope this post gave you a little insight on how we approach bugs, especially if you are not hanging around on #perl6 very often. Is our approach less than awesome? Do you have some ideas for other bots that could help us work with bugs? Leave it in the comments, we would like to know!