Day 8 — Make your Perl 6 grammar compact

Welcome to Day 8 of this year’s Perl 6 Advent Calendar!

Grammars are among many things that make Perl 6 a great programming language. I would not even try predicting the result of a poll to choose between grammars, Unicode support, concurrency features, hyper-operators, or the set syntax, or a Whatever star. Google found its own list of the best Perl 6 features published on the Internet.

Anyway, today we’ll be talking about Perl 6 grammars, and I will share a few tricks that I use to make the grammars more compact.

1. Split the actions

Suppose you are writing a grammar to parse Perl’s variable declaration. You would expect it to match the following statements:

my $s; my @a;

Both of them declare a variable, so we can make a generic rule to parse either case. Below, the complete program is shown:

grammar G {
    rule TOP {
        <variable-declaration>* %% ';'
    }

    rule variable-declaration {
        | <scalar-declaration>
        | <array-declaration>
    }

    rule scalar-declaration {
        'my' '$' <variable-name>
    }

    rule array-declaration {
        'my' '@' <variable-name>
    }

    token variable-name {
        \w+
    }
}

class A {
    has %!var;

    method TOP($/) {
        dd %!var;
    }

    method variable-declaration($/) {
        if $<scalar-declaration> {
            %!var{$<scalar-declaration><variable-name>} = 0;
        }
        elsif $<array-declaration> {
            %!var{$<array-declaration><variable-name>} = [];
        }
    }
}

G.parse('my $s; my @a;', :actions(A.new));

Let me not explain every bit of this program; if you are interested, there’s an 80-minute video from one of the recent Amsterdam.pm meetings.

The object of interest now is the rule variable-declaration and its corresponding action.

The rule contains two options: whether a scalar or an array is declared. The action also selects between the options and does that using the ifelse block. Perl 6 allows you omitting parentheses around the Boolean condition, but still, the whole construction is quite big. Think, for example, that if you add hash declarations, you will need to add another elsif branch.

It would be much clearer to have separate actions for each subbranch:

method scalar-declaration($/) {
    %!var{$<variable-name>} = 0;
}

method array-declaration($/) {
    %!var{$<variable-name>} = [];
}

Now, the body of each method contains a single line of code, and you can immediately see what it is doing. Not to mention that it became less error-prone.

Before we move on to the next trick, here’s another optimisation that you may want to implement: the my keyword is present in either declaration, so use non-capturing brackets and move the common string out of them:

rule variable-declaration {
    'my' [
        | <scalar-declaration>
        | <array-declaration>
    ]
}

rule scalar-declaration {
    '$' <variable-name>
}

rule array-declaration {
    '@' <variable-name>
}

Use multi-methods

Let us improve the grammar to allow assignments in the target language:

my $s; my @a; $s = 3; $a[1] = 4;

Notice that the assignment is done in the Perl 5 style, with a dollar sigil for array elements. Having that, the assignment can be done with a single rule beginning with a dollar:

grammar G {
    rule TOP {
        [
            | <variable-declaration>
            | <assignment>
        ]
        * %% ';'
    }

    # . . .

    rule assignment {
        '$' <variable-name> <index>? '=' <value>
    }

    rule index {
        '[' <value> ']'
    }

    token value {
        \d+
    }
}

So, the assignment action must deduce what kind of assignment it is handling at the moment.

Again, you can use our old friend, the ifelse block in the action. Depending on the presence of index, you decide if this is a simple scalar or an element of an array:

method assignment($/) {
    if $<index> {
        %!var{$<variable-name>}[$<index><value>] = +$<value>;
    }
    else {
        %!var{$<variable-name>} = +$<value>;
    }
}

This code can also be easily simplified, but this time using multi-methods:

multi method assignment($/ where !$<index>) {
    %!var{$<variable-name>} = +$<value>;
}

multi method assignment($/ where $<index>) {
    %!var{$<variable-name>}[$<index><value>] = +$<value>;
}

The where clause lets Perl 6 make the decision of which method candidate is more suitable in the given situation.

Notice also how the <value> key is used twice in the second multi-method. Each entry of <value> is referring to different parts of the target code: one for the index value, another—for the right-hand side value.

3. Let Perl do the job

Sometimes, Perl can do the job for us, especially if you want to implement something that Perl is familiar with. For example, let us allow different types of numbers in the assignment:

my $a; my $b; $a = 3; $b = -3.14;

It is relatively easy to introduce floating-point numbers to the grammar:

token value {
    | '-'? \d+
    | '-'? \d+ '.' \d+
}

Would you like to add other types of numbers, refer to my article at perl.com. For now, we can limit the grammar with the above two options, as this is enough to demonstrate the trick.

If you run the code with the change, you might be surprised that you get the desired result. Both variables receive the values:

Hash %!var = {:a(3), :b(-3.14)}

In both cases, the same action was triggered:

multi method assignment($/ where !$<index>) {
    %!var{$<variable-name>} = +$<value>;
}

On the right-hand side of the assignment we see +$<value>, which is a type cast from the Match object to a number. The grammar puts either 3 or -3.14 inside $<value>, both as strings. The + unary operator makes an attempt to convert the strings to numbers. Both strings are valid numbers, so Perl 6 won’t complain.

It would be much more difficult to write your own code to convert a string to a number, taking into account all different forms of the numeric values. To have an idea of what other formats Perl 6 is aware of, look at the definition of the numish token in the Perl 6 grammar:

token numish {
[
| 'NaN' >>
| <integer>
| <dec_number>
| <rad_number>
| <rat_number>
| <complex_number>
| 'Inf' >>
| $<uinf>='∞'
| <unum=:No+:Nl>
]
}

If you allow any of the above types in your own grammar, Perl will be able to convert them for you.

4. Use multi-rules and multi-tokens

It is not only methods, which can be multi-things. Rules and tokens of a grammar are also methods, and you also can create multiple variants of them.

Let us update our grammar to allow arithmetic expressions on the right side of assignments:

my $a; $a = 6 + 5 * (4 - 3);

The new problem here is to parse an expression and take care of the operator precedence and parentheses. You can describe any expression in the following way:

  1. An expression is a sequence of terms separated by + or -.
  2. Any term in the previous rule is a sequence of items separated by * or /.
  3. Anything within parentheses is another expression, so go to rule 1.

Having that said, you end up with the following grammar changes:

grammar G {
    # . . .

    rule assignment {
        '$' <variable-name> <index>? '=' <expression>
    }

    multi token op(1) {
        '+' | '-'
    }

    multi token op(2) {
        '*' | '/'
    }

    rule expression {
        <expr(1)>
    }

    multi rule expr($n) {
        <expr($n + 1)>+ %% <op($n)>
    }

    multi rule expr(3) {
        | <value>
        | '(' <expression> ')'
    }

    # . . .
}

Here, both rules and tokes are multi-methods, which take a single integer value reflecting the depth of the expression. The same happens to operators: on the first level, you expect + and -, on the second level—* and /.

Do not forget that multi-methods (as well as multi-subs) in Perl 6 can be dispatched based on constants, that’s why it is possible, for example, to have signatures such those you see in multi token op(2).

The expr($n) rule is defined recursively via expr($n + 1). The recursion stops when $n reaches 3, and Perl 6 chooses the last candidate multi rule expr(3).

Let me be lazy and use the previous advice to let Perl evaluate the expression:

multi method assignment($/ where !$<index>) {
    use MONKEY-SEE-NO-EVAL;
    %!var{$<variable-name>} = EVAL($<expression>);
}

In general, I would suggest using EVAL only during the magical Christmas time. In the rest of the year, please compute the expression yourself and use the abstract syntax tree and the pair of methods make and made to keep partial results. See an example here, for instance.

I would also suggest some extra reading to better understand how to use the multi and proto keywords:

  1. The proto keyword in Perl 6
  2. More on the proto keyword in Perl 6

And at this point, I will leave you with this journey along the awesome Perl 6 grammars. You can find the complete examples of today’s post on GitHub. I wish you a pleasant reading of the rest of this and other Perl Advent Calendars!

4 thoughts on “Day 8 — Make your Perl 6 grammar compact

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.