Day 21, Grammars and Actions

by

In today’s post, we’re going to cover several topics, but primarily grammars. I’d like to use an example inspired by some Perl 5 code I wrote for work recently.

So we have a bunch of text that we want to process and deal with. Perl’s supposed to be great at that, right? To be precise, let’s talk about the following text, describing some questions and their answers:

pickmany: Which items are food?
    ac: Rice
    ac: Orange
    ac: Mushroom
    ai: Shoes
pickone: Which item is a color?
    ac: Orange
    ai: Shoes
    ai: Mushroom
    ai: Rice

To parse this in Perl 6, I'll start by defining a Grammar. A Grammar is a special type of namespace for holding regular expressions. We'll also define several named expressions to split up our parsing a bit.

grammar Question::Grammar {
    token TOP {
        \n*
        <question>+
    }
    token question {
        <header>
        <answer>+
    }
    token header {
        ^^ $<type>=['pickone'|'pickmany'] ':' \s+ $<text>=[\N*] \n
    }
    token answer {
        ^^ \s+ $<correct>=['ac'|'ai'] ':' \s+ $<text>=[\N*] \n
    }
}

First, a brief overview of what's going on here, from a standpoint that assumes you're familiar with regular expressions at least a little. By default in Perl 6 grammars, whitespace is ignored and matches occur over the entire string. It's like the /x and /s Perl 5 modifiers are turned on. TOP is the regex that's called if we try to match against the entire grammar as a whole.

'token' is one of three identifiers used to declare a regex, including 'regex', 'token', and 'rule'. 'regex' is the plain, unmodified version, and the second two just enable some additional options. 'token' disables backtracking, and 'rule' both disables backtracking and causes whitespace in the regex to match literal whitespace in the matched text. We won't use 'rule's here.

The <foo> syntax is what's used to call another named regex. '^^' is used to match the beginning of a line, as opposed to lone '^', which matches the beginning of the entire matched text. The square brackets, [], are non-capturing grouping, like (?: ) in Perl 5 regular expressions.

The = syntax is used to assign the RHS into the name specified on the LHS. You'll see what I mean later when we use the result of this regex.

Let's see what we get if we try matching against that grammar and printing the result we get back:

my $text = Q {
pickmany: Which items are food?
    ac: Rice
    ac: Orange
    ac: Mushroom
    ai: Shoes
pickone: Which item is a color?
    ac: Orange
    ai: Shoes
    ai: Mushroom
    ai: Rice
};
my $match = Question::Grammar.parse($text);
say $match.perl;

Try running that yourself if you like. It produces 232 lines of output, which is a bit too much to include here. Let's pull out just one part, the questions.

# Print the question
for $match<question>.flat -> $q {
    say $q<header><text>;
}
We need to use .flat, because $match<question> is an array held in a scalar container.

As a reminder, postfix <> is the auto-quoting named lookup syntax. That's equivalent to the following, but a little easier to type, and also easier to read:

# Print the question
for $match{'question'}.flat -> $q {
    say $q{'header'}{'text'};
}

So we can see that a match object contains named items as hash values, and repetitions are stored as an array. If we had any positional captures, made with parens just like Perl 5: (), they would have been accessed through the positional interface, with postfix [], like an array.

The next step is to make some classes and then populate them from the match object. First some class definitions:

class Question::Answer {
    has $.text is rw;
    has Bool $.correct is rw;
}
class Question {
    has $.text is rw;
    has $.type is rw;
    has Question::Answer @.answers is rw;
}

Building Question objects out of the match object isn't that bad, but it's still not pretty:

my @questions = $match<question>.map: {
    Question.new(
        text    => ~$_<header><text>,
        type    => ~$_<header><type>,
        answers => $_<answer>.map: {
            Question::Answer.new(
                text => ~$_<text>,
                correct => ~$_<correct> eq 'ac',
            )
        },
    );
};

Remembering that any repetition in the regex is reflected as an array in the match object, we map over the <question> attribute, building a Question object for each. Each <question> match has an array of <answer> matches, so we map over those too, building a list of Question::Answer objects for each one. We are stringifying the values to have strings in our array, rather than a bunch of Match objects.

As you can guess, this approach doesn't scale up very well. A much nicer way to do it is to build the objects as we go. The method used to do this is to pass an object as the :action argument to the .parse() method on the Grammar. The parsing engine will then call methods on that object with the same name as the regexes being parsed, with the evaluated match object for the rule passed as an argument. If the method calls 'make()' during execution, the argument to 'make()' is set as the '.ast' (for "Abstract Syntax Tree") attribute of the match object.

Okay, that's a fairly abstract description. Let's see some real code. We need to make a class with methods named the same as those three regexes:

class Question::Actions {
    method TOP($/) {
        make $<question>».ast;
    }
    method question($/) {
        make Question.new(
            text => ~$<header><text>,
            type => ~$<header><type>,
            answers => $<answer>».ast,
        );
    }
    method answer($/) {
        make Question::Answer.new(
            correct => ~$<correct> eq 'ac',
            text => ~$<text>,
        );
    }
}

$/ is the traditional name for match objects, and it's as special as $_, in that there's special syntax that accesses its attributes. Named and Positional access without a variable ($<foo> and $[1]) are translated into access to $/ ($/<foo> and $/[1]). It's only a one-character difference, but it saves some visual noise, and helps it fill a semantic space similar to $1, $2, $3, etc. in Perl 5.

In the 'TOP' method, we just use a hyperoperator method call to make a list of the .ast attributes of each item in $<question>. Again, whenever we call 'make' in an action method, we're setting something as the '.ast' attribute of the match object that gets returned, so this is just fetching whatever we 'make' in the 'question' method.

In the 'question' method, we construct a new Question object, populating its attributes from the match object, and specifically set its 'answers' attribute as the list of objects we produce in each call to the 'answer' regex from the current parse of 'question'.

In the 'answer' method, we do the same thing, setting the 'correct' attribute to the result of a comparison, so that it satisfies the 'Bool' type constraint on the attribute.

So, again, to use this in a parse, we instantiate this new class and pass the object as the :action parameter to the '.parse' method of the grammar, and then we fetch the constructed object from the '.ast' attribute of the match object it returns:

my $actions = Question::Actions.new();
my @questions = Question::Grammar.parse($text, :actions($actions)).ast.flat;
We need .flat for the same reason as before.

Now we can inspect the created objects to see that everything went according to plan:

for @questions -> $q {
    say $q.text;
    for $q.answers.kv -> $i, $a {
        say "    $i) {$a.text}";
    }
}

To finish this post off, let's add a method to Question to ask the question, fetch an answer, and grade the question.

Let's start by printing out a representation of the question, its answers, and a prompt:

    method ask {
        my %hints = (
            pickmany => "Choose all that are true",
            pickone => "Choose the one item that is correct",
        );
        say "\n{%hints{$.type}}\n";
        say $.text;
        for @.answers.kv -> $i, $a {
            say "$i) {$a.text}";
        }
        print "> ";

Next, let's fetch a line from STDIN and pull out the digits.

        my $line = $*IN.get();
        my @answers = $line.comb(/<digit>+/)>>.Int.sort;

'comb' is kind of the opposite of 'split', in that we specify what we want to keep instead of what we want to discard. The advantage here is that we don't have to choose a delimiter. The user can enter "1 2 3", "1,2,3", or even "1, 2, and 3". We then use a hyperoperator method call to generate an array of Integers from the array of Matches, and then sort it.

Next, let's generate a corresponding array of all of the correct answer indexes, and then compare them to determine correctness of the response. This isn't the only way to do it, merely the first that occurred to me. :)

        my @correct = @.answers.kv.map({ $^value.correct ?? $^key !! () });
        if @correct ~~ @answers {
            say "Yay, you got it right!";
            return 1;
        }
        else {
            say "Oops... you got it wrong. :(";
            return 0;
        }
    }

Let's call it on each question and collect the results by mapping over our new method:

my @results = @questions.map(*.ask);
say "\nFinal score: " ~ [+] @results;

You'll get results like this:

[sweeks@kupo ~]$ perl6 /tmp/questions.pl 

Choose all that are true, separated by spaces

Which items are food?
0) Rice
1) Orange
2) Mushroom
3) Shoes
> 0 1 2
Yay, you got it right!

Choose the one item that is correct

Which item is a color?
0) Orange
1) Shoes
2) Mushroom
3) Rice
> 1
Oops... you got it wrong. :(

Final score: 1

With everything put together, here's the full program we've written:


class Question::Answer {
    has $.text is rw;
    has Bool $.correct is rw;
}
class Question {
    has $.text is rw;
    has $.type is rw;
    has Question::Answer @.answers is rw;
    method ask {
        my %hints = (
            pickmany => "Choose all that are true",
            pickone => "Choose the one item that is correct",
        );
        say "\n{%hints{$.type}}\n";
        say $.text;
        for @.answers.kv -> $i, $a {
            say "$i) {$a.text}";
        }
        print "> ";
        my $line = $*IN.get();
        my @answers = $line.comb(/<digit>+/)>>.Int.sort;
        my @correct = @.answers.kv.map({ $^value.correct ?? $^key !! () });
        if @correct ~~ @answers {
            say "Yay, you got it right!";
            return 1;
        } else {
            say "Oops... you got it wrong. :(";
            return 0;
        }
    }
}

grammar Question::Grammar {
    token TOP {
        \n*
        <question>+
    }
    token question {
        <header>
        <answer>+
    }
    token header {
        ^^ $<type>=['pickone'|'pickmany'] ':' \s+ $<text>=[\N*] \n
    }
    token answer {
        ^^ \s+ $<correct>=['ac'|'ai'] ':' \s+ $<text>=[\N*] \n
    }
}

class Question::Actions {
    method TOP($/) {
        make $<question>».ast;
    }
    method question($/) {
        make Question.new(
            text => ~$<header><text>,
            type => ~$<header><type>,
            answers => $<answer>».ast,
        );
    }
    method answer($/) {
        make Question::Answer.new(
            correct => ~$<correct> eq 'ac',
            text => ~$<text>,
        );
    }
}

my $text = Q {
pickmany: Which items are food?
    ac: Rice
    ac: Orange
    ac: Mushroom
    ai: Shoes
pickone: Which item is a color?
    ac: Orange
    ai: Shoes
    ai: Mushroom
    ai: Rice
};

my $actions = Question::Actions.new();
my @questions = Question::Grammar.parse($text, :actions($actions)).ast.flat;
my @results = @questions.map(*.ask);

say "\nFinal score: " ~ [+] @results;

About these ads

7 Responses to “Day 21, Grammars and Actions”

  1. illviljan Says:

    Really interesting stuff and I understand that it is difficult to really subdivide it into smaller fragments. Still, this is the first Perl6 Advent Calendar gift I felt that I have to digest under the course of several days.

    There is quite a lot of stuff to grasp here (but no wonder, the Regexes part is one of the pieces that got a major overhaul in Perl6 compared to perl5).

    Keep up the good work!

  2. mberends Says:

    Tene++ for a producing a very nice example and post. The explanation is very clear and beautifully worded too, imho. Well done!

  3. Amber Spencer Says:

    Thanks for the nice explanation, helped me a lot:D

  4. Daniel Brockman Says:

    I think this is my favorite gift so far. It’s a bit long, but the writing is really good so the length isn’t really a problem. The examples are beautiful and manage to tie together a lot of different parts of the language quite nicely. Bravo!

  5. Reader Says:

    :action has been renamed to :actions in the Grammar.parse method

  6. Reader Says:

    Using rakudo-star-2011.04:

    :action has been renamed to :actions in the Grammar.parse method

  7. Chase Says:

    Hello, Neat post. There is an issue together with your website in
    web explorer, may test this? IE nonetheless is the market leader and a big component of other people will omit your magnificent writing due to this problem.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 37 other followers

%d bloggers like this: