Author Archive

Day 21, Grammars and Actions

December 21, 2009

In today’s post, we’re going to cover several topics, but primarily grammars. I’d like to use an example inspired by some Perl 5 code I wrote for work recently.

So we have a bunch of text that we want to process and deal with. Perl’s supposed to be great at that, right? To be precise, let’s talk about the following text, describing some questions and their answers:

pickmany: Which items are food?
    ac: Rice
    ac: Orange
    ac: Mushroom
    ai: Shoes
pickone: Which item is a color?
    ac: Orange
    ai: Shoes
    ai: Mushroom
    ai: Rice

To parse this in Perl 6, I’ll start by defining a Grammar. A Grammar is a special type of namespace for holding regular expressions. We’ll also define several named expressions to split up our parsing a bit.

grammar Question::Grammar {
    token TOP {
        \n*
        <question>+
    }
    token question {
        <header>
        <answer>+
    }
    token header {
        ^^ $<type>=['pickone'|'pickmany'] ':' \s+ $<text>=[\N*] \n
    }
    token answer {
        ^^ \s+ $<correct>=['ac'|'ai'] ':' \s+ $<text>=[\N*] \n
    }
}

First, a brief overview of what’s going on here, from a standpoint that assumes you’re familiar with regular expressions at least a little. By default in Perl 6 grammars, whitespace is ignored and matches occur over the entire string. It’s like the /x and /s Perl 5 modifiers are turned on. TOP is the regex that’s called if we try to match against the entire grammar as a whole.

‘token’ is one of three identifiers used to declare a regex, including ‘regex’, ‘token’, and ‘rule’. ‘regex’ is the plain, unmodified version, and the second two just enable some additional options. ‘token’ disables backtracking, and ‘rule’ both disables backtracking and causes whitespace in the regex to match literal whitespace in the matched text. We won’t use ‘rule’s here.

The <foo> syntax is what’s used to call another named regex. ‘^^’ is used to match the beginning of a line, as opposed to lone ‘^’, which matches the beginning of the entire matched text. The square brackets, [], are non-capturing grouping, like (?: ) in Perl 5 regular expressions.

The = syntax is used to assign the RHS into the name specified on the LHS. You’ll see what I mean later when we use the result of this regex.

Let’s see what we get if we try matching against that grammar and printing the result we get back:

my $text = Q {
pickmany: Which items are food?
    ac: Rice
    ac: Orange
    ac: Mushroom
    ai: Shoes
pickone: Which item is a color?
    ac: Orange
    ai: Shoes
    ai: Mushroom
    ai: Rice
};
my $match = Question::Grammar.parse($text);
say $match.perl;

Try running that yourself if you like. It produces 232 lines of output, which is a bit too much to include here. Let’s pull out just one part, the questions.

# Print the question
for $match<question>.flat -> $q {
    say $q<header><text>;
}
We need to use .flat, because $match<question> is an array held in a scalar container.

As a reminder, postfix <> is the auto-quoting named lookup syntax. That’s equivalent to the following, but a little easier to type, and also easier to read:

# Print the question
for $match{'question'}.flat -> $q {
    say $q{'header'}{'text'};
}

So we can see that a match object contains named items as hash values, and repetitions are stored as an array. If we had any positional captures, made with parens just like Perl 5: (), they would have been accessed through the positional interface, with postfix [], like an array.

The next step is to make some classes and then populate them from the match object. First some class definitions:

class Question::Answer {
    has $.text is rw;
    has Bool $.correct is rw;
}
class Question {
    has $.text is rw;
    has $.type is rw;
    has Question::Answer @.answers is rw;
}

Building Question objects out of the match object isn’t that bad, but it’s still not pretty:

my @questions = $match<question>.map: {
    Question.new(
        text    => ~$_<header><text>,
        type    => ~$_<header><type>,
        answers => $_<answer>.map: {
            Question::Answer.new(
                text => ~$_<text>,
                correct => ~$_<correct> eq 'ac',
            )
        },
    );
};

Remembering that any repetition in the regex is reflected as an array in the match object, we map over the <question> attribute, building a Question object for each. Each <question> match has an array of <answer> matches, so we map over those too, building a list of Question::Answer objects for each one. We are stringifying the values to have strings in our array, rather than a bunch of Match objects.

As you can guess, this approach doesn’t scale up very well. A much nicer way to do it is to build the objects as we go. The method used to do this is to pass an object as the :action argument to the .parse() method on the Grammar. The parsing engine will then call methods on that object with the same name as the regexes being parsed, with the evaluated match object for the rule passed as an argument. If the method calls ‘make()’ during execution, the argument to ‘make()’ is set as the ‘.ast’ (for “Abstract Syntax Tree”) attribute of the match object.

Okay, that’s a fairly abstract description. Let’s see some real code. We need to make a class with methods named the same as those three regexes:

class Question::Actions {
    method TOP($/) {
        make $<question>».ast;
    }
    method question($/) {
        make Question.new(
            text => ~$<header><text>,
            type => ~$<header><type>,
            answers => $<answer>».ast,
        );
    }
    method answer($/) {
        make Question::Answer.new(
            correct => ~$<correct> eq 'ac',
            text => ~$<text>,
        );
    }
}

$/ is the traditional name for match objects, and it’s as special as $_, in that there’s special syntax that accesses its attributes. Named and Positional access without a variable ($<foo> and $[1]) are translated into access to $/ ($/<foo> and $/[1]). It’s only a one-character difference, but it saves some visual noise, and helps it fill a semantic space similar to $1, $2, $3, etc. in Perl 5.

In the ‘TOP’ method, we just use a hyperoperator method call to make a list of the .ast attributes of each item in $<question>. Again, whenever we call ‘make’ in an action method, we’re setting something as the ‘.ast’ attribute of the match object that gets returned, so this is just fetching whatever we ‘make’ in the ‘question’ method.

In the ‘question’ method, we construct a new Question object, populating its attributes from the match object, and specifically set its ‘answers’ attribute as the list of objects we produce in each call to the ‘answer’ regex from the current parse of ‘question’.

In the ‘answer’ method, we do the same thing, setting the ‘correct’ attribute to the result of a comparison, so that it satisfies the ‘Bool’ type constraint on the attribute.

So, again, to use this in a parse, we instantiate this new class and pass the object as the :action parameter to the ‘.parse’ method of the grammar, and then we fetch the constructed object from the ‘.ast’ attribute of the match object it returns:

my $actions = Question::Actions.new();
my @questions = Question::Grammar.parse($text, :actions($actions)).ast.flat;
We need .flat for the same reason as before.

Now we can inspect the created objects to see that everything went according to plan:

for @questions -> $q {
    say $q.text;
    for $q.answers.kv -> $i, $a {
        say "    $i) {$a.text}";
    }
}

To finish this post off, let’s add a method to Question to ask the question, fetch an answer, and grade the question.

Let’s start by printing out a representation of the question, its answers, and a prompt:

    method ask {
        my %hints = (
            pickmany => "Choose all that are true",
            pickone => "Choose the one item that is correct",
        );
        say "\n{%hints{$.type}}\n";
        say $.text;
        for @.answers.kv -> $i, $a {
            say "$i) {$a.text}";
        }
        print "> ";

Next, let’s fetch a line from STDIN and pull out the digits.

        my $line = $*IN.get();
        my @answers = $line.comb(/<digit>+/)>>.Int.sort;

‘comb’ is kind of the opposite of ‘split’, in that we specify what we want to keep instead of what we want to discard. The advantage here is that we don’t have to choose a delimiter. The user can enter “1 2 3″, “1,2,3″, or even “1, 2, and 3″. We then use a hyperoperator method call to generate an array of Integers from the array of Matches, and then sort it.

Next, let’s generate a corresponding array of all of the correct answer indexes, and then compare them to determine correctness of the response. This isn’t the only way to do it, merely the first that occurred to me. :)

        my @correct = @.answers.kv.map({ $^value.correct ?? $^key !! () });
        if @correct ~~ @answers {
            say "Yay, you got it right!";
            return 1;
        }
        else {
            say "Oops... you got it wrong. :(";
            return 0;
        }
    }

Let’s call it on each question and collect the results by mapping over our new method:

my @results = @questions.map(*.ask);
say "\nFinal score: " ~ [+] @results;

You’ll get results like this:

[sweeks@kupo ~]$ perl6 /tmp/questions.pl 

Choose all that are true, separated by spaces

Which items are food?
0) Rice
1) Orange
2) Mushroom
3) Shoes
> 0 1 2
Yay, you got it right!

Choose the one item that is correct

Which item is a color?
0) Orange
1) Shoes
2) Mushroom
3) Rice
> 1
Oops... you got it wrong. :(

Final score: 1

With everything put together, here’s the full program we’ve written:


class Question::Answer {
    has $.text is rw;
    has Bool $.correct is rw;
}
class Question {
    has $.text is rw;
    has $.type is rw;
    has Question::Answer @.answers is rw;
    method ask {
        my %hints = (
            pickmany => "Choose all that are true",
            pickone => "Choose the one item that is correct",
        );
        say "\n{%hints{$.type}}\n";
        say $.text;
        for @.answers.kv -> $i, $a {
            say "$i) {$a.text}";
        }
        print "> ";
        my $line = $*IN.get();
        my @answers = $line.comb(/<digit>+/)>>.Int.sort;
        my @correct = @.answers.kv.map({ $^value.correct ?? $^key !! () });
        if @correct ~~ @answers {
            say "Yay, you got it right!";
            return 1;
        } else {
            say "Oops... you got it wrong. :(";
            return 0;
        }
    }
}

grammar Question::Grammar {
    token TOP {
        \n*
        <question>+
    }
    token question {
        <header>
        <answer>+
    }
    token header {
        ^^ $<type>=['pickone'|'pickmany'] ':' \s+ $<text>=[\N*] \n
    }
    token answer {
        ^^ \s+ $<correct>=['ac'|'ai'] ':' \s+ $<text>=[\N*] \n
    }
}

class Question::Actions {
    method TOP($/) {
        make $<question>».ast;
    }
    method question($/) {
        make Question.new(
            text => ~$<header><text>,
            type => ~$<header><type>,
            answers => $<answer>».ast,
        );
    }
    method answer($/) {
        make Question::Answer.new(
            correct => ~$<correct> eq 'ac',
            text => ~$<text>,
        );
    }
}

my $text = Q {
pickmany: Which items are food?
    ac: Rice
    ac: Orange
    ac: Mushroom
    ai: Shoes
pickone: Which item is a color?
    ac: Orange
    ai: Shoes
    ai: Mushroom
    ai: Rice
};

my $actions = Question::Actions.new();
my @questions = Question::Grammar.parse($text, :actions($actions)).ast.flat;
my @results = @questions.map(*.ask);

say "\nFinal score: " ~ [+] @results;

Day 12: Modules and Exporting

December 12, 2009

Today I’d like to talk about a fairly fundamental subject: libraries.

To write a library in Perl 6, we use the “module” keyword:

module Fancy::Utilities {
    sub lolgreet($who) {
        say "O HAI " ~ uc $who;
    }
}

Put that in Fancy/Utilities.pm somewhere in $PERL6LIB and we can use it like the following:

use Fancy::Utilities;
Fancy::Utilities::lolgreet('Tene');

That’s hardly ideal.  Just like in Perl 5, we can indicate that some symbols from the module should be made available in the lexical scope of the code loading the module.  We’ve got a rather different syntax for it, though:

# Utilities.pm
module Fancy::Utilities {
  sub lolgreet($who) is export {
    say "O HAI " ~ uc $who;
  }
}

# foo.pl
use Fancy::Utilities;
lolgreet('Jnthn');

If you don’t specify further, symbols marked “is export” are exported by default.  We can also choose to label symbols as being exported as part of a different named group:

module Fancy::Utilities {
 sub lolgreet($who) is export(:lolcat, :greet) {
  say "O HAI " ~ uc $who;
 }
 sub nicegreet($who) is export(:greet, :DEFAULT) {
  say "Good morning, $who!"; # Always morning?
 }
 sub shortgreet is export(:greet) {
  say "Hi!";
 }
 sub lolrequest($item) is export(:lolcat) {
  say "I CAN HAZ A {uc $item}?";
 }
}

Those tags can be referenced in the code loading this module to choose which symbols to import:

use Fancy::Utilities; # Just get the DEFAULTs
use Fancy::Utilities :greet, :lolcat;
use Fancy::Utilities :ALL; # Everything marked is export

Multi subs are export by default, so you only need to label them if you want to change that.

multi sub greet(Str $who) { say "Good morning, $who!" }
multi sub greet() { say "Hi!" }
multi sub greet(Lolcat $who) { say "O HAI " ~ $who.name }

Classes are just a specialization of modules, so you can export things from them as well.  In addition, you can export a method to make it available as a multi sub.  For example, the setting exports the “close” method from the IO class so you can call “close($fh);”

class IO {
    ...
    method close() is export {
        ...
    }
    ...
}

Perl 6 does support importing symbols by name from a library, but Rakudo does not yet implement it.


Follow

Get every new post delivered to your Inbox.

Join 36 other followers