Day 8 – Grammars generating grammars

Day 8 – Grammars generating grammars

By now you have probably gotten used to the prefix “meta” appearing here and there in the Perl 6 world. Metaclasses, metaobjects, metaoperators, the mythical Meta-Object Protocol. Sounds like nothing scary, all good and familiar and you’ve seen it all, eh? Nothing further from the truth! Today, on the Perl 6 Advent Calendar we’re going full meta. We’re going to have grammars that parse grammars then generate grammars that are going to use to parse grammars. I’ll let this sink in for a moment while you scroll down for the next paragraph.

Grammars are hands-down one of the killer features of Perl 6. Taking Perl’s no less than legendary ability to process text and taking it to the next level. Regular expressions, to many people’s disappointment are just what they say they are: regular (well, regular regular expressions do. Perl regexes are a bit of a different beast, but that’s a material for another story). They parse regular, as opposed to, say, context-free languages, to the neverending disappointment of all the people who would just love to parse XML with them, like in that everyone’s favourite Stackoverflow answer. Say no more to silly theoretical limitations of language theory though, since now we have grammars which are all we ever missed in regexes: readability, composability and, of course, ability to parse even Perl 6 itself — and if that doesn’t sell its absolute power, I don’t know what does.

Writing parsers for predefined grammars (say, in Bachus-Naur Form) was always a bit of a dull job, almost as exciting as copypasting stuff. If you ever sat down and wrote a parser from scratch (perhaps while going through the excellent Let’s Build a Compiler book), you probably recognize the pattern all too well: take a single rule from your grammar, write a subroutine for it, have it call (perhaps recursively) other subroutines similarly defined for other grammar rules, rinse, repeat. Well say no more to that now that we have Perl 6 grammars! In this wonderful new world we need not write subroutines for every token to get the work done. Now we write a grammar class, where we put tokens, rules and regexes for every symbol in our grammar, and inside them we write regexes (or code) that refer to (perhaps recursively) other symbols in our Perl 6 grammar. Now, if you ever went through both of those alternatives, you will definitely realize how massive of a convenience the grammars are in Perl 6. Instead of painstakingly cranking out repetitive and error-prone code we have a wonderful, declarative way to specify our language, with an impressive collection of utility to get most of our common, boring work out of the way.

But what if we already have a grammar, specified perhaps in the previously mentioned BNF? What we do then is carefully retype the existing grammar (parsing it in our head, actually) into our new, shiny Perl 6 grammar that represents the exact same thing, but has the clear advantage of actually being executable code. A fair deal, you could say. For most people, no big deal at all. We are not most people. We are programmers. We have the resources. The will. To make these grammars count! Our job revolves around building things that will do the work for us so we don’t have to. To take the well-defined specifications and turn them into tools that work for us, or for themselves. Tools that take the repetitive and automatable parts of our work the way. Why should building parsers be any different? Why, I’m glad you asked.

The wonderful thing about the Perl 6 grammars is that they are no more magical than any other element of the language. Just as classes are first class citizen that we can introspect, augment and build programatically, so are grammars. In fact, you can look at the source code of the compiler itself and notice that grammars are nothing else than a specialized kind of classes. They follow the same rules as classes do, allowing us to create them on the fly, add tokens to them on the fly and eventually finalize them in order to have a proper, instantiatable class object. So now that we can parse BNF grammars (since they’re just ordinary text) and create Perl 6 grammars from code, let’s put those pieces together and write us something that will save us the effort of converting a BNF grammar to Perl 6 grammar manually.

The grammar for a BNF grammar

grammar Grammar::BNF {
    token TOP { \s* <rule>+ \s* }

    token rule {
        <opt-ws> '<' <rule-name> '>' <opt-ws> '::=' <opt-ws> <expression> <line-end>
    }

    token expression {
        <list-of-terms> +% [\s* '|' <opt-ws>]
    }

    token term {
        <literal> | '<' <rule-name> '>'
    }

    token list-of-terms { <term> +% <opt-ws> }

    token rule-name { <-[>]>+ }

    token opt-ws { \h* }

    token line-end { [ <opt-ws> \n ]+ }

    token literal { '"' <-["]>* '"' | "'" <-[']>* "'" }

    ...
}

The interesting stuff happens in the three, almost-topmost tokens. rule is the core building block of a BNF grammar: a <symbol> ::= <expression> pair, followed by a new line. The entire grammar is no more than a list of those. Each expression is a list of terms, or possibly and alternative of them. Each term is either a literal, or a symbol name surrounded by angle brackets. Easy enough! That covers the parsing part. Let’s look at the generating itself. We do have a built-in mechanism of “doing stuff for each token in the grammar”, in the form of Actions, so let’s go ahead and use that:

my class Actions {
    has $.name = 'BNFGrammar';
    method TOP($/) {
        my $grmr := Metamodel::GrammarHOW.new_type(:$.name);
        $grmr.^add_method('TOP',
            EVAL 'token { <' ~ $<rule>[0].ast.key ~ '> }');
        for $<rule>.map(*.ast) -> $rule {
            $grmr.^add_method($rule.key, $rule.value);
        }
        $grmr.^compose;
        make $grmr;
    }

    method expression($/) {
        make EVAL 'token { ' ~ ~$/ ~ ' }';
    }

    method rule($/) {
        make ~$<rule-name> => $<expression>.ast;
    }
}

The TOP method is definitely the most magical and scary, so let’s tackle that first to make the rest of the stuff look trivial in comparison. Basically, three things happen there:

1. We create a new grammar, as a new Perl 6 type
2. We add tokens to it using the `^add_method` method
3. We finalize the grammar using the `^compose` method

While Perl 6 specifies that the token named TOP is where the parsing starts, in BNF the first rule is always the starting point. To adapt one to the other, we craft a phony TOP token which just calls the first rule specfied in the BNF grammar. Unavoidably, the scary and discouraging EVAL catches our attention, as if it said “horrible stuff happens here!” It’s not entirely wrong when it says that, but since we have no other way of programmatically constructing the individual regexes (that I know of), we’ll have to accept this little discomfort in the name of the Greater Good, and look a little bit closer at what we’re actually EVALing in a moment.

After TOP we proceed to add the rest of the BNF rules to our grammar, this time preserving their original names, then ^compose() the whole thing and finally make it the result of the parsing: a readymade parser class.

In the expression method we glue the parsed BNF elements together in order to produce valid Perl 6 code. This turns out to be pretty easy, since both separate symbols with whitespace, alternate them with the pipe character and sorround symbol names with angle brackets. So for a rule that looks like this:

<foo> ::= 'bar' | <baz>

The Perl 6 code that we EVAL turns becomes:

token { 'bar' | <baz> }

Since we already validated in the grammar part of our code that the BNF we parse is correct, there’s nothing stopping us from literally pasting the entire expression into our code and wrap it in a token { }, so let’s go ahead and do just that.

Last but not least, for each BNF rule we parse we produce a nice Pair, so our TOP method has an easier times processing each of them.

Seems like we’re about done here, but just for the users’ convenience, let’s write a nice method that takes a BNF grammar and produces a ready to use type object for us. As we remember, grammars are just classes, so there’s nothing stopping us from just adding it straight to our grammar:

grammar Grammar::BNF {
    ...

    method generate($source, :$name = 'BNFGrammar') {
        my $actions = Actions.new(:$name);
        my $ret = self.new.parse($source, :$actions).ast;
        return $ret.WHAT;
    }
}

Looks good from here! Before you start copypasting all this into your own projects, remember that Grammar::BNF is a Perl 6 module available in your Perl 6 Module Ecosystem, installable with your favourite module manager. It also ships with some goodies like a Perl 6 slang, allowing you to write BNF grammars straight in your Perl 6 code (as opposed to including them as strings), or parsing the more powerful (and perhaps more widespread) ABNF grammars as well.

If you indeed took the time for the beginning of this post to sink in, you may remember that I promised that we’re going to have grammars (a-one) that parse grammars (a-two) then generate grammars (a-three) that are going to use to parse grammars (a-four). So far we’ve seen the BNF::Grammar grammar (that’s our a-one), that parses a BNF grammar (that’s our a-two), generates a Perl 6 grammar in a form of a type object (that’s a-three) and… that’s it. We’re still missing the last part, using this whole thing to parse grammars. We’ve only gone 75%-meta, and that is just not good enough in this day and age. Why stop now? Why not take a BNF grammar of a BNF grammar, parse that with a Perl 6 grammar and use the resulting Perl 6 BNF grammar to parse our original BNF grammar of a BNF grammar? Wouldn’t that be sweet? Sure it will! That is, however, left as an exercise for you, my dear readers. After all, how fair would it be if I had all the fun while you just sit there and watch? Would you like it if you opened the little paper window in your advent calendar only to find a note saying. „There was a chocolate here for you. I ate it. It was truly delivious?” Me neither! There’s a BNF grammar for BNF both on Wikipedia and in Grammar::BNF’s very own test suite; the latter even includes a little breadcrumb that can help you with your adventure. I eagerly await thy results, and as always, thank you all for reading and I wish you a wonderful advent!

One thought on “Day 8 – Grammars generating grammars

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s