Day 24: The Perl 6 standard grammar

We’ve now reached the end of this year’s advent series, what will be the gift in our last box? The door opens to reveal…the Perl 6 grammar.

At first it might seem odd to cite a grammar as a significant component of a language. Obviously the syntax matters a lot to people writing programs in that language, but once a syntax has been designed, we simply use grammars to describe the syntax and build parsers, right?

Not in Perl 6, where language syntax is a dynamic thing — modifiable to accommodate new keywords and syntax not anticipated in the original design. Or, perhaps more accurately, Perl 6 explicitly anticipates and supports modules and applications changing the language’s syntax for their specific needs. Defining custom operators is just one example of a place where we change the language syntax itself, but Perl 6 also allows the dynamic addition of macros, new statement types, new sigils, and the like.

Thus a Perl 6 grammar and parser needs to not only parse the standard Perl 6 syntax, it must handle program-defined custom syntax as well. Language modifications must also be scoped, so that defining a new operator in one module doesn’t inadvertently change the interpretation of another module in unintended ways.

This is what the Perl 6 standard grammar achieves, and much of the effort that has gone into the Perl 6 specification for regexes and grammars (Synopsis 5) has been just to make this sort of thing possible. I personally believe this is one of the key features that will enable Perl 6 to remain a viable language far into the future. (On the other hand, when I first read the designs for Perl 6 in detail, I had serious doubts as to whether this could in fact be achieved. It’s nice to see that it wasn’t an impossible dream.)

The expectation is that parsers for Perl 6 will themselves be written in Perl 6, and there are several examples already available. The “standard” or “reference” grammar and parser is STD.pm; Larry has been using this to refine the Perl 6 language specification and explore the impacts of various language constructs on the writing of Perl 6 programs.

Some parts of STD.pm are still evolving in response to implementation concerns; thus Rakudo Perl maintains its own version of the language grammar that works for its environment. Many of the ideas first explored by Rakudo often find their way back into the standard grammar. This is by design — our expectation is that the various grammar implementations will continue to converge over the course of the next year.

The key feature that jumps out from looking at the Perl 6 grammar is the use of protoregexes. A protoregex allows multiple regexes to be combined into a single “category”. In a more traditional grammar, we might write:

    rule statement {
        | <if_statement>
        | <while_statement>
        | <for_statement>
        | <expr>
    }
    rule if_statement    { 'if' <expr> <statement> }
    rule while_statement { 'while' <expr> <statement> }
    rule for_statement   { 'for' '(' <expr> ';' <expr> ';' <expr> ')' <stmt> }

With a protoregex, we’d write it as follows:

    proto token statement { <...> }
    rule statement:sym<if>    { 'if' <expr> <statement> }
    rule statement:sym<while> { 'while' <expr> <statement> }
    rule statement:sym<for>
        { 'for' '(' <expr> ';' <expr> ';' <expr> ')' <stmt> }
    rule statement:sym<expr>  { <expr> }

We’re still saying that a <statement> matches any of the listed statement constructs, but the protoregex version is much easier to extend. In the non-protoregex version above, adding a new statement construct (such as “repeat..until”) would require rewriting the “rule statement” declaration in its entirety to include the new statement construct. But with a protoregex, we can simply declare an additional rule:

    rule statement:sym<repeat> { 'repeat' <stmt> 'until' <expr> }

This newly declared rule is automatically added as one of the candidates to the <statement> protoregex. All of this works for derived languages as well:

    grammar MyNewGrammar is BaseGrammar {
        rule statement:sym<repeat> { 'repeat' <stmt> 'until' <expr> }
    }

Thus MyGrammar parses everything the same as BaseGrammar, with the additional definition of the repeat..until statement construct.

The ability to dynamically replace the existing grammar with a new one that has different parse semantics is at the heart of Perl 6’s operator overloading, macro handling, and other syntax modifying features. Unlike source filters, this provides a much more nuanced approach to declaring new language constructs.

Another significant component of the standard grammar is its devotion to providing useful error diagnostics when an error is encountered. Instead of simply saying “an error occurred here”, it offers suggestions about what might have been intended instead, and places where it thinks the programmer may have been confused. It also does significant work to catch constructs that have changed between Perl 5 and Perl 6, to assist people with migration. For example, if someone writes an “unless” statement with an “else” block, the parser responds with

    unless does not take "else" in Perl 6; please rewrite using "if"

Or, if a program appears to contain the question-mark-colon (?:) ternary operator, the parser says

    Unsupported use of "?:"; in Perl 6 please use "??!!"

 

In late October of this year, Rakudo started a significant refactor in a new branch (called “ng”) that makes use of protoregexes and the many other features of the STD.pm grammar. We still have a short way to continue before this new branch can become the official released version of Rakudo, but we expect that to happen in the January 2010 release. Already this conversion has enabled us to finally add long-awaited features in Rakudo, including dynamic generation of metaoperators, lazy list handling, lexical context handling, and the like.

With Rakudo’s conversion to following STD.pm for its grammar, we’re very much on track for the Rakudo Star release in April 2010. While we expect that Rakudo Star won’t be a complete implementation of Perl 6, it will be sufficiently advanced and usable for a wide variety of applications. We’ve been quickly resolving the critical items listed in the Rakudo Star ROADMAP, and over the next couple of months will be focusing on improved error reporting (like STD.pm) and distribution / packaging issues.

…and this concludes the Perl 6 Advent series for December 2009. We hope that you’ve enjoyed reading the articles at least as much as we’ve enjoyed writing them, and we appreciate the many comments that people have made about the posts. We also hope to have conveyed our sense that many useful parts of Perl 6 are available now for experimentation, and that we’re well on the way to making them available in 2010 for a wider variety of applications. Indeed, we have high hopes and expectations for the entire Perl family in 2010 — it promises to be an exciting time for us all.

Happy holidays, and best wishes for the new year.

2 thoughts on “Day 24: The Perl 6 standard grammar

Leave a reply to Satya Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.