The language formerly known as Perl 6 is now called Raku, so this year’s advent calendar can be found at rakuadventcalendar.wordpress.com.
Enjoy, and please updated your bookmarks, RSS feeds and so on!
The language formerly known as Perl 6 is now called Raku, so this year’s advent calendar can be found at rakuadventcalendar.wordpress.com.
Enjoy, and please updated your bookmarks, RSS feeds and so on!
We can quibble all day if 2017 was the year of the Linux desktop, but there can be little doubt that it was the year of the Perl 6 book.
December 2016 brought us the ebook launch of Perl 6 at a Glance by Andrew Shitov, and then in 2017 the print version came it. It is an introduction to Perl 6 that targets programmers already familiar with another language.
It is the first of a generation of “modern” Perl 6 books. There weren’t many Perl 6 books before, most notably “Perl 6 and Parrot Essentials”, which was written when Perl 6 was very much a language in flux. The December 2015 release of the Perl 6 language version v6.c (and the accompanying Rakudo Perl 6 compiler) finally offers enough stability to make Perl 6 books work.
The next book released in 2017 was “Think Perl 6: How to Think Like a Computer Scientist”. It is a Perl 6 adaptation of Allen Downey’s great book Think Python: How to Think Like a Computer Scientist, lovingly ported to Perl 6 by Laurent Rosenfeld. It is available in print from O’Reilly, and freely available as an ebook from Green Tea Press. It is also available under an Open Source license in its source form (LaTeX) on GitHub.
“Think Perl 6” is an introduction to programming and computer science that happens to use Perl 6 as its primary tool. It targets the absolute beginner, and goes into a lot of detail on basic concepts such as branches, loops, variables, expressions, functions, recursion and so on.
A book I haven’t had on my radar until it was available for purchase on Amazon was Learning to program with Perl 6: First Steps: Getting into programming without leaving the command line by JJ Merelo. You can buy it on Amazon pretty cheaply, or check it out on GitHub, where you can find a musical as bonus material.
It mostly targets beginners, and also discusses some things related to programming, like the use of GitHub, some shell features, and SSH. It is a light-hearted introduction into computing and Perl 6.
Perl 6 Fundamentals started its life as “Perl 6 by Example”, written by Moritz Lenz, aka yours truly. (Yes, authors write about themselves in the third person. That “About the Author” section in each book? Written by the author. In third person. Weird). When Apress acquired the book, it was renamed to Perl 6 Fundamentals: A Primer with Examples, Projects, and Case Studies. It is available from everywhere that you can buy books. At least I hope so :-)
Each chapter focuses on one (at least somewhat) practical example, and uses that as an excuse to talk about various Perl 6 features, including concurrency, functional programming, grammars, and calling Python libraries through Inline::Python. (You can read the chapter about Inline::Python over at perltricks.com.) It targets programmers with previous experience, though not necessarily Perl 6 (or Perl 5) experience.
Larry Wall has kindly written a foreword for the book.
Andrew Shitov’s second Perl 6 book, Perl 6 Deep Dive is, as the name suggests, more comprehensive and a, well, deeper dive, than “Perl 6 at Glance”, though somewhat similar in style. With more than 350 pages, it seems to have the largest coverage of Perl 6 features of any book so far.
Guess who’s released a third Perl 6 book within one year? That’s right, Andrew Shitov again. Somebody give that man a medal! Using Perl 6 is a collection of 100 programming challenges/problems and their solution in Perl 6. This is what Andrew wrote about it:
About a year ago, I decided to write a book about using Perl 6. Later, the plans changed and I published “Perl 6 at a Glance”, after which I wanted to write “Migrating to Perl 6” but instead wrote “Perl 6 Deep Dive” for Packt Publishing. Here and there, I was giving trainings on Perl 5, Python, and JavaScript, and was always suffering from finding a good list of easy tasks that a newcomer can use to train their skills in the language. Finally, I made up the list and solved it in Perl 6. This is the content of “Using Perl 6” — a book with solutions to 100 programming challenges, from simple to intermediate, with explanations of the Perl 6 constructions used.
Since his fourth book, *Migrating to Perl 6 will be released in 2018, it doesn’t get its own section. Take that, Andy! :-) This is the “Perl 6 for Perl 5 programmers” book that people (Perl 5 people, mostly) have been asking for on IRC and some other media.
And of course I won’t mention Andrew’s kickstarter for a cookbook-style project, because that will further skew the stats. Ooops, I just did. Hmm. Well, go support the man!
After writing a general Perl 6 book, I wanted to focus on a narrower topic. A non-representative poll on twitter confirmed my suspicion that regexes and grammars would be the best niche, and so Parsing with Perl 6 Regexes and Grammars: A Recursive Descent into Parsing was born.
It requires basic programming knowledge to read, but no prior exposure to regexes to Perl. It goes from the building blocks of regexes to fully-featured parsers, including Abstract Syntax Tree generation and error reporting. A discussion of three very different example parsers concludes the nearly 200 pages, which could have also been titled far more than you ever wanted to know about parsing with Perl 6.
Right now, the ebook version is available for purchase, and I hope that the print version will be ready by Christmas. (And I’m talking about Christmas 2017, to be sure :)
I’d be remiss if I didn’t point out two more books that aren’t available yet, but might be in the next months or years.
brian d foy works on Learning Perl 6. In his last update, he shares that the first draft of the book is written, with a plan for things that need rewriting.
Gabor Szabo crowd-funded a book on Web Application Development in Perl 6 using the Bailador framework. The earlier chapters are mostly fleshed out, and the later chapters mostly exist as skeletons. Gabor expects it to be finished in 2018.
The flood of Perl 6 books has made it hard for newcomers to decide which book to read, so I created the https://perl6book.com/ website that has one-line summaries, and a flow-chart for deciding which book to buy.
Even though I had input from other Perl 6 authors, it certainly reflects my biases. But, you can help to improve it!
With 7 Perl 6 books published by three major publishers in 2017, it’s been a fantastic year. I am also very happy with the diversity of the books, their target audience and styles. I hope you are too!
A final plea: If you have read any of these books, please give the author some feedback. They put incredible amounts of work into those, and feedback helps the author’s learning process and motivation. And if you liked a book, maybe even give it 5 stars on Amazon and write a line or two about why you liked it.
The following is an extract from the book Parsing with Perl 6 Regexes and Grammars: A Recursive Descent into Parsing, by Moritz Lenz, published 2017 by Apress Media, LLC. Reprinted with permission.
This books is being published right now. At least the ebook version should become available for purchase this month, and print version can be pre-ordered from Amazon. It should ship in January 2018 at the latest, but with a bit of luck, it’ll be available by Christmas.
Below you can find chapter 9, Parsing with Grammars. The previous chapters discuss the building blocks of regexes in great detail, how regexes interact with Perl 6 code, match objects, regex mechanics, common regex techhniques, and reusing and composing regexes. You can acquire some of this background by reading the official documentation on regexes.
Later chapters cover action classes and objects, how to report high-quality parse errors, Unicode support, and finally three case studies.
And now, enjoy!
Grammars are the proverbial Swiss-army chain saw1 for parsing.
In this chapter, we will explore them in more detail. Most importantly, we will discuss how to harness their power.
Grammars implement a top-down approach to parsing. The entry point, usually the regex TOP
, knows about the coarse-grained structure and calls further regexes that descend into the gory details. Recursion can be involved too. For example, if you parse a mathematical expression, a term can be an arbitrary expression inside a pair of parentheses.
This is a top-down structure, or more precisely a recursive descent parser. If no backtracking is involved, we call it a predictive parser, because at each position in the string, we know exactly what we’re looking for — we can predict what the next token is going to be (even if we can only predict that it might be one of a set of alternatives).
The resulting match tree corresponds in structure exactly to the call structure of regexes in the grammar. Let’s consider parsing a mathematical expression that only includes the operators *
, +
, and parentheses for grouping:
grammar MathExpression { token TOP { <sum> } rule sum { <product>+ % '+' } rule product { <term>+ % '*' } rule term { <number> | <group> } rule group { '(' <sum> ')' } token number { \d+ } } say MathExpression.parse('2 + 4 * 5 * (1 + 3)');
From the grammar itself you can already see the potential for recursion: sum
calls product
, which calls term
, which calls group
, which calls sum
again. This allows parsing of nested expressions of arbitrary depth.
Parsing the example above produces the following match object:
⌜2 + 4 * 5 * (1 + 3)⌟ sum => ⌜2 + 4 * 5 * (1 + 3)⌟ product => ⌜2 ⌟ term => ⌜2 ⌟ number => ⌜2⌟ product => ⌜4 * 5 * (1 + 3)⌟ term => ⌜4 ⌟ number => ⌜4⌟ term => ⌜5 ⌟ number => ⌜5⌟ term => ⌜(1 + 3)⌟ group => ⌜(1 + 3)⌟ sum => ⌜1 + 3⌟ product => ⌜1 ⌟ term => ⌜1 ⌟ number => ⌜1⌟ product => ⌜3⌟ term => ⌜3⌟ number => ⌜3⌟
If you want to know how a particular number was parsed, you can follow the path backwards by looking for lines above the current line that are indented less; for instance, the number 1
was parsed by token number
, called from term
, called from product
, and so on.
We can verify this by raising an exception from token number
:
token number { (\d+) { die "how did I get here?" if $0 eq '1' } }
This indeed shows the call chain in the backtrace, with the most immediate context at the top:
how did I get here? in regex number at bt.p6 line 9 in regex term at bt.p6 line 5 in regex product at bt.p6 line 4 in regex sum at bt.p6 line 3 in regex group at bt.p6 line 6 in regex term at bt.p6 line 5 in regex product at bt.p6 line 4 in regex sum at bt.p6 line 3 in regex TOP at bt.p6 line 2 in block <unit> at bt.p6 line 13
This grammar only uses tokens and rules, so there is no backtracking involved, and the grammar is a predictive parser. This is fairly typical. Many grammars work fine without backtracking, or with backtracking in just a few places.
The MathExpression
grammar has two rules which are structurally identical:
rule sum { <product>+ % '+' } rule product { <term>+ % '*' }
Instead, we could have written
rule expression { <operator>+ % <term> } token operator { '*' | '+' }
or even used the proto token
construct discussed in the previous chapter to parse different operators. The reason I chose the first, more repetitive, approach is that it makes the match structure correspond to the precedence of the operators *
and +
.
When evaluating the mathematical expression 1 + 2 * 5
, mathematicians and most programming languages evaluate the 2 * 5
first, because the *
operator has tighter precedence than +
. The result is then substituted back into the expression, leading to 1 + 10
, and finally 11
as the result.
When parsing such expressions with the first version of the grammar, the structure of the parse tree expresses this grouping: it has — as the top level — a single sum, with the operands being 1
and 2 * 5
.
This comes at a cost: we need a separate rule and name for each precedence level, and the nesting of the resulting match object has at least one level per precedence level. Furthermore, adding more precedence levels later on is not trivial, and very hard to do in a generic way. If you are not willing to accept these costs, you can instead use the flat model with a single token for parsing all operators. If you then need the structure in a way that reflects precedence, you can write code that transforms the list into a tree. This is commonly called an operator precedence parser.
To avoid infinite recursion, you have to take care that each possible recursion cycle advances the cursor position by at least one character. In the MathExpression
grammar, the only possible recursion cycle is sum
→ product
→ term
→ group
→ sum
, and group
can only match if it consumes an initial open parenthesis, (
.
If recursion does not consume a character, it is called left recursion and needs special language support that Perl 6 does not offer. A case in point is
token a { <a>? 'a' }
which could match the same input as the regex a+
, but instead loops infinitely without progressing.
A common technique to avoid left recursion is to have a structure where you can order regexes from generic (here sum
) to specific (number
). You only have to be careful and check for consumed characters when a regex deviates from that order (e.g. group
calling sum
).
Another potential source of infinite loops is when you quantify a regex that can match the empty string. This can happen when parsing a language that actually allows something to be empty. For instance, in UNIX shells, you can assign variables by potentially leaving the right-hand side empty:
VAR1=value VAR2=
When writing a grammar for UNIX shell commands, it might be tempting to write a token string { \w* }
that would potentially match an empty string. In a situation that allows for more than one string literal, <string>+
can then hang, because the effective regex, [\w*]+
, tries to match a zero-width string infinitely many times.
Once you are aware of the problem, the solution is pretty simple: change the token to not allow an empty string (token string { \w+ }
), and explicitly take care of situations where an empty string is allowed:
token assignment { <variable> '=' <string>? }
Even though a grammar works from the top down, developing a grammar works best from the bottom up. It is often not obvious from the start what the overall structure of a grammar will be, but you usually have a good idea about the terminal tokens: those that match text directly without calling other subrules.
In the earlier example of parsing mathematical expressions, you might not have known from the start how to arrange the rules that parse sums and products, but it’s likely that you knew you had to parse a number at some point, so you can start by writing:
grammar MathExpression { token number { \d+ } }
This is not much, but it’s also not very complicated, and it’s a good way to get over the writer’s block that programmers sometimes face when challenged with a new problem area. Of course, as soon as you have a token, you can start to write some tests:
grammar MathExpression { token number { \d+ } } multi sub MAIN(Bool :$test!) { use Test; plan 2; ok MathExpression.parse('1234', :rule<number>), '<number> parses 1234'; nok MathExpression.parse('1+4', :rule<number>), '<number> does not parse 1+4'; }
Now you can start to build your way up to more complex expressions:
grammar MathExpression { token number { \d+ } rule product { <number>+ % '*' } } multi sub MAIN(Bool :$test!) { use Test; plan 5; ok MathExpression.parse('1234', :rule<number>), '<number> parses 1234'; nok MathExpression.parse('1+4', :rule<number>), '<number> does not parse 1+4'; ok MathExpression.parse('1234', :rule<product>), '<product> can parse a simple number'; ok MathExpression.parse('1*3*4', :rule<product>), '<product> can parse three terms'; ok MathExpression.parse('1 * 3', :rule<product>), '<product> and whitespace'; }
It is worth it to include whitespace early on in the tests. The example above looks innocent enough, but the last test actually fails. There is no rule that matches the whitespace between the 1
and the *
. Adding a space in the regex between the <number>
and the +
quantifier makes the tests pass again, because the whitespace inserts an implicit <.ws>
call.
Such subtleties are easy to catch if you start really simple and catch them as soon as possible. If instead you give in to the temptation of writing down a whole grammar from top to bottom, you can spend many hours debugging why some seemingly simple thing such as an extra space makes the parse fail.
Once you have written the basic tokens for lexical analysis, you can progress to combining them. Typically the tokens do not parse whitespace at the borders of their matches, so the rules that combine them do that.
In the MathExpression
example in the previous section, rule product
directly called number
, even though we now know that the final version uses an intermediate step, rule term
, which can also parse an expression in parentheses. Introducing this extra step does not invalidate the tests we have written for product
, because the strings it matched in the early version still match. Introducing more layers happens naturally when you start with a grammar that handles a subset of the language, which you later expand.
There are two failure modes for a regex or a grammar: it can match when it’s not supposed to match (a false positive), or it can fail to match when it’s supposed to match (a false negative). Typically, false positives are easier to understand, because you can inspect the resulting match object and see which regexes matched which part of the string.
There is a handy tool for debugging false negatives: the Grammar::Tracer
module. If you load the module in a file containing a grammar, running the grammar produces diagnostic information that can help you find out where a match went wrong.
Note that this is only a diagnostic tool for developers; if you want to give end users better error messages, please read Chapter 11 for improvement suggestions.
You need to install the Perl 6 module Grammar::Debugger
, which also contains Grammar::Tracer
. If you use the moritzlenz/perl6-regex-alpine
docker image, this is already done for you. If you installed Perl 6 via another method, you need to run
zef install Grammar::Debugger
on the command line. If zef
is not yet installed, follow the installation instructions on the zef
GitHub page.
Let’s look at the Perl 6 module Config::INI
by Tadeusz Sośnierz. It contains the following grammar (slightly reformatted here):
grammar INI { token TOP { ^ <.eol>* <toplevel>? <sections>* <.eol>* $ } token toplevel { <keyval>* } token sections { <header> <keyval>* } token header { ^^ \h* '[' ~ ']' $<text>=<-[ \] \n ]>+ \h* <.eol>+ } token keyval { ^^ \h* <key> \h* '=' \h* <value>? \h* <.eol>+ } regex key { <![#\[]> <-[;=]>+ } regex value { [ <![#;]> \N ]+ } token eol { [ <[#;]> \N* ]? \n } }
Suppose we want to understand why it does not parse the following piece of input text:
a = b [foo] c: d
So, before the grammar, we insert the line
use Grammar::Tracer;
and after it, add a small piece of code that calls the .parse
method of that grammar:
INI.parse(q:to/EOF/); a = b [foo] c: d EOF
This produces a sizable, but fairly informative piece of output.
Each entry consists of a name of a regex, like TOP
or eol
(for "end of line"), followed by the indented output of the regexes it calls. After each regex comes a line containing an asterisk (*
) and either MATCH
followed by the string segment that the regex matched, or FAIL
if the regex failed.
Let’s look at the output piece by piece, even if it comes out in one chunk:
TOP | eol | * FAIL | toplevel | | keyval | | | key | | | * MATCH "a " | | | value | | | * MATCH "b" | | | eol | | | * MATCH "\n" | | | eol | | | * FAIL | | * MATCH "a = b\n" | | keyval | | | key | | | * FAIL | | * FAIL | * MATCH "a = b\n"
This tells us that TOP
called eol
, which failed to match. Since the call to eol
is quantified with *
, this does not cause the match of TOP
to fail. TOP
then calls key
, which matches the text "a"
, and value
, which matches "b"
. The eol
regex then proceeds to match the newline character, fails on the second attempt (since there are no two newline characters in a row). This causes the initial keyval
token to match successfully. A second call to keyval
matches pretty quickly (in the call to key
). Then, the match of token toplevel
proceeds successfully, consuming the string "a = b\n"
.
So far this all looks as expected. Now let’s take a look at the second chunk of output:
| sections | | header | | | eol | | | * MATCH "\n" | | | eol | | | * FAIL | | * MATCH "[foo]\n" | | keyval | | | key | | | * MATCH "c: d\n" | | * FAIL | * MATCH "[foo]\n"
TOP
next calls sections
, wherein token header
successfully matches the string "[foo]\n"
. Then, keyval
calls key
, which matches the whole line "c: d\n"
. Wait, that’s not right, is it? We might have expected key
to only match the c
. I certainly wouldn’t have expected it to match a newline character at the end. The lack of an equals sign in the input causes the regex engine to never even call regex value
. But since keyval
is again quantified with the star *
quantifier, the match of the calling regex sections
succeeds in matching just the header "[foo]\n"
.
The last part of the Grammar::Tracer
output follows:
| sections | | header | | * FAIL | * FAIL | eol | * FAIL * FAIL
It’s FAIL
s from here on. The second call to sections
again tries to parse a header, but its next input is still "c: d\n"
, so it fails. As does the end-of-string anchor $
in token TOP
, failing the overall match in method parse
.
So we have learned that regex key
matched the whole line c: d\n
, but since no equals sign (=
) follows it, token keyval
cannot parse this line. Since no other regex (notably not header
) matches it, this is where the match fails.
As you can see from this example run, Grammar::Tracer
enables us to pinpoint where a parse failure happens, even though we had to look carefully through its output to locate it. When you run it in a terminal, you automatically get colored output, with FAIL
having a red and MATCH
a green background, and token names standing out in bold white (instead of the usual gray) output. This makes it easier to scan from the bottom (where a failed match usually leaves a trail of red FAIL
s) up to the trailing successful matches, and then look in the vicinity of the border between matches and failures.
Since debugging imposes a significant mental burden, and the output from Grammar::Tracer
tends to grow quickly, it is generally advisable to reduce the failing input to a minimal specimen. In the case described above, we could have removed the first line of the input string and saved ten lines of Grammar::Tracer
output to look through.
As said before, the idiomatic way to parse insignificant whitespace is by calling <.ws>
, typically implicitly by using whitespace in a rule. The default ws
implementation, <!ww>\s*
, works well for many languages, but it has its limits.
In a surprising number of file formats and computer languages, there is significant whitespace that <.ws>
would just gobble up. These include INI files (where a newline typically indicates a new key/value pair), Python and YAML (where indentation is used for grouping), CSV (where a newline signals a new record), and Makefiles (where indentation is required to be with a tabulator character).
In these cases, it is best practice to override ws
in your own grammar to match only insignificant whitespace. Let’s take a look at a second, minimalistic INI parser, independently developed from the one described in the previous section:
grammar INIFile { token TOP { <section>* } token section { <header> <keyvalue>* } rule header { '[' <-[ \] \n ]>+ ']' <.eol> } rule keyvalue { ^^ $<key>=[\w+] <[:=]> $<value>=[<-[\n;#]>*] <.eol> } token ws { <!ww> \h* } token eol { \n [\h*\n]* } }
This parses simple INI configuration files like this:
[db] driver: mysql host: db01.example.com port: 122 username: us123 password: s3kr1t
Take note how this grammar uses two paths for parsing whitespace: a custom ws
token that only matches horizontal whitespace (blanks and tabs), and a separate token eol
that matches (significant) line breaks. The eol
token also gobbles up further lines consisting only of whitespace.
If a language supports comments, and you don’t want them to appear in your parse tree, you can parse them either in your ws
token, or in eol
(or your equivalent thereof). Which one it is depends on where comments are allowed. In INI files, they are only allowed after a key-value pair or in a line on their own, so eol
would be the fitting place. In contrast, SQL allows comments in every place where whitespace is allowed, so it is natural to parse them in ws
:
# comment parsing for SQL: token ws { <!ww> \s* [ '--' \N* \n ]* } # comment parsing for INI files: token eol { [ [ <[#;]> \N* ]? \n ]+ }
Some of the more interesting data formats and languages require the parser to store things (at least temporarily) to be able to correctly parse them. A case in point is the C programming language, and others inspired by its syntax (such as C++ and Java). Such languages allow variable declarations of the form type variable = initial_value
, like this:
int x = 42;
This is valid syntax, but only if the first word is a type name. In contrast, this would be invalid, because x
is not a type:
int x = 42; x y = 23;
From these examples it is pretty clear that the parser must have a record of all the types it knows. Since users can also declare types in their code files, the parser must be able to update this record.
Many languages also require that symbols (variables, types, and functions) be declared before they are referenced. This too requires the grammar to keep track of what has been declared and what hasn’t. This record of what has been declared (and what is a type or not, and possibly other meta information) is called a symbol table.
Instead of parsing the full C programming language, let’s consider a minimalist language that just allows assignments of lists of numbers, and variables to variables:
a = 1 b = 2 c = a, 5, b
If we don’t impose declaration rules, it’s pretty easy to write a grammar:
grammar VariableLists { token TOP { <statement>* } rule statement { <identifier> '=' <termlist> \n } rule termlist { <term> * % ',' } token term { <identifier> | <number> } token number { \d+ } token identifier { <:alpha> \w* } token ws { <!ww> \h* } }
Now we demand that variables can only be used after they’ve been assigned to, so that the following input would be invalid, because b
is not declared in the second line, where it’s used:
a = 1 c = a, 5, b b = 2
To maintain a symbol table, we need three new elements: a declaration of the symbol table, some code that adds a variable name to the symbol table when the assignment has been parsed, and finally a check whether a variable has been declared at the time we come across it in a term list:
grammar VariableLists { token TOP { :my %*SYMBOLS; <statement>* } token ws { <!ww> \h* } rule statement { <identifier> { %*SYMBOLS{ $<identifier> } = True } '=' <termlist> \n } rule termlist { <term> * % ',' } token term { <variable> | <number> } token variable { <identifier> <?{ %*SYMBOLS{ $<identifier> } }> } token number { \d+ } token identifier { <:alpha> \w* } }
In the token TOP
, :my %*SYMBOLS
declares a variable. Declarations in regexes start with a colon (:
), and end with a semicolon (;
). In between they look like normal declarations in Perl 6. The %
sigil signals that the variable is a hash — a mapping of string keys to values. The *
makes it a dynamic variable — a variable that is not limited to the current scope but also visible to code (or regexes, which are also code) that is called from the current scope. Since this is an unusually large scope, it is custom to choose a variable in CAPITAL LETTERS.
The second part, adding a symbol to the symbol table, happens in the rule statement
:
rule statement { <identifier> { %*SYMBOLS{ $<identifier> } = True } '=' <termlist> \n }
Inside the curly braces is regular (non-regex) Perl 6 code, so we can use it to manipulate the hash %*SYMBOLS
. The expression $<identifier>
accesses the capture for the variable name2. Thus, if this rule parses a variable a
, this statement sets %*SYMBOLS{ 'a' } = True
.
The placement of the code block is relevant. Putting it before the call to termlist
means that the variable is already known when the term list is parsed, so it accepts input like a = 2, a
. If we call termlist
first, this kind of input is rejected.
Speaking of rejection, this part happens in token variable
. term
now calls the new token variable
(previously it called identifier
directly), and variable
validates that the symbol has been declared before:
token term { <variable> | <number> } token variable { <identifier> <?{ %*SYMBOLS{ $<identifier> } }> }
You might remember from earlier examples that <?{ ... }>
executes a piece of Perl 6 code, and fails the parse if it returns a false value. If $<identifier>
is not in %*SYMBOLS
, this is exactly what happens. At this time the non-backtracking nature of tokens is important. If the variable being parsed is abc
, and a variable a
is in %*SYMBOLS
, backtracking would try shorter matches for <identifier>
until it hits a
, and then succeeds3.
Since %*SYMBOLS
is declared in token TOP
, you have to duplicate this declaration when you try to call rules other than TOP
from outside the grammar. Without a declaration such as my %*SYMBOLS;
, a call like
VariableLists.parse('abc', rule => 'variable');
dies with
Dynamic variable %*SYMBOLS not found
Many programming languages have the concept of a lexical scope. A scope is the area in a program where a symbol is visible. We call a scope lexical if the scope is determined solely by the structure of the text (and not, say, runtime features of the program).
Scopes can typically be nested. A variable declared in one scope is visible in this scope, and in all inner, nested scopes (unless an inner scope declares a variable of the same name, in which case the inner declaration hides the outer).
Coming back to the toy language of lists and assignments, we can introduce a pair of curly braces to denote a new scope, so this is valid:
a = 1 b = 2 { c = a, 5, b }
but the next example is invalid, because it declares b
only in an inner scope, and so it is not visible in the outer scope:
a = 1 { b = 2 } c = a, 5, b
To implement these rules in a grammar, we can make use of an important observation: dynamic scoping in a grammar corresponds to lexical scoping in text it parses. If we have a regex block
that parses both the delimiters of a scope and the things inside that scope, its dynamic scope is confined to all of the regexes it calls (directly and indirectly), and that is also the extent of the lexical scope it matches in the input text.
Let’s take a look at how we can implement dynamic scoping:
grammar VariableLists { token TOP { :my %*SYMBOLS; <statement>* } token ws { <!ww> \h* } token statement { | <declaration> | <block> } rule declaration { <identifier> { %*SYMBOLS{ $<identifier> } = True; } '=' <termlist> \n } rule block { :my %*SYMBOLS = CALLERS::<%*SYMBOLS>; '{' \n* <statement>* '}' \n* } rule termlist { <term> * % ',' } token term { <variable> | <number> } token variable { <identifier> <?{ %*SYMBOLS{ $<identifier> } }> } token number { \d+ } token identifier { <:alpha> \w* } }
There are a few changes to the previous version of this grammar: the rule statement
has been renamed to declaration
and the new rule statement
parses either a declaration
or a block
.
All the interesting bits happen in the block
rule. The line :my %*SYMBOLS = CALLERS::<%*SYMBOLS>;
declares a new dynamic variable %*SYMBOLS
and initializes it with the previous value of that variable. CALLERS::<%*SYMBOLS>
looks through the caller, and the caller’s caller, and so on for a variable %*SYMBOLS
, and thus looks up the value corresponding to the outer scope. The initialization creates a copy of the hash, such that changes to one copy do not affect the other copies.
Let’s take a look at what happens when this grammar parses the following input:
a = 1 b = 2 { c = a, 5, b }
After the first two lines, %*SYMBOLS
has the value {a => True, b => True}
. When rule block
parses the opening curly bracket on the third line, it creates a copy of %*SYMBOLS
. The declaration of c
on the fourth line inserts the pair c => True
into the copy of %*SYMBOLS
. After rule block
parses the closing curly brace on the last line, it exits successfully, and the copy of %*SYMBOLS
goes out of scope. This leaves us with the earlier version of %*SYMBOLS
(with only the keys a
and b
), which then goes out of scope when TOP
exits.
Using dynamic variables for managing symbol tables usually works pretty well, but there are some edge cases where a more explicit approach works better. Such edge cases include those where there are so many symbols that copying becomes prohibitively expensive, or where more than the top-most scope must be inspected, or when copying the symbol table is impractical for other reasons.
Consequently, you can write a class for your symbol table (which in the simplest case uses an array as a stack of scopes) and explicitly call methods on it when entering and leaving scopes, when declaring a variable, and for checking whether a variable is known in a scope:
class SymbolTable { has @!scopes = {}, ; method enter-scope() { @!scopes.push({}) } method leave-scope() { @!scopes.pop(); } method declare($variable) { @!scopes[*-1]{$variable} = True } method check-declared($variable) { for @!scopes.reverse -> %scope { return True if %scope{$variable}; } return False; } } grammar VariableLists { token TOP { :my $*ST = SymbolTable.new(); <statement>* } token ws { <!ww> \h* } token statement { | <declaration> | <block> } rule declaration { <identifier> { $*ST.declare( $<identifier> ) } '=' <termlist> \n } rule block { '{' \n* { $*ST.enter-scope() } <statement>* { $*ST.leave-scope() } '}' \n* } rule termlist { <term> * % ',' } token term { <variable> | <number> } token variable { <identifier> <?{ $*ST.check-declared($<identifier>) }> } token number { \d+ } token identifier { <:alpha> \w* } }
The class SymbolTable
has the private array attribute @!scopes
which is initialized with a list containing a single, empty hash {}
. Entering a scope means pushing an empty hash on top of this array, and when leaving the scope it is removed again through the pop
method call. A variable declaration adds its name to the top-most hash, @!scopes[*-1]
.
Checking for the presence of a variable must not just consider the top-most hash, because variables are inherited to inner scopes. Here we go through the all scopes in reverse order, from inner-most to outer-most scope. The order of traversal is not relevant for a simple Boolean check, but if you need to look up information associated with the variable, it is important to adhere to this order to reference the correct one.
Token TOP
creates a new object of class SymbolTable
, declaration
calls the declare
method, and token variable
calls method check-declared
. The rule block
calls enter-scope
before parsing the statement list, and leave-scope
afterwards. This works, but only if the statement list can be parsed successfully; if not, rule block
fails before it manages to call leave-scope
.
Perl 6 has a safety feature for such situations: if you prefix a statement with LEAVE
, Perl 6 calls it for you at routine exit, in all circumstances where this is possible (even if an exception is thrown). Since the LEAVE
phaser only works in regular code and not in regexes, we need to wrap the regex in a method:
method block { $*ST.enter-scope(); LEAVE $*ST.leave-scope(); self.block_wrapped(); } rule block_wrapped { '{' \n* <statement>* '}' \n* }
Now we have the same robustness as the approach with dynamic variables, and more flexibility to add extra code to the symbol table, at the cost of more code and increased effort.
Perl 6 grammars are a declarative way to write recursive descent parsers. Without backtracking, they are predictive; at each point, we know what list of tokens to expect.
The recursive nature of grammars comes with the risk of left recursion, a situation where a recursive path does not consume any characters, and so leads to an infinite loop.
In spite of the top-down nature of grammars, writing them typically happens from the bottom up: starting with lexical analysis, and then moving up to parsing larger structures.
Complex languages require additional state for successful and precise parsing. We have seen how you can use dynamic variables to hold state in grammars, how their scope corresponds to lexical scoping in the input, and how symbol tables can be written and integrated into grammars.
Like a Swiss-army knife, but with much more power.↩
At this point it is crucial that identifier
does not parse its surrounding whitespace. Hence the principle that tokens do not care about whitespace, and the rules that call those tokens parse the whitespace.↩
In this case, this would be harmless, because no other rule could match the rest of the variable, leading to a parse error nonetheless. But in more complicated cases, this kind of unintended backtracking can lead to errors that are very puzzling for the maintainer of the grammar.↩
One question we occasionally get on the phenomenal #perl6 IRC channel is about Perl 6 books. It turns out, some folks don’t want to learn just from tutorials, blog posts and docs. Last year, we didn’t have any good answers to that question.
A few months ago, there seemed to be a flurry of such questions, and at the same time, rumours about new book projects started to spread.
If I remember correctly, the first rumour was when Laurent Rosenfeld contacted me in June, asking for feedback on a manuscript. He had translated a good 200 pages of the Think Python book to Perl 6. This is primarily a book teaching programming to beginners. Later I was happy to hear that a major publisher has accepted the manuscript. The manuscript is now finished, and under review. So I guess we’ll see e-book and print versions of this book in due time.
Then brian d foy opened a kickstarter to raise funds for a “Learning Perl 6” book. By the time this is published, the kickstarter project should still be open, so I warmly encourage you to support this. Having a book by a prominent member of the Perl 5 community, and author of several good Perl books would surely be a boon for the Perl 6 community.
Before the publication of brian’s project, I’ve been mulling about writing my own Perl 6 book. In the past, I’ve tried that once already, and failed. But I had more success with another book project of mine, so I decided to give it another shot. The working title is Perl 6 by example. Content for the book is published in form of blog posts, and later turned into book chapters.
Later I learned that Ken Youens-Clark had written a book on metagenomics that spends about 150 pages explaining Perl 6. And it’s free, you can download the PDF, epub and mobi versions right away! If you’ve followed the advent calendar, you might have read Ken’s earlier post on Bioinformatics
In summary, we have one book starring Perl 6 content, one in progress with the manuscript in stage “feature complete”, one in the kick-starting phase which you can support, and a fourth being written right now, with parts being pre-published in the form of blog posts.
If you want to keep up-to-date with news on Perl 6 books, you can sign up for the low volume Perl 6 book mailing list (less than one mail per month).
I hope that in next year’s advent calendar, we’ll see four reviews of Perl 6 books.
Please fasten your seat belt for your annual dive into Perl 6.
As has been customary the last six years, we start with a brief overview of the state of Perl 6.
Last year's State of Perl 6 anticipated an upcoming production release of Perl 6. That is still scheduled for this Christmas. Last year's summary also identified major areas of work to-be-done for this release.
The 2015.05 release of Rakudo introduced NFG or "Normal Form Grapheme", which means that strings in Perl 6 are not based on Unicode codepoints anymore, and instead on grapheme clusters. Grapheme clusters can consist of base characters, like the latin lower-case c
, and combining characters, like the "combining cedilla" character. Together, they form a single, visual character "c with cedilla", ç. This character also exists as a single codepoint, but other combinations of base character and combining character don't. Yet with this new feature, Perl 6 still treats the cluster of base character plus one (or more) combining characters as a single character, so regexes matches and substr won't tear them apart.
In September, Rakudo shipped with the GLR or Great List Refactoring in place. This mostly means that the rules for using and accessing nested data structures are now much simpler and more consistent. Under the hood we also have a sane and powerful iterator model, and a new type Seq for lazy value streams that don't necessarily memorize old values on iteration.
Finally, the September release introduced native, shaped arrays (or NSA). This allows you to write
my int32 @matrx[4;5]
which allocates a continuous block of 20 32-bit values, but is still usable as a two-dimensional matrix in Perl 6 land. This paves the road towards memory efficient linear algebra (and other applications, of course).
But, development didn't stop there. The upcoming December release of Rakudo brings us automatic precompilation of modules and installation repositories.
Not only the compiler progresses. Where the Perl 6 Modules page showed around 270 modules a year ago (end of November 2014), we are now at about 460 modules. I'm also happy to report that there are two module installers now, panda and zef.
We also now have decent documentation coverage, at least on built-in types; other areas such as tutorials and material on language features are still a bit sparse. Other documentation efforts such as Learn Perl 6 in Y minutes and perl6intro.com have sprung up, and enjoy popularity.
I'm sure there has been more Perl 6 activity that would be worth reporting, but the excitment for the upcoming release makes a calm retrospective a bit harder than usual.
Stay tuned for more, and have fun!
In the emerging tradition of writing about sort on the 23rd, let me elaborate on a particularly silly kind of sorting algorithm: sleepsort.
The idea is that to sort a list of numbers, you spawn a separate process (or thread), sleep for as long as the number specifies, and then print the result. That way the numbers are printed in ascending order.
Rakudo on the MoarVM backend has threads, so let's put them to good use:
use v6;
my @unsorted = (1..10).pick(5);
say "Unsorted: @unsorted[]";
await @unsorted.map: -> $t {
start {
sleep $t;
say $t;
}
};
If you run it, you should get output like this:
$ ./perl6-m sleepsort-01.p6
Unsorted: 1 7 5 8 3
1
3
5
7
8
How did we get there? (1..10).pick(5)
randomly picks (without replacing) five numbers from 1 to 10. start { ... }
starts a thread and returns a Promise object. We do that for all the numbers, and await
blocks until all promises that have passed to it are fullfilled (or broken, but that won't happen here). Or in other words: until all the newly spawned threads have finished.
Now let's make it web scale! Well, sort (sic) of.
First an obvious improvement is to not sleep for the whole second. Let's just scale it with a constant factor: sleep $t / 10
, and we're about ten times as fast!
But there's another scaling problem: If you want to sort hundreds of thousands of numbers with sleep sort (you'd do that, right?), our program would spawn as many threads. So lots of memory wasted. Also sleep
maps pretty much directly to the underlying libc call, and thus blocks the thread. We can do better:
use v6;
my @unsorted = (1..10).pick(5);
await @unsorted.map: -> $t {
Promise.in($t / 10 ).then({ say $t });
};
Promise.in($s)
creates a Promise that will be kept in $s
seconds. .then
creates a chained promise whose block is executed when the first one is kept.
This also removes the need to spawn processes explicitly. The standard scheduler takes care of that. We're basically web scale!
Now it's also time to address an issue of correctness. The sleepsort algorithm is a kind of divide-and-conquer approach, but there's no joining of the results. There is no single point in the program that can access the entire the sorted list. To fix that, we use a Channel, a thread-safe queue:
use v6;
my @unsorted = (1..10).pick(5);
my $channel = Channel.new;
await @unsorted.map: -> $t {
Promise.in($t / 10 ).then({ $channel.send($t) });
};
$channel.close;
say $channel.list;
This prints the sorted list all from the main thread.
So now it can be encapsulated in a nice subroutine.
sub sleepsort(*@values, :$factor) {
my $channel = Channel.new;
await @values.map: -> $t {
Promise.in($t * $factor ).then({ $channel.send($t) });
};
$channel.close;
$channel.list;
}
Another issue of correctness remains: Both sleep
and Promise.in
only specify a minimal time to be slept; implementation-dependent, longer sleep times are possible. If $factor
is too small, the promises might executed out of the desired order.
So let's find the minimal factor with which it still works:
my $previous = Inf;
for 0.1, */1.5 ... * -> $factor {
say "Trying $factor";
my $success = [<=] sleepsort(:$factor, @unsorted);
say $success ?? 'Success' !! 'Failed';
unless $success {
say "Last good factor: ", $previous;
last;
}
$previous = $factor;
}
On my machine, this produces the following output:
Trying 0.1
Success
Trying 0.066667
Success
Trying 0.044444
Success
Trying 0.029630
Success
Trying 0.019753
Success
Trying 0.013169
Success
Trying 0.008779
Success
Trying 0.005853
Failed
Last good factor: 0.008779
So a :$factor
of around 0.01
or 0.09
seems to work on my setup.
Your output/mileage may vary.
Over and out, sleep
ing until Christmas Eve.
Perl 6 is designed with extensibility in mind. And when you want to extend something, you often need to know as much as possible about the environment.
Today we’ll look at an aspect of finding out about the environment: introspecting symbol tables.
A symbol is something you refer to by name, so it could be a package, class, variable, routine, constant etc.
A symbol table is a data structure in which symbols can be looked up. Perl 6 has three main types of symbol tables, differentiated by scope: lexical symbols live in the lexical pad, package symbols live in a Stash, and methods in a method table.
Enough theory, time for action:
$ perl6 -e 'my $x = 42; say MY::.keys'
$x $! $/ $_ GLOBALish EXPORT $?PACKAGE ::?PACKAGE $=pod !UNIT_MARKER
MY
is a pseudo package representing the current scope, and appending two colons gives us its symbol table. Which in turn roughly behaves like a Hash, so we can use a method like keys
to find all symbols in that table. Or look up a string there:
$ perl6 -e 'my $x = 42; say MY::<$x>'
42
A complicated way to say $x
, but you can use that to lookup symbols by name:
$ perl6 -e 'my $x = 42; my $var = q[$x]; say MY::{$var}'
42
Or if you don’t care if it comes from the current lexical scope, just from somewhere:
$ perl6 -e 'my $x = 42; my $var = q[$x]; say ::($var)'
42
No need to tell you that in general, this is a bad idea; just use it when you have exceptional circumstances.
But wait, what were all those other symbols from the first line? $!
, $/
and $_
are the three “special” variables for errors, matches and the current topic; the rest is stuff that doesn’t need to be in lexical scope, but the compiler puts it there because it makes stuff easier.
Outer lexicals are accessible through the OUTER::
pseudo package:
$ perl6 -e 'my $x = 1; { my $x = 2; say OUTER::<$x>; say $x }'
1
2
Instead of OUTER::<$x>
, you can also say $OUTER::x
. In general, you can move the sigil to the front and omit the angle brackets if the symbol name is a literal.
Package declarations and explicitly our
-scoped symbols generally go into a stash, with the top-most being GLOBAL::
class A {
class B { }
}
say GLOBAL::<A>; # (A)
say A; # (A)
say A::.keys; # B
Not only the double colon gives you the symbol table, you can also get it with .WHO
(which also works for non-literals). So a long way to write A::B
is
say A::B === GLOBAL::<A>.WHO<B>; # True
Finally, methods live in a classes or roles, and you can use the meta object to introspect them; those generally respect inheritance and role composition, so they are a bit more complicated than stashes.
my $m = Str.^can('split')[0];
say $m.^name; # Method
say $m('a1B', /\d/).perl; # ("a", "B").list
Here .^can(name)
returns a list of methods on that object with a given name. It’s a list, because it includes methods from superclasses, so there can be more than one. To get a list of all available methods, you can use .^methods
and .^methods(:all)
:
$ perl6-m -e 'say Str.^methods(:all).grep: {.name ~~ /^m/ }'
match match map min max minmax
.^methods(:all)
includes methods from all superclasses, without the :all, methods from classes Cool, Any and Mu are excluded.
This concludes our short tour through the land of the symbol table. There is more to see (for example the EXPORT
package, and various pseudo packages), and the daring can go to the design documents or play with Rakudo’s implementation, which is fairly robust.
Welcome to the 6th annual edition of the Perl 6 advent calendar, code name 2014.
In 2014, MoarVM has become the de facto standard backend for the Rakudo Perl 6 compiler. Parrot and JVM are still supported as backends, but MoarVM provides much lower memory usage, faster startup time, and is significantly faster than parrot at runtime.
Rakudo on MoarVM also has concurrency support, including reactive programming.
Much work has been done to improve performance, some in MoarVM, some in Rakudo and NQP. Maybe most notably is the JIT compiler for MoarVM that came out of the Google Summer of Code project by Bart Wiegmans.
Another Google Summer of Code project by Filip Sergot brought us HTTP::UserAgent, a versatile HTTP client library with SSL/TLS support.
During the Austrian Perl Workshop in fall 2014, many of the Perl 6 and Rakudo core contributors met, and identified three roadblocks for a "final" Perl 6.0 release: GLR, NFG and NSA.
Here GLR stands for the Grand List Refactoring, a plan to make the list-y types more transparent, list iteration faster, and more obvious when a list will flatten.
NFG is the Normal Form Grapheme, a plan for implementing grapheme-based string indexing.
Finally, our NSA has nothing to do with surveillance. Natively Shaped Arrays are a flexible feature for declaring typed, potentially multi-dimensional arrays, potentially with pre-defined dimensions. It will make memory-efficient matrix storage and operations possible, for example.
With these three blockers defined, Larry Wall submitted a talk to FOSDEM called Get ready to party!, predicting that 2015 will be the year that Perl 6 will get a production release.
Sorry, this was meant to become a summary of the current state of Perl 6, and it derailed into an outlook. To allow me to keep the "state" in the title, let me just tell you that Rakudo on the MoarVM is quite fun to work with. It's fast enough for small (and sometimes even mid-sized) tasks, module installation works fairly reliably, and the range of available modules has also increased.
Also I feel that any attempt to summarize the progress of this awesome community is bound to be very incomplete; I hope that my fellow Perl 6 hackers will fill in some details in the upcoming 23 posts.
Have a nice pre-Christmas time, and enjoy the show!
Most languages or libraries that provide a generic sort routine allow you to specify a comparator, that is a callback that tells the sort routine how two given elements compare. Perl is no exception.
For example in Perl 5, which defaults to lexicographic ordering, you can request numeric sorting like this:
use v5;
my @sorted = sort { $a <=> $b } @values;
Perl 6 offers a similar option:
use v6;
my @sorted = sort { $^a <=> $^b }, @values;
The main difference is that the arguments are not passed through the global variables $a
and $b
, but rather as arguments to the comparator. The comparator can be anything callable, that is a named or anonymous sub or a block. The { $^a <=> $^b}
syntax is not special to sort, I have just used placeholder variables to show the similarity with Perl 5. Other ways to write the same thing are:
my @sorted = sort -> $a, $b { $a <=> $b }, @values;
my @sorted = sort * <=> *, @values;
my @sorted = sort &infix:«<=>», @values;
The first one is just another syntax for writing blocks, * <=> *
use * to automatically curry an argument, and the final one directly refers to the routine that implements the <=>
"space ship" operator (which does numeric comparison).
But Perl strives not only to make hard things possible, but also to make simple things easy. Which is why Perl 6 offers more convenience. Looking at sorting code, one can often find that the comparator duplicates code. Here are two common examples:
# sort words by a sort order defined in a hash:
my %rank = a => 5, b => 2, c => 10, d => 3;
say sort { %rank{$^a} <=> %rank{$^b} }, 'a'..'d';
# ^^^^^^^^^^ ^^^^^^^^^^ code duplication
# sort case-insensitively
say sort { $^a.lc cmp $^b.lc }, @words;
# ^^^^^^ ^^^^^^ code duplication
Since we love convenience and hate code duplication, Perl 6 offers a shorter solution:
# sort words by a sort order defined in a hash:
say sort { %rank{$_} }, 'a'..'d';
# sort case-insensitively
say sort { .lc }, @words;
sort
is smart enough to recognize that the code object code now only takes a single argument, and now uses it to map each element of the input list to new values, which it then sorts with normal cmp
sort semantics. But it returns the original list in the new order, not the transformed elements. This is similar to the Schwartzian Transform, but very convenient since it's built in.
So the code block now acts as a transformer, not a comparator.
Note that in Perl 6, cmp
is smart enough to compare strings with string semantics and numbers with number semantics, so producing numbers in the transformation code generally does what you want. This implies that if you want to sort numerically, you can do that by forcing the elements into numeric context:
my @sorted-numerically = sort +*, @list;
And if you want to sort in reverse numeric order, simply use -*
instead.
The unary sort is very convenient, so you might wonder why the Perl 5 folks haven't adopted it yet. The answer is that since the sort routine needs to find out whether the callback takes one or two arguments, it relies on subroutine (or block) signatures, something not (yet?) present in Perl 5. Moreover the "smart" cmp
operator, which compares number numerically and strings lexicographically, requires a type system which Perl 5 doesn't have.
I strongly encourage you to try it out. But be warned: Once you get used to it, you'll miss it whenever you work in a language or with a library that lacks this feature.
There are a lot of events that are scheduled on particular days of the week each month, for example the regular Windows Patch Day on the second Tuesday of each month, or in Perl 6 land that Rakudo Perl 6 compiler release, which is scheduled for two days after the Parrot release day, which again is scheduled for the third Tuesday of the month.
So let's write something that calculates those dates.
The specification format I have chosen looks like 3rd tue + 2
for the Rakudo release date, that is, two days after the 3rd Tuesday of each month (note that this isn't always the same as the 3rd Thursday).
Parsing it isn't hard with a simple grammar:
grammar DateSpec::Grammar {
rule TOP {
[<count><.quant>?]?
<day-of-week>
[<sign>? <offset=count>]?
}
token count { \d+ }
token quant { st | nd | rd | th }
token day-of-week { :i
[ mon | tue | wed | thu | fri | sat | sun ]
}
token sign { '+' | '-' }
}
As you can see, everything except the day of the week is optional, so sun
would simply be the first Sunday of the month, and 2 sun - 1
the Saturday before the second Sunday of the month.
Now it's time to actually turn this specification into a data structure that does something useful. And for that, a class wouldn't be a bad choice:
my %dow = (mon => 1, tue => 2, wed => 3, thu => 4,
fri => 5, sat => 6, sun => 7);
class DateSpec {
has $.day-of-week;
has $.count;
has $.offset;
multi method new(Str $s) {
my $m = DateSpec::Grammar.parse($s);
die "Invalid date specification '$s'\n" unless $m;
self.bless(
:day-of-week(%dow{lc $m<day-of-week>}),
:count($m<count> ?? +$m<count>[0] !! 1),
:offset( ($m<sign> eq '-' ?? -1 !! 1)
* ($m<offset> ?? +$m<offset> !! 0)),
);
}
We only need three pieces of data from those date specification strings: the day of the week, whether the 1st, 2nd, 3rd. etc is wanted (here named $.count
), and the offset. Extracting them is a wee bit fiddly, mostly because so many pieces of the grammar are optional, and because the grammar allows a space between the sign and the offset, which means we can't use the Perl 6 string-to-number conversion directly.
There is a cleaner but longer method of extracting the relevant data using an actions class.
The closing }
is missing, because the class doesn't do anything useful yet, and that should be added. The most basic operation is to find the specified date in a given month. Since Perl 6 has no built-in type for months, we use a Date
object where the .day
is one, that is, a Date object for the first day of the month.
method based-on(Date $d is copy where { .day == 1}) {
++$d until $d.day-of-week == $.day-of-week;
$d += 7 * ($.count - 1) + $.offset;
return $d;
}
The algorithm is quite simple: Proceed to the next date (++$d
) until the day of week matches, then advance as many weeks as needed, plus as many days as needed for the offset. Date
objects support addition and subtraction of integers, and the integers are interpreted as number of days to add or subtract. Handy, and exactly what we need here. (The API is blatantly copied from the Date::Simple Perl 5 module).
Another handy convenience method to implement is next
, which returns the next date matching the specification, on or after a reference date.
method next(Date $d = Date.today) {
my $month-start = $d.truncated-to(month);
my $candidate = $.based-on($month-start);
if $candidate ge $d {
return $candidate;
}
else {
return $.based-on($month-start + $month-start.days-in-month);
}
}
}
Again there's no rocket science involved: try the date based on the month of $d
, and if that's before $d
, try again, but with the next month as base.
Time to close the class :-).
So, when is the next Rakudo release? And the next Rakudo release after Christmas?
my $spec = DateSpec.new('3rd Tue + 2');
say $spec.next;
say $spec.next(Date.new(2013, 12, 25));
Output:
2013-12-19
2014-01-23
The code works fine on Rakudo with both the Parrot and the JVM backend.
Happy recurring hollidates!