Day 12 – I just felt a disturbance in the switch

Day 12 – I just felt a disturbance in the switch

So I said “I’m going to advent post about the spec change to given/when earlier this year, OK?”

And more than one person said “What spec change?”

And I said, “Exactly.”

We speak quite proudly about “whirlpool development” in Perl 6-land. Many forces push on a particular feature and force it to converge to an ideal point: specification, implementation, bugs, corner cases, actual real-world usage…

Perl 6 is not alone in this. By my count, the people behind the HTML specification have now realized at least twice that if you try to lead by specifying, you will find to your dismay that users do something different than you expected them to, and that in the resulting competition between reality and specification, reality wins out by virtue of being real.

I guess my point is: if you have a specification and a user community, often the right thing is to specify stuff that eventually ends up in implementations that the user community benefits from. But information can also come flowing back from actual real-world usage, and in some of those cases, it’s good to go adapting the spec.

Alright. What has happened to given/when since (say) some clown described them in an advent post six years ago?

To answer that question, first let me say that everything in that post is still true, and still works. (And not because I went back and changed it. I promise!)

There are two small changes, and they are both of the kind that enable new behavior, not restrict existing behavior.

First small change: we already knew (see the old post) that the switching behavior of when works not just in a given block, but in any “topicalizer” block such as a for loop or subroutine that takes $_ as a parameter.

given $answer {
    when "Atlantis" { say "that is CORRECT" }
    default { say "BZZZZZZZZZT!" }
for 1..100 {
    when * %% 15 { say "Fizzbuzz" }
    when * %% 3 { say "Fizz" }
    when * %% 5 { say "Buzz" }
    default { say $_ }
sub expand($_) {
    when "R" { say "Red Mars" }
    when "G" { say "Green Mars" }
    when "B" { say "Blue Mars" }
    default { die "Unknown contraction '$_'" }

But even subroutines that don’t take $_ as a parameter get their own lexical $_ to modify. So the rule is actually less about special topicalizer blocks and more about “is there something nice in $_ right now that I might want to switch on?”. We can even set $_ ourselves if we want.

sub seek-the-answer() {
    $_ = (^100).pick;
    when 42 { say "The answer!" }
    default { say "A number" }

In other words, we already knew that when (and default) were pretty separate from given. But this shows that the rift is bigger than previously thought. The switch statement logic is all in the when statements. In this light, given is just a handy topicalizer block for when we temporarily want to set $_ to something.

Second small change: you can nest when statements!

I didn’t see that one coming. I’m pretty sure I’ve never used it in the wild. But apparently people have! And yes, I can see it being very useful sometimes.

when * > 2 {
    when 4 { say 'four!' }
    default { say 'huge' }
default {
    say 'little'

You might remember that a when block has an implicit succeed statement at the end which makes the surrounding topicalizer block exit. (Meaning you don’t have to remember to break out of the switch manually, because it’s done for you by default.) If you want to override the succeed statement and continue past the when block, then you write an explicit proceed at the end of the when block. Fall-through, as it were, is opt-in.

given $verse-number {
    when * >= 12 { say "Twelve drummers drumming"; proceed }
    when * >= 11 { say "Eleven pipers piping"; proceed }
    # ...
    when * >= 5 { say "FIIIIIIVE GOLDEN RINGS!"; proceed }
    when * >= 4 { say "Four calling birds"; proceed }
    when * >= 3 { say "Three French hens"; proceed }
    when * >= 2 {
        say "Two turtle doves";
        say "and a partridge in a pear tree";
    say "a partridge in a pear tree";

All that is still true, save for the word “topicalizer”, since we now realize that when blocks show up basically anywhere. The new rule is this: when makes the surrounding block exit. If you’re in a nested-when-type situation, “the surrounding block” is taken to be the innermost block that isn’t a when block. (So, usually a given block or similar.)

It’s nice to see spec changes happening due to how things are being used in practice. This particular spec change came about because jnthn tried to implement a patch in Rakudo to detect topicalizer blocks more strictly, and he came across userland cases such as the above two, and decided (correctly) to align the spec with those cases instead of the other way around.

The given/when spec has been very stable, and so it’s rare to see a change happen in that part of the synopses. I find the change to be an improvement, and even something of a simplification. I’m glad Perl 6 is the type of language that adapts the spec to the users instead of vice versa.

Just felt like sharing this. Enjoy your Advent.

Day 11 – The Source will be with you, always

Day 11 – The Source will be with you, always

Reportings from a Learnathon

This past weekend I had the pleasure of hosting a Perl 6 learnathon with a friend who contacted me specifically to have a night of absorbing this new version of Perl. I thought it might be interesting to share some of what we learned during the process. I will begin with by explaining the single line of code which ended up and then show you some examples of where our evening took us.

Pick Whatever, rotor by five

As we opened our laptops, I was excited to show the newest example I had pushed to my Terminal::Print project. It’s taken quite some time to achieve, but asynchronous printing is now working with this module. It’s not fast, yet, but my long saught multi-threaded “Matrix-ish” example is working. Each column is being printed from an individual thread. This approach of spinning off a bunch of new threads and then throwing them away is not efficient, but as a proof of concept I find it very exciting.


This line contains a few things that inspired questions. The first is the precise meaning of pick(*), which here means that we want to produce a randomized version of @columns. You can think of the Whatever here as a meaning “as many as possible”. It triggers a different multi method code path which knows to use the invoking objects’s own array size as the length of the list to pick.

The second part to explain was rotor. This is another one of those Perl 6 English-isms that at first felt quite strange but quickly became a favorite as I began visualizing a huge turbine rotor-ing all of my lists into whatever shape I desire whenever I use it. In this case, I want to assemble a list of 5-element arrays from a random version of @columns. By default, rotor will only give you back fully formed 5-element arrays, but by passing the :partial we trigger the multi-method form that will include a non 5-element array if the length of @columns is not divisible by 5 (‘Divisibility’ is easily queried Perl 6, by the way. We phrase it as $n %% 5.)

Put another way: rotor is a list-of-lists generator, taking one list and chopping it up into chunks according to your specifications. My friend mentioned that this resembles a question that he asks in interviews, inviting an investigation into the underlying implementation.

I’ve always considered Rakudo’s Perl 6-implemented-in-Perl 6 approach as a secret weapon that often goes overlooked in assessment of Perl 6’s viability. Even with the current reliance on NQP, there is a great deal of Perl 6 which is implemented in the language itself. To this end, I could immediately open src/core/ in an editor and continue explaining the language by reading the implementation of the language. Not just explaining the implementation, which can be accomplished by the right combination of language/implementation language and polyglot coverage. I mean explaining the language by looking at how it is used to implement itself, by following different threads from questions that arise as a result of looking at that code.

A word of caution

Now, I don’t mean to imply that one’s initial exposure to core can’t be a shocking experience. You are looking at Perl 6 code that is written according to constraints that don’t exist in perl6 which arise from not being fully bootstrapped and performance concerns. These are expressed in core by NQP and relative placement in the compilation process, on the one hand, and in prototyping and hoop jumping, on the other.

In other words: you are not expected to write code like this and core code does not represent an accurate representation of what Perl 6 code looks like in practice. It is 100% really Perl 6 code, though,  and if you look at NQP stuff as funky library calls, everything you are seeing is possible in your own code. You just don’t normally need to.

Down the rabbit hole

From src/core/, here is the code for rotor:


And the code for pick:


These are not the actual implementations, mind you. Those live in But already these signatures inspire some questions

What is this |c thing in the signature?

This represents a Capture, which is an object representing the arguments passed into the method call. In this case it is being used such that no unpacking/containerization occurs. Rather we simply accept what we have been given and pass them as a single parameter to self.list.rotor without looking at them.

In the proto signature for pick we see that there is no name for the Capture, but rather a bare ‘|‘ which tells the method dispatcher that there can be zero or more arguments. Put another way: there are no mandatory arguments that apply to all the pick candidates.

What is this proto thing?

The answer is that it is a clarification that you can apply to your multi methods that constrains them to a specific “shape”.  It is commonly used in conjunction with a Capture, and in fact we see this in our pick example.

As the docs put it, a proto is ‘a way to formally declare commonalities between multi candidates’. The prototype for pick specifies that there are no mandatory arguments, but there might be. This is basically the default behavior for multi method dispatch, but here it allows us to specify the is nodal trait, which provides further information to the dispatcher relating to the list-y. Also due to bootstrapping issues, all multi methods built into Rakudo need an explicit proto.In practice we do not need either of these for our code. But they are there when you need them. One example of a trait that you might use regularly is the pure trait: when a given input has a guaranteed output, you can specify that it is pure and the dispatcher can return results for cached arguments without repeated calculations.

Midnight Golf

As promised, here are a few code examples from the learnathon.


This is using anonymous state variables, which are available to subroutines and methods but not to other Callables like blocks.  My friend shared my marvel at the power and flexibility of the Perl 6 parser to be able to handle statements like $--*5, when every character besides the 5 in that statement has a different meaning according to context. Meanwhile Perl 6 gives you defaults for your subroutine parameters by using the assignment operator.

Note that each bare $ in a subroutine creates/addresses a new, individual state variable. Some people will hate this, as they hate much that I appreciate about the language. These anonymous state variables are for situations where a name doesn’t matter, such as golfing on the command line. They can be confusing to get a full grasp of, though.

Here is another example we generated while exploring (and failing many times) to grasp the nuances of these dynamics.


Gone is the anonymous state variable. This is by necessity, because you can only refer to an anonymous state variable once in a subroutine. We’ve switched $a to be optional. The parens around the state declaration are necessary because trying to declare a variable in terms of itself is undefined behavior and Perl 6 knows that.

The same thing, expressed in slight variations but with the same meaning:


The bottom example shows that the assignment to zero in our initial more-state is actually unnecessary. The middle shows creating a variable and binding it to a state var, which is a rougher way to get the same behavior as openly declaring a state variable. The top example shows what might be considered the ‘politely verbose’ option.

Concluding thoughts

I was hoping to share more from the evening with you, but it’s been a lot of words already and we’ve only scratched the surface of the first example we examined! Instead, I recommend that you spend some time with src/core in an editor,  the perl6 repl in a terminal, and #perl6 in an IRC client (or web browser) and just explore.

This language is deep. Opening src/core is like diving into the deep end of the ocean from the top of a Star Destroyer. The reward for your exploration of the language, however, is an increasing mastery of a programming language that is designed according to a human-centric philosophy and which approaches a level of expressivity that, some would argue, is unparalleled in the world of computing.


Day 10 – Perl 6 Pod

Day 10 – Perl 6 Pod

Whenever you write code, you also write documentation, right? Wait, don’t answer that! Let me just pretend that you said “yes”.

As with Perl 5, Perl 6 has a built-in documentation format called Pod. In Perl 5, POD was short for “Plain Ol’ Documentation”, but in Perl 6 we just call it Pod and there’s no acronym. One of the great things about Pod in Perl 6 is that the compiler parses Pod for you. The resulting structure is essentially a tree (really, an Array of trees).

Let’s take a look at a few examples …

use v6;

=begin pod

=head1 NAME

example1.p6 - It's a script to show you how Pod works in Perl 6!

=end pod

sub MAIN {
    say $=pod.WHAT;
    say $=pod[0].WHAT;

When I run this script I get:

$ perl6 ./bin/example1.p6 

The $=pod variable is a reference to the parsed Pod from the current file.

From the code above we can see that the $=pod variable contains an Array. Each element of that array is a Pod node of some sort, and those nodes in turn will usually contain more Pod nodes, and so on. Essentially, Pod is parsed into a tree that you can recursively descend and examine.

The $=pod variable is an array is because you can have several distinct chunks of Pod in a single file …

use v6;

=begin pod

=head1 NAME

example2.p6 - It's a script to show you how Pod works in Perl 6!

=end pod

sub MAIN {
    say $=pod.elems;
    say $=pod[0].WHAT;
    say $=pod[1].WHAT;

=begin pod


This is where I'd put some more stuff about the script if it did anything.

=end pod

When we run this script we get:

$ perl6 ./bin/example2.p6 

So there are now two elements in $=pod, one for each chunk of Pod delimited by our =begin pod/=end pod pairs.

Pod Syntax in Perl 6

Up until this point I’ve shown you some Pod without actually explaining what it is. Let’s take a look at some valid Pod constructs in Perl 6 so you can get a better sense of how it looks, both when you write it and when you’re writing code to process it.

Perl 6 Pod is purely semantic. In other words, you describe what something is, rather than how to present it. It’s left up to code that turns Pod into something else (HTML, Markdown, man pages) to decide how to present any given construct.

Perl 6 Pod has quite a bit more to it than we can cover here, so see Synopsis 26 for all the details. We’ll cover some of the highlights and then look more closely at how to process Pod programmatically.


Pod supports several different ways to declare a block. You can use =begin and =end markers …

=begin head1
Heading Goes Here
=end head1

Or you can use a =for marker …

=for head1
Heading Goes Here

These block styles allow you to include configuration information:

=begin head3 :include-in-toc
This will show up in TOC
=end head3

=for head4 :!include-in-toc
Not in the TOC

With a =for block, only the lines immediately following the =for is part of that block. Once there is an empty line, anything that follows is separate.

You can use an “abbreviated” block …

=head1 Heading Goes Here

… but abbreviated blocks don’t allow for configuration information. Like =for blocks, abbreviated blocks end at the first empty line.

There are many types of blocks, including table, code, I/O blocks (input and output), lists, and more.


Text that isn’t explicitly part of a block is considered to be a paragraph block …

use v6;

=begin pod

=begin head1


=end head1

This is a paragraph block.

=end pod

sub MAIN {
    say $=pod[0].contents[1].WHAT;

The output will be (Para), because the second element of contents is a plain paragraph.


Lists are much simpler in Perl 6 than they were in Perl 5 …

=begin pod

=item1 The first item
=item1 The second item
=item2 This is a sub-item of the second item
=item1 The third item

Of course, you can also use the =for and =begin/=end form with list items too.

Formatting Codes

There are a variety of formatting codes available …

See L<> for details.

Some times you need to mark some text as C<code>,
B<being the basis of the sentence>,
or I<important>.

And More

There are a lot of new things with Perl 6 Pod, and if you want to learn more I encourage you to read Synopsis 26 for all of the gory details.

Processing Pod with Perl 6

As I said before, the Perl 6 compiler parses Pod in your program and makes it available to you via various variables. We already saw the use of $=pod, which contains all of the Pod in the current file. That pod is an array of Pod::Block objects. In future versions of Rakudo, you will also be able to get at specific types of Pod blocks by name with variables such as $=head1 or $=TITLE, but this is not yet implemented.

Let’s write a simple program to show us an outline of the Pod in a given file. For example, given some Pod like this …

=head1 NAME




=head1 METHODS







… we want to print this output …

  $ (METHOD)
  $ (METHOD)

Here’s the code to do just that …

use v6;

sub MAIN {
    for $=pod.kv -> $i, $block {
        say "Block $i";
        print "\n";

sub recurse-blocks (Pod::Block $block) {
    given $block {
        when Pod::Block::Named {
            if $ eq 'pod' {
                for $block.contents.values -> $b {
            else {
                my $output = $block.contents[0].contents[0]
                    ~ " ({$})";
                depth-say( 2, $output );
        when Pod::Heading {
            depth-say( $block.level,
                       $block.contents[0].contents[0] );

sub depth-say (Int $depth, Str $thing) {
    print '  ' x $depth;
    say $thing;


This code takes some shortcuts wherever it uses $block.contents[0].contents[0] by making the very big assumption that a block contains one element which is a single Pod::Block::Para. It then assumes that the Para block contains one element which is plain text.

In practice, blocks may contain arbitrarily nested blocks because of things like formatting codes, but this gives you an idea of how you might walk through the Pod tree.

Of course, because Perl 6 is still Perl, there’s a module for that! I’ve sketched out a preliminary module called Pod::TreeWalker that will do the tree walking for you, generating events as it finds different nodes. You provide a listener object that is called for each event.

GitHub Repo

All of the code shown in these examples is on GitHub. I’ve also written an additional script that uses Pod::TreeWalker to implement the outlining I demonstrate above.

Day 9 – Perl 6 and the wolf pack

Day 9 – Perl 6 and the wolf pack

This year I rejoin with Perl6 once again and found out that this Christmas is THE Christmas.

With the announcement of Larry Wall and the imminent liberation of a 1.0 version of Rakudo I got really excited and right away started a couple of side projects using it.

And just when I was thinking that Perl6 couldn’t get any cooler I found the Jonathan Worthington, YAPC::Asia conference: «Parallelism, Concurrency, and Asynchrony in Perl 6».

Perl6 parallelism realy impressed me so implemented an  adaptation of the Grey wolf optimizer created by Mirjalili et all in 2014.

This algorithm (by wikipedia’s definition) is a meta heuristic that mimics the social and hunting behavior of the wolf packs to search for “good enough” solutions in a wide range of problems.

The pseudo-code here:


And from my implementation the interesting parallel bits:


Initially I used a loop block here (I like them more), but that caused that all the wolves end up with a random value instead of keeping the passed $wolf_number value.

After a couple hours and a lot of questions on the #Perl6 IRC channel (wonderful, patient people by the way) I found the problem: the $wolf_number variable disappear with the end of the loop block leaving the fitness_libsvm with nothing.

After that is just await and vòila! process the results.


Never ever have I seen a simpler more elegant code for parallelism than the one produced by Perl6… I just love it!

Postdata: The full implementation is available at

Day 8 – Grammars generating grammars

Day 8 – Grammars generating grammars

By now you have probably gotten used to the prefix “meta” appearing here and there in the Perl 6 world. Metaclasses, metaobjects, metaoperators, the mythical Meta-Object Protocol. Sounds like nothing scary, all good and familiar and you’ve seen it all, eh? Nothing further from the truth! Today, on the Perl 6 Advent Calendar we’re going full meta. We’re going to have grammars that parse grammars then generate grammars that are going to use to parse grammars. I’ll let this sink in for a moment while you scroll down for the next paragraph.

Grammars are hands-down one of the killer features of Perl 6. Taking Perl’s no less than legendary ability to process text and taking it to the next level. Regular expressions, to many people’s disappointment are just what they say they are: regular (well, regular regular expressions do. Perl regexes are a bit of a different beast, but that’s a material for another story). They parse regular, as opposed to, say, context-free languages, to the neverending disappointment of all the people who would just love to parse XML with them, like in that everyone’s favourite Stackoverflow answer. Say no more to silly theoretical limitations of language theory though, since now we have grammars which are all we ever missed in regexes: readability, composability and, of course, ability to parse even Perl 6 itself — and if that doesn’t sell its absolute power, I don’t know what does.

Writing parsers for predefined grammars (say, in Bachus-Naur Form) was always a bit of a dull job, almost as exciting as copypasting stuff. If you ever sat down and wrote a parser from scratch (perhaps while going through the excellent Let’s Build a Compiler book), you probably recognize the pattern all too well: take a single rule from your grammar, write a subroutine for it, have it call (perhaps recursively) other subroutines similarly defined for other grammar rules, rinse, repeat. Well say no more to that now that we have Perl 6 grammars! In this wonderful new world we need not write subroutines for every token to get the work done. Now we write a grammar class, where we put tokens, rules and regexes for every symbol in our grammar, and inside them we write regexes (or code) that refer to (perhaps recursively) other symbols in our Perl 6 grammar. Now, if you ever went through both of those alternatives, you will definitely realize how massive of a convenience the grammars are in Perl 6. Instead of painstakingly cranking out repetitive and error-prone code we have a wonderful, declarative way to specify our language, with an impressive collection of utility to get most of our common, boring work out of the way.

But what if we already have a grammar, specified perhaps in the previously mentioned BNF? What we do then is carefully retype the existing grammar (parsing it in our head, actually) into our new, shiny Perl 6 grammar that represents the exact same thing, but has the clear advantage of actually being executable code. A fair deal, you could say. For most people, no big deal at all. We are not most people. We are programmers. We have the resources. The will. To make these grammars count! Our job revolves around building things that will do the work for us so we don’t have to. To take the well-defined specifications and turn them into tools that work for us, or for themselves. Tools that take the repetitive and automatable parts of our work the way. Why should building parsers be any different? Why, I’m glad you asked.

The wonderful thing about the Perl 6 grammars is that they are no more magical than any other element of the language. Just as classes are first class citizen that we can introspect, augment and build programatically, so are grammars. In fact, you can look at the source code of the compiler itself and notice that grammars are nothing else than a specialized kind of classes. They follow the same rules as classes do, allowing us to create them on the fly, add tokens to them on the fly and eventually finalize them in order to have a proper, instantiatable class object. So now that we can parse BNF grammars (since they’re just ordinary text) and create Perl 6 grammars from code, let’s put those pieces together and write us something that will save us the effort of converting a BNF grammar to Perl 6 grammar manually.

The grammar for a BNF grammar

grammar Grammar::BNF {
    token TOP { \s* <rule>+ \s* }

    token rule {
        <opt-ws> '<' <rule-name> '>' <opt-ws> '::=' <opt-ws> <expression> <line-end>

    token expression {
        <list-of-terms> +% [\s* '|' <opt-ws>]

    token term {
        <literal> | '<' <rule-name> '>'

    token list-of-terms { <term> +% <opt-ws> }

    token rule-name { <-[>]>+ }

    token opt-ws { \h* }

    token line-end { [ <opt-ws> \n ]+ }

    token literal { '"' <-["]>* '"' | "'" <-[']>* "'" }


The interesting stuff happens in the three, almost-topmost tokens. rule is the core building block of a BNF grammar: a <symbol> ::= <expression> pair, followed by a new line. The entire grammar is no more than a list of those. Each expression is a list of terms, or possibly and alternative of them. Each term is either a literal, or a symbol name surrounded by angle brackets. Easy enough! That covers the parsing part. Let’s look at the generating itself. We do have a built-in mechanism of “doing stuff for each token in the grammar”, in the form of Actions, so let’s go ahead and use that:

my class Actions {
    has $.name = 'BNFGrammar';
    method TOP($/) {
        my $grmr := Metamodel::GrammarHOW.new_type(:$.name);
            EVAL 'token { <' ~ $<rule>[0].ast.key ~ '> }');
        for $<rule>.map(*.ast) -> $rule {
            $grmr.^add_method($rule.key, $rule.value);
        make $grmr;

    method expression($/) {
        make EVAL 'token { ' ~ ~$/ ~ ' }';

    method rule($/) {
        make ~$<rule-name> => $<expression>.ast;

The TOP method is definitely the most magical and scary, so let’s tackle that first to make the rest of the stuff look trivial in comparison. Basically, three things happen there:

1. We create a new grammar, as a new Perl 6 type
2. We add tokens to it using the `^add_method` method
3. We finalize the grammar using the `^compose` method

While Perl 6 specifies that the token named TOP is where the parsing starts, in BNF the first rule is always the starting point. To adapt one to the other, we craft a phony TOP token which just calls the first rule specfied in the BNF grammar. Unavoidably, the scary and discouraging EVAL catches our attention, as if it said “horrible stuff happens here!” It’s not entirely wrong when it says that, but since we have no other way of programmatically constructing the individual regexes (that I know of), we’ll have to accept this little discomfort in the name of the Greater Good, and look a little bit closer at what we’re actually EVALing in a moment.

After TOP we proceed to add the rest of the BNF rules to our grammar, this time preserving their original names, then ^compose() the whole thing and finally make it the result of the parsing: a readymade parser class.

In the expression method we glue the parsed BNF elements together in order to produce valid Perl 6 code. This turns out to be pretty easy, since both separate symbols with whitespace, alternate them with the pipe character and sorround symbol names with angle brackets. So for a rule that looks like this:

<foo> ::= 'bar' | <baz>

The Perl 6 code that we EVAL turns becomes:

token { 'bar' | <baz> }

Since we already validated in the grammar part of our code that the BNF we parse is correct, there’s nothing stopping us from literally pasting the entire expression into our code and wrap it in a token { }, so let’s go ahead and do just that.

Last but not least, for each BNF rule we parse we produce a nice Pair, so our TOP method has an easier times processing each of them.

Seems like we’re about done here, but just for the users’ convenience, let’s write a nice method that takes a BNF grammar and produces a ready to use type object for us. As we remember, grammars are just classes, so there’s nothing stopping us from just adding it straight to our grammar:

grammar Grammar::BNF {

    method generate($source, :$name = 'BNFGrammar') {
        my $actions =$name);
        my $ret =$source, :$actions).ast;
        return $ret.WHAT;

Looks good from here! Before you start copypasting all this into your own projects, remember that Grammar::BNF is a Perl 6 module available in your Perl 6 Module Ecosystem, installable with your favourite module manager. It also ships with some goodies like a Perl 6 slang, allowing you to write BNF grammars straight in your Perl 6 code (as opposed to including them as strings), or parsing the more powerful (and perhaps more widespread) ABNF grammars as well.

If you indeed took the time for the beginning of this post to sink in, you may remember that I promised that we’re going to have grammars (a-one) that parse grammars (a-two) then generate grammars (a-three) that are going to use to parse grammars (a-four). So far we’ve seen the BNF::Grammar grammar (that’s our a-one), that parses a BNF grammar (that’s our a-two), generates a Perl 6 grammar in a form of a type object (that’s a-three) and… that’s it. We’re still missing the last part, using this whole thing to parse grammars. We’ve only gone 75%-meta, and that is just not good enough in this day and age. Why stop now? Why not take a BNF grammar of a BNF grammar, parse that with a Perl 6 grammar and use the resulting Perl 6 BNF grammar to parse our original BNF grammar of a BNF grammar? Wouldn’t that be sweet? Sure it will! That is, however, left as an exercise for you, my dear readers. After all, how fair would it be if I had all the fun while you just sit there and watch? Would you like it if you opened the little paper window in your advent calendar only to find a note saying. „There was a chocolate here for you. I ate it. It was truly delivious?” Me neither! There’s a BNF grammar for BNF both on Wikipedia and in Grammar::BNF’s very own test suite; the latter even includes a little breadcrumb that can help you with your adventure. I eagerly await thy results, and as always, thank you all for reading and I wish you a wonderful advent!

Day 7 — Unicode, Perl 6, and You

Day 7 — Unicode, Perl 6, and You

Quick (rhetorical) question: how many of you either try your best to ignore Unicode, or groan at the thought of having to deal with it again?

It’s fair, after all, considering Unicode is big. Really big. (You may think it’s a long walk down the ASCII table, but that’s peanuts compared to space Unicode.) It certainly doesn’t help that many languages, particularly older ones, don’t help you, the average programmer, work with it all that well. Either they don’t deal with encoding standards at all, meaning some familiarity is mandatory, or certain other languages claim to support it but really just balk once you get past the BMP (the codepoints that can fit in a 16-bit number).

Perl 6, as you might guess, does handle Unicode well. It’s actually necessary to go about this day in a twofold manner: half of the story is how to process Unicode text, and half is how to use Unicode syntax. Let’s start with the one more likely to be of concern when actually programming, that of…

How do I Handle Unicode Text?

No matter your level of experience in handling Unicode (or anything involving different encodings), you’ll be pleased to learn that in Perl 6, it goes just about the way you’d expect.

Perl 6’s strings are interesting in that they by default work on the notion of graphemes — a collection of codepoints that look like a distinct thing; what you’d call a “character” if you didn’t know better. Not every distinct “character” you could come up with has its own codepoint in the standard, so usually handling visual elements naturally can be quite painful.

However, Perl 6 does this work for you, keeping track of these collections of codepoints internally, so that you just have to think in terms of what you would see the characters as. If you’ve ever had to dance around with substring operations to make sure you didn’t split between a letter and a diacritic, this will be your happiest day in programming.

As an example, here’s a devanagari syllable in a string. The .codes method returns the number of codepoints in the string, while .chars returns the number of characters (aka graphemes):

say "नि".codes;    # returns  2
say "नि".chars;    # returns  1

Even though there isn’t a singular assigned codepoint for this syllable, Perl 6 still treats it as one character, suiting any purpose that doesn’t involve messing with the text at a lower level.

That’s cool, but does it matter much to me, a simple English-speaking programmer who’s never had to deal with other languages or scripts?, I can imagine some of you thinking. And the answer is yes, because regardless of your background, there is most definitely one grapheme you’ve encountered before:

say "\r\n".chars;    # returns 1

Yep, the Windows end-of-line sequence is explicitly counted by Unicode’s “extended grapheme cluster” definition as one grapheme.

And of course it’s not just looks, that’s how operations on strings work:

say "नि\r\n".substr(1,1).perl    # returns "\r\n"

Of course, that’s all just for the default Str type. If you don’t want to work at a grapheme level, then you have several other string types to choose from: If you’re interested in working within a particular normalization, there’s the self-explanatory types of NFC, NFD, NFKC, and NFKD. If you just want to work with codepoints and not bother with normalization, there’s the Uni string type (which may be most appropriate in cases where you don’t want the NFC normalization that comes with normal Str, and keep text as-is). And if you want to work at the binary level, well, there’s always the Blob family of types :) .

We also have several methods that let you examine the various bits of Unicode info associated with characters:

say "a".uniname;                # get name of first Unicode character in string.
say "\r\nhello!".ord            # get number of first codepoint
                                # (*not* grapheme) in string
say "\r\nhello!".ords           # get numbers of all codepoints
say "0".uniprop("Numeric_Type") # get associated property

And so on :) . Note that the ord/ords part shows you that you’ll really never get the internal numbers used to keep track of graphemes. When ord sees a grapheme cluster, it just returns the codepoint number for the first codepoint of that cluster.

Not Just Strings

Of course, our Unicode support wouldn’t be complete without regex support! Of particular note is the ability to match based on properties, so for example

/ <:Alpha>+ /

will match multiple alphabetic characters (<alpha> will do almost the same thing, just with the addition of matching underscore), and

/ '0x' <:Nv(0..9) + :Hex_Digit>+ | '0b' <:Nv(0..1)>+ /

is a regex that lets you match against either hexadecimal numbers or binary ones, in a Unicode-friendly way. And if you wanted to write the Unicode standard’s “extended grapheme cluster” pattern in regexes (the same pattern we use to determine grapheme handling mentioned earlier):

grammar EGC {
    token Hangul-Syllable {
        || <:GCB<L>>* <:GCB<V>>+ <:GCB<T>>*
        || <:GCB<L>>* <:GCB<LV>> <:GCB<V>>* <:GCB<T>>*
        || <:GCB<L>>* <:GCB<LVT>> <:GCB<T>>*
        || <:GCB<L>>+
        || <:GCB<T>>+

    token TOP {
        || <:GCB<CR>> <:GCB<LF>>
        || <:GCB<PP>>*
           || <:GCB<RI>>
           || <.Hangul-Syllable>
           || <!:GCB<Control>>
           || <:Grapheme_Extend>
           || <:GCB<Spacing_Mark>>
        || .

A bit wordy, but just imagine how much more painful that would be without built-in Unicode support in your regexes!

And aside from all the programming-related stuff, there’s also…

Using Unicode to Write Perl 6

In part of our tireless support of Unicode, we also parse your source code with the same regex engine you just saw demonstrated above (though the Perl 6 parser doesn’t need to bother with Unicode properties nearly that often). This means we’re able to support syntax using Unicode in Perl 6, and have been taking advantage of it for a long time now. Observe:

say 0 ∈ «42 -5 1».map(&log ∘ &abs);
say 0.1e0 + 0.2e0 ≅ 0.3e0;
say 「There is no \escape in here!」

Just a small sampling of the Unicode built-in to Perl 6 by default. Featuring interpolating quote-words lists, setops, function composition, and approximate equality. Oh, and the delimiters for the most basic level of string quoting.

Don’t worry though, standard Perl 6 does not demand that you be able to type Unicode. If you can’t, there are so-called “Texas” variants:

say 0 (elem) <<42 -5 1>>.map(&log o &abs);
say 0.1e0 + 0.2e0 =~= 0.3e0;
say Q[[[There is no \escape in here!]]]

This is fine of course, but if it’s feasible for you to set up Unicode support, I heartily recommend it. Here’s a short list on various ways to do it:

  • Get an awesome text editor — The more featureful text editors (such as emacs or vim, to name a couple) will have functionality in place to insert arbitrary characters. Go look it up in your editor’s documentation, and consider petitioning if it doesn’t support Unicode entry :) .
  • Use your OS’s hex input — Some systems, such as Windows or applications using GTK, support key shortcuts to let you type the hexadecimal codepoint numbers for characters. You’ll have to memorize codepoints, but chances are you’d get used to it eventually.
  • Set up your keyboard’s third/fourth/etc. levels — If your system supports it, you can enable third/fourth level modifiers and so on for your keyboard to access those levels (if you don’t know what those are, your ‘Shift’ key counts as a second-level modifier, and the characters it lets you type are considered on the second level, as an example). Depending on the amount of time and/or patience you have you could even customize those extra levels.
  • (X11) Set up your Compose key — This is the method I myself use, and it involves setting up a key to use as the “Compose key” or “Multi key”, and use of a file in ~/.XCompose (or some other place, as long as you configure it) to set up key combos. The Compose key works by letting you type any configured sequence of keys after pressing the Compose key, which will insert the character(s) of your choice.
    • Which key you sacrifice of course depends on which keys you don’t make use of; it could be the caps lock, or one of those extra Shift/Alt/Ctrl keys. It can even be that useless Menu key, which you probably just remembered was on your keyboard :P .
    • An absolutely wonderful starting .XCompose can be found in this github repository. You’ll still want to add combinations to this for some Perl 6, and perhaps do other tinkering with it¹, but it’s still quite a lot better than having to start from scratch :) .

In Conclusion

This of course isn’t an exhaustive coverage of all that Perl 6 has to offer Unicode, but the underlying takeaway is that Perl 6 makes handling Unicode much nicer than other languages do (at least out of the box).

Bonus! Partly in the spirit of Christmastime, and partly in the spirit of “I love this, and what better time to share it?”, allow me to present for your historical interest Perl 6’s legendary “snowman comet” bug:

say "abc" ~~ m☃.(.).☄  # this used to work. Really.

Basically this old old old old bug that (sadly) doesn’t exist anymore was about the regex part of the parser messing up a bit and interpreting ☃☄ as just as valid a pair of brackets as () or ⦃⦄.

Is there a relevant lesson in this bug? Nope. Is it only vaguely connected to a winter blog post on Unicode? You bet. It’s just that it’s thanks to Unicode support we were able to get that kind of bug way back in 2009, and it’s thanks to Unicode support (among other things) that would let someone re-implement this as a slang or something ☺ .

So go forth confident in your newfound ability to handle international text with much greater ease than you’re perhaps used to, and spend more time building ☃☃☃☃ the rest of this month.

Have the appropriate amount of fun! ❄

¹Psst! Use the texas variants for your compose combos if you’re stuck on coming up with them, e.g. <Multi_key> <equal> <asciitilde> <equal> for

Day 6 – On Opening Files and Contributing to Open Source Projects

Day 6 – On Opening Files and Contributing to Open Source Projects

Why do people contribute to open source projects? Some do it for fun, some for fame and some for fortune by actually getting paid to do such work. However, probably the most important factor is scratching your own itch: You want something done, and make it happen.

That was the position I found myself in earlier this year: I wanted to use Perl 6 to read a binary file and patch it in-place. According to the documentation, I should have been able to do so, as Rakudo claimed to support opening files in the modes read-only :r, write-only :w, append :a and read-write :rw, and I assumed the last one would do the trick.

Not so: The code silently adjusted the flag :rw to :w, and I was left hanging.

After a bit of bikeshedding, a design emerged that I was happy with, and as it was merged upstream a month and a half later, others seemed to agree. But while I added an extensive commit message to lay out the new system in its glory, slacker that I am, it was left for others to perfom the tedious tasks of writing tests and documentation.

As you might expect, that hope never materialized, and the extended open modes I introduced became something of an Easter egg: Undocumented, untested and only discoverable by reading the commit log or the code itself. I may very well be the only person who has actually made use of them.

Fast-forward to December: Here we are, Christmas draws near, and with it the first proper release of Perl 6. As a sometimes-good on-and-off-again citizen of the Perl 6 community, I decided to put on my big boy pants and do the Right Thing: Some tests have now been written (which makes the new open modes an official part of the 6.c language release), and the documentation will be updated soon-ish to conform to what is actually implemented instead of what had been planned to be implemented eventually.

Having said all that, let us now take a look at what the fuss is all about:

# not the actual signature as &open delegates to
sub open(
    IO() $path,
    :$mode, :$create, :$append, :$truncate, :$exclusive,
    :$r, :$w, :$x, :$a, :$rw, :$rx, :$ra, :$update,
    :$bin, :$enc = 'utf8',
    :$nl-in = ["\x0A", "\r\n"], :$nl-out = "\n", :$chomp = True
--> IO::Handle)

Isn’t it a thing of beauty? Perhaps not, but it’s also not as scary as it looks:

The only required argument is the $path, which must be something we can call .IO on. The common example would be a string that holds the name of a file you want to access.

One design goal behind the additional open modes was to support all the features of fopen(3) as specified by the C 11 standard library. To keep things sane, inspiration was drawn from open(2) and the flags specified by POSIX.

The first row of named arguments lists the POSIX-inspired ones. Here, :$mode may take the values 'ro', 'wo' and 'rw', the rest are boolean flags. If I did my job well, you will rarely need to use them – instead, the single- and double-letter variants listed on the next line should suffice.

Read-only mode :r is the default. The three write-only modes are :w, which truncates the file if it aready exists (and is the only shorthand mode that does so implicitly), exclusive mode :x, which fails if the file already exists (and thus can be used to implement a poor man’s locking mechanism), and append mode :a, which adds content at the end of a file.

If you want to read and write to a file at the same time (as I did), you may combine the flag :r with any of :w, :x or :a. For convenience, this may also be spelled :rw, :rx, :ra. Note that the effect of providing :r,:w (or its alias :rw) is not a combination of the effects of :r and :w – existing files will not be truncated as one might expect, which is a deliberate departure from the pattern.

Aside from :r, all shorthand modes will implicitly create the file if it doesn’t already exist. In contrast, the fopen(3) mode "r+" (known to Perl 5 programmers as "+<") does not. In Perl 6, this mode is now known as :update, corresponding to the low-level :mode<rw>. In contrast, :rw maps to :mode<rw>,:create, which does not have a direct equivalent in either C 11 or Perl 5.

The :bin flag and :$enc argument control whether the file should be opened in binary mode or text mode with given encoding. By default, files are assumed to be text files encoded as UTF-8. This is different from what the documentation tells you (there has never been autodetection of Unicode encodings as far as I’m aware), and binary files also do not return buffers instead of strings when processing the file line-by-line. I’ve raised that issue with The Man (translation: talked on IRC about it) and a pull request has been sent. We’ll have to wait and see what develops on that front in the remaining weeks leading up to the party.

Finally, the last line of named arguments lists those that control line-based access in general and the behaviour of the .get and .lines methods in particular. The :$nl-in argument allows you to provide a list of strings that should be considered line separators, the :$nl-out argument controls what gets written to disk when you request a newline, eg when using the methods .say, .put or .print-nl. If the final argument :$chomp is set to True (which is the default), line separators will be discarded instead of being included at the end of the strings returned by .get or .lines. As a side note, if you find chomp => False too much of a burden to type, Perl 6 supports the shorthand notation :!chomp.

For those of you still with me: Congratulations, you now know how to open files in Perl 6, and all that remains for me to say to you is this:

Have a happy St.Nicholas Day and a fun time playing with Rakudo!

PS: As to what you can do with a file handle once it has been opened, I’ll leave that as an exercise to the reader. But note that while Perl wants to make hard things possible, it also tries to keep easy things easy – which means that you can use interfaces like slurp, spurt and lines (or their method forms .IO.slurp, .IO.spurt, .IO.lines) without having to manually open (and close!) anything.