Day 15 – 2015 The Year of The Great List Refactor

The Great List Refactor (GLR) was probably one of the most tricky and problematic changes in terms of design and implementation that Perl 6 has undergone recently.

It’s also fairly hard to explain! But hopefully some historical background might help with this. There was much GLR discussion at the 2014 Austrian Perl Workshop which was written up by Patrick Michaud on his blog.

The problems the GLR was intended to address were those of performance and consistency in the way lists and related types operated. It was also clear that changing such basic data types would be painful.

Traditionally Perl 5 flattens such that

% perl -dE 1 -MData::Dumper

[...snip...]

DB<1> @foos=((1,2),3)

DB<2> say Dumper \@foos
$VAR1 = [
         1,
         2,
         3
        ];

and initially much Perl 6 behaviour resembled this but by the end of last year there had been increased use of non-flattening behaviour which preserves the original data structure.

However this was not consistent with many edge cases and there was use of a data type Parcel which was heavily used internally in Rakudo but by then regarded as a Bad Idea – mainly for reasons of performance.

A draft twin of the design document covering lists “Synopsis 7” was imported by Patrick in June 2015 and became the official S07 the following month.

August was a very busy month for the GLR.  Jonathan Worthington experimented with a single gist based GLR code fragment which then became a separate GLR git branch under the Rakudo repository.  Many people worked on this branch with Stefan Seifert (nine) being particularly productive.  For a while the IRC channel had two bots camelia and GLRelia (the latter tracking the GLR branch) so people could quickly try and compare the behaviour of code under the old and new systems.  There were a huge number of changes and software such as panda had to be patched to remain functional.

The Parcel data type became List (which was immutable and used round brackets) and Array was a mutable subclass of List which used square brackets.  There was no longer implicit list flattening for arrays.  Simple arrays could be flattened with the .flat method.

Pre GLR

> my @array = 1,(2,3),4

1 2 3 4

> @array.elems

4

Post GLR

my @array = 1,(2,3),4

[1 (2 3) 4]

> @array.elems

3

> my @array = (1,(2,3),4).flat

[1 2 3 4]

It was also possible to “Slip” a list into an array

> my @a = 1, (2, 3).Slip, 4

[1 2 3 4]

Also

> my @a = 1, |(2, 3), 4
[1 2 3 4]

Sequences (to be consumed once only) were introduced

> my $grep = (1..4).grep(*>2)
(3 4)
> $grep.^name
Seq

and the .cache method can be used to prevent consumption of the Seq

The Single Argument Rule

The arguments passed to iterators such as the for loop obey “the single arg rule” which is that they are treated as a single arg even if they appear multiple arguments at first glance. Generally this makes for behave as the programmer would expect apart from one Gotcha example with a trailing comma.

my @a=1,2,3;

my ($i,$j);
for (@a) {
    $i++;
}

for (@a,) { # this is actually a single element list!
    $j++;
}

say :$i.gist; # => 3
say :$j.gist; # => 1

S07 was then rewritten by Jonathan Worthington in September 2015. The net result was by then S07 had under gone many changes eg. Parcel was removed, reintroduced and finally removed again!

https://github.com/perl6/specs/blob/master/S07-lists.pod

For such a radical change to Perl 6 the GLR went remarkably smoothly within Rakudo itself.  But of course much existing Perl 6 outside of Rakudo in the ecosystem and other places had to be rewritten in order to continue to work. An example is "perl6-examples" which seem to use lists and arrays particularly heavily.

For many months the GLR had been the Bugbear of Perl 6. A bugbear which has now happily departed.

Day 14 – A nice supplies: syntactic relief for working with asynchronous data

Asynchronous data is all around us. What is it? It’s data that arrives whenever it pleases, as opposed to when we ask for it. Some common examples are:

  • GUI events
  • Data arriving over an async socket
  • Messages arriving from a message queue
  • Ticks of a timer
  • File change notifications
  • Signals

These contrast with synchronous data, which we get when we ask for it. For example, when we query a database and iterate the resulting rows, or iterate through file system entries in a directory, we block until the data we wish to have is available.

Our programming languages tend to be pretty good about synchronous data, giving us a range of syntactic constructs for producing and processing it. For example, in Perl 6 we have for loops for consuming streams of synchronously produced values, and gather blocks for producing them – perhaps doing so by in turn using for loops to consume other synchronous data sources. Inside for and gather, we’re free to use state (kept in variables), and use our comfortable range of conditionals (if, when, etc.) That is to say, we’re free to act like normal, down to earth, imperative programmers in dealing with our synchronous data. Sure, sometimes something just looks far nicer using the functional style, and we map, grep, zip, and reduce our way to the solution. But some problems are just unnatural – at least for most of us, me included – to see in that way.

In Perl 6, we have a common interface for talking about things that produce values over time – that is, synchronous data. These things do the Iterable role, and work in terms of Iterator objects. You never really see an Iterator in day to day Perl 6, and instead see the Seq type. These Seqs are the things gather blocks produce, and for loops can consume.

In the last years, we introduced supplies to Perl 6. Supplies are our common interface for talking about asynchronous data. These have, from the beginning, been rather well received. With supplies, you can grep UI events, map file system notifications, and zip whatever you fancied with ticks of a timer. And that’s just great when it’s easy to think about your problem in functional terms. But what about the rest of the time?

In the last year, we’ve also added supply/react blocks, along with the whenever asynchronous looping construct. In this post, we’ll take a look at how they might be used.

STOMP!

As I wrote the list of sources of asynchronous data at the start of this post, I realized that there was only one of them that I’d not yet done in Perl 6: working with message queues. Hmm. Two hours until midnight. Advent post needs completing before I sleep. Challenge accepted!

10 minutes later, and I’ve RabbitMQ happily installed and am reading the STOMP specification. STOMP is a simple text-oriented message protocol – that is, a text-based way to deal with message queues. Within an hour, I’d got a working STOMP client handling the absolute basics of sending messages to a queue and subscribing to a queue to get messages. So, let’s take a look at how I did it.

First, I sketched in a Stomp::Client class. I figured it would need the host and port we want to connect to, the connection credentials (yes, yes, this is not the height of security engineering), and something to hold the socket representing the connection to the server.

class Stomp::Client {
    has Str $.host is required;
    has Int $.port is required;
    has Str $.login = 'guest';
    has Str $.password = 'guest';
    has $!connection;
}

So, let’s get connected. The connect method will contain a start block, because we want it to function asynchronously. This means the method will return a Promise that the caller can await. To connect to a stomp server, you establish a TCP connection and send a CONNECT frame:

method connect() {
    start {
        my $conn = await IO::Socket::Async.connect($!host, $!port);
        await $conn.print: qq:to/FRAME/;
            CONNECT
            accept-version:1.2
            login:$!login
            passcode:$!password
            
            \0
            FRAME
        True
    }
}

But…then what? Well, then you wait for the server to either return a CONNECTED frame or an ERROR frame. Hm, seems we’ve some work to do. First is to translate the BNF from the STOMP specification into a Perl 6 grammar:

my grammar StompMessage {
    token TOP {
        <command> \n
        [<header> \n]*
        \n
        <body>
        \n*
    }
    token command {
        < CONNECTED MESSAGE RECEIPT ERROR >
    }
    token header {
        <header-name> ":" <header-value>
    }
    token header-name {
        <-[:\r\n]>+
    }
    token header-value {
        <-[:\r\n]>*
    }
    token body {
        <-[\x0]>* )> \x0
    }
}

Note it’s lexically scoped inside of our STOMP::Client class. So is the following little message data structure:

my class Message {
    has $.command;
    has %.headers;
    has $.body;
}

An IO::Socket::Async connection exposes incoming data as a Supply. There are a few things to consider:

  • A message may be spread over multiple packets
  • Two messages may be in one packet
  • The next packet may arrive while we’re processing the current one, and since Perl 6 delivers I/O notifications on the thread pool – like various other languages – we might worry about data races

In fact, the third point is a non-problem: Perl 6 promises that, unless you go out of your way to avoid it, supplies are serial, and will only have you processing one message at a time on a given message pipeline. Supplies are a tool for taming concurrency rather than introducing it.

Now, if I was writing this synchronously, and I wanted to expose a Seq of the incoming messages, I’d use a gather block, keep myself a buffer, recv each incoming chunk of data from the socket, and try to parse a new message at the start of the buffer. Whenever I got a message I’d take it. What about with supplies? Here’s how it looks:

method !process-messages($incoming) {
    supply {
        my $buffer = '';
        whenever $incoming -> $data {
            $buffer ~= $data;
            while StompMessage.subparse($buffer) -> $/ {
                $buffer .= substr($/.chars);
                if $<command> eq 'ERROR' {
                    die ~$<body>;
                }
                else {
                    emit Message.new(
                        command => ~$<command>,
                        headers => $<header>
                            .map({ ~.<header-name> => ~.<header-value> })
                            .hash,
                        body => ~$<body>
                    );
                }
            }
        }
    }
}

Taking it from the top, $incoming will be the Supply of data arriving from the socket, decoded to strings. We wrap the code in this method up in a supply block, since we’d like to return a new Supply. We declare $buffer, to hold data we’ve yet to parse. In this case, our thread safety is fairly trivial: $incoming will deliver a message at a time. But what if we were going to bring together data from many supplies? We’d still be fine: a supply block promises that, per tapping of the Supply, only one thread will ever be allowed in the code inside of the supply block at a time.

The whenever construct is a asynchronous loop. It taps the supply we specify, and the loop body will run whenever a value is emitted by that supply. In our case, that is every time we get data from the socket. We concatenate the incoming $data onto the $buffer. We then try to parse a message at the start of the buffer (subparse means that we don’t mind if we don’t reach the end of the buffer, unlike parse which will complain if it doesn’t completely parse the incoming string). When we do parse data, we lop it off the start of the buffer.

For each message we parse, we first see if it’s an ERROR frame; if so, we simply die with the message. A die inside a whenever block will propagate the error to whatever tapped the supply, so errors will not be lost. Otherwise, we’ve some kind of more interesting message; we turn the parsed data into a Message instance and emit it.

A supply block gives an “on-demand” supply. Each tap performed on the block will run the supply block, which in turn will tap each of the whenevers. That’s the right thing in most cases, but for my Stomp::Client I’d like to have various bits of code tap the same messages, to look for interesting things. One of these bits of code will look for a single CONNECTED message. So, I’ll add a $!incoming attribute to my class:

has $!incoming;

And then, after establishing the connection to the message queue server, do this:

$!incoming = self!process-messages($conn.Supply).share;

That is, I share the result of processing messages out amongst everything that is interested. And interested things will tap the $!incoming supply and filter out what they care about. For example, here’s how I create a Promise that will be kept when we receive a CONNECTED frame:

my $connected = $!incoming
    .grep(*.command eq 'CONNECTED')
    .head(1)
    .Promise;

Here is the connect method in full:

method connect() {
    start {
        my $conn = await IO::Socket::Async.connect($!host, $!port);
        $!incoming = self!process-messages($conn.Supply).share;
        
        my $connected = $!incoming
            .grep(*.command eq 'CONNECTED')
            .head(1)
            .Promise;
        await $conn.print: qq:to/FRAME/;
            CONNECT
            accept-version:1.2
            login:$!login
            passcode:$!password
            
            \0
            FRAME
        await $connected;
        $!connection = $conn;
        
        True
    }
}

Sending messages is easy:

method send($topic, $body) {
    self!ensure-connected;
    $!connection.print: qq:to/FRAME/;
        SEND
        destination:/queue/$topic
        content-type:text/plain
 
        $body\0
        FRAME
}

And subscription is implemented using another supply block, and a bit of simple filtering on the incoming message:

method subscribe($topic) {
    self!ensure-connected;
    state $next-id = 0;
    supply {
        my $id = $next-id++;
        
        $!connection.print: qq:to/FRAME/;
            SUBSCRIBE
            destination:/queue/$topic
            id:$id
 
            \0
            FRAME
        
        whenever $!incoming {
            if .command eq 'MESSAGE' && .headers<subscription> == $id {
                emit .body;
            }
        }
    }
}

And with that, we have a working STOMP client. Let’s try it:

my $queue = Stomp::Client.new(host => 'localhost', port => 61613);
await $queue.connect();
await $queue.send("test", "hello world");
react {
    whenever $queue.subscribe('test') {
        .say;
    }
}

Here, for the first time, we encounter the react block. What’s that? Well, if you imagine that supply blocks are a little like gather – that is, used to create something that can produce values – a react block is for the places you might normally use a for loop when processing synchronous data. However, in the asynchronous world you’re usually interested in a range of things that might happen. A react block runs, sets up all the whenevers, and then blocks until either all of the supplies tapped by whenever blocks are done, or the “done” control operator is used somewhere inside of react.

A little example

Suppose that we wanted to build a simple monitoring system. Workers “sign on”, and must send us a ping every 10 seconds. If they fail to do so, then a monitor will report this. Workers can sign off when finished. Here is the worker process:

sub MAIN($name) {
    my $queue = Stomp::Client.new(host => 'localhost', port => 61613);
    await $queue.connect();
    $queue.send('signon', $name);
    react {
        whenever Supply.interval(10, 5) {
            if <y y n>.pick eq 'y' {
                $queue.send('ping', $name);
            }
        }
    
        whenever signal(SIGINT) {
            $queue.send('signoff', $name);
            done;
        }
    }
}

We start it from the command line with a name. Every 10 seconds it sends a ping – or at least, two thirds of the time it will. And if we hit Ctrl+C the worker signs off.

Next, here’s the monitor:

my $queue = Stomp::Client.new(host => 'localhost', port => 61613);
await $queue.connect();
react {
    my %last-active;
 
    whenever $queue.subscribe('signon') -> $name {
        %last-active{$name} = now;
    }
  
    whenever $queue.subscribe('ping') -> $name {
        %last-active{$name} = now;
    }
  
    whenever $queue.subscribe('signoff') -> $name {
        %last-active{$name}:delete;
    }
 
    whenever Supply.interval(10) {
        for %last-active.kv -> $name, $last-ping {
            if now - $last-ping > 11 {
                say "No ping from $name";
            }
        }
    }
}

Note that the react block, like supply block, enforces that only a single thread at a time may be operating across the various whenever blocks, so the %last-active hash is safe. We have three subscriptions to queues, and every ten seconds look for missing pings. We give an extra second’s grace on those.

Notice how, by exposing the message queue in terms of supplies, we are able to effortlessly compose this asynchronous data with both signals and time. This is where the real power of a uniform interface to asynchronous data comes in. And, with a little syntactic support, we are no longer required to put our functional hat on to wield that power.

Day 13 – A new trait for old methods

use v6;

El_Che expressed the desire on IRC to have Perl 6 watch his fingers when typing named arguments. It is indeed quite easy to send a wrong name the right way. Luckily we can help him.

First we need some exception to throw. Perl 6 kindly creates a constructor for us that deals with filling all attributes with values, so we can provide details when we throw the exception.

class X::Parameter::ExtraNamed is Exception {
    has $.extra-parameters;
    has $.classname;
    has $.method-name;
    method message () {
        "The method $.method-name of $.classname is offended by the named parameter(s): $.extra-parameters";
    }
}

We want to modify a method (.new in this case) so we need a trait. A trait is a modification of the Perl 6 grammar and a subroutine. It’s name comes from the required named argument. The thing it operates on is stored in the first positional. In our case that’s a method.

multi sub trait_mod:<is>(Method $m, :$strict!){

We want to check named arguments against the list of arguments that are defined in the methods signature. We have to store them somewhere.

    my @named-params;

When we provide some error message we may want to name the class the method is member of.

    my $invocant-type;
    my $method-name = $m.name;

The signature of a method can provide us with a list of parameters.

    for $m.signature.params -> $p {

But first the classname.

        $invocant-type = $p.type if $p.invocant;

We are only interested in named arguments.

        next unless $p.named;

Each named argument starts with a sigil ($ in this case). That would get in our way, so we strip it from the method name and store the rest for safekeeping.

        @named-params.push: $p.name.substr(1);
    }

After we got all the information we need, we come to the business end of our trait. We wrap the method in a new method. We only want to peek into it’s argument list so we take a capture. Conveniently a capture provided us with a method to get a list of all named arguments it contains.

    $m.wrap: method (|args) {

We first check with the subset operator (<=) if there are any named arguments we don’t like and if so, we use the set difference operator (-) to name only those. The exception we defined earlier takes care of the rest.

        X::Parameter::ExtraNamed.new(
            method-name => $method-name,
            classname => $invocant-type.perl, 
            extra-parameters => args.hash.keys (-) @named-params
        ).throw unless args.hash.keys (<=) @named-params;

If all named arguments are fine, we can call the wrapped method and forward all arguments kept in the capture.

        callwith(self, |args);
    }
}

Let’s test it.

class Foo {
    has $.valid;

Perl 6 will deal with the positional part, our trait is strict will learn about :$valid.

    method new (Int $p1, :$valid) is strict { self.bless(valid => $valid) }
    method say () { say 'Foo' }
}

New instance is new and got 2 extra named arguments the method doesn’t know of.

Foo.new(42, valid => True, :invalid, :also-invalid).say;

OUTPUT:

<<The method new of Foo is offended by the named parameter(s): invalid also-invalid>>

Traits are a nice way to generalise behaviour of methods and subroutines. Combined with exceptionally good introspection Perl 6 can be mended and bended to fit into any Christmas package. Merry 1.0 and a happy new Perl 6 to you all!

Day 12 – I just felt a disturbance in the switch

So I said “I’m going to advent post about the spec change to given/when earlier this year, OK?”

And more than one person said “What spec change?”

And I said, “Exactly.”


We speak quite proudly about “whirlpool development” in Perl 6-land. Many forces push on a particular feature and force it to converge to an ideal point: specification, implementation, bugs, corner cases, actual real-world usage…

Perl 6 is not alone in this. By my count, the people behind the HTML specification have now realized at least twice that if you try to lead by specifying, you will find to your dismay that users do something different than you expected them to, and that in the resulting competition between reality and specification, reality wins out by virtue of being real.

I guess my point is: if you have a specification and a user community, often the right thing is to specify stuff that eventually ends up in implementations that the user community benefits from. But information can also come flowing back from actual real-world usage, and in some of those cases, it’s good to go adapting the spec.


Alright. What has happened to given/when since (say) some clown described them in an advent post six years ago?

To answer that question, first let me say that everything in that post is still true, and still works. (And not because I went back and changed it. I promise!)

There are two small changes, and they are both of the kind that enable new behavior, not restrict existing behavior.

First small change: we already knew (see the old post) that the switching behavior of when works not just in a given block, but in any “topicalizer” block such as a for loop or subroutine that takes $_ as a parameter.

given $answer {
    when "Atlantis" { say "that is CORRECT" }
    default { say "BZZZZZZZZZT!" }
}
for 1..100 {
    when * %% 15 { say "Fizzbuzz" }
    when * %% 3 { say "Fizz" }
    when * %% 5 { say "Buzz" }
    default { say $_ }
}
sub expand($_) {
    when "R" { say "Red Mars" }
    when "G" { say "Green Mars" }
    when "B" { say "Blue Mars" }
    default { die "Unknown contraction '$_'" }
}

But even subroutines that don’t take $_ as a parameter get their own lexical $_ to modify. So the rule is actually less about special topicalizer blocks and more about “is there something nice in $_ right now that I might want to switch on?”. We can even set $_ ourselves if we want.

sub seek-the-answer() {
    $_ = (^100).pick;
    when 42 { say "The answer!" }
    default { say "A number" }
}

In other words, we already knew that when (and default) were pretty separate from given. But this shows that the rift is bigger than previously thought. The switch statement logic is all in the when statements. In this light, given is just a handy topicalizer block for when we temporarily want to set $_ to something.

Second small change: you can nest when statements!

I didn’t see that one coming. I’m pretty sure I’ve never used it in the wild. But apparently people have! And yes, I can see it being very useful sometimes.

when * > 2 {
    when 4 { say 'four!' }
    default { say 'huge' }
}
default {
    say 'little'
}

You might remember that a when block has an implicit succeed statement at the end which makes the surrounding topicalizer block exit. (Meaning you don’t have to remember to break out of the switch manually, because it’s done for you by default.) If you want to override the succeed statement and continue past the when block, then you write an explicit proceed at the end of the when block. Fall-through, as it were, is opt-in.

given $verse-number {
    when * >= 12 { say "Twelve drummers drumming"; proceed }
    when * >= 11 { say "Eleven pipers piping"; proceed }
    # ...
    when * >= 5 { say "FIIIIIIVE GOLDEN RINGS!"; proceed }
    when * >= 4 { say "Four calling birds"; proceed }
    when * >= 3 { say "Three French hens"; proceed }
    when * >= 2 {
        say "Two turtle doves";
        say "and a partridge in a pear tree";
    }
    say "a partridge in a pear tree";
}

All that is still true, save for the word “topicalizer”, since we now realize that when blocks show up basically anywhere. The new rule is this: when makes the surrounding block exit. If you’re in a nested-when-type situation, “the surrounding block” is taken to be the innermost block that isn’t a when block. (So, usually a given block or similar.)


It’s nice to see spec changes happening due to how things are being used in practice. This particular spec change came about because jnthn tried to implement a patch in Rakudo to detect topicalizer blocks more strictly, and he came across userland cases such as the above two, and decided (correctly) to align the spec with those cases instead of the other way around.

The given/when spec has been very stable, and so it’s rare to see a change happen in that part of the synopses. I find the change to be an improvement, and even something of a simplification. I’m glad Perl 6 is the type of language that adapts the spec to the users instead of vice versa.

Just felt like sharing this. Enjoy your Advent.

Day 11 – The Source will be with you, always

Reportings from a Learnathon

This past weekend I had the pleasure of hosting a Perl 6 learnathon with a friend who contacted me specifically to have a night of absorbing this new version of Perl. I thought it might be interesting to share some of what we learned during the process. I will begin with by explaining the single line of code which ended up and then show you some examples of where our evening took us.

Pick Whatever, rotor by five

As we opened our laptops, I was excited to show the newest example I had pushed to my Terminal::Print project. It’s taken quite some time to achieve, but asynchronous printing is now working with this module. It’s not fast, yet, but my long saught multi-threaded “Matrix-ish” example is working. Each column is being printed from an individual thread. This approach of spinning off a bunch of new threads and then throwing them away is not efficient, but as a proof of concept I find it very exciting.

pick-rotor

This line contains a few things that inspired questions. The first is the precise meaning of pick(*), which here means that we want to produce a randomized version of @columns. You can think of the Whatever here as a meaning “as many as possible”. It triggers a different multi method code path which knows to use the invoking objects’s own array size as the length of the list to pick.

The second part to explain was rotor. This is another one of those Perl 6 English-isms that at first felt quite strange but quickly became a favorite as I began visualizing a huge turbine rotor-ing all of my lists into whatever shape I desire whenever I use it. In this case, I want to assemble a list of 5-element arrays from a random version of @columns. By default, rotor will only give you back fully formed 5-element arrays, but by passing the :partial we trigger the multi-method form that will include a non 5-element array if the length of @columns is not divisible by 5 (‘Divisibility’ is easily queried Perl 6, by the way. We phrase it as $n %% 5.)

Put another way: rotor is a list-of-lists generator, taking one list and chopping it up into chunks according to your specifications. My friend mentioned that this resembles a question that he asks in interviews, inviting an investigation into the underlying implementation.

I’ve always considered Rakudo’s Perl 6-implemented-in-Perl 6 approach as a secret weapon that often goes overlooked in assessment of Perl 6’s viability. Even with the current reliance on NQP, there is a great deal of Perl 6 which is implemented in the language itself. To this end, I could immediately open src/core/Any.pm in an editor and continue explaining the language by reading the implementation of the language. Not just explaining the implementation, which can be accomplished by the right combination of language/implementation language and polyglot coverage. I mean explaining the language by looking at how it is used to implement itself, by following different threads from questions that arise as a result of looking at that code.

A word of caution

Now, I don’t mean to imply that one’s initial exposure to core can’t be a shocking experience. You are looking at Perl 6 code that is written according to constraints that don’t exist in perl6 which arise from not being fully bootstrapped and performance concerns. These are expressed in core by NQP and relative placement in the compilation process, on the one hand, and in prototyping and hoop jumping, on the other.

In other words: you are not expected to write code like this and core code does not represent an accurate representation of what Perl 6 code looks like in practice. It is 100% really Perl 6 code, though,  and if you look at NQP stuff as funky library calls, everything you are seeing is possible in your own code. You just don’t normally need to.

Down the rabbit hole

From src/core/Any.pm, here is the code for rotor:

any-rotor.png

And the code for pick:

pick-any

These are not the actual implementations, mind you. Those live in List.pm. But already these signatures inspire some questions

What is this |c thing in the signature?

This represents a Capture, which is an object representing the arguments passed into the method call. In this case it is being used such that no unpacking/containerization occurs. Rather we simply accept what we have been given and pass them as a single parameter to self.list.rotor without looking at them.

In the proto signature for pick we see that there is no name for the Capture, but rather a bare ‘|‘ which tells the method dispatcher that there can be zero or more arguments. Put another way: there are no mandatory arguments that apply to all the pick candidates.

What is this proto thing?

The answer is that it is a clarification that you can apply to your multi methods that constrains them to a specific “shape”.  It is commonly used in conjunction with a Capture, and in fact we see this in our pick example.

As the docs put it, a proto is ‘a way to formally declare commonalities between multi candidates’. The prototype for pick specifies that there are no mandatory arguments, but there might be. This is basically the default behavior for multi method dispatch, but here it allows us to specify the is nodal trait, which provides further information to the dispatcher relating to the list-y. Also due to bootstrapping issues, all multi methods built into Rakudo need an explicit proto.In practice we do not need either of these for our code. But they are there when you need them. One example of a trait that you might use regularly is the pure trait: when a given input has a guaranteed output, you can specify that it is pure and the dispatcher can return results for cached arguments without repeated calculations.

Midnight Golf

As promised, here are a few code examples from the learnathon.

stateful.pngstateful-results.png

This is using anonymous state variables, which are available to subroutines and methods but not to other Callables like blocks.  My friend shared my marvel at the power and flexibility of the Perl 6 parser to be able to handle statements like $--*5, when every character besides the 5 in that statement has a different meaning according to context. Meanwhile Perl 6 gives you defaults for your subroutine parameters by using the assignment operator.

Note that each bare $ in a subroutine creates/addresses a new, individual state variable. Some people will hate this, as they hate much that I appreciate about the language. These anonymous state variables are for situations where a name doesn’t matter, such as golfing on the command line. They can be confusing to get a full grasp of, though.

Here is another example we generated while exploring (and failing many times) to grasp the nuances of these dynamics.

more-state.pngmore-state-results.png

Gone is the anonymous state variable. This is by necessity, because you can only refer to an anonymous state variable once in a subroutine. We’ve switched $a to be optional. The parens around the state declaration are necessary because trying to declare a variable in terms of itself is undefined behavior and Perl 6 knows that.

The same thing, expressed in slight variations but with the same meaning:

more-more-state.png

The bottom example shows that the assignment to zero in our initial more-state is actually unnecessary. The middle shows creating a variable and binding it to a state var, which is a rougher way to get the same behavior as openly declaring a state variable. The top example shows what might be considered the ‘politely verbose’ option.

Concluding thoughts

I was hoping to share more from the evening with you, but it’s been a lot of words already and we’ve only scratched the surface of the first example we examined! Instead, I recommend that you spend some time with src/core in an editor,  the perl6 repl in a terminal, and #perl6 in an IRC client (or web browser) and just explore.

This language is deep. Opening src/core is like diving into the deep end of the ocean from the top of a Star Destroyer. The reward for your exploration of the language, however, is an increasing mastery of a programming language that is designed according to a human-centric philosophy and which approaches a level of expressivity that, some would argue, is unparalleled in the world of computing.

 

Day 10 – Perl 6 Pod

Whenever you write code, you also write documentation, right? Wait, don’t answer that! Let me just pretend that you said “yes”.

As with Perl 5, Perl 6 has a built-in documentation format called Pod. In Perl 5, POD was short for “Plain Ol’ Documentation”, but in Perl 6 we just call it Pod and there’s no acronym. One of the great things about Pod in Perl 6 is that the compiler parses Pod for you. The resulting structure is essentially a tree (really, an Array of trees).

Let’s take a look at a few examples …

use v6;

=begin pod

=head1 NAME

example1.p6 - It's a script to show you how Pod works in Perl 6!

=end pod

sub MAIN {
    say $=pod.WHAT;
    say $=pod[0].WHAT;
}

When I run this script I get:

$ perl6 ./bin/example1.p6 
(Array)
(Named)

The $=pod variable is a reference to the parsed Pod from the current file.

From the code above we can see that the $=pod variable contains an Array. Each element of that array is a Pod node of some sort, and those nodes in turn will usually contain more Pod nodes, and so on. Essentially, Pod is parsed into a tree that you can recursively descend and examine.

The $=pod variable is an array is because you can have several distinct chunks of Pod in a single file …

use v6;

=begin pod

=head1 NAME

example2.p6 - It's a script to show you how Pod works in Perl 6!

=end pod

sub MAIN {
    say $=pod.elems;
    say $=pod[0].WHAT;
    say $=pod[1].WHAT;
}

=begin pod

=head1 DESCRIPTION

This is where I'd put some more stuff about the script if it did anything.

=end pod

When we run this script we get:

$ perl6 ./bin/example2.p6 
2
(Named)
(Named)

So there are now two elements in $=pod, one for each chunk of Pod delimited by our =begin pod/=end pod pairs.

Pod Syntax in Perl 6

Up until this point I’ve shown you some Pod without actually explaining what it is. Let’s take a look at some valid Pod constructs in Perl 6 so you can get a better sense of how it looks, both when you write it and when you’re writing code to process it.

Perl 6 Pod is purely semantic. In other words, you describe what something is, rather than how to present it. It’s left up to code that turns Pod into something else (HTML, Markdown, man pages) to decide how to present any given construct.

Perl 6 Pod has quite a bit more to it than we can cover here, so see Synopsis 26 for all the details. We’ll cover some of the highlights and then look more closely at how to process Pod programmatically.

Blocks

Pod supports several different ways to declare a block. You can use =begin and =end markers …

=begin head1
Heading Goes Here
=end head1

Or you can use a =for marker …

=for head1
Heading Goes Here

These block styles allow you to include configuration information:

=begin head3 :include-in-toc
This will show up in TOC
=end head3

=for head4 :!include-in-toc
Not in the TOC

With a =for block, only the lines immediately following the =for is part of that block. Once there is an empty line, anything that follows is separate.

You can use an “abbreviated” block …

=head1 Heading Goes Here

… but abbreviated blocks don’t allow for configuration information. Like =for blocks, abbreviated blocks end at the first empty line.

There are many types of blocks, including table, code, I/O blocks (input and output), lists, and more.

Paragraphs

Text that isn’t explicitly part of a block is considered to be a paragraph block …

use v6;

=begin pod

=begin head1

NAME

=end head1

This is a paragraph block.

=end pod

sub MAIN {
    say $=pod[0].contents[1].WHAT;
}

The output will be (Para), because the second element of contents is a plain paragraph.

Lists

Lists are much simpler in Perl 6 than they were in Perl 5 …

=begin pod

=item1 The first item
=item1 The second item
=item2 This is a sub-item of the second item
=item1 The third item

Of course, you can also use the =for and =begin/=end form with list items too.

Formatting Codes

There are a variety of formatting codes available …

See L<http://design.perl6.org/S26.html> for details.

Some times you need to mark some text as C<code>,
B<being the basis of the sentence>,
or I<important>.

And More

There are a lot of new things with Perl 6 Pod, and if you want to learn more I encourage you to read Synopsis 26 for all of the gory details.

Processing Pod with Perl 6

As I said before, the Perl 6 compiler parses Pod in your program and makes it available to you via various variables. We already saw the use of $=pod, which contains all of the Pod in the current file. That pod is an array of Pod::Block objects. In future versions of Rakudo, you will also be able to get at specific types of Pod blocks by name with variables such as $=head1 or $=TITLE, but this is not yet implemented.

Let’s write a simple program to show us an outline of the Pod in a given file. For example, given some Pod like this …

=head1 NAME

...

=head1 DESCRIPTION

...

=head1 METHODS

...

=METHOD $obj.foo

...

=METHOD $obj.bar

...

 

… we want to print this output …

NAME
DESCRIPTION
METHODS
  $obj.foo (METHOD)
  $obj.bar (METHOD)

Here’s the code to do just that …

use v6;

sub MAIN {
    for $=pod.kv -> $i, $block {
        say "Block $i";
        recurse-blocks($block);
        print "\n";
    }
}

sub recurse-blocks (Pod::Block $block) {
    given $block {
        when Pod::Block::Named {
            if $block.name eq 'pod' {
                for $block.contents.values -> $b {
                    recurse-blocks($b);
                }
            }
            else {
                my $output = $block.contents[0].contents[0]
                    ~ " ({$block.name})";
                depth-say( 2, $output );
            }
        }
                
        when Pod::Heading {
            depth-say( $block.level,
                       $block.contents[0].contents[0] );
        }
    }
}

sub depth-say (Int $depth, Str $thing) {
    print '  ' x $depth;
    say $thing;
}

 

This code takes some shortcuts wherever it uses $block.contents[0].contents[0] by making the very big assumption that a block contains one element which is a single Pod::Block::Para. It then assumes that the Para block contains one element which is plain text.

In practice, blocks may contain arbitrarily nested blocks because of things like formatting codes, but this gives you an idea of how you might walk through the Pod tree.

Of course, because Perl 6 is still Perl, there’s a module for that! I’ve sketched out a preliminary module called Pod::TreeWalker that will do the tree walking for you, generating events as it finds different nodes. You provide a listener object that is called for each event.

GitHub Repo

All of the code shown in these examples is on GitHub. I’ve also written an additional script that uses Pod::TreeWalker to implement the outlining I demonstrate above.

Day 9 – Perl 6 and the wolf pack

This year I rejoin with Perl6 once again and found out that this Christmas is THE Christmas.

With the announcement of Larry Wall and the imminent liberation of a 1.0 version of Rakudo I got really excited and right away started a couple of side projects using it.

And just when I was thinking that Perl6 couldn’t get any cooler I found the Jonathan Worthington, YAPC::Asia conference: «Parallelism, Concurrency, and Asynchrony in Perl 6».

Perl6 parallelism realy impressed me so implemented an  adaptation of the Grey wolf optimizer created by Mirjalili et all in 2014.

This algorithm (by wikipedia’s definition) is a meta heuristic that mimics the social and hunting behavior of the wolf packs to search for “good enough” solutions in a wide range of problems.

The pseudo-code here:

pseudocodigo-gwo

And from my implementation the interesting parallel bits:

uno

Initially I used a loop block here (I like them more), but that caused that all the wolves end up with a random value instead of keeping the passed $wolf_number value.

After a couple hours and a lot of questions on the #Perl6 IRC channel (wonderful, patient people by the way) I found the problem: the $wolf_number variable disappear with the end of the loop block leaving the fitness_libsvm with nothing.

After that is just await and vòila! process the results.

dos

Never ever have I seen a simpler more elegant code for parallelism than the one produced by Perl6… I just love it!

Postdata: The full implementation is available at https://github.com/Sufrostico/perl6-gwo

Day 8 – Grammars generating grammars

By now you have probably gotten used to the prefix “meta” appearing here and there in the Perl 6 world. Metaclasses, metaobjects, metaoperators, the mythical Meta-Object Protocol. Sounds like nothing scary, all good and familiar and you’ve seen it all, eh? Nothing further from the truth! Today, on the Perl 6 Advent Calendar we’re going full meta. We’re going to have grammars that parse grammars then generate grammars that are going to use to parse grammars. I’ll let this sink in for a moment while you scroll down for the next paragraph.

Grammars are hands-down one of the killer features of Perl 6. Taking Perl’s no less than legendary ability to process text and taking it to the next level. Regular expressions, to many people’s disappointment are just what they say they are: regular (well, regular regular expressions do. Perl regexes are a bit of a different beast, but that’s a material for another story). They parse regular, as opposed to, say, context-free languages, to the neverending disappointment of all the people who would just love to parse XML with them, like in that everyone’s favourite Stackoverflow answer. Say no more to silly theoretical limitations of language theory though, since now we have grammars which are all we ever missed in regexes: readability, composability and, of course, ability to parse even Perl 6 itself — and if that doesn’t sell its absolute power, I don’t know what does.

Writing parsers for predefined grammars (say, in Bachus-Naur Form) was always a bit of a dull job, almost as exciting as copypasting stuff. If you ever sat down and wrote a parser from scratch (perhaps while going through the excellent Let’s Build a Compiler book), you probably recognize the pattern all too well: take a single rule from your grammar, write a subroutine for it, have it call (perhaps recursively) other subroutines similarly defined for other grammar rules, rinse, repeat. Well say no more to that now that we have Perl 6 grammars! In this wonderful new world we need not write subroutines for every token to get the work done. Now we write a grammar class, where we put tokens, rules and regexes for every symbol in our grammar, and inside them we write regexes (or code) that refer to (perhaps recursively) other symbols in our Perl 6 grammar. Now, if you ever went through both of those alternatives, you will definitely realize how massive of a convenience the grammars are in Perl 6. Instead of painstakingly cranking out repetitive and error-prone code we have a wonderful, declarative way to specify our language, with an impressive collection of utility to get most of our common, boring work out of the way.

But what if we already have a grammar, specified perhaps in the previously mentioned BNF? What we do then is carefully retype the existing grammar (parsing it in our head, actually) into our new, shiny Perl 6 grammar that represents the exact same thing, but has the clear advantage of actually being executable code. A fair deal, you could say. For most people, no big deal at all. We are not most people. We are programmers. We have the resources. The will. To make these grammars count! Our job revolves around building things that will do the work for us so we don’t have to. To take the well-defined specifications and turn them into tools that work for us, or for themselves. Tools that take the repetitive and automatable parts of our work the way. Why should building parsers be any different? Why, I’m glad you asked.

The wonderful thing about the Perl 6 grammars is that they are no more magical than any other element of the language. Just as classes are first class citizen that we can introspect, augment and build programatically, so are grammars. In fact, you can look at the source code of the compiler itself and notice that grammars are nothing else than a specialized kind of classes. They follow the same rules as classes do, allowing us to create them on the fly, add tokens to them on the fly and eventually finalize them in order to have a proper, instantiatable class object. So now that we can parse BNF grammars (since they’re just ordinary text) and create Perl 6 grammars from code, let’s put those pieces together and write us something that will save us the effort of converting a BNF grammar to Perl 6 grammar manually.

The grammar for a BNF grammar

grammar Grammar::BNF {
    token TOP { \s* <rule>+ \s* }

    token rule {
        <opt-ws> '<' <rule-name> '>' <opt-ws> '::=' <opt-ws> <expression> <line-end>
    }

    token expression {
        <list-of-terms> +% [\s* '|' <opt-ws>]
    }

    token term {
        <literal> | '<' <rule-name> '>'
    }

    token list-of-terms { <term> +% <opt-ws> }

    token rule-name { <-[>]>+ }

    token opt-ws { \h* }

    token line-end { [ <opt-ws> \n ]+ }

    token literal { '"' <-["]>* '"' | "'" <-[']>* "'" }

    ...
}

The interesting stuff happens in the three, almost-topmost tokens. rule is the core building block of a BNF grammar: a <symbol> ::= <expression> pair, followed by a new line. The entire grammar is no more than a list of those. Each expression is a list of terms, or possibly and alternative of them. Each term is either a literal, or a symbol name surrounded by angle brackets. Easy enough! That covers the parsing part. Let’s look at the generating itself. We do have a built-in mechanism of “doing stuff for each token in the grammar”, in the form of Actions, so let’s go ahead and use that:

my class Actions {
    has $.name = 'BNFGrammar';
    method TOP($/) {
        my $grmr := Metamodel::GrammarHOW.new_type(:$.name);
        $grmr.^add_method('TOP',
            EVAL 'token { <' ~ $<rule>[0].ast.key ~ '> }');
        for $<rule>.map(*.ast) -> $rule {
            $grmr.^add_method($rule.key, $rule.value);
        }
        $grmr.^compose;
        make $grmr;
    }

    method expression($/) {
        make EVAL 'token { ' ~ ~$/ ~ ' }';
    }

    method rule($/) {
        make ~$<rule-name> => $<expression>.ast;
    }
}

The TOP method is definitely the most magical and scary, so let’s tackle that first to make the rest of the stuff look trivial in comparison. Basically, three things happen there:

1. We create a new grammar, as a new Perl 6 type
2. We add tokens to it using the `^add_method` method
3. We finalize the grammar using the `^compose` method

While Perl 6 specifies that the token named TOP is where the parsing starts, in BNF the first rule is always the starting point. To adapt one to the other, we craft a phony TOP token which just calls the first rule specfied in the BNF grammar. Unavoidably, the scary and discouraging EVAL catches our attention, as if it said “horrible stuff happens here!” It’s not entirely wrong when it says that, but since we have no other way of programmatically constructing the individual regexes (that I know of), we’ll have to accept this little discomfort in the name of the Greater Good, and look a little bit closer at what we’re actually EVALing in a moment.

After TOP we proceed to add the rest of the BNF rules to our grammar, this time preserving their original names, then ^compose() the whole thing and finally make it the result of the parsing: a readymade parser class.

In the expression method we glue the parsed BNF elements together in order to produce valid Perl 6 code. This turns out to be pretty easy, since both separate symbols with whitespace, alternate them with the pipe character and sorround symbol names with angle brackets. So for a rule that looks like this:

<foo> ::= 'bar' | <baz>

The Perl 6 code that we EVAL turns becomes:

token { 'bar' | <baz> }

Since we already validated in the grammar part of our code that the BNF we parse is correct, there’s nothing stopping us from literally pasting the entire expression into our code and wrap it in a token { }, so let’s go ahead and do just that.

Last but not least, for each BNF rule we parse we produce a nice Pair, so our TOP method has an easier times processing each of them.

Seems like we’re about done here, but just for the users’ convenience, let’s write a nice method that takes a BNF grammar and produces a ready to use type object for us. As we remember, grammars are just classes, so there’s nothing stopping us from just adding it straight to our grammar:

grammar Grammar::BNF {
    ...

    method generate($source, :$name = 'BNFGrammar') {
        my $actions = Actions.new(:$name);
        my $ret = self.new.parse($source, :$actions).ast;
        return $ret.WHAT;
    }
}

Looks good from here! Before you start copypasting all this into your own projects, remember that Grammar::BNF is a Perl 6 module available in your Perl 6 Module Ecosystem, installable with your favourite module manager. It also ships with some goodies like a Perl 6 slang, allowing you to write BNF grammars straight in your Perl 6 code (as opposed to including them as strings), or parsing the more powerful (and perhaps more widespread) ABNF grammars as well.

If you indeed took the time for the beginning of this post to sink in, you may remember that I promised that we’re going to have grammars (a-one) that parse grammars (a-two) then generate grammars (a-three) that are going to use to parse grammars (a-four). So far we’ve seen the BNF::Grammar grammar (that’s our a-one), that parses a BNF grammar (that’s our a-two), generates a Perl 6 grammar in a form of a type object (that’s a-three) and… that’s it. We’re still missing the last part, using this whole thing to parse grammars. We’ve only gone 75%-meta, and that is just not good enough in this day and age. Why stop now? Why not take a BNF grammar of a BNF grammar, parse that with a Perl 6 grammar and use the resulting Perl 6 BNF grammar to parse our original BNF grammar of a BNF grammar? Wouldn’t that be sweet? Sure it will! That is, however, left as an exercise for you, my dear readers. After all, how fair would it be if I had all the fun while you just sit there and watch? Would you like it if you opened the little paper window in your advent calendar only to find a note saying. „There was a chocolate here for you. I ate it. It was truly delivious?” Me neither! There’s a BNF grammar for BNF both on Wikipedia and in Grammar::BNF’s very own test suite; the latter even includes a little breadcrumb that can help you with your adventure. I eagerly await thy results, and as always, thank you all for reading and I wish you a wonderful advent!

Day 7 — Unicode, Perl 6, and You

Quick (rhetorical) question: how many of you either try your best to ignore Unicode, or groan at the thought of having to deal with it again?

It’s fair, after all, considering Unicode is big. Really big. (You may think it’s a long walk down the ASCII table, but that’s peanuts compared to space Unicode.) It certainly doesn’t help that many languages, particularly older ones, don’t help you, the average programmer, work with it all that well. Either they don’t deal with encoding standards at all, meaning some familiarity is mandatory, or certain other languages claim to support it but really just balk once you get past the BMP (the codepoints that can fit in a 16-bit number).

Perl 6, as you might guess, does handle Unicode well. It’s actually necessary to go about this day in a twofold manner: half of the story is how to process Unicode text, and half is how to use Unicode syntax. Let’s start with the one more likely to be of concern when actually programming, that of…

How do I Handle Unicode Text?

No matter your level of experience in handling Unicode (or anything involving different encodings), you’ll be pleased to learn that in Perl 6, it goes just about the way you’d expect.

Perl 6’s strings are interesting in that they by default work on the notion of graphemes — a collection of codepoints that look like a distinct thing; what you’d call a “character” if you didn’t know better. Not every distinct “character” you could come up with has its own codepoint in the standard, so usually handling visual elements naturally can be quite painful.

However, Perl 6 does this work for you, keeping track of these collections of codepoints internally, so that you just have to think in terms of what you would see the characters as. If you’ve ever had to dance around with substring operations to make sure you didn’t split between a letter and a diacritic, this will be your happiest day in programming.

As an example, here’s a devanagari syllable in a string. The .codes method returns the number of codepoints in the string, while .chars returns the number of characters (aka graphemes):

say "नि".codes;    # returns  2
say "नि".chars;    # returns  1

Even though there isn’t a singular assigned codepoint for this syllable, Perl 6 still treats it as one character, suiting any purpose that doesn’t involve messing with the text at a lower level.

That’s cool, but does it matter much to me, a simple English-speaking programmer who’s never had to deal with other languages or scripts?, I can imagine some of you thinking. And the answer is yes, because regardless of your background, there is most definitely one grapheme you’ve encountered before:

say "\r\n".chars;    # returns 1

Yep, the Windows end-of-line sequence is explicitly counted by Unicode’s “extended grapheme cluster” definition as one grapheme.

And of course it’s not just looks, that’s how operations on strings work:

say "नि\r\n".substr(1,1).perl    # returns "\r\n"

Of course, that’s all just for the default Str type. If you don’t want to work at a grapheme level, then you have several other string types to choose from: If you’re interested in working within a particular normalization, there’s the self-explanatory types of NFC, NFD, NFKC, and NFKD. If you just want to work with codepoints and not bother with normalization, there’s the Uni string type (which may be most appropriate in cases where you don’t want the NFC normalization that comes with normal Str, and keep text as-is). And if you want to work at the binary level, well, there’s always the Blob family of types :) .

We also have several methods that let you examine the various bits of Unicode info associated with characters:

say "a".uniname;                # get name of first Unicode character in string.
say "\r\nhello!".ord            # get number of first codepoint
                                # (*not* grapheme) in string
say "\r\nhello!".ords           # get numbers of all codepoints
say "0".uniprop("Numeric_Type") # get associated property

And so on :) . Note that the ord/ords part shows you that you’ll really never get the internal numbers used to keep track of graphemes. When ord sees a grapheme cluster, it just returns the codepoint number for the first codepoint of that cluster.

Not Just Strings

Of course, our Unicode support wouldn’t be complete without regex support! Of particular note is the ability to match based on properties, so for example

/ <:Alpha>+ /

will match multiple alphabetic characters (<alpha> will do almost the same thing, just with the addition of matching underscore), and

/ '0x' <:Nv(0..9) + :Hex_Digit>+ | '0b' <:Nv(0..1)>+ /

is a regex that lets you match against either hexadecimal numbers or binary ones, in a Unicode-friendly way. And if you wanted to write the Unicode standard’s “extended grapheme cluster” pattern in regexes (the same pattern we use to determine grapheme handling mentioned earlier):

grammar EGC {
    token Hangul-Syllable {
        || <:GCB<L>>* <:GCB<V>>+ <:GCB<T>>*
        || <:GCB<L>>* <:GCB<LV>> <:GCB<V>>* <:GCB<T>>*
        || <:GCB<L>>* <:GCB<LVT>> <:GCB<T>>*
        || <:GCB<L>>+
        || <:GCB<T>>+
    }

    token TOP {
        || <:GCB<CR>> <:GCB<LF>>
        || <:GCB<PP>>*
           [
           || <:GCB<RI>>
           || <.Hangul-Syllable>
           || <!:GCB<Control>>
           ]
           [
           || <:Grapheme_Extend>
           || <:GCB<Spacing_Mark>>
           ]*
        || .
    }
}

A bit wordy, but just imagine how much more painful that would be without built-in Unicode support in your regexes!

And aside from all the programming-related stuff, there’s also…

Using Unicode to Write Perl 6

In part of our tireless support of Unicode, we also parse your source code with the same regex engine you just saw demonstrated above (though the Perl 6 parser doesn’t need to bother with Unicode properties nearly that often). This means we’re able to support syntax using Unicode in Perl 6, and have been taking advantage of it for a long time now. Observe:

say 0 ∈ «42 -5 1».map(&log ∘ &abs);
say 0.1e0 + 0.2e0 ≅ 0.3e0;
say 「There is no \escape in here!」

Just a small sampling of the Unicode built-in to Perl 6 by default. Featuring interpolating quote-words lists, setops, function composition, and approximate equality. Oh, and the delimiters for the most basic level of string quoting.

Don’t worry though, standard Perl 6 does not demand that you be able to type Unicode. If you can’t, there are so-called “Texas” variants:

say 0 (elem) <<42 -5 1>>.map(&log o &abs);
say 0.1e0 + 0.2e0 =~= 0.3e0;
say Q[[[There is no \escape in here!]]]

This is fine of course, but if it’s feasible for you to set up Unicode support, I heartily recommend it. Here’s a short list on various ways to do it:

  • Get an awesome text editor — The more featureful text editors (such as emacs or vim, to name a couple) will have functionality in place to insert arbitrary characters. Go look it up in your editor’s documentation, and consider petitioning if it doesn’t support Unicode entry :) .
  • Use your OS’s hex input — Some systems, such as Windows or applications using GTK, support key shortcuts to let you type the hexadecimal codepoint numbers for characters. You’ll have to memorize codepoints, but chances are you’d get used to it eventually.
  • Set up your keyboard’s third/fourth/etc. levels — If your system supports it, you can enable third/fourth level modifiers and so on for your keyboard to access those levels (if you don’t know what those are, your ‘Shift’ key counts as a second-level modifier, and the characters it lets you type are considered on the second level, as an example). Depending on the amount of time and/or patience you have you could even customize those extra levels.
  • (X11) Set up your Compose key — This is the method I myself use, and it involves setting up a key to use as the “Compose key” or “Multi key”, and use of a file in ~/.XCompose (or some other place, as long as you configure it) to set up key combos. The Compose key works by letting you type any configured sequence of keys after pressing the Compose key, which will insert the character(s) of your choice.
    • Which key you sacrifice of course depends on which keys you don’t make use of; it could be the caps lock, or one of those extra Shift/Alt/Ctrl keys. It can even be that useless Menu key, which you probably just remembered was on your keyboard :P .
    • An absolutely wonderful starting .XCompose can be found in this github repository. You’ll still want to add combinations to this for some Perl 6, and perhaps do other tinkering with it¹, but it’s still quite a lot better than having to start from scratch :) .

In Conclusion

This of course isn’t an exhaustive coverage of all that Perl 6 has to offer Unicode, but the underlying takeaway is that Perl 6 makes handling Unicode much nicer than other languages do (at least out of the box).

Bonus! Partly in the spirit of Christmastime, and partly in the spirit of “I love this, and what better time to share it?”, allow me to present for your historical interest Perl 6’s legendary “snowman comet” bug:

say "abc" ~~ m☃.(.).☄  # this used to work. Really.

Basically this old old old old bug that (sadly) doesn’t exist anymore was about the regex part of the parser messing up a bit and interpreting ☃☄ as just as valid a pair of brackets as () or ⦃⦄.

Is there a relevant lesson in this bug? Nope. Is it only vaguely connected to a winter blog post on Unicode? You bet. It’s just that it’s thanks to Unicode support we were able to get that kind of bug way back in 2009, and it’s thanks to Unicode support (among other things) that would let someone re-implement this as a slang or something ☺ .

So go forth confident in your newfound ability to handle international text with much greater ease than you’re perhaps used to, and spend more time building ☃☃☃☃ the rest of this month.

Have the appropriate amount of fun! ❄

¹Psst! Use the texas variants for your compose combos if you’re stuck on coming up with them, e.g. <Multi_key> <equal> <asciitilde> <equal> for

Day 6 – On Opening Files and Contributing to Open Source Projects

Why do people contribute to open source projects? Some do it for fun, some for fame and some for fortune by actually getting paid to do such work. However, probably the most important factor is scratching your own itch: You want something done, and make it happen.

That was the position I found myself in earlier this year: I wanted to use Perl 6 to read a binary file and patch it in-place. According to the documentation, I should have been able to do so, as Rakudo claimed to support opening files in the modes read-only :r, write-only :w, append :a and read-write :rw, and I assumed the last one would do the trick.

Not so: The code silently adjusted the flag :rw to :w, and I was left hanging.

After a bit of bikeshedding, a design emerged that I was happy with, and as it was merged upstream a month and a half later, others seemed to agree. But while I added an extensive commit message to lay out the new system in its glory, slacker that I am, it was left for others to perfom the tedious tasks of writing tests and documentation.

As you might expect, that hope never materialized, and the extended open modes I introduced became something of an Easter egg: Undocumented, untested and only discoverable by reading the commit log or the code itself. I may very well be the only person who has actually made use of them.

Fast-forward to December: Here we are, Christmas draws near, and with it the first proper release of Perl 6. As a sometimes-good on-and-off-again citizen of the Perl 6 community, I decided to put on my big boy pants and do the Right Thing: Some tests have now been written (which makes the new open modes an official part of the 6.c language release), and the documentation will be updated soon-ish to conform to what is actually implemented instead of what had been planned to be implemented eventually.

Having said all that, let us now take a look at what the fuss is all about:

# not the actual signature as &open delegates to IO::Handle.open
sub open(
    IO() $path,
    :$mode, :$create, :$append, :$truncate, :$exclusive,
    :$r, :$w, :$x, :$a, :$rw, :$rx, :$ra, :$update,
    :$bin, :$enc = 'utf8',
    :$nl-in = ["\x0A", "\r\n"], :$nl-out = "\n", :$chomp = True
--> IO::Handle)

Isn’t it a thing of beauty? Perhaps not, but it’s also not as scary as it looks:

The only required argument is the $path, which must be something we can call .IO on. The common example would be a string that holds the name of a file you want to access.

One design goal behind the additional open modes was to support all the features of fopen(3) as specified by the C 11 standard library. To keep things sane, inspiration was drawn from open(2) and the flags specified by POSIX.

The first row of named arguments lists the POSIX-inspired ones. Here, :$mode may take the values 'ro', 'wo' and 'rw', the rest are boolean flags. If I did my job well, you will rarely need to use them – instead, the single- and double-letter variants listed on the next line should suffice.

Read-only mode :r is the default. The three write-only modes are :w, which truncates the file if it aready exists (and is the only shorthand mode that does so implicitly), exclusive mode :x, which fails if the file already exists (and thus can be used to implement a poor man’s locking mechanism), and append mode :a, which adds content at the end of a file.

If you want to read and write to a file at the same time (as I did), you may combine the flag :r with any of :w, :x or :a. For convenience, this may also be spelled :rw, :rx, :ra. Note that the effect of providing :r,:w (or its alias :rw) is not a combination of the effects of :r and :w – existing files will not be truncated as one might expect, which is a deliberate departure from the pattern.

Aside from :r, all shorthand modes will implicitly create the file if it doesn’t already exist. In contrast, the fopen(3) mode "r+" (known to Perl 5 programmers as "+<") does not. In Perl 6, this mode is now known as :update, corresponding to the low-level :mode<rw>. In contrast, :rw maps to :mode<rw>,:create, which does not have a direct equivalent in either C 11 or Perl 5.

The :bin flag and :$enc argument control whether the file should be opened in binary mode or text mode with given encoding. By default, files are assumed to be text files encoded as UTF-8. This is different from what the documentation tells you (there has never been autodetection of Unicode encodings as far as I’m aware), and binary files also do not return buffers instead of strings when processing the file line-by-line. I’ve raised that issue with The Man (translation: talked on IRC about it) and a pull request has been sent. We’ll have to wait and see what develops on that front in the remaining weeks leading up to the party.

Finally, the last line of named arguments lists those that control line-based access in general and the behaviour of the .get and .lines methods in particular. The :$nl-in argument allows you to provide a list of strings that should be considered line separators, the :$nl-out argument controls what gets written to disk when you request a newline, eg when using the methods .say, .put or .print-nl. If the final argument :$chomp is set to True (which is the default), line separators will be discarded instead of being included at the end of the strings returned by .get or .lines. As a side note, if you find chomp => False too much of a burden to type, Perl 6 supports the shorthand notation :!chomp.

For those of you still with me: Congratulations, you now know how to open files in Perl 6, and all that remains for me to say to you is this:

Have a happy St.Nicholas Day and a fun time playing with Rakudo!

PS: As to what you can do with a file handle once it has been opened, I’ll leave that as an exercise to the reader. But note that while Perl wants to make hard things possible, it also tries to keep easy things easy – which means that you can use interfaces like slurp, spurt and lines (or their method forms .IO.slurp, .IO.spurt, .IO.lines) without having to manually open (and close!) anything.