Day 04 — Heredocs, Theredocs, Everywheredocs docs

December 4, 2013 by

So let’s say you’ve got a bit of documentation to print out, a help statement perhaps. You could use an ordinary string, but it always looks like something you really shouldn’t be doing.

    sub USAGE {
        say "foobar Usage:
    ./foobar <args> <file>

    Options:

    ...
    ";
    }

Perl 6 has a much better idea for you, fortunately: heredocs! They work a bit differently from Perl 5, and are now invoked using the adverb :heredoc on quoting constructs:

    say q:heredoc/END/;
    Hello world!
    END

When you use :heredoc, the contents of the string are no longer the final contents; they become the string that signifies the end of a heredoc. q"END" results in the string "END", q:heredoc"END" results in everything before the next END to appear on its own line.

You will have also noticed that heredocs only start on the next possible line for them to start, not immediately after the construct closes. That semicolon after the construct never gets picked up as part of a heredoc, don’t worry :) .

The :heredoc adverb is nice, but it seems a bit long, doesn’t it? Luckily it has a short form, :to, which is much more commonly used. So that’s what we’ll be using through the rest of the post.

    say q:to"FIN";
    Hello again.
    FIN

You can use any sort of string for the delimiter, so long as there’s no leading whitespace in it. A null delimiter (q:to//) is fine too, it just means you end the heredoc with two newlines, effectively a blank line.

And yes, delimiters need to be on their own line. This heredoc never ends:

    say q:to"noend";
    HELLO WORLD noend

A note about indentation: look at this heredoc

    say q:to[finished];
      Hello there
        everybody
    finished

Which of those three heredoc lines decides how much whitespace is removed from the beginning of each line (and thus sets the base level of indentation)? It’s the line with the end delimiter, “finished” in the last example. Lines with more indentation than the delimiter will appear indented by however much extra space they use, and lines with less indentation will be as indented as the delimiter, with a warning about the issue.

(Tabs are considered to be 8 spaces long, unless you change $?TABSTOP. This usually doesn’t matter unless you mix spaces and tabs for indentation anyway though.)

It doesn’t matter how much the delimiter indentation is, all that matters is indentation relative to the delimiter. So these are all the same:

    say q:to/END/;
    HELLO
      WORLD
    END
    say q:to/END/;
        HELLO
          WORLD
        END
    say q:to/END/;
                   HELLO
                     WORLD
                   END

One other thing to note is that what quoting construct you use will affect how the heredoc contents are parsed, so

    say q:to/EOF/;
    $dlrs dollars and {$cnts} cents.
    EOF

Interpolates nothing,

    say q:to:c/EOF/;
    $dlrs dollars and {$cnts} cents.
    EOF

Interpolates just {$cnts} (the :c adverb allows for interpolation of just closures), and

    say qq:to/EOF/;
    $dlrs dollars and {$cnts} cents.
    EOF

Interpolates both $dlrs and {$cnts}.

Here’s the coolest part of heredocs: using more than one at once! It’s easy too, just use more than one heredoc quoting construct on the line!

    say q:to/end1/, qq:to/end2/, Q:to/end3/;
    This is q.\\Only some backslashes work though\t.
    $sigils don't interpolate either.
    end1
    This is qq. I can $interpolate-sigils as well as \\ and \t.
    Neat, yes?
    end2
    This is Q. I can do \\ no \t such $things.
    end3

Which, assuming you’ve defined $interpolate-sigils to hold the string "INTERPOLATE SIGILS", prints out

    This is q.\Only some backslashes work though\t.
    $sigils don't interpolate either.
    This is qq. I can INTERPOLATE SIGILS as well as \ and   .
    Neat, yes?
    This is Q. I can do \\ no \t such $things.

After every end delimiter, the next heredoc to look for its contents starts.

Of course, indentation of different heredocs will help whenever you have to stack a bunch of them like this.

    say qq:to/ONE/, qq:to/TWO/, qq:to/THREE/, qq:to/ONE/;
    The first one.
    ONE
        The second one.
        TWO
    The third one.
    THREE
        The fourth one.
        ONE

Which outputs:

    The first one.
    The second one.
    The third one.
    The fourth one.

(And yes, you don’t have to come up with a unique end delimiter every time. That could have been four q:to/EOF/ statements and it’d still work.)

One final note you should be aware of when it comes to heredocs. Like the rest of Perl 6 (barring a couple of small exceptions), heredocs are read using one-pass parsing (this means your Perl 6 interpreter won’t re-read or skip ahead to better understand the code you wrote). For heredocs this means Perl 6 will just wait for a newline to start reading heredoc data, instead of looking ahead to try and find the heredoc.

As long as the heredoc contents and the statement that introduces the heredoc are part of the same compilation unit, everything’s fine. In addition to what you’ve seen so far, you can even do stuff like this:

    sub all-info { return q:to/END/ }
    This is a lot of important information,
    and it is carefully formatted.
    END

(If you didn’t put the brace on the same line, it would be part of the heredoc, and then you’d need another brace on a line after END.)

However, things like BEGIN blocks start compiling before normal code, so trying that last one with BEGIN block fails:

    BEGIN { say q:to/END/ }
    This is only the BEGINning.
    END

You have to put the heredoc inside the BEGIN block, with the quoting construct, in order to place them in the same compilation unit.

    BEGIN {
        say q:to/END/;
        This is only the BEGINning.
        END
    }

That’s it for heredocs! When should you use them? I would say whenever you need to type a literal newline (by hitting Enter) into the string. Help output from the USAGE sub is probably the most common case. The one at the beginning could easily (and more readably) be written as

    sub USAGE {
        say q:to"EOHELP";
            foobar Usage:
            ./foobar <args> <file>

            Options:

            ...
            EOHELP
    }

Day 03 – Rakudo Perl 6 on the JVM

December 3, 2013 by

There have been a number of exciting developments for Perl 6 during 2013. In this post, we’ll take a look at one of them in some detail: running Perl 6 on the JVM (Java Virtual Machine).

Why the JVM?

There are many reasons for a language to have an implementation that targets the JVM. Here are some that drove us to bring Perl 6 to this platform.

  • The JVM is a stable, widely deployed,  trusted-in-the-enterprise platform. There are places where they don’t mind which language you write it, but they do care that it can run on the JVM.
  • The JVM has been very well optimized over the years. It most certainly isn’t fast to get started – but for long running things it typically performs well.
  • These days, the JVM is most certainly not just for Java. In fact, the commitment to run other languages – including those very different to Java – is serious. For example, there’s now a yearly JVM Language Summit, and the invokedynamic instruction and infrastructure was added in JDK7, and being improved in JDK8. Since Perl 6 is a gradually typed language, a VM that can play host to both static and dynamic languages is a good fit. Furthermore, every other major dynamic language is on the JVM. So, why not Perl too?
  • The JVM has widely used, well exercised support for concurrent, parallel and asynchronous programming. A wide range of primitives are available. Given that before this year, the Perl 6 story in these areas was also rather weak so far as implementation went, being on the JVM would provide an opportunity for fast prototyping and exploration, to help drive things forward.

But…another Perl 6 implementation?!

Implementing Perl 6 is a large undertaking – as those of us who got sucked into the process along the way have discovered. Many languages have got to the JVM by having a JVM-specific implementation of the language: JRuby, Jython, Nashorn, etc. For Perl 6, we’ve taken a different path.

The Rakudo Perl 6 compiler may only have targeted Parrot for much of its life, but those designing it have had VM portability in mind for a good while. Furthermore, the basic architecture has always been to have strongly isolated compilation stages, communicating by well-defined data structures. This put Rakudo in a good place to gain a JVM backend – at least, in theory. Over the course of the last year, what we hoped would work out well in theory has played out very nicely in practice.

The vast majority of the Rakudo codebase is not in any way VM-specific. Better still, the bits that need to change most often and that undergo most active development are almost always VM-independent. Many developers working on Rakudo test their changes against a single backend, and it’s relatively uncommon to find breakage on the other backend as a result. That said, we have automated daily spectest runs to catch any regressions.

Status

First, let’s consider the specification test suite. You might think at this point I’d mention how close Rakudo on JVM is to passing the number of specification tests that Rakudo on Parrot does. In fact, we need to do it the other way around these days: Rakudo on Parrot passes 99.64% of the spectests that Rakudo on the JVM does. “Huh,” you might think. “How’d the JVM backend come out ahead?” The answer is relatively simple: the JVM backend runs a bunch of concurrency tests that we don’t run on the Parrot backend. There actually are a small number of tests (tens rather than hundreds) that only pass on Rakudo on Parrot, largely due to “interesting” edge-case behaviors that have yet to be hunted down. However, these days the vast majority of programs run unmodified on both.

In the wider ecosystem, things are not quite so polished yet. Panda, the module installer, runs on Rakudo on the JVM. However, a number of modules depend on the NativeCall library, for calling into native code. The NativeCall porting effort is very much underway; last time I looked at it I could do basic things, like calling simple Win32 APIs. But it’s not all the way there yet. This is, however, really the last major missing piece. To say this time last year, we couldn’t run Perl 6 on the JVM at all, we’ve come a very long way.

Is it faster?

Well, it depends. For quick one liners and short-running scripts? No, startup will kill you. For something long running? Yes, usually it’s faster, and sometimes it’s significantly faster (perhaps five times or even forty times). And that’s before we’ve really done a great deal of optimization work on the JVM backend; the focus thus far has largely been “make it work”.

Can I call Java libraries?

Yes, but… :-) We do have some basic interop support in place already. Here’s an example:

use java::util::zip::CRC32:from<java>;

my $crc = CRC32.new();
for 'Hello, Java'.encode('utf-8') {
    $crc.'method/update/(B)V'($_);
}
say $crc.getValue();

It doesn’t look all that bad until you hit the method call in the loop. What’s that funny method/update/(B)V thing about? In Java you can statically overload methods. When there’s no overloading, we quite happily give you a short name. When there’s multiple, for now you need to use the JVM’s method descriptors to indicate the desired one. We’ll improve that, and many other aspects of interop, over the coming months. In summary, it’s often quite possible to call code from Java libraries today, it’s just not pleasant yet.

The future

Much has been done, yet of course there’s still plenty to do. Once NativeCall support is in shape, we’ll be able to add the JVM as an option to the Rakudo Star distribution release (for now, it’s only available in compiler releases – or fresh from Git, of course). Beyond that, the main areas of focus will be convergence, Java interop and performance. Given that this year took us from zero JVM support to Rakudo on JVM being the implementation passing the most spectests, it’s exciting to think where we’ll be in another year from now.

Day 02 – The humble type object

December 2, 2013 by

For some inscrutable reason, we have defined a Dog class in today’s post.

class Dog {
    has $.name;
}

Don’t ask me why — maybe we’re writing software for a kennel? Maybe we’re writing software for dogs? “Teach your dog how to type!” Clever dogs can do up to 10 words a minute, with surprisingly few typos.

Anyway. Having a Dog class gives us the dubious pleasure of being able to create dogs out of thin air and passing them to functions. No surprise there.

sub check(Dog $d) {
    say "Yup, that's a dog for sure.";
}

my Dog $dog .= new(:name);
check($dog);     # Yup, that's a dog for sure.

But where there might be some surprise — if you haven’t gotten used to the idea of Perl 6′s type objects yet — is that you can also do this:

check(Dog);      # Yup, that's a dog for sure.

What did we just do there?

We didn’t pass a dog to the function, we passed Dog to the function. And the function accepts it and says it’s a dog, too. So, Dog is a dog. Huh.

What’s going on here is that in Perl 6 when we declare a class Dog like we just did, the word Dog ends up having two meanings:

  • The class Dog that we declared.
  • The type object Dog, kind of a patron saint for all Dog objects ever instantiated.

Cramming two concepts into one word like this seems like a recipe for failure. But it turns out it works surprisingly well.

Before we look at the reasons for using type objects, let’s find out a bit more about what they are.

say Dog;          # (Dog)
say Dog.name;     # ERROR: Cannot look up attributes in a type object
say ?Dog;         # False
say defined Dog;  # False

So, in summary, the Dog type object identifies itself as (Dog), it refuses to have its attribute inspected, it boolifies to False, and it’s not defined. Contrast this with an instance of Dog:

say $dog;         # Dog.new(name => "Fido")
say $dog.name;    # Fido
say ?$dog;        # True
say defined $dog; # True

An instance is everthing the type object isn’t: it knows how to output itself, it will happily tell you its name, and it’s both True and defined. Nice.

(Being undefined is only almost a surefire way to identify the type object. Someone could have gone through the trouble of making their instance object undefined. As they say in the industry, you don’t have to be a type object to be undefined… but it helps!)

And now, as promised, the Top Five Reasons Type Objects Work Surprisingly Well:

  1. Classes actually make sense as objects in the program. There’s this idea that classes have to haughtily refuse to play among the rest of the values in a program, that they have to somehow be like gods looking down on the instances from a Parthenon of class-hood. Perl 5 kind of has it like that. But both Ruby and Python show that classes can behave as more or less normal objects. In Perl 6, Dog is also what you get if you do $dog.WHAT.
  2. It fits quite well with the whole smartmatching thing. So what $dog ~~ Dog actually means is something like “hey, Dog type object, does this $fido look anything like you?”. The type object doesn’t just sit there, it does useful things like smartmatching.
  3. Another thing: the whole reason that line, my Dog $dog .= new(:name);, works as it does is because we end up calling .new on the type object. So here’s what that line does in slow motion. It desugars to a declaration and an assignment. The declaration is my Dog $dog; and so, because the $dog variable needs to start out with some undefined value, it starts out with Dog, the type object. The assignment then is simply $dog.=new, which is short for $dog = $dog.new. Conveniently, because the type object Dog is an object of the type Dog, it has a .new method (inherited from Mu in this case) that knows how to construct dogs.
  4. A little detail from that last point, which actually turns out to be a rather big deal: Perl 6 doesn’t really have undef like Perl 5 does. It turned out that undef wasn’t a really good fit with a type system; undef gets in everywhere, and doesn’t really have a type at all. (A bit like Java’s null which is known to have caused people no end of suffering.) So what Perl 6 has instead are these typed undefined values, namely — you guessed it — the type objects. If you declare a variable my Int $i, then $i will start out as undefined, that is, containing the type object Int.
  5. Not only do you sometimes want to call .new on the type object, sometimes you have other methods which don’t require an instance. (These kinds of methods are sometimes known as static methods in some languages, and class methods in other languages. Some languages have both of these, and they’re different, just to be confusing.) Again, the type object comes to the rescue here, sort of acts like an instance so that you can call your method on it, and then once again fades into the background. For example, if the class Dog had a method bark { say "woof" } then the Dog type object would be able to bark just as well as actual dog instances. (But the type object still refuses to tell you its .name, or any of its attributes.)

So that’s type objects for you. They’re sitting in a convenient semantic spot halfway between the class and its instances, sometimes representing one end of the spectrum, sometimes the other.

One thing before we part ways today. It doesn’t happen often, but sometimes you do want to be able to tell type objects and real instances apart, for example when accepting parameters in a function:

multi sniff(Dog:U $dog) {
    say "a type object, of course"
}
multi sniff(Dog:D $dog) {
    say "definitely a real dog instance"
}

sniff Dog;    # a type object, of course
sniff $dog;   # definitely a real dog instance

Here, :U stands for “undefined” and :D for “defined”. (And that, dear friends, is how we got a smiley into the design of Perl 6. Program language designers, take heed.) As I mentioned parenthetically before, it’s actually possible to be an undefined object without being a type object. For these special occasions, we have :T, but this modifier isn’t implemented in Rakudo as of this writing. (Though moritz++ informs me that, in Rakudo, :U currently has the semantics that :T should have.)

Let’s just end this post with maybe the corniest one-liner ever to see the light of day in #perl6:

$ perl6 -e 'say (my @a = "Hip " x 2), @a.^name, "!"'
Hip Hip Array!

Day 01 – The State of Perl 6 in 2013

December 1, 2013 by

Welcome to the 2013 Perl 6 advent calendar!

In Perl 6 land, 2013 will be remembered as the year that brought proper concurrency support.

But I'm getting ahead of myself.

There is also sad news. Niecza, the Perl 6 compiler on the CLR (.NET/Mono) platform, and the Perl 6 compiler with the best runtime characteristics, had its last release in March. Since then there were a few maintenance patches and new built-in types and routines, but little in terms of actual compiler features.

A little later, Rakudo gained support to run on the Java Virtual machine. There are still some bits missing, mostly notably support for the native call interface, but all in all it works quite well, passes more than 99.9% of the tests that Rakudo on Parrot passes, and has two key advantages: it is much faster at run time, and has proper concurrency/parallelism support.

Jonathan Worthington prototyped and implemented it, and later specified it in S17, which again led to lots of improvements. Stay tuned for more advent calendar posts on the JVM and concurrency/parallelism topics.

Another big news this year was the revelation of MoarVM, a virtual machine designed to run Perl 6. With the JVM's high startup time and Parrot being mostly unmaintained and having lots of unsolved problems, there is a niche to be filled. NQP, the "Not Quite Perl" Perl 6 compiler used to bootstrap Rakudo already runs on MoarVM; Rakudo support for MoarVM is on its way, and progressing well so far.

There was also lots of progress in terms of built-in types likes Set and Bag, and IO::Path for handling path and directory objects.

As a developer and early adopter, I find Perl 6 to be pleasant to work with. In 2013 it has gotten easier to use, due to better error reporting and improved IO.

Perl 6 Advent Calendar 2013: Table of Contents

November 30, 2013 by

This post serves as a table of contents for the 2013 Perl 6 advent calendar. Links to new posts will appear here during the course of this month.

See also: table of contents for 2012, 2011, 2010, 2009.

Day 24 – An Advent Calendar

December 24, 2012 by

Recently I was unpacking some boxes of books and came across a book entitled "BASIC Computer Programs for the Home" by Charles D. Sternberg. Apparently my father had purchased this book in the early 1980s and had given it to me. In any case, my name was scrawled in the front cover in the manner an adolescent me would have done.

Mostly this book is filled with simple BASIC programs that manage simple databases of various things: recipes, household budget, address book, music collections, book collections, school grades, etc. But the program that caught my eye and made me think of the Perl 6 Advent Calendar was one for printing a calendar starting at a particular month.

Now, the version in this book is a little simple in that it asks for the starting month, year, the day of the week that the first month starts on, and how many months to print. I wanted something a little more like the Unix utility cal(1) program. Luckily, Perl 6 has date handling classes as part of the specification and both major implemenations, Rakudo and Niecza, have actual implementations of these which should make creating the calendar quite easy.

For reference, the output of the Unix cal(1) utility looks like this:

       December 2012
    Su Mo Tu We Th Fr Sa
                       1
     2  3  4  5  6  7  8
     9 10 11 12 13 14 15
    16 17 18 19 20 21 22
    23 24 25 26 27 28 29
    30 31                 

It also has some options to change the output in various ways, but I just want to focus on reproducing the above basic output.

I'll need a list of month names and weekday abbreviations:

    constant @months = <January February March April May June July
                        August September October November December>;
    constant @days = <Su Mo Tu We Th Fr Sa>;

And it looks like the month and year are centered above the days of the week. Generating a calendar for May shows this to be the case, so I'll need a routine that centers text:

    sub center(Str $text, Int $width) {
        my $prefix = ' ' x ($width - $text.chars) div 2;
        my $suffix = ' ' x $width - $text.chars - $prefix.chars;
        return $prefix ~ $text ~ $suffix;
    }

Now, the mainline code needs two things: a month and a year. From this it should be able to generate an appropriate calendar. But, we should have a reasonable default for these values I think. Today's month and year seem reasonable to me:

    sub MAIN(:$year = Date.today.year, :$month = Date.today.month) {

but if it's not today's month and year, then it's some arbitrary month and year we need info about. To do this we construct a new Date object from the month and year given.

        my $dt = Date.new(:year($year), :month($month), :day(1) );

Looking at the calendar generated for December, it seems like we may actually output up to 6 rows of numbers since the month can start and end on a partial week. In order to implement this, I think I'll need some "slots" for each day. Each slot will either be empty or will contain the day of the month. The number of empty slots at the beginning of the month correspond to the day of the week that the first of the month occurs on. If the first is on Sunday, there will be 0 empty slots, if the first is on a Monday there will be 1 empty slot, if the first is on a Tuesday, there will be 2 empty slots, etc. This is remarkably similar to the number we get when we interrogate a Date object for the day of the week. The only wrinkle is that it returns 7 for Sunday when we actually need a 0. That's easily remedied with a modulus operator however:

        my $dt = Date.new(:year($year), :month($month), :day(1) );
        my $ss = $dt.day-of-week % 7;
        my @slots = ''.fmt("%2s") xx $ss;

That gives us the empty slots at the beginning, but what about the ones that actually contain the days of the month? Easy enough, we'll just generate a number for each day of the month using the Date object we created earlier.

        my $days-in-month = $dt.days-in-month;
        for $ss ..^ $ss + $days-in-month {
            @slots[$_] = $dt.day.fmt("%2d");
            $dt++
        }

Now we've got an array with appropriate values in the appropriate positions, all that's left is to actually output the calendar. Using the header line for our weekdays as a metric for the width of the calendar, and the routine we created for centering text, we can output the header portion of the calendar:

        my $weekdays = @days.fmt("%2s").join: " ";
        say center(@months[$month-1] ~ " " ~ $year, $weekdays.chars);
        say $weekdays;

Then we iterate over each slot and output the appropriate values. If we've reached the end of the week or the end of the month, we output a newline:

        for @slots.kv -> $k, $v {
            print "$v ";
            print "\n" if ($k+1) %% 7 or $v == $days-in-month;
        }

Putting it all together, here is the final program:

    #!/usr/bin/env perl6

    constant @months = <January February March April May June July
                        August September October November December>;
    constant @days = <Su Mo Tu We Th Fr Sa>;


    sub center(Str $text, Int $width) {
        my $prefix = ' ' x ($width - $text.chars) div 2;
        my $suffix = ' ' x $width - $text.chars - $prefix.chars;
        return $prefix ~ $text ~ $suffix;
    }

    sub MAIN(:$year = Date.today.year, :$month = Date.today.month) {
        my $dt = Date.new(:year($year), :month($month), :day(1) );
        my $ss = $dt.day-of-week % 7;
        my @slots = ''.fmt("%2s") xx $ss;

        my $days-in-month = $dt.days-in-month;
        for $ss ..^ $ss + $days-in-month {
            @slots[$_] = $dt.day.fmt("%2d");
            $dt++
        }

        my $weekdays = @days.fmt("%2s").join: " ";
        say center(@months[$month-1] ~ " " ~ $year, $weekdays.chars);
        say $weekdays;
        for @slots.kv -> $k, $v {
            print "$v ";
            print "\n" if ($k+1) %% 7 or $v == $days-in-month;
        }
    }

Normally, cal(1) will highlight today's date on the calendar. That's a feature I left out of my calendar implementation but it could easily be added with Term::ANSIColor. Also, there's a little bit of coupling between the data being generated in the slots and the output processing (the slots are all formatted to be 2 characters wide in anticipation of the output). There are some other improvements that could be done, but for a first cut at a calendar in Perl 6, I'm happy. :-)

Day 23 – Macros

December 23, 2012 by

Syntactic macros. The Lisp gods of yore provided humanity with this invention, essentially making Lisp a programmable programming language. Lisp adherents often look at the rest of the programming world with pity, seeing them fighting to invent wheels that were wrought and polished back in the sixties when giants walked the Earth and people wrote code in all-caps.

And the Lisp adherents see that the rest of us haven’t even gotten to the best part yet, the part with syntactic macros. We’re starting to get the hang of automatic memory management, continuations, and useful first-class functions. But macros are still absent from this picture.

In part, this is because in order to have proper syntactic macros, you basically have to look like Lisp. You know, with the parentheses and all. Lisp ends up having almost no syntax at all, making every program a very close representation of a syntax tree. Which really helps when you have macros starting to manipulate those same trees. Other languages, not really wanting to look like Lisp, find it difficult-to-impossible to pull off the same trick.

The Perl languages love the difficult-to-impossible. Perl programmers publish half a dozen difficult-to-impossible solutions to CPAN before breakfast. And, because Perl 6 is awesome and syntactic macros are awesome, Perl 6 has syntactic macros.

It is known, Khaleesi.

What are macros?

For reasons even I don’t fully understand, I’ve put myself in charge of implementing syntactic macros in Rakudo. Implementing macros means understanding them. Understanding them means my brain melts regularly. Unless it fries. It’s about 50-50.

I have this habit where I come into the #perl6 channel, and exclaiming “macros are just X!” for various values of X. Here are some samples:

  • Macros are just syntax tree manipulators.
  • Macros are just “little compilers”.
  • Macros are just a kind of templates.
  • Macros are just routines that do code substitution.
  • Macros allow you to safely hand values back and forth between the compile-time world and the runtime world.

But the definition that I finally found that I like best of all comes from scalamacros.org:

Macros are functions that are called by the compiler during compilation. Within these functions the programmer has access to compiler APIs. For example, it is possible to generate, analyze and typecheck code.

While we only cover the “generate” part of it yet in Perl 6, there’s every expectation we’ll be getting to the “analyze and typecheck” parts as well.

Some examples, please?

Coming right up.

macro checkpoint {
  my $i = ++(state $n);
  quasi { say "CHECKPOINT $i"; }
}

checkpoint;
for ^5 { checkpoint; }
checkpoint;

The quasi block is Perl 6′s way of saying “a piece of code, coming right up!”. You just put your code in the quasi block, and return it from the macro routine.

This code inserts “checkpoints” in our code, like little debugging messages. There’s only three checkpoints in the code, so the output we’ll get looks like this:

CHECKPOINT 1
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 3

Note that the “code insertion” happens at compile time. That’s why we get five copies of the CHECKPOINT 2 line, because it’s the same checkpoint running five times. If we had had a subroutine instead:

sub checkpoint {
  my $i = ++(state $n);
  say "CHECKPOINT $i";
}

Then the program would print 7 distinct checkpoints.

CHECKPOINT 1
CHECKPOINT 2
CHECKPOINT 3
CHECKPOINT 4
CHECKPOINT 5
CHECKPOINT 6
CHECKPOINT 7

As a more practical example, let’s say you have logging output in your program, but you want to be able to switch it off completely. The problem with an ordinary logging subroutine is that with something like:

LOG "The answer is { time-consuming-computation() }";

The time-consuming-computation() will run and take a lot of time even if LOG subsequently finds that logging was turned off. (That’s just how argument evaluation works in a non-lazy language.)

A macro fixes this:

constant LOGGING = True;

macro LOG($message) {
  if LOGGING {
    quasi { say {{{$message}}} };
  }
}

Here we see a new feature: the {{{ }}} triple-block. (Syntax is likely to change in the near future, see below.) It’s our way to mix template code in the quasi block with code coming in from other places. Doing say $message; would have been wrong, because $message is a syntax tree of the message to be logged. We need to inject that syntax tree right into the quasi, and we do that with a triple-block.

The macro conditionally generates logging code in your program. If the constant LOGGING is switched on, the appropriate logging code will replace each LOG macro invocation. If LOGGING is off, each macro invocation will be replaced by literally nothing.

Experience shows that running no code at all is very efficient.

What are syntactic macros?

A lot of things are called “macros” in this world. In programming languages, there are two big categories:

  • Textual macros. They substitute code on the level of the source code text. C’s macros, or Perl 5′s source filters, are examples.
  • Syntactic macros. They substitute code on the level of the source code syntax tree. Lisp macros are an example.

Textual macros are very powerful, but they represent the kind of power that is just as likely to shoot half your leg off as it is to get you to your destination. Using them requires great care, of the same kind needed for a janitor gig at Jurassic Park.

The problem is that textual macros don’t compose all that well. Bring in more than one of them to work on the same bit of source code, and… all bets are off. This puts severe limits on modularity. Textual macros, being what they are, leak internal details all over the place. This is the big lesson from Perl 5′s source filters, as far as I understand.

Syntactic macros compose wonderfully. The compiler is already a pipeline handing off syntax trees between various processing steps, and syntactic macros are simply more such steps. It’s as if you and the compiler were two children, with the compiler going “Hey, you want to play in my sandbox? Jump right in. Here’s a shovel. We’ve got work to do.” A macro is a shovel.

And syntactic macros allow us to be hygienic, meaning that code in the macro and code outside of the macro don’t step on each other’s toes. In practice, this is done by carefully keeping track of the macros context and the mainline’s context, and making sure wires don’t cross. This is necessary for safe and large-scale composition. Textual macros don’t give us this option at all.

Future

Both of the examples in this post work already in Rakudo. But it might also be useful to know where we’re heading with macros in the next year or so. The list is in the approximate order I expect to tackle things.

  • Un-hygiene. While hygienic macros are the sane and preferable default, sometimes you want to step on the toes of the mainline code. There should be an opt-out, and escape hatch. This is next up.
  • Introspection. In order to analyze and typecheck code, not just generate it, we need to be able to take syntax trees coming in as macro arguments, and look inside of them. There are no tools for that yet, and there’s no spec to guide us here. But I’m fairly sure people will want this. The trick is to come up with something that doesn’t tie us down to one compiler’s internal syntax-tree format. Both for the sake of compiler interoperability and future compatibility.
  • Deferred declarations. The sandbox analogy isn’t so frivolous, really. If you declare a class inside a quasi block, that declaration is limited (“sandboxed”) to within that quasi block. Then, when the code is injected somewhere in the mainline because of a macro invocation, it should actually run. Fortunately, as it happens, the Rakudo internals are factored in such a way that this will be fairly straightforward to implement.
  • Better syntax. The triple-block syntax is probably going away in favor of something better. The problem isn’t the syntax so much as the fact that it currently only works for terms. We want it to work for basically all syntactic categories. A solid proposal for this is yet to materialize, though.

With each of these steps, I expect us to find new and fruitful uses of macros. Knowing my fellow Perl 6 developers, we’ll probably find some uses that will shock and disgust us all, too.

In conclusion

Perl 6 is awesome because it puts you, the programmer, in the driver seat. Macros are simply more of that.

Implementing macros makes your brain melt. However, using them is relatively straightforward.

Day 22 – Parsing an IPv4 address

December 22, 2012 by

Guest post by Herbert Breunung (lichtkind).

Perl 5 brought regexes to mainstream programming and set a standard, one that is felt as relevant even in Redmond. Perl 6, of course, steps up the game by adding many new features to the regex camp, including easy-to-build grammars for your own complex parsers. But without getting too complex, you can get a lot of joy out of Perl 6′s rx (that’s how Perl 6 spells Perl 5′s qr operator, that enables you to save a Regex in a variable).

Because the Perl 6 regex syntax is less littered with exceptional cases, Larry Wall also likes to joke that he put the “regular” back into “regular expression”.

Some of the changes are:

  • most special variables are gone,
  • non-capturing groups and other grouping syntax is easier to type,
  • no more single/multi line modes,
  • x mode became default, making whitespace non-significant by default.

In summary, regexes are more regular than in Perl 5, confirming Larry’s joke. They try a bit harder to make your life easier when you need to match text. Under the hood, regexes have blossomed out into a complete sub-language within the bigger Perl 6 language. A language with its own parsing rules.

But don’t fret; not everything has changed. Some things remain the same:

/\d+/

This regex still matches one or more consecutive digits.

Similarly, if you want to capture the digits, you can do this, just like you’re used to:

/(\d+)/

You’ll find the matched digits in $0, not $1 as in Perl 5. All the special variables $0, $1, $2 are really syntactic sugar for indexing the match variable ($/[0], $/[1], $/[2]). Because indices start at 0, it makes sense for the first matched group to be $0. In Perl 5, $0 contains the name of the script or program, but this has been renamed into $*EXECUTABLE_NAME in Perl 6.

Should you be interested in getting all of the captured groups of a regex match, you can use @(), which is syntactic sugar for @($/).

The object in the $/ variable holds lots of useful information about the last match. For example, $/.from will give you the starting string position of the match.

But $0 will get us far enough for this post. We use it to extract individual features from a string.

Sometimes we want to extract a whole bunch of similar things at once. Then we can use the :g (or :global) modifier on the regex:

$_ = '1 23 456 78.9';
say .Str for m:g/(\d+)/; # 1 23 456 78 9

Note that the :g — as opposed to prior regex implementations — sits up front, right at the start of the regex. Not at the end. That way, when you read the regex from left to right, you will know from the start how the regex is doing its matching. No more end-heavy regex expressions.

Matching “all things that look like this” is so useful, that there’s even a dedicated method for that, .comb:

$str.comb(/\d+/);

If you’re familiar with .split, you can think of .comb as its cheerful cousin, matching all the things that .split discards.

Let’s tackle the matching of an IPv4 address. Coming from a Perl 5 angle, we expect to have to do something like this:

/(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/

This won’t do in Perl 6, though. First of all, the {} blocks are real blocks in a Perl 6 regex; they contain Perl 6 code. Second, because Perl 6 has lots of error handling to catch p5isms, like this, you’ll get an error saying “Unsupported use of {N,M} as general quantifier; in Perl 6 please use ** N..M (or ** N..*)”.

So let’s do that. To match between one and three digits in a Perl 6 regex, we should type:

/\d ** 1..3/

Note how the regex sublanguage re-uses parts from the main Perl 6 language. ** can be seen as a kind of exponentiation (if we squint), in that we’re taking \d “to the between-first-and-third power”. And the range notation 1..3 exists both outside and within regexes.

Using our new knowledge about the repetition quantifier, we end up with something like this:

/(\d**1..3) \. (\d**1..3) \. (\d**1..3) \. (\d**1..3)/

That’s still kinda clunky. We might end up wishing that we could use the repetition operator again, but those literal dots in between prevent us from doing that. If only we could specify repetition a given number of times and a divider.

In Perl 6 regexes, you can.

/ (\d ** 1..3) ** 4 % '.' /

The % operator here is a quantifier modifier, so it can only follow on a quantifier like * or + or **. The choice of % for this function is relatively new in Perl 6, and you may prefer to read it as “modulo”, just like in the main language. That is, “match four groups of digits, modulo literal dots in between”. Or you could think of the dots in between as the “remainder”, the separators that are left after you’ve parsed the actual elements.

Oh, and you might’ve noticed that \. changed to '.' on the way. We can use either; they mean exactly the same. In Perl 5, there isn’t a simple rule saying which symbols have a magic meaning and which ones simply signify themselves. In Perl 6, it’s easy: word characters (alphanumerics and the underscore) always signify themselves. Everything else has to be escaped or quoted to get its literal meaning.

Putting it all together, here’s how we would extract IPv4 addresses out of a string:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";

say .Str for m:g/ (\d ** 1..3) ** 4 % '.' /;
# output: 127.0.0.1 173.194.32.32

Or, we could use .comb:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";
my @ip4addrs = .comb(/ (\d ** 1..3) ** 4 % '.' /);

If we’re interested in individual integers, we can get those too:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";
say .list>>.Str.perl for m:g/ (\d ** 1..3) ** 4 % '.' /;
# output: ("127", "0", "0", "1") ("173", "194", "32", "32")

If you want to know more, read the S05, or watch me battling with my slide deck and the English language in this presentation about regexes.

Day 21 – Collatz Variations

December 21, 2012 by

The Collatz sequence is one of those interesting “simple” math problems that I’ve run into a number of times. Most recently a blog post on programming it in Racket showed up on Hacker News. As happens so often, I instantly wanted to implement it in Perl 6.

sub collatz-sequence(Int $start) { 
    $start, { when * %% 2 { $_ / 2 }; when * !%% 2 { 3 * $_ + 1 }; } ... 1;
}

sub MAIN(Int $min, Int $max) {
    say [max] ($min..$max).map({ +collatz-sequence($_) });        
}

This is a very straightforward implementation of the Racket post’s max-cycle-length-range as a stand-alone p6 script. collatz-sequence generates the sequence using the p6 sequence operator. Start with the given number. If it is divisible by two, do so: when * %% 2 { $_ / 2 }. If it is not, multiply by three and add 1: when * !%% 2 { 3 * $_ + 1 }. Repeat this until the sequence reaches 1.

MAIN(Int $min, Int $max) sets up our main function to take two integers. Many times I don’t bother with argument types in p6, but this provides a nice feedback for users:

> perl6 collatz.pl blue red
Usage:
  collatz.pl <min> <max> 

The core of it just maps the numbers from $min to $max (inclusive) to the length of the sequence (+collatz-sequence) and then says the max of the resulting list ([max]).

Personally I’m a big fan of using the sequence operator for tasks like this; it directly represents the algorithm constructing the Collatz sequence in a simple and elegant fashion. On the other hand, you should be able to memoize the recursive version for a speed increase. Maybe that would give it an edge over the sequence operator version?

Well, I was wildly wrong about that.

sub collatz-length($start) {
    given $start {
        when 1       { 1 }
        when * !%% 2 { 1 + collatz-length(3 * $_ + 1) } 
        when * %% 2  { 1 + collatz-length($_ / 2) } 
    }
}

sub MAIN($min, $max) {
    say [max] ($min..$max).map({ collatz-length($_) });        
}

This recursive version, which makes no attempt whatsoever to be efficient, is actually better than twice as fast as the sequence operator version. In retrospect, this makes perfect sense: I was worried about the recursive version making a function call for every iteration, but the sequence version has to make two, one to calculate the next iteration and the other to check and see if the ending condition has been reached.

Well, once I’d gotten this far, I thought I’d better do things correctly. I wrote two framing scripts, one for timing all the available scripts, the other for testing them to make sure they work!

my @numbers = 1..200, 10000..10200;

sub MAIN(Str $perl6, *@scripts) {
    my %results;
    for @scripts -> $script {
        my $start = now;
        qqx/$perl6 $script { @numbers }/;
        my $end = now;

        %results{$script} = $end - $start;
    }

    for %results.pairs.sort(*.value) -> (:key($script), :value($time)) {
        say "$script: $time seconds";
    }
}

This script takes as an argument a string that can be used to call a Perl 6 executable and a list of scripts to run. It runs the scripts using the specified executable, and times them using p6′s now function. It then sorts the results into order and prints them. (A similar script I won’t post here tests each of them to make sure they are returning correct results.)

In the new framework, the Collatz script has changed a bit. Instead of taking a min and a max value and finding the longest Collatz sequence generated by a number in that range, it takes a series of numbers and generates and reports the length of the sequence for each of them. Here’s the sequence operator script in its full new version:

sub collatz-length(Int $start) { 
    +($start, { when * %% 2 { $_ / 2 }; when * !%% 2 { 3 * $_ + 1 }; } ... 1);
}

sub MAIN(*@numbers) {
    for @numbers -> $n {
        say "$n: " ~ collatz-length($n.Int);
    }
}

For the rest of the scripts I will skip the MAIN sub, which is exactly the same in each of them.

Framework established, I redid the recursive version starting from the new sequence operator code.

sub collatz-length(Int $n) {
    given $n {
        when 1       { 1 }
        when * %% 2  { 1 + collatz-length($_ div 2) }
        when * !%% 2 { 1 + collatz-length(3 * $_ + 1) }
    } 
}

The sharp-eyed will notice this version is different from the first recursive version above in two significant ways. This time I made the argument Int $n, which instantly turned up a bit of a bug in all implementations thus far: because I used $_ / 2, most of the numbers in the sequence were actually rationals, not integers! This shouldn’t change the results, but is probably less efficient than using Ints. Thus the second difference about, it now uses $_ div 2 to divide by 2. This version remains a great improvement over the sequence operator version, running in 4.7 seconds instead of 13.3. Changing when * !%% 2 to a simple default shaves another .3 seconds off the running time.

Once I started wondering how much time was getting eaten up by the when statements, rewriting that bit using the ternary operator was an obvious choice.

sub collatz-length(Int $start) { 
    +($start, { $_ %% 2 ?? $_ div 2 !! 3 * $_ + 1 } ... 1);
}

Timing results: Basic sequence 13.4 seconds. Sequence with div 11.5 seconds. Sequence with div and ternary 9.7 seconds.

That made me wonder what kind of performance I could get from a handcoded loop.

sub collatz-length(Int $n is copy) {
    my $length = 1;
    while $n != 1 {
        $n = $n %% 2 ?? $n div 2 !! 3 * $n + 1;
        $length++;
    }
    $length;
}

That’s by far the least elegant of these, I think, but it gets great performance: 3 seconds.

Switching back to the recursive approach, how about using the ternary operator there?

sub collatz-length(Int $n) {
    return 1 if $n == 1;
    1 + ($n %% 2 ?? collatz-length($n div 2) !! collatz-length(3 * $n + 1));
}

This one just edges out the handcoded loop, 2.9 seconds.

Can we do better than that? How about memoization? is cached is supposed to be part of Perl 6; neither implementation has it yet, but last year’s Advent calendar has a Rakudo implementation that still works. Using the last version changed to sub collatz-length(Int $n) is cached { works nicely, but takes 3.4 seconds to execute. Apparently the overhead of caching slows it down a bit. Interestingly, the non-ternary recursive version does speed up with is cached, from 4.4 seconds to 3.6 seconds.

Okay, instead of using a generic memoization, how about hand-coding one?

sub collatz-length(Int $n) {
    return 1 if $n == 1;
    state %lengths;
    return %lengths{$n} if %lengths{$n}:exists;
    %lengths{$n} = 1 + ($n %% 2 ?? collatz-length($n div 2) !! collatz-length(3 * $n + 1));
}

Bingo! 2.7 seconds.

I’m sure there are lots of other interesting approaches for solving this problem, and encourage people to send them in. In the meantime, here’s my summary of results so far:

Script Rakudo Niecza
bin/collatz-recursive-ternary-hand-cached.pl 2.5 1.7
bin/collatz-recursive-ternary.pl 3 1.7
bin/collatz-loop.pl 3.1 1.7
bin/collatz-recursive-ternary-cached.pl 3.2 N/A
bin/collatz-recursive-default-cached.pl 3.5 N/A
bin/collatz-recursive-default.pl 4.4 1.8
bin/collatz-recursive.pl 4.9 1.9
bin/collatz-sequence-ternary.pl 9.9 3.3
bin/collatz-sequence-div.pl 11.6 3.5
bin/collatz-sequence.pl 13.5 3.8

The table was generated from timing-table-generator.pl.

Day 20 – Dynamic variables and DSL-y things

December 20, 2012 by

Today, let’s talk about DSLs.

Post from the past: a motivating example

Two years ago I wrote a blog post about Nim, a game played with piles of stones. I just put in ASCII diagrams of the actual Nim stone piles, telling myself that if I had time, I would put in fancy SVG diagrams, generated with Perl 6.

Naturally, I didn’t have time. My self-imposed deadline ran out, and I published the post with simple ASCII diagrams.

But time is ever-regenerative, and there for people who want it. So, let’s generate some fancy SVG diagrams with Perl 6.

Have bit array, want SVG

What do we need, exactly? Well, a subroutine that takes an array of piles as input and generates an SVG file would be a really good start.

Let’s take the last “image” in the post as an example:

3      OO O
4 OOOO
5 OOOO    O

For the moment, let’s ignore the numbers at the left margin; they’re just counting stones. We summarize the piles themselves as a kind of bitmap, which also forms the input to the function:

my @piles =
    [0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 0, 1, 1, 0, 1],
    [1, 1, 1, 1, 0, 0, 0, 0, 1];

nim-svg(@piles);

At this point, we need only create the nim-svg function itself, and make it render SVG from this bitmap. Since I’ve long since tired of outputting SVG by hand, I use the SVG module, which comes bundled with Rakudo Star.

use SVG;

sub nim-svg(@piles) {
    my $width = max map *.elems, @piles;
    my $height = @piles.elems;

    my @elements = gather for @piles.kv -> $row, @pile {
        for @pile.kv -> $column, $is_filled {
            if $is_filled {
                take 'circle' => [
                    :cx($column + 0.5),
                    :cy($row + 0.5),
                    :r(0.4)
                ];
            }
        }
    }
    
    say SVG.serialize('svg' => [ :$width, :$height, @elements ]);
}

I think you can follow the logic in there. The subroutine simply iterates over the bitmap, turning 1s into circles with appropriate coordinates.

That’s it?

Well, this will indeed generate an SVG image for us, with the stones correctly placed. But let’s look again at the input that helped create this image:

    [0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 0, 1, 1, 0, 1],
    [1, 1, 1, 1, 0, 0, 0, 0, 1];

Clearly, though we can discern the stones and gaps in there if we squint in a bit-aware programmer’s fashion, the input isn’t… visually attractive. (The zeroes even look like stones, even though they’re gaps!)

We can do better

Instead of using a bit array, let’s start from the desired SVG image and try to make the input look like that.

So, this is what I would prefer to write instead of a bitmask:

nim {
  _ _ _ _ _ _ _ _ o;
  o o o o _ o o _ o;
  o o o o _ _ _ _ o;
}

That’s better. That looks more like my original ASCII diagram, while still being syntactic Perl 6 code.

Making a DSL

Wikipedia talks about a DSL as a language “dedicated to a particular problem domain”. Well, the above way of specifying the input would be a DSL dedicated to solving the draw-SVG-images-of-Nim-positions domain. (Admittedly a fairly narrow domain. But I’m mostly out to show the potential of DSLs in Perl 6, not to change the world with this particular DSL.)

Now that we have the desired end state, how do we connect the wires and make the above work? Clearly we need to declare three subroutines: nim, _, o. (Yes, you can name a subroutine _, no sweat.)

sub nim(&block) {
    my @*piles;
    my @*current-pile;

    &block();
    finish-last-pile();
    
    nim-svg(@*piles);
}

sub _(@rest?) {
    unless @rest {
        finish-last-pile();
    }
    @*current-pile = 0, @rest;
    return @*current-pile;
}

sub o(@rest?) {
    unless @rest {
        finish-last-pile();
    }
    @*current-pile = 1, @rest;
    return @*current-pile;
}

Ok… explain, please?

A couple of things are going on here.

  • The two variables @*piles and @*current-pile are dynamic variables which means that they are visible not just in the current lexical scope, but also in all subroutines called before the current scope has finished. Notably, the two subroutines _ and o.
  • The two subroutines _ and o take an optional parameter. On each row, the rightmost _ or o acts as a silent “start of pile” marker, taking the time to do a bit of bookkeeping with the piles, storing away the last pile and starting on a new one.
  • Each row in the DSL-y input basically forms a chain of subroutine calls. We take this into account by both incrementally building the @*current-pile array at each step, all the while returning it as (possible) input for the next subroutine call in the chain.

And that’s it. Oh yeah, we need the bookkeeping routine finish-last-pile, too:

sub finish-last-pile() {
    if @*current-pile {
        push @*piles, [@*current-pile];
    }
    @*current-pile = ();
}

So, it works?

Now, the whole thing works. We can turn this DSL-y input:

nim {
  _ _ _ _ _ _ _ _ o;
  o o o o _ o o _ o;
  o o o o _ _ _ _ o;
}

…into this SVG output:

<svg
  xmlns="http://www.w3.org/2000/svg"
  xmlns:svg="http://www.w3.org/2000/svg"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  width="9" height="3">

  <circle cx="8.5" cy="0.5" r="0.4" />
  <circle cx="0.5" cy="1.5" r="0.4" />
  <circle cx="1.5" cy="1.5" r="0.4" />
  <circle cx="2.5" cy="1.5" r="0.4" />
  <circle cx="3.5" cy="1.5" r="0.4" />
  <circle cx="5.5" cy="1.5" r="0.4" />
  <circle cx="6.5" cy="1.5" r="0.4" />
  <circle cx="8.5" cy="1.5" r="0.4" />
  <circle cx="0.5" cy="2.5" r="0.4" />
  <circle cx="1.5" cy="2.5" r="0.4" />
  <circle cx="2.5" cy="2.5" r="0.4" />
  <circle cx="3.5" cy="2.5" r="0.4" />
  <circle cx="8.5" cy="2.5" r="0.4" />
</svg>

Yay!

Summary

The principles I used in this post are fairly easy to generalize. Start from your desired DSL, and create the subroutines to make it happen. Have dynamic variables handle the communication between separate subroutines.

DSLs are nice because they allow us to shape the code we’re writing around the problem we’re solving. Using relatively little “adapter code”, we’re left to focus on describing and solving problems in a natural way, making the programming language rise to our needs instead of lowering ourselves down to its needs.


Follow

Get every new post delivered to your Inbox.

Join 37 other followers