Day 3 – cap your junctions

“With great power comes great responsibility.” These days I muse on how this applies not just to movie superheroes, but to programming language features as well.

It happens all over the place. Database connection? Awesome. Well, until you become the victim of an SQL injection or a SELECT N+1 problem. Regular expressions? Also awesome… until someone uses it to parse HTML and gets a visitation from elder non-beings from beyond the fabric of reality. Ouch.

Of course, we still like our powerful features. But we need to learn to contain them, to harness them. The awesome should work for us and the purpose we have set it to. It should not work for attackers, elder non-beings, or Murphy. We need to make sure the awesome doesn’t leak beyond its designated boundaries.

In 2009, Matthew Walton blogged about junctions. His explanation of what makes junctions exciting is great, and I agree with it. Today I would simply like to add the following point: junctions are so awesome that they should be contained.

Let’s demonstrate the problem real quickly.

> my @list = 3..7
3 4 5 6 7
> my $answer = all(@list) > 0;

Quick, what value ends up in $answer? Well, all the values in @list are
indeed greater than 0, so…

all(True, True, True, True, True)

Aw, dang. Not again.

Just to be clear, this is by spec, and correctly implemented and everything. But unless you ask specifically for Perl 6 to collapse the junction, it just keeps on going. Keeps on going, like a juggernaut, through walls and oceans, propagating through your program whether you want it to or not.

What you have to do then is to collapse the junction, putting a cap on all that awesome. The junction is useful as long as you’re doing the junctive computation itself (in this case all(@list) > 0), but immediately after that, we’ll want to collapse back into the normal, sane, non-junctive world:

> ?$answer      # symbolic prefix
True
> so $answer    # listop prefix
True
> $answer.Bool  # coerce method
True

Moritz points out that the whole “collapsing” metaphor comes from quantum physics, where something collapses as it’s measured. The analogy holds up; after we’re done junctioning around with our “quantum” superposition, we want to get a single boolean value back out. We do that by “measuring” — boolifying — the junction.

I will hasten to add that the most common use of junctions doesn’t suffer from this problem, namely junctions in if statements:

if all(@list) > 0 {
...
}

The if statement itself provides a natural cap to the junction. It does its own boolification of it under the hood, but more importantly, the junction value doesn’t stick around. We only see the effects of the boolean value of the junction.

No, the real risk comes when you’re a library writer, and you provide routines that happen to compute things using junctions:

sub all-positive(@list) {
    all(@list) > 0;     # DANGER
}

That’s where you want to remember the old tired adage. With great power… comes… right, that’s right. So you cap your junction before you let it run out into the wild and do untold structural damage to people, buildings, cattle, and those cute fluffy dogs that would otherwise suffer Puppy Death By Junction.

sub all-positive(@list) {
    so all(@list) > 0;
}

Yes. That’s better.

If you declare your return type, you can get a runtime error for leaving out the `so`.

sub all-positive(@list --> Bool) {
    all(@list) > 0;
}

say all-positive([1, 2, 3]); # Type check failed for return value;
                             # expected 'Bool' but got 'Junction'

Maybe that’s an argument for declaring return types for your routines. And maybe some day we’ll catch that one at compile-time, too.

Cap your junctions as early as possible.
— masak’s Rule of Responsible Junction Use

The longest I can remember legitimately keeping junctions around in an uncollapsed state was when storing junctions of regexes in a hash as part of this password-analyzing script. Even that’s not an exception to the above rule, though — note that the first thing that happens to the junctions in the subsequent code is that they become the rhs in a smartmatch statement, and then get capped by an if statement. So even here, they don’t escape out into the wild.

Code responsibly. Cap your junctions.

Day 22 – A catalogue of operator types

Perl 6 has a very healthy relationship to operators. "An operator is just a funnily-named subroutine", goes the slogan in the community.

In practice, it means you can do this:

sub postfix:<!>($N) {
    [*] 2..$N;
}
say 6!;    # 720

Yeah, that’s the prototypical factorial example. I promise I won’t do it any more in this post. I swear the only reason we don’t have factorial as a standard operator in the language, is so that we can impress people by defining it.

Anyway, so a postfix operator ! is really just a "funnily-named" subroutine postfix:<!>. Similarly, we have prefix and infix operators, like prefix:<-> and infix:<*>. They’re all just subroutines. I wrote about that before, so I’m not going to hammer on that point.

Operators have different precedence (like infix:<*> binds tighter than infix:<+>)…

$x + $y * $z   # compiler sees $x + ($y * $z)
$x * $y + $z   # compiler sees ($x * $y) + $z

…and different associativity (like infix:</> associates to the left but infix:<=> associates to the right).

$x / $y / $z   # compiler sees ($x / $y) / $z
$x = $y = $z   # compiler sees $x = ($y = $z)

But I wrote about that before too, at quite some length, so I’m not going to revisit that topic.

No, today I just want to talk about the operator categories in general. I think Perl 6 does a nice job of describing the operator types themselves. I don’t see that in many other languages.

Here are all the different types:

 type            position           syntax
======          ==========         ========

prefix          before a term        !X
infix           between two terms    X ! Y
postfix         after a term         X!

circumfix       around               [X]
postcircumfix   after & around       X[Y]

A lot of other languages give you the ability to define your own operators. Many of them provide approaches which are hackish afterthoughts, and only allow you to override existing operators. Some other languages do approach the problem head-on, but end up simplifying the language to decrease the complexity of defining new operators.

Perl 6 approaches, and solves, the problem, head-on. You get all of the above operators, you can refine or override old ones, and you can talk within the language about things like precedence and associativity.

All that is rather nice. But, as usual, Perl 6 goes one step further and starts classifying metaoperators, too.

What’s a metaoperator? That’s our name for when you can extend an operator in a certain way. For example, many languages allow some kind of not-equal operator like infix:<!=>. Perl 6 has another one for string non-equality: infix:<ne>. And then maybe you want to do smart non-matching: infix:<!~~>. And so on — pretty soon you catch on to the pattern: you may, at one point or another, want to invert the result of any boolean matcher. So that’s what Perl 6 provides.

Here are a few examples:

normal op     negated op
=========     ==========
eq            !eq     ( synonym of ne )
~~            !~~
<             !<      ( synonym of >= )
before        !before

Because this particular metaoperator attaches itself before an infix operator, it gets the name infix_prefix_meta_operator:<!>. I was going to say "and yes, you can add your own user-defined metaoperators too", but currently no implementation supports that. Maybe sometime in the future.

There are many other categories of metaoperators. For example the "multiplication reducer" [*] that we used in the factorial example at the top (which takes a list of numbers and multiplies them all together) is really an infix:<*> surrounded by the metaop prefix_circumfix_meta_operator:sym<[ ]>. (It’s "prefix" because it goes before the list of numbers.)

Luckily, we don’t have meta-meta-ops. Already thinking about the metaops is quite a challenge sometimes. But what I like about Perl 6 and the approach it takes to grammar and operators is this: someone did think about it, long and hard. The resulting system has a pleasing simplicity to it.

Perl 6 is very well-suited for building parsers for languages. As one of its neatest tricks, it takes this expressive power and directs it towards the task of defining itself. As a Perl 6 user, you’re given not just an excellent toolbox, but a well-organized workshop to adapt and extend to fit your needs.

Day 15 – Numbers and ways of writing them

Consider the humble integer.

my $number-of-dwarfs = 7;
say $number-of-dwarfs.WHAT;   # (Int)

Integers are great. (Well, many of them are.) Computers basically run on integers. Also, God made the integers, and everything else is basically cheap counterfeit knock-offs, smuggled into the Platonic realm when no-one’s looking. If they’re so important, we’d expect to have lots of good ways of writing them in any respectable programming language.

Do we have lots of ways of writing integers in Perl 6? We do. (Does that make Perl 6 respectable? No, because it’s a necessary but not sufficient condition. Duh.)

One thing we might want to do, for example, is write our numbers in other number bases. The need for this crops up now and then, for example if you’re stranded on Binarium where everyone has a total of two fingers:

my $this-is-what-we-humans-call-ten = 0b1010;   # 'b' for 'binary', baby
my $and-your-ten-is-our-two = 0b10;

Or you may find yourself on the multi-faceted world of Hexa X, whose inhabitants (due to an evolutionary arms race involving the act of tickling) sport 16 fingers each:

my $open-your-mouth-and-say-ten = 0xA;    # 'x' is for, um, 'heXadecimal'
my $really-sixteen = 0x10;

If you’re really unlucky, you will find yourself stuck in a file permissions factory, with no other means to signal the outer world than by using base 8:

my $halp-i'm-stuck-in-this-weird-unix-factory = 0o644;

(Fans of other C-based languages may notice the deviation from tradition here: for your own sanity, we no longer write the above number as 0644 — doing so rewards you with a stern (turn-offable) warning. Maybe octal numbers were once important enough to merit a prefix of just 0, but they ain’t no more.)

Of course, just these bases are not enough. Sometimes you will need to count with a number of fingers that’s not any of 2, 8, or 16. For those special occasions, we have another nice syntax for you:

say :3<120>;       # 15 (== 1 * 3**2 + 2 * 3**1 + 0 * 3**0)
say :28<aaaargh>;  # 4997394433
say :5($n);        # let me parse that in base 5 for you

Yes, that’s the dear old pair syntax, here hijacked for base conversions from a certain base. You will recognize this special syntax by the fact that it uses colonpairs (:3<120>), not the "fat arrow" (3 => 120), and that the key is an integer. For the curious, the integer goes up to 36 (at which point we’ve reached ‘z’ in the alphabet, and there’s no natural way to make up more symbols for "digits").

I once used :4($_) to translate DNA into proteins. Perl 6 — fulfilling your bioinformatics dreams!

Oh, and sometimes you want to convert into a certain base, essentially taking a good-old-integer and making it understandable for a being with a certain amount of fingers. We’ve got you covered there, too!

say 0xCAFEBABE;            # 3405691582
say 3405691582.base(16);   # CAFEBABE

Moving on. Perl 6 also has rationals.

my $tenth = 1/10;
say $tenth.WHAT;   # (Rat)

Usually computers (and programming languages) are pretty bad at representing numbers such as a tenth, because most of them are stranded on the planet Binarium. Not so Perl 6; it stores your rational numbers exactly, most of the time.

say (0, 1/10 ... 1);            # 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
say (1/3 + 1/6) * 2;            # 1
say 1/10 + 1/10 + 1/10 == 0.3;  # True (\o/)

The rule here is, if you give some number as a decimal (0.5) or as a ratio (1/2), it will be represented as a Rat. After that, Perl 6 does its darnedest to strike a balance between representing numbers without losing precision, and not losing too much performance in tight loops.

But sometimes you do reach for those lossy, precision-dropping, efficient numbers:

my $pi = 3.14e0;
my $earth-mass = 5.97e24;  # kg
say $earth-mass.WHAT;      # (Num)

You get floating-point numbers by writing things in the scientific notation (3.14e0, where the e0 means "times ten to the power of 0"). You can also get these numbers by doing some lossy calculation, such as exponentiation.

Do keep in mind that these are not exact, and will get you into trouble if you treat them as exact:

say (0, 1e-1 ... 1);             # misses the 1, goes on forever
say (0, 1e-1 ... * >= 1);        # 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 (oops)
say 1e0/10 + 1/10 + 1/10 == 0.3; # False (awww!)

So, don’t do that. Don’t treat them as exact, that is.

You’re supposed to be able to write lists of numbers really neatly with the good old Perl 6 quote-words syntax:

my @numbers = <1 2.0 3e0>;
say .WHAT for @numbers;          # (IntStr) (RatStr) (NumStr)

But this, unfortunately, is not implemented in Rakudo yet. (It is implemented in Niecza.)

If you’re into Mandelbrot fractals or Electrical Engineering, Perl 6 has complex numbers for you, too:

say i * i == -1;            # True
say e ** (i * pi) + 1;      # 0+1.22464679914735e-16i

That last one is really close to 0, but the exponentiation throws us into the imprecise realm of floating-point numbers, and we lose a tiny bit of precision. (But what’s a tenth of a quadrillionth between friends?)

my $googol = EVAL( "1" ~ "0" x 100 );
say $googol;                            # 1000…0 (exactly 100 zeros)
say $googol.WHAT;                       # (Int)

Finally, if you have a taste for the infinite, Perl 6 allows handing, passing, and storage of Inf as a constant, and it compares predictably against numbers:

say Inf;              # Inf
say Inf.WHAT;         # (Num)
say Inf > $googol;    # True
my $max = -Inf;
$max max= 5;          # (max= means "store only if greater")
say $max;             # 5

Perl 6 meets your future requirements by giving you alien number bases, integers and rationals with excellent precision, floating point and complex numbers, and infinities. This is what has been added — now go forth and multiply.

Day 09 – Hashes and pairs

Hashes are nice. They can work as a kind of “poor man’s objects” when creating a class seems like just too much ceremony for the occasion.

my $employee = {
    name => 'Fred',
    age => 51,
    skills => <sweeping accounting barking>,
};

Perl (both 5 and 6) takes hashes seriously. So seriously, in fact, that there’s a dedicated sigil for them, and by using it you can omit the braces in the above code:

my %employee =
    name => 'Fred',
    age => 51,
    skills => <sweeping accounting barking>,
;

Note that Perl 6, just as Perl 5, allows you to use barewords for hash keys. People coming from Perl 5 seem to expect that we’ve dropped this feature — I don’t know why, but I suspect that as much as they like the ability, they also feel that it’s secretly “dirty” or “wrong” somehow, and thus they just assume that hash keys need to be quotes in Perl 6. Don’t worry, fivers, you can omit the quotes there without any feelings of guilt!

Another nice thing is that final comma. Yes, Perl allows a comma even after the last hash entry. This makes rearranging lines later a lot less of a hair-pulling experience, because all lines have a final comma. So a big win for maintainability, and not a lot of extra bookkeeping for the Perl 6 parser.

Hashes make great “configuration objects”, too. You want to pass some options into a routine somewhere, but the options (for reasons of future compatibility, perhaps) need to be an open set.

my %options =
    rpm => 440,
    duration => 60,
;
$centrifuge.start(%options);

Actually, we have two options with that last line. Either we pass in the whole hash like that, and the method in the centrifuge class will need to look like this:

method start(%options) {
    # probably need to start by unpacking options here
    # ...
}

Or we decide to “gut” the hash as we pass it in, effectively turning it into a bunch of named arguments:

$centrifuge.start( |%options );  # means :rpm(440), :duration(60)

In which case the method declaration will have to look like this instead:

method start(:$rpm!, :$duration!) {
    # ...
}

(In this case, we probably want to put in those exclamation marks, to make those named parameters obligatory. Unless we’re fine with providing some of them with a default, such as :$duration = 120.)

The “gut” operator prefix:<|> is really called “flattening” or “interpolation”. I really like how, in Perl 6, arrays flatten into positional parameters, and hashes flatten into named parameters. Decades after the fact, it gives some extra rationale to making arrays and hashes special enough to have their own sigils.

my @args = "Would you like fries with that?", 15, 5;
say substr(|@args);    # fries

my %details = :year(1969), :month(7), :day(16),
              :hour(20), :minute(17);
my $moonlanding = DateTime.new( |%details );

This brings us neatly into my next point: hash entries in Perl 6 really try to look like named parameters. They aren’t, of course, they’re just keys and values in a hash. But they try really hard to look the same. So much so that we even have *two* syntaxes for writing a hash entry. We have the fat-arrow syntax:

my %opts = blackberries => 42;

And we have the named argument syntax:

my %opts = :blackberries(42);

They each have their pros and cons. One of the nice things about the latter syntax is that it mixes nicely with variables, and (in case your variables are fortunately named) eliminates a bit of redundancy:

my $blackberries = 42;
my %opts = :$blackberries;   # means the same as :blackberries($blackberries)

We can’t do that in the fat-arrow syntax, not without repeating the word blackberries. And no-one likes to do that.

So hash entries (a key plus a value) really become more of a thing in Perl 6 than they ever were in Perl 5. In Perl 5 they’re a bit of a visual hack, since the fat-arrow is just a synonym for the comma, and hashes are initialized through lists.

In Perl 6, hash entries are syntactically pulled into visual pills through the :blackberries(42) syntax, and even more so through the :$blackberries syntax. Not only that, but we’re passing hashes into routines entry by entry, making the entries stand out a bit more.

In the end, we give up and realize that we care a bunch about those hash entries as units, so we give them a name: Pair. A hash is an (unordered) bunch of Pair objects.

So when you’re saying this:

say %employee.elems;

And the answer comes back “3”… that’s the number of Pair objects in the hash that were counted.

But in the end, Pair objects even turn out to have a sort of independent existence, graduating from their role as hash constituents. For, example, you can treat them as cons pairs and simulate Lisp lists with them:

my $lisp-list = 1 => 2 => 3 => Nil;  # it's nice that infix:<< => >> is right-associative

And then, as a final trick, let’s dynamically extend the Pair class to recognize arbitrary cadr-like method calls. (Note that .^add_fallback is not in the spec and currently Rakudo-only.)

Pair.^add_fallback(
    -> $, $name { $name ~~ /^c<[ad]>+r$/ },  # should we handle this? yes, if /^c<[ad]>+r$/
    -> $, $name {                            # if it turned out to be our job, this is what we do
        -> $p {
            $name ~~ /^c(<[ad]>*)(<[ad]>)r$/;        # split out last 'a' or 'd'
            my $r = $1 eq 'a' ?? $p.key !! $p.value; # choose key or value
            $0 ne '' ?? $r."c{$0}r"() !! $r;               # maybe recurse
        } 
    }
);

$lisp-list.caddr.say;    # 3

Whee!

Day 02 – The humble type object

For some inscrutable reason, we have defined a Dog class in today’s post.

class Dog {
    has $.name;
}

Don’t ask me why — maybe we’re writing software for a kennel? Maybe we’re writing software for dogs? “Teach your dog how to type!” Clever dogs can do up to 10 words a minute, with surprisingly few typos.

Anyway. Having a Dog class gives us the dubious pleasure of being able to create dogs out of thin air and passing them to functions. No surprise there.

sub check(Dog $d) {
    say "Yup, that's a dog for sure.";
}

my Dog $dog .= new(:name);
check($dog);     # Yup, that's a dog for sure.

But where there might be some surprise — if you haven’t gotten used to the idea of Perl 6’s type objects yet — is that you can also do this:

check(Dog);      # Yup, that's a dog for sure.

What did we just do there?

We didn’t pass a dog to the function, we passed Dog to the function. And the function accepts it and says it’s a dog, too. So, Dog is a dog. Huh.

What’s going on here is that in Perl 6 when we declare a class Dog like we just did, the word Dog ends up having two meanings:

  • The class Dog that we declared.
  • The type object Dog, kind of a patron saint for all Dog objects ever instantiated.

Cramming two concepts into one word like this seems like a recipe for failure. But it turns out it works surprisingly well.

Before we look at the reasons for using type objects, let’s find out a bit more about what they are.

say Dog;          # (Dog)
say Dog.name;     # ERROR: Cannot look up attributes in a type object
say ?Dog;         # False
say defined Dog;  # False

So, in summary, the Dog type object identifies itself as (Dog), it refuses to have its attribute inspected, it boolifies to False, and it’s not defined. Contrast this with an instance of Dog:

say $dog;         # Dog.new(name => "Fido")
say $dog.name;    # Fido
say ?$dog;        # True
say defined $dog; # True

An instance is everthing the type object isn’t: it knows how to output itself, it will happily tell you its name, and it’s both True and defined. Nice.

(Being undefined is only almost a surefire way to identify the type object. Someone could have gone through the trouble of making their instance object undefined. As they say in the industry, you don’t have to be a type object to be undefined… but it helps!)

And now, as promised, the Top Five Reasons Type Objects Work Surprisingly Well:

  1. Classes actually make sense as objects in the program. There’s this idea that classes have to haughtily refuse to play among the rest of the values in a program, that they have to somehow be like gods looking down on the instances from a Parthenon of class-hood. Perl 5 kind of has it like that. But both Ruby and Python show that classes can behave as more or less normal objects. In Perl 6, Dog is also what you get if you do $dog.WHAT.
  2. It fits quite well with the whole smartmatching thing. So what $dog ~~ Dog actually means is something like “hey, Dog type object, does this $fido look anything like you?”. The type object doesn’t just sit there, it does useful things like smartmatching.
  3. Another thing: the whole reason that line, my Dog $dog .= new(:name);, works as it does is because we end up calling .new on the type object. So here’s what that line does in slow motion. It desugars to a declaration and an assignment. The declaration is my Dog $dog; and so, because the $dog variable needs to start out with some undefined value, it starts out with Dog, the type object. The assignment then is simply $dog.=new, which is short for $dog = $dog.new. Conveniently, because the type object Dog is an object of the type Dog, it has a .new method (inherited from Mu in this case) that knows how to construct dogs.
  4. A little detail from that last point, which actually turns out to be a rather big deal: Perl 6 doesn’t really have undef like Perl 5 does. It turned out that undef wasn’t a really good fit with a type system; undef gets in everywhere, and doesn’t really have a type at all. (A bit like Java’s null which is known to have caused people no end of suffering.) So what Perl 6 has instead are these typed undefined values, namely — you guessed it — the type objects. If you declare a variable my Int $i, then $i will start out as undefined, that is, containing the type object Int.
  5. Not only do you sometimes want to call .new on the type object, sometimes you have other methods which don’t require an instance. (These kinds of methods are sometimes known as static methods in some languages, and class methods in other languages. Some languages have both of these, and they’re different, just to be confusing.) Again, the type object comes to the rescue here, sort of acts like an instance so that you can call your method on it, and then once again fades into the background. For example, if the class Dog had a method bark { say "woof" } then the Dog type object would be able to bark just as well as actual dog instances. (But the type object still refuses to tell you its .name, or any of its attributes.)

So that’s type objects for you. They’re sitting in a convenient semantic spot halfway between the class and its instances, sometimes representing one end of the spectrum, sometimes the other.

One thing before we part ways today. It doesn’t happen often, but sometimes you do want to be able to tell type objects and real instances apart, for example when accepting parameters in a function:

multi sniff(Dog:U $dog) {
    say "a type object, of course"
}
multi sniff(Dog:D $dog) {
    say "definitely a real dog instance"
}

sniff Dog;    # a type object, of course
sniff $dog;   # definitely a real dog instance

Here, :U stands for “undefined” and :D for “defined”. (And that, dear friends, is how we got a smiley into the design of Perl 6. Program language designers, take heed.) As I mentioned parenthetically before, it’s actually possible to be an undefined object without being a type object. For these special occasions, we have :T, but this modifier isn’t implemented in Rakudo as of this writing. (Though moritz++ informs me that, in Rakudo, :U currently has the semantics that :T should have.)

Let’s just end this post with maybe the corniest one-liner ever to see the light of day in #perl6:

$ perl6 -e 'say (my @a = "Hip " x 2), @a.^name, "!"'
Hip Hip Array!

Day 23 – Macros

Syntactic macros. The Lisp gods of yore provided humanity with this invention, essentially making Lisp a programmable programming language. Lisp adherents often look at the rest of the programming world with pity, seeing them fighting to invent wheels that were wrought and polished back in the sixties when giants walked the Earth and people wrote code in all-caps.

And the Lisp adherents see that the rest of us haven’t even gotten to the best part yet, the part with syntactic macros. We’re starting to get the hang of automatic memory management, continuations, and useful first-class functions. But macros are still absent from this picture.

In part, this is because in order to have proper syntactic macros, you basically have to look like Lisp. You know, with the parentheses and all. Lisp ends up having almost no syntax at all, making every program a very close representation of a syntax tree. Which really helps when you have macros starting to manipulate those same trees. Other languages, not really wanting to look like Lisp, find it difficult-to-impossible to pull off the same trick.

The Perl languages love the difficult-to-impossible. Perl programmers publish half a dozen difficult-to-impossible solutions to CPAN before breakfast. And, because Perl 6 is awesome and syntactic macros are awesome, Perl 6 has syntactic macros.

It is known, Khaleesi.

What are macros?

For reasons even I don’t fully understand, I’ve put myself in charge of implementing syntactic macros in Rakudo. Implementing macros means understanding them. Understanding them means my brain melts regularly. Unless it fries. It’s about 50-50.

I have this habit where I come into the #perl6 channel, and exclaiming “macros are just X!” for various values of X. Here are some samples:

  • Macros are just syntax tree manipulators.
  • Macros are just “little compilers”.
  • Macros are just a kind of templates.
  • Macros are just routines that do code substitution.
  • Macros allow you to safely hand values back and forth between the compile-time world and the runtime world.

But the definition that I finally found that I like best of all comes from scalamacros.org:

Macros are functions that are called by the compiler during compilation. Within these functions the programmer has access to compiler APIs. For example, it is possible to generate, analyze and typecheck code.

While we only cover the “generate” part of it yet in Perl 6, there’s every expectation we’ll be getting to the “analyze and typecheck” parts as well.

Some examples, please?

Coming right up.

macro checkpoint {
  my $i = ++(state $n);
  quasi { say "CHECKPOINT $i"; }
}

checkpoint;
for ^5 { checkpoint; }
checkpoint;

The quasi block is Perl 6’s way of saying “a piece of code, coming right up!”. You just put your code in the quasi block, and return it from the macro routine.

This code inserts “checkpoints” in our code, like little debugging messages. There’s only three checkpoints in the code, so the output we’ll get looks like this:

CHECKPOINT 1
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 3

Note that the “code insertion” happens at compile time. That’s why we get five copies of the CHECKPOINT 2 line, because it’s the same checkpoint running five times. If we had had a subroutine instead:

sub checkpoint {
  my $i = ++(state $n);
  say "CHECKPOINT $i";
}

Then the program would print 7 distinct checkpoints.

CHECKPOINT 1
CHECKPOINT 2
CHECKPOINT 3
CHECKPOINT 4
CHECKPOINT 5
CHECKPOINT 6
CHECKPOINT 7

As a more practical example, let’s say you have logging output in your program, but you want to be able to switch it off completely. The problem with an ordinary logging subroutine is that with something like:

LOG "The answer is { time-consuming-computation() }";

The time-consuming-computation() will run and take a lot of time even if LOG subsequently finds that logging was turned off. (That’s just how argument evaluation works in a non-lazy language.)

A macro fixes this:

constant LOGGING = True;

macro LOG($message) {
  if LOGGING {
    quasi { say {{{$message}}} };
  }
}

Here we see a new feature: the {{{ }}} triple-block. (Syntax is likely to change in the near future, see below.) It’s our way to mix template code in the quasi block with code coming in from other places. Doing say $message; would have been wrong, because $message is a syntax tree of the message to be logged. We need to inject that syntax tree right into the quasi, and we do that with a triple-block.

The macro conditionally generates logging code in your program. If the constant LOGGING is switched on, the appropriate logging code will replace each LOG macro invocation. If LOGGING is off, each macro invocation will be replaced by literally nothing.

Experience shows that running no code at all is very efficient.

What are syntactic macros?

A lot of things are called “macros” in this world. In programming languages, there are two big categories:

  • Textual macros. They substitute code on the level of the source code text. C’s macros, or Perl 5’s source filters, are examples.
  • Syntactic macros. They substitute code on the level of the source code syntax tree. Lisp macros are an example.

Textual macros are very powerful, but they represent the kind of power that is just as likely to shoot half your leg off as it is to get you to your destination. Using them requires great care, of the same kind needed for a janitor gig at Jurassic Park.

The problem is that textual macros don’t compose all that well. Bring in more than one of them to work on the same bit of source code, and… all bets are off. This puts severe limits on modularity. Textual macros, being what they are, leak internal details all over the place. This is the big lesson from Perl 5’s source filters, as far as I understand.

Syntactic macros compose wonderfully. The compiler is already a pipeline handing off syntax trees between various processing steps, and syntactic macros are simply more such steps. It’s as if you and the compiler were two children, with the compiler going “Hey, you want to play in my sandbox? Jump right in. Here’s a shovel. We’ve got work to do.” A macro is a shovel.

And syntactic macros allow us to be hygienic, meaning that code in the macro and code outside of the macro don’t step on each other’s toes. In practice, this is done by carefully keeping track of the macros context and the mainline’s context, and making sure wires don’t cross. This is necessary for safe and large-scale composition. Textual macros don’t give us this option at all.

Future

Both of the examples in this post work already in Rakudo. But it might also be useful to know where we’re heading with macros in the next year or so. The list is in the approximate order I expect to tackle things.

  • Un-hygiene. While hygienic macros are the sane and preferable default, sometimes you want to step on the toes of the mainline code. There should be an opt-out, and escape hatch. This is next up.
  • Introspection. In order to analyze and typecheck code, not just generate it, we need to be able to take syntax trees coming in as macro arguments, and look inside of them. There are no tools for that yet, and there’s no spec to guide us here. But I’m fairly sure people will want this. The trick is to come up with something that doesn’t tie us down to one compiler’s internal syntax-tree format. Both for the sake of compiler interoperability and future compatibility.
  • Deferred declarations. The sandbox analogy isn’t so frivolous, really. If you declare a class inside a quasi block, that declaration is limited (“sandboxed”) to within that quasi block. Then, when the code is injected somewhere in the mainline because of a macro invocation, it should actually run. Fortunately, as it happens, the Rakudo internals are factored in such a way that this will be fairly straightforward to implement.
  • Better syntax. The triple-block syntax is probably going away in favor of something better. The problem isn’t the syntax so much as the fact that it currently only works for terms. We want it to work for basically all syntactic categories. A solid proposal for this is yet to materialize, though.

With each of these steps, I expect us to find new and fruitful uses of macros. Knowing my fellow Perl 6 developers, we’ll probably find some uses that will shock and disgust us all, too.

In conclusion

Perl 6 is awesome because it puts you, the programmer, in the driver seat. Macros are simply more of that.

Implementing macros makes your brain melt. However, using them is relatively straightforward.

Day 22 – Parsing an IPv4 address

Guest post by Herbert Breunung (lichtkind).

Perl 5 brought regexes to mainstream programming and set a standard, one that is felt as relevant even in Redmond. Perl 6, of course, steps up the game by adding many new features to the regex camp, including easy-to-build grammars for your own complex parsers. But without getting too complex, you can get a lot of joy out of Perl 6’s rx (that’s how Perl 6 spells Perl 5’s qr operator, that enables you to save a Regex in a variable).

Because the Perl 6 regex syntax is less littered with exceptional cases, Larry Wall also likes to joke that he put the “regular” back into “regular expression”.

Some of the changes are:

  • most special variables are gone,
  • non-capturing groups and other grouping syntax is easier to type,
  • no more single/multi line modes,
  • x mode became default, making whitespace non-significant by default.

In summary, regexes are more regular than in Perl 5, confirming Larry’s joke. They try a bit harder to make your life easier when you need to match text. Under the hood, regexes have blossomed out into a complete sub-language within the bigger Perl 6 language. A language with its own parsing rules.

But don’t fret; not everything has changed. Some things remain the same:

/\d+/

This regex still matches one or more consecutive digits.

Similarly, if you want to capture the digits, you can do this, just like you’re used to:

/(\d+)/

You’ll find the matched digits in $0, not $1 as in Perl 5. All the special variables $0, $1, $2 are really syntactic sugar for indexing the match variable ($/[0], $/[1], $/[2]). Because indices start at 0, it makes sense for the first matched group to be $0. In Perl 5, $0 contains the name of the script or program, but this has been renamed into $*EXECUTABLE_NAME in Perl 6.

Should you be interested in getting all of the captured groups of a regex match, you can use @(), which is syntactic sugar for @($/).

The object in the $/ variable holds lots of useful information about the last match. For example, $/.from will give you the starting string position of the match.

But $0 will get us far enough for this post. We use it to extract individual features from a string.

Sometimes we want to extract a whole bunch of similar things at once. Then we can use the :g (or :global) modifier on the regex:

$_ = '1 23 456 78.9';
say .Str for m:g/(\d+)/; # 1 23 456 78 9

Note that the :g — as opposed to prior regex implementations — sits up front, right at the start of the regex. Not at the end. That way, when you read the regex from left to right, you will know from the start how the regex is doing its matching. No more end-heavy regex expressions.

Matching “all things that look like this” is so useful, that there’s even a dedicated method for that, .comb:

$str.comb(/\d+/);

If you’re familiar with .split, you can think of .comb as its cheerful cousin, matching all the things that .split discards.

Let’s tackle the matching of an IPv4 address. Coming from a Perl 5 angle, we expect to have to do something like this:

/(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/

This won’t do in Perl 6, though. First of all, the {} blocks are real blocks in a Perl 6 regex; they contain Perl 6 code. Second, because Perl 6 has lots of error handling to catch p5isms, like this, you’ll get an error saying “Unsupported use of {N,M} as general quantifier; in Perl 6 please use ** N..M (or ** N..*)”.

So let’s do that. To match between one and three digits in a Perl 6 regex, we should type:

/\d ** 1..3/

Note how the regex sublanguage re-uses parts from the main Perl 6 language. ** can be seen as a kind of exponentiation (if we squint), in that we’re taking \d “to the between-first-and-third power”. And the range notation 1..3 exists both outside and within regexes.

Using our new knowledge about the repetition quantifier, we end up with something like this:

/(\d**1..3) \. (\d**1..3) \. (\d**1..3) \. (\d**1..3)/

That’s still kinda clunky. We might end up wishing that we could use the repetition operator again, but those literal dots in between prevent us from doing that. If only we could specify repetition a given number of times and a divider.

In Perl 6 regexes, you can.

/ (\d ** 1..3) ** 4 % '.' /

The % operator here is a quantifier modifier, so it can only follow on a quantifier like * or + or **. The choice of % for this function is relatively new in Perl 6, and you may prefer to read it as “modulo”, just like in the main language. That is, “match four groups of digits, modulo literal dots in between”. Or you could think of the dots in between as the “remainder”, the separators that are left after you’ve parsed the actual elements.

Oh, and you might’ve noticed that \. changed to '.' on the way. We can use either; they mean exactly the same. In Perl 5, there isn’t a simple rule saying which symbols have a magic meaning and which ones simply signify themselves. In Perl 6, it’s easy: word characters (alphanumerics and the underscore) always signify themselves. Everything else has to be escaped or quoted to get its literal meaning.

Putting it all together, here’s how we would extract IPv4 addresses out of a string:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";

say .Str for m:g/ (\d ** 1..3) ** 4 % '.' /;
# output: 127.0.0.1 173.194.32.32

Or, we could use .comb:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";
my @ip4addrs = .comb(/ (\d ** 1..3) ** 4 % '.' /);

If we’re interested in individual integers, we can get those too:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";
say .list>>.Str.perl for m:g/ (\d ** 1..3) ** 4 % '.' /;
# output: ("127", "0", "0", "1") ("173", "194", "32", "32")

If you want to know more, read the S05, or watch me battling with my slide deck and the English language in this presentation about regexes.