Author Archive

Day 22 – A catalogue of operator types

December 22, 2013

Perl 6 has a very healthy relationship to operators. "An operator is just a funnily-named subroutine", goes the slogan in the community.

In practice, it means you can do this:

sub postfix:<!>($N) {
    [*] 2..$N;
}
say 6!;    # 720

Yeah, that’s the prototypical factorial example. I promise I won’t do it any more in this post. I swear the only reason we don’t have factorial as a standard operator in the language, is so that we can impress people by defining it.

Anyway, so a postfix operator ! is really just a "funnily-named" subroutine postfix:<!>. Similarly, we have prefix and infix operators, like prefix:<-> and infix:<*>. They’re all just subroutines. I wrote about that before, so I’m not going to hammer on that point.

Operators have different precedence (like infix:<*> binds tighter than infix:<+>)…

$x + $y * $z   # compiler sees $x + ($y * $z)
$x * $y + $z   # compiler sees ($x * $y) + $z

…and different associativity (like infix:</> associates to the left but infix:<=> associates to the right).

$x / $y / $z   # compiler sees ($x / $y) / $z
$x = $y = $z   # compiler sees $x = ($y = $z)

But I wrote about that before too, at quite some length, so I’m not going to revisit that topic.

No, today I just want to talk about the operator categories in general. I think Perl 6 does a nice job of describing the operator types themselves. I don’t see that in many other languages.

Here are all the different types:

 type            position           syntax
======          ==========         ========

prefix          before a term        !X
infix           between two terms    X ! Y
postfix         after a term         X!

circumfix       around               [X]
postcircumfix   after & around       X[Y]

A lot of other languages give you the ability to define your own operators. Many of them provide approaches which are hackish afterthoughts, and only allow you to override existing operators. Some other languages do approach the problem head-on, but end up simplifying the language to decrease the complexity of defining new operators.

Perl 6 approaches, and solves, the problem, head-on. You get all of the above operators, you can refine or override old ones, and you can talk within the language about things like precedence and associativity.

All that is rather nice. But, as usual, Perl 6 goes one step further and starts classifying metaoperators, too.

What’s a metaoperator? That’s our name for when you can extend an operator in a certain way. For example, many languages allow some kind of not-equal operator like infix:<!=>. Perl 6 has another one for string non-equality: infix:<ne>. And then maybe you want to do smart non-matching: infix:<!~~>. And so on — pretty soon you catch on to the pattern: you may, at one point or another, want to invert the result of any boolean matcher. So that’s what Perl 6 provides.

Here are a few examples:

normal op     negated op
=========     ==========
eq            !eq     ( synonym of ne )
~~            !~~
<             !<      ( synonym of >= )
before        !before

Because this particular metaoperator attaches itself before an infix operator, it gets the name infix_prefix_meta_operator:<!>. I was going to say "and yes, you can add your own user-defined metaoperators too", but currently no implementation supports that. Maybe sometime in the future.

There are many other categories of metaoperators. For example the "multiplication reducer" [*] that we used in the factorial example at the top (which takes a list of numbers and multiplies them all together) is really an infix:<*> surrounded by the metaop prefix_circumfix_meta_operator:sym<[ ]>. (It’s "prefix" because it goes before the list of numbers.)

Luckily, we don’t have meta-meta-ops. Already thinking about the metaops is quite a challenge sometimes. But what I like about Perl 6 and the approach it takes to grammar and operators is this: someone did think about it, long and hard. The resulting system has a pleasing simplicity to it.

Perl 6 is very well-suited for building parsers for languages. As one of its neatest tricks, it takes this expressive power and directs it towards the task of defining itself. As a Perl 6 user, you’re given not just an excellent toolbox, but a well-organized workshop to adapt and extend to fit your needs.

Day 15 – Numbers and ways of writing them

December 15, 2013

Consider the humble integer.

my $number-of-dwarfs = 7;
say $number-of-dwarfs.WHAT;   # (Int)

Integers are great. (Well, many of them are.) Computers basically run on integers. Also, God made the integers, and everything else is basically cheap counterfeit knock-offs, smuggled into the Platonic realm when no-one’s looking. If they’re so important, we’d expect to have lots of good ways of writing them in any respectable programming language.

Do we have lots of ways of writing integers in Perl 6? We do. (Does that make Perl 6 respectable? No, because it’s a necessary but not sufficient condition. Duh.)

One thing we might want to do, for example, is write our numbers in other number bases. The need for this crops up now and then, for example if you’re stranded on Binarium where everyone has a total of two fingers:

my $this-is-what-we-humans-call-ten = 0b1010;   # 'b' for 'binary', baby
my $and-your-ten-is-our-two = 0b10;

Or you may find yourself on the multi-faceted world of Hexa X, whose inhabitants (due to an evolutionary arms race involving the act of tickling) sport 16 fingers each:

my $open-your-mouth-and-say-ten = 0xA;    # 'x' is for, um, 'heXadecimal'
my $really-sixteen = 0x10;

If you’re really unlucky, you will find yourself stuck in a file permissions factory, with no other means to signal the outer world than by using base 8:

my $halp-i'm-stuck-in-this-weird-unix-factory = 0o644;

(Fans of other C-based languages may notice the deviation from tradition here: for your own sanity, we no longer write the above number as 0644 — doing so rewards you with a stern (turn-offable) warning. Maybe octal numbers were once important enough to merit a prefix of just 0, but they ain’t no more.)

Of course, just these bases are not enough. Sometimes you will need to count with a number of fingers that’s not any of 2, 8, or 16. For those special occasions, we have another nice syntax for you:

say :3<120>;       # 15 (== 1 * 3**2 + 2 * 3**1 + 0 * 3**0)
say :28<aaaargh>;  # 4997394433
say :5($n);        # let me parse that in base 5 for you

Yes, that’s the dear old pair syntax, here hijacked for base conversions from a certain base. You will recognize this special syntax by the fact that it uses colonpairs (:3<120>), not the "fat arrow" (3 => 120), and that the key is an integer. For the curious, the integer goes up to 36 (at which point we’ve reached ‘z’ in the alphabet, and there’s no natural way to make up more symbols for "digits").

I once used :4($_) to translate DNA into proteins. Perl 6 — fulfilling your bioinformatics dreams!

Oh, and sometimes you want to convert into a certain base, essentially taking a good-old-integer and making it understandable for a being with a certain amount of fingers. We’ve got you covered there, too!

say 0xCAFEBABE;            # 3405691582
say 3405691582.base(16);   # CAFEBABE

Moving on. Perl 6 also has rationals.

my $tenth = 1/10;
say $tenth.WHAT;   # (Rat)

Usually computers (and programming languages) are pretty bad at representing numbers such as a tenth, because most of them are stranded on the planet Binarium. Not so Perl 6; it stores your rational numbers exactly, most of the time.

say (0, 1/10 ... 1);            # 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
say (1/3 + 1/6) * 2;            # 1
say 1/10 + 1/10 + 1/10 == 0.3;  # True (\o/)

The rule here is, if you give some number as a decimal (0.5) or as a ratio (1/2), it will be represented as a Rat. After that, Perl 6 does its darnedest to strike a balance between representing numbers without losing precision, and not losing too much performance in tight loops.

But sometimes you do reach for those lossy, precision-dropping, efficient numbers:

my $pi = 3.14e0;
my $earth-mass = 5.97e24;  # kg
say $earth-mass.WHAT;      # (Num)

You get floating-point numbers by writing things in the scientific notation (3.14e0, where the e0 means "times ten to the power of 0"). You can also get these numbers by doing some lossy calculation, such as exponentiation.

Do keep in mind that these are not exact, and will get you into trouble if you treat them as exact:

say (0, 1e-1 ... 1);             # misses the 1, goes on forever
say (0, 1e-1 ... * >= 1);        # 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 (oops)
say 1e0/10 + 1/10 + 1/10 == 0.3; # False (awww!)

So, don’t do that. Don’t treat them as exact, that is.

You’re supposed to be able to write lists of numbers really neatly with the good old Perl 6 quote-words syntax:

my @numbers = <1 2.0 3e0>;
say .WHAT for @numbers;          # (IntStr) (RatStr) (NumStr)

But this, unfortunately, is not implemented in Rakudo yet. (It is implemented in Niecza.)

If you’re into Mandelbrot fractals or Electrical Engineering, Perl 6 has complex numbers for you, too:

say i * i == -1;            # True
say e ** (i * pi) + 1;      # 0+1.22464679914735e-16i

That last one is really close to 0, but the exponentiation throws us into the imprecise realm of floating-point numbers, and we lose a tiny bit of precision. (But what’s a tenth of a quadrillionth between friends?)

my $googol = EVAL( "1" ~ "0" x 100 );
say $googol;                            # 1000…0 (exactly 100 zeros)
say $googol.WHAT;                       # (Int)

Finally, if you have a taste for the infinite, Perl 6 allows handing, passing, and storage of Inf as a constant, and it compares predictably against numbers:

say Inf;              # Inf
say Inf.WHAT;         # (Num)
say Inf > $googol;    # True
my $max = -Inf;
$max max= 5;          # (max= means "store only if greater")
say $max;             # 5

Perl 6 meets your future requirements by giving you alien number bases, integers and rationals with excellent precision, floating point and complex numbers, and infinities. This is what has been added — now go forth and multiply.

Day 09 – Hashes and pairs

December 9, 2013

Hashes are nice. They can work as a kind of “poor man’s objects” when creating a class seems like just too much ceremony for the occasion.

my $employee = {
    name => 'Fred',
    age => 51,
    skills => <sweeping accounting barking>,
};

Perl (both 5 and 6) takes hashes seriously. So seriously, in fact, that there’s a dedicated sigil for them, and by using it you can omit the braces in the above code:

my %employee =
    name => 'Fred',
    age => 51,
    skills => <sweeping accounting barking>,
;

Note that Perl 6, just as Perl 5, allows you to use barewords for hash keys. People coming from Perl 5 seem to expect that we’ve dropped this feature — I don’t know why, but I suspect that as much as they like the ability, they also feel that it’s secretly “dirty” or “wrong” somehow, and thus they just assume that hash keys need to be quotes in Perl 6. Don’t worry, fivers, you can omit the quotes there without any feelings of guilt!

Another nice thing is that final comma. Yes, Perl allows a comma even after the last hash entry. This makes rearranging lines later a lot less of a hair-pulling experience, because all lines have a final comma. So a big win for maintainability, and not a lot of extra bookkeeping for the Perl 6 parser.

Hashes make great “configuration objects”, too. You want to pass some options into a routine somewhere, but the options (for reasons of future compatibility, perhaps) need to be an open set.

my %options =
    rpm => 440,
    duration => 60,
;
$centrifuge.start(%options);

Actually, we have two options with that last line. Either we pass in the whole hash like that, and the method in the centrifuge class will need to look like this:

method start(%options) {
    # probably need to start by unpacking options here
    # ...
}

Or we decide to “gut” the hash as we pass it in, effectively turning it into a bunch of named arguments:

$centrifuge.start( |%options );  # means :rpm(440), :duration(60)

In which case the method declaration will have to look like this instead:

method start(:$rpm!, :$duration!) {
    # ...
}

(In this case, we probably want to put in those exclamation marks, to make those named parameters obligatory. Unless we’re fine with providing some of them with a default, such as :$duration = 120.)

The “gut” operator prefix:<|> is really called “flattening” or “interpolation”. I really like how, in Perl 6, arrays flatten into positional parameters, and hashes flatten into named parameters. Decades after the fact, it gives some extra rationale to making arrays and hashes special enough to have their own sigils.

my @args = "Would you like fries with that?", 15, 5;
say substr(|@args);    # fries

my %details = :year(1969), :month(7), :day(16),
              :hour(20), :minute(17);
my $moonlanding = DateTime.new( |%details );

This brings us neatly into my next point: hash entries in Perl 6 really try to look like named parameters. They aren’t, of course, they’re just keys and values in a hash. But they try really hard to look the same. So much so that we even have *two* syntaxes for writing a hash entry. We have the fat-arrow syntax:

my %opts = blackberries => 42;

And we have the named argument syntax:

my %opts = :blackberries(42);

They each have their pros and cons. One of the nice things about the latter syntax is that it mixes nicely with variables, and (in case your variables are fortunately named) eliminates a bit of redundancy:

my $blackberries = 42;
my %opts = :$blackberries;   # means the same as :blackberries($blackberries)

We can’t do that in the fat-arrow syntax, not without repeating the word blackberries. And no-one likes to do that.

So hash entries (a key plus a value) really become more of a thing in Perl 6 than they ever were in Perl 5. In Perl 5 they’re a bit of a visual hack, since the fat-arrow is just a synonym for the comma, and hashes are initialized through lists.

In Perl 6, hash entries are syntactically pulled into visual pills through the :blackberries(42) syntax, and even more so through the :$blackberries syntax. Not only that, but we’re passing hashes into routines entry by entry, making the entries stand out a bit more.

In the end, we give up and realize that we care a bunch about those hash entries as units, so we give them a name: Pair. A hash is an (unordered) bunch of Pair objects.

So when you’re saying this:

say %employee.elems;

And the answer comes back “3”… that’s the number of Pair objects in the hash that were counted.

But in the end, Pair objects even turn out to have a sort of independent existence, graduating from their role as hash constituents. For, example, you can treat them as cons pairs and simulate Lisp lists with them:

my $lisp-list = 1 => 2 => 3 => Nil;  # it's nice that infix:<< => >> is right-associative

And then, as a final trick, let’s dynamically extend the Pair class to recognize arbitrary cadr-like method calls. (Note that .^add_fallback is not in the spec and currently Rakudo-only.)

Pair.^add_fallback(
    -> $, $name { $name ~~ /^c<[ad]>+r$/ },  # should we handle this? yes, if /^c<[ad]>+r$/
    -> $, $name {                            # if it turned out to be our job, this is what we do
        -> $p {
            $name ~~ /^c(<[ad]>*)(<[ad]>)r$/;        # split out last 'a' or 'd'
            my $r = $1 eq 'a' ?? $p.key !! $p.value; # choose key or value
            $0 ne '' ?? $r."c{$0}r"() !! $r;               # maybe recurse
        } 
    }
);

$lisp-list.caddr.say;    # 3

Whee!

Day 02 – The humble type object

December 2, 2013

For some inscrutable reason, we have defined a Dog class in today’s post.

class Dog {
    has $.name;
}

Don’t ask me why — maybe we’re writing software for a kennel? Maybe we’re writing software for dogs? “Teach your dog how to type!” Clever dogs can do up to 10 words a minute, with surprisingly few typos.

Anyway. Having a Dog class gives us the dubious pleasure of being able to create dogs out of thin air and passing them to functions. No surprise there.

sub check(Dog $d) {
    say "Yup, that's a dog for sure.";
}

my Dog $dog .= new(:name);
check($dog);     # Yup, that's a dog for sure.

But where there might be some surprise — if you haven’t gotten used to the idea of Perl 6’s type objects yet — is that you can also do this:

check(Dog);      # Yup, that's a dog for sure.

What did we just do there?

We didn’t pass a dog to the function, we passed Dog to the function. And the function accepts it and says it’s a dog, too. So, Dog is a dog. Huh.

What’s going on here is that in Perl 6 when we declare a class Dog like we just did, the word Dog ends up having two meanings:

  • The class Dog that we declared.
  • The type object Dog, kind of a patron saint for all Dog objects ever instantiated.

Cramming two concepts into one word like this seems like a recipe for failure. But it turns out it works surprisingly well.

Before we look at the reasons for using type objects, let’s find out a bit more about what they are.

say Dog;          # (Dog)
say Dog.name;     # ERROR: Cannot look up attributes in a type object
say ?Dog;         # False
say defined Dog;  # False

So, in summary, the Dog type object identifies itself as (Dog), it refuses to have its attribute inspected, it boolifies to False, and it’s not defined. Contrast this with an instance of Dog:

say $dog;         # Dog.new(name => "Fido")
say $dog.name;    # Fido
say ?$dog;        # True
say defined $dog; # True

An instance is everthing the type object isn’t: it knows how to output itself, it will happily tell you its name, and it’s both True and defined. Nice.

(Being undefined is only almost a surefire way to identify the type object. Someone could have gone through the trouble of making their instance object undefined. As they say in the industry, you don’t have to be a type object to be undefined… but it helps!)

And now, as promised, the Top Five Reasons Type Objects Work Surprisingly Well:

  1. Classes actually make sense as objects in the program. There’s this idea that classes have to haughtily refuse to play among the rest of the values in a program, that they have to somehow be like gods looking down on the instances from a Parthenon of class-hood. Perl 5 kind of has it like that. But both Ruby and Python show that classes can behave as more or less normal objects. In Perl 6, Dog is also what you get if you do $dog.WHAT.
  2. It fits quite well with the whole smartmatching thing. So what $dog ~~ Dog actually means is something like “hey, Dog type object, does this $fido look anything like you?”. The type object doesn’t just sit there, it does useful things like smartmatching.
  3. Another thing: the whole reason that line, my Dog $dog .= new(:name);, works as it does is because we end up calling .new on the type object. So here’s what that line does in slow motion. It desugars to a declaration and an assignment. The declaration is my Dog $dog; and so, because the $dog variable needs to start out with some undefined value, it starts out with Dog, the type object. The assignment then is simply $dog.=new, which is short for $dog = $dog.new. Conveniently, because the type object Dog is an object of the type Dog, it has a .new method (inherited from Mu in this case) that knows how to construct dogs.
  4. A little detail from that last point, which actually turns out to be a rather big deal: Perl 6 doesn’t really have undef like Perl 5 does. It turned out that undef wasn’t a really good fit with a type system; undef gets in everywhere, and doesn’t really have a type at all. (A bit like Java’s null which is known to have caused people no end of suffering.) So what Perl 6 has instead are these typed undefined values, namely — you guessed it — the type objects. If you declare a variable my Int $i, then $i will start out as undefined, that is, containing the type object Int.
  5. Not only do you sometimes want to call .new on the type object, sometimes you have other methods which don’t require an instance. (These kinds of methods are sometimes known as static methods in some languages, and class methods in other languages. Some languages have both of these, and they’re different, just to be confusing.) Again, the type object comes to the rescue here, sort of acts like an instance so that you can call your method on it, and then once again fades into the background. For example, if the class Dog had a method bark { say "woof" } then the Dog type object would be able to bark just as well as actual dog instances. (But the type object still refuses to tell you its .name, or any of its attributes.)

So that’s type objects for you. They’re sitting in a convenient semantic spot halfway between the class and its instances, sometimes representing one end of the spectrum, sometimes the other.

One thing before we part ways today. It doesn’t happen often, but sometimes you do want to be able to tell type objects and real instances apart, for example when accepting parameters in a function:

multi sniff(Dog:U $dog) {
    say "a type object, of course"
}
multi sniff(Dog:D $dog) {
    say "definitely a real dog instance"
}

sniff Dog;    # a type object, of course
sniff $dog;   # definitely a real dog instance

Here, :U stands for “undefined” and :D for “defined”. (And that, dear friends, is how we got a smiley into the design of Perl 6. Program language designers, take heed.) As I mentioned parenthetically before, it’s actually possible to be an undefined object without being a type object. For these special occasions, we have :T, but this modifier isn’t implemented in Rakudo as of this writing. (Though moritz++ informs me that, in Rakudo, :U currently has the semantics that :T should have.)

Let’s just end this post with maybe the corniest one-liner ever to see the light of day in #perl6:

$ perl6 -e 'say (my @a = "Hip " x 2), @a.^name, "!"'
Hip Hip Array!

Day 23 – Macros

December 23, 2012

Syntactic macros. The Lisp gods of yore provided humanity with this invention, essentially making Lisp a programmable programming language. Lisp adherents often look at the rest of the programming world with pity, seeing them fighting to invent wheels that were wrought and polished back in the sixties when giants walked the Earth and people wrote code in all-caps.

And the Lisp adherents see that the rest of us haven’t even gotten to the best part yet, the part with syntactic macros. We’re starting to get the hang of automatic memory management, continuations, and useful first-class functions. But macros are still absent from this picture.

In part, this is because in order to have proper syntactic macros, you basically have to look like Lisp. You know, with the parentheses and all. Lisp ends up having almost no syntax at all, making every program a very close representation of a syntax tree. Which really helps when you have macros starting to manipulate those same trees. Other languages, not really wanting to look like Lisp, find it difficult-to-impossible to pull off the same trick.

The Perl languages love the difficult-to-impossible. Perl programmers publish half a dozen difficult-to-impossible solutions to CPAN before breakfast. And, because Perl 6 is awesome and syntactic macros are awesome, Perl 6 has syntactic macros.

It is known, Khaleesi.

What are macros?

For reasons even I don’t fully understand, I’ve put myself in charge of implementing syntactic macros in Rakudo. Implementing macros means understanding them. Understanding them means my brain melts regularly. Unless it fries. It’s about 50-50.

I have this habit where I come into the #perl6 channel, and exclaiming “macros are just X!” for various values of X. Here are some samples:

  • Macros are just syntax tree manipulators.
  • Macros are just “little compilers”.
  • Macros are just a kind of templates.
  • Macros are just routines that do code substitution.
  • Macros allow you to safely hand values back and forth between the compile-time world and the runtime world.

But the definition that I finally found that I like best of all comes from scalamacros.org:

Macros are functions that are called by the compiler during compilation. Within these functions the programmer has access to compiler APIs. For example, it is possible to generate, analyze and typecheck code.

While we only cover the “generate” part of it yet in Perl 6, there’s every expectation we’ll be getting to the “analyze and typecheck” parts as well.

Some examples, please?

Coming right up.

macro checkpoint {
  my $i = ++(state $n);
  quasi { say "CHECKPOINT $i"; }
}

checkpoint;
for ^5 { checkpoint; }
checkpoint;

The quasi block is Perl 6’s way of saying “a piece of code, coming right up!”. You just put your code in the quasi block, and return it from the macro routine.

This code inserts “checkpoints” in our code, like little debugging messages. There’s only three checkpoints in the code, so the output we’ll get looks like this:

CHECKPOINT 1
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 3

Note that the “code insertion” happens at compile time. That’s why we get five copies of the CHECKPOINT 2 line, because it’s the same checkpoint running five times. If we had had a subroutine instead:

sub checkpoint {
  my $i = ++(state $n);
  say "CHECKPOINT $i";
}

Then the program would print 7 distinct checkpoints.

CHECKPOINT 1
CHECKPOINT 2
CHECKPOINT 3
CHECKPOINT 4
CHECKPOINT 5
CHECKPOINT 6
CHECKPOINT 7

As a more practical example, let’s say you have logging output in your program, but you want to be able to switch it off completely. The problem with an ordinary logging subroutine is that with something like:

LOG "The answer is { time-consuming-computation() }";

The time-consuming-computation() will run and take a lot of time even if LOG subsequently finds that logging was turned off. (That’s just how argument evaluation works in a non-lazy language.)

A macro fixes this:

constant LOGGING = True;

macro LOG($message) {
  if LOGGING {
    quasi { say {{{$message}}} };
  }
}

Here we see a new feature: the {{{ }}} triple-block. (Syntax is likely to change in the near future, see below.) It’s our way to mix template code in the quasi block with code coming in from other places. Doing say $message; would have been wrong, because $message is a syntax tree of the message to be logged. We need to inject that syntax tree right into the quasi, and we do that with a triple-block.

The macro conditionally generates logging code in your program. If the constant LOGGING is switched on, the appropriate logging code will replace each LOG macro invocation. If LOGGING is off, each macro invocation will be replaced by literally nothing.

Experience shows that running no code at all is very efficient.

What are syntactic macros?

A lot of things are called “macros” in this world. In programming languages, there are two big categories:

  • Textual macros. They substitute code on the level of the source code text. C’s macros, or Perl 5’s source filters, are examples.
  • Syntactic macros. They substitute code on the level of the source code syntax tree. Lisp macros are an example.

Textual macros are very powerful, but they represent the kind of power that is just as likely to shoot half your leg off as it is to get you to your destination. Using them requires great care, of the same kind needed for a janitor gig at Jurassic Park.

The problem is that textual macros don’t compose all that well. Bring in more than one of them to work on the same bit of source code, and… all bets are off. This puts severe limits on modularity. Textual macros, being what they are, leak internal details all over the place. This is the big lesson from Perl 5’s source filters, as far as I understand.

Syntactic macros compose wonderfully. The compiler is already a pipeline handing off syntax trees between various processing steps, and syntactic macros are simply more such steps. It’s as if you and the compiler were two children, with the compiler going “Hey, you want to play in my sandbox? Jump right in. Here’s a shovel. We’ve got work to do.” A macro is a shovel.

And syntactic macros allow us to be hygienic, meaning that code in the macro and code outside of the macro don’t step on each other’s toes. In practice, this is done by carefully keeping track of the macros context and the mainline’s context, and making sure wires don’t cross. This is necessary for safe and large-scale composition. Textual macros don’t give us this option at all.

Future

Both of the examples in this post work already in Rakudo. But it might also be useful to know where we’re heading with macros in the next year or so. The list is in the approximate order I expect to tackle things.

  • Un-hygiene. While hygienic macros are the sane and preferable default, sometimes you want to step on the toes of the mainline code. There should be an opt-out, and escape hatch. This is next up.
  • Introspection. In order to analyze and typecheck code, not just generate it, we need to be able to take syntax trees coming in as macro arguments, and look inside of them. There are no tools for that yet, and there’s no spec to guide us here. But I’m fairly sure people will want this. The trick is to come up with something that doesn’t tie us down to one compiler’s internal syntax-tree format. Both for the sake of compiler interoperability and future compatibility.
  • Deferred declarations. The sandbox analogy isn’t so frivolous, really. If you declare a class inside a quasi block, that declaration is limited (“sandboxed”) to within that quasi block. Then, when the code is injected somewhere in the mainline because of a macro invocation, it should actually run. Fortunately, as it happens, the Rakudo internals are factored in such a way that this will be fairly straightforward to implement.
  • Better syntax. The triple-block syntax is probably going away in favor of something better. The problem isn’t the syntax so much as the fact that it currently only works for terms. We want it to work for basically all syntactic categories. A solid proposal for this is yet to materialize, though.

With each of these steps, I expect us to find new and fruitful uses of macros. Knowing my fellow Perl 6 developers, we’ll probably find some uses that will shock and disgust us all, too.

In conclusion

Perl 6 is awesome because it puts you, the programmer, in the driver seat. Macros are simply more of that.

Implementing macros makes your brain melt. However, using them is relatively straightforward.

Day 22 – Parsing an IPv4 address

December 22, 2012

Guest post by Herbert Breunung (lichtkind).

Perl 5 brought regexes to mainstream programming and set a standard, one that is felt as relevant even in Redmond. Perl 6, of course, steps up the game by adding many new features to the regex camp, including easy-to-build grammars for your own complex parsers. But without getting too complex, you can get a lot of joy out of Perl 6’s rx (that’s how Perl 6 spells Perl 5’s qr operator, that enables you to save a Regex in a variable).

Because the Perl 6 regex syntax is less littered with exceptional cases, Larry Wall also likes to joke that he put the “regular” back into “regular expression”.

Some of the changes are:

  • most special variables are gone,
  • non-capturing groups and other grouping syntax is easier to type,
  • no more single/multi line modes,
  • x mode became default, making whitespace non-significant by default.

In summary, regexes are more regular than in Perl 5, confirming Larry’s joke. They try a bit harder to make your life easier when you need to match text. Under the hood, regexes have blossomed out into a complete sub-language within the bigger Perl 6 language. A language with its own parsing rules.

But don’t fret; not everything has changed. Some things remain the same:

/\d+/

This regex still matches one or more consecutive digits.

Similarly, if you want to capture the digits, you can do this, just like you’re used to:

/(\d+)/

You’ll find the matched digits in $0, not $1 as in Perl 5. All the special variables $0, $1, $2 are really syntactic sugar for indexing the match variable ($/[0], $/[1], $/[2]). Because indices start at 0, it makes sense for the first matched group to be $0. In Perl 5, $0 contains the name of the script or program, but this has been renamed into $*EXECUTABLE_NAME in Perl 6.

Should you be interested in getting all of the captured groups of a regex match, you can use @(), which is syntactic sugar for @($/).

The object in the $/ variable holds lots of useful information about the last match. For example, $/.from will give you the starting string position of the match.

But $0 will get us far enough for this post. We use it to extract individual features from a string.

Sometimes we want to extract a whole bunch of similar things at once. Then we can use the :g (or :global) modifier on the regex:

$_ = '1 23 456 78.9';
say .Str for m:g/(\d+)/; # 1 23 456 78 9

Note that the :g — as opposed to prior regex implementations — sits up front, right at the start of the regex. Not at the end. That way, when you read the regex from left to right, you will know from the start how the regex is doing its matching. No more end-heavy regex expressions.

Matching “all things that look like this” is so useful, that there’s even a dedicated method for that, .comb:

$str.comb(/\d+/);

If you’re familiar with .split, you can think of .comb as its cheerful cousin, matching all the things that .split discards.

Let’s tackle the matching of an IPv4 address. Coming from a Perl 5 angle, we expect to have to do something like this:

/(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/

This won’t do in Perl 6, though. First of all, the {} blocks are real blocks in a Perl 6 regex; they contain Perl 6 code. Second, because Perl 6 has lots of error handling to catch p5isms, like this, you’ll get an error saying “Unsupported use of {N,M} as general quantifier; in Perl 6 please use ** N..M (or ** N..*)”.

So let’s do that. To match between one and three digits in a Perl 6 regex, we should type:

/\d ** 1..3/

Note how the regex sublanguage re-uses parts from the main Perl 6 language. ** can be seen as a kind of exponentiation (if we squint), in that we’re taking \d “to the between-first-and-third power”. And the range notation 1..3 exists both outside and within regexes.

Using our new knowledge about the repetition quantifier, we end up with something like this:

/(\d**1..3) \. (\d**1..3) \. (\d**1..3) \. (\d**1..3)/

That’s still kinda clunky. We might end up wishing that we could use the repetition operator again, but those literal dots in between prevent us from doing that. If only we could specify repetition a given number of times and a divider.

In Perl 6 regexes, you can.

/ (\d ** 1..3) ** 4 % '.' /

The % operator here is a quantifier modifier, so it can only follow on a quantifier like * or + or **. The choice of % for this function is relatively new in Perl 6, and you may prefer to read it as “modulo”, just like in the main language. That is, “match four groups of digits, modulo literal dots in between”. Or you could think of the dots in between as the “remainder”, the separators that are left after you’ve parsed the actual elements.

Oh, and you might’ve noticed that \. changed to '.' on the way. We can use either; they mean exactly the same. In Perl 5, there isn’t a simple rule saying which symbols have a magic meaning and which ones simply signify themselves. In Perl 6, it’s easy: word characters (alphanumerics and the underscore) always signify themselves. Everything else has to be escaped or quoted to get its literal meaning.

Putting it all together, here’s how we would extract IPv4 addresses out of a string:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";

say .Str for m:g/ (\d ** 1..3) ** 4 % '.' /;
# output: 127.0.0.1 173.194.32.32

Or, we could use .comb:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";
my @ip4addrs = .comb(/ (\d ** 1..3) ** 4 % '.' /);

If we’re interested in individual integers, we can get those too:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";
say .list>>.Str.perl for m:g/ (\d ** 1..3) ** 4 % '.' /;
# output: ("127", "0", "0", "1") ("173", "194", "32", "32")

If you want to know more, read the S05, or watch me battling with my slide deck and the English language in this presentation about regexes.

Day 20 – Dynamic variables and DSL-y things

December 20, 2012

Today, let’s talk about DSLs.

Post from the past: a motivating example

Two years ago I wrote a blog post about Nim, a game played with piles of stones. I just put in ASCII diagrams of the actual Nim stone piles, telling myself that if I had time, I would put in fancy SVG diagrams, generated with Perl 6.

Naturally, I didn’t have time. My self-imposed deadline ran out, and I published the post with simple ASCII diagrams.

But time is ever-regenerative, and there for people who want it. So, let’s generate some fancy SVG diagrams with Perl 6.

Have bit array, want SVG

What do we need, exactly? Well, a subroutine that takes an array of piles as input and generates an SVG file would be a really good start.

Let’s take the last “image” in the post as an example:

3      OO O
4 OOOO
5 OOOO    O

For the moment, let’s ignore the numbers at the left margin; they’re just counting stones. We summarize the piles themselves as a kind of bitmap, which also forms the input to the function:

my @piles =
    [0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 0, 1, 1, 0, 1],
    [1, 1, 1, 1, 0, 0, 0, 0, 1];

nim-svg(@piles);

At this point, we need only create the nim-svg function itself, and make it render SVG from this bitmap. Since I’ve long since tired of outputting SVG by hand, I use the SVG module, which comes bundled with Rakudo Star.

use SVG;

sub nim-svg(@piles) {
    my $width = max map *.elems, @piles;
    my $height = @piles.elems;

    my @elements = gather for @piles.kv -> $row, @pile {
        for @pile.kv -> $column, $is_filled {
            if $is_filled {
                take 'circle' => [
                    :cx($column + 0.5),
                    :cy($row + 0.5),
                    :r(0.4)
                ];
            }
        }
    }
    
    say SVG.serialize('svg' => [ :$width, :$height, @elements ]);
}

I think you can follow the logic in there. The subroutine simply iterates over the bitmap, turning 1s into circles with appropriate coordinates.

That’s it?

Well, this will indeed generate an SVG image for us, with the stones correctly placed. But let’s look again at the input that helped create this image:

    [0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 0, 1, 1, 0, 1],
    [1, 1, 1, 1, 0, 0, 0, 0, 1];

Clearly, though we can discern the stones and gaps in there if we squint in a bit-aware programmer’s fashion, the input isn’t… visually attractive. (The zeroes even look like stones, even though they’re gaps!)

We can do better

Instead of using a bit array, let’s start from the desired SVG image and try to make the input look like that.

So, this is what I would prefer to write instead of a bitmask:

nim {
  _ _ _ _ _ _ _ _ o;
  o o o o _ o o _ o;
  o o o o _ _ _ _ o;
}

That’s better. That looks more like my original ASCII diagram, while still being syntactic Perl 6 code.

Making a DSL

Wikipedia talks about a DSL as a language “dedicated to a particular problem domain”. Well, the above way of specifying the input would be a DSL dedicated to solving the draw-SVG-images-of-Nim-positions domain. (Admittedly a fairly narrow domain. But I’m mostly out to show the potential of DSLs in Perl 6, not to change the world with this particular DSL.)

Now that we have the desired end state, how do we connect the wires and make the above work? Clearly we need to declare three subroutines: nim, _, o. (Yes, you can name a subroutine _, no sweat.)

sub nim(&block) {
    my @*piles;
    my @*current-pile;

    &block();
    finish-last-pile();
    
    nim-svg(@*piles);
}

sub _(@rest?) {
    unless @rest {
        finish-last-pile();
    }
    @*current-pile = 0, @rest;
    return @*current-pile;
}

sub o(@rest?) {
    unless @rest {
        finish-last-pile();
    }
    @*current-pile = 1, @rest;
    return @*current-pile;
}

Ok… explain, please?

A couple of things are going on here.

  • The two variables @*piles and @*current-pile are dynamic variables which means that they are visible not just in the current lexical scope, but also in all subroutines called before the current scope has finished. Notably, the two subroutines _ and o.
  • The two subroutines _ and o take an optional parameter. On each row, the rightmost _ or o acts as a silent “start of pile” marker, taking the time to do a bit of bookkeeping with the piles, storing away the last pile and starting on a new one.
  • Each row in the DSL-y input basically forms a chain of subroutine calls. We take this into account by both incrementally building the @*current-pile array at each step, all the while returning it as (possible) input for the next subroutine call in the chain.

And that’s it. Oh yeah, we need the bookkeeping routine finish-last-pile, too:

sub finish-last-pile() {
    if @*current-pile {
        push @*piles, [@*current-pile];
    }
    @*current-pile = ();
}

So, it works?

Now, the whole thing works. We can turn this DSL-y input:

nim {
  _ _ _ _ _ _ _ _ o;
  o o o o _ o o _ o;
  o o o o _ _ _ _ o;
}

…into this SVG output:

<svg
  xmlns="http://www.w3.org/2000/svg"
  xmlns:svg="http://www.w3.org/2000/svg"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  width="9" height="3">

  <circle cx="8.5" cy="0.5" r="0.4" />
  <circle cx="0.5" cy="1.5" r="0.4" />
  <circle cx="1.5" cy="1.5" r="0.4" />
  <circle cx="2.5" cy="1.5" r="0.4" />
  <circle cx="3.5" cy="1.5" r="0.4" />
  <circle cx="5.5" cy="1.5" r="0.4" />
  <circle cx="6.5" cy="1.5" r="0.4" />
  <circle cx="8.5" cy="1.5" r="0.4" />
  <circle cx="0.5" cy="2.5" r="0.4" />
  <circle cx="1.5" cy="2.5" r="0.4" />
  <circle cx="2.5" cy="2.5" r="0.4" />
  <circle cx="3.5" cy="2.5" r="0.4" />
  <circle cx="8.5" cy="2.5" r="0.4" />
</svg>

Yay!

Summary

The principles I used in this post are fairly easy to generalize. Start from your desired DSL, and create the subroutines to make it happen. Have dynamic variables handle the communication between separate subroutines.

DSLs are nice because they allow us to shape the code we’re writing around the problem we’re solving. Using relatively little “adapter code”, we’re left to focus on describing and solving problems in a natural way, making the programming language rise to our needs instead of lowering ourselves down to its needs.

Day 16 – Operator precedence

December 16, 2012

All the precedence men

As I was taking a walk today, I realized one of the reasons why I like Perl. Five as well as six. I often hear praise such as “Perl fits the way I think”. And I have that feeling too sometimes.

If I were the president (or prime minister, as I’m Swedish), and had a bunch of advisers, maybe some of them would be yes-men, trying to give me advice that they think I will want to hear, instead of advice that would be useful to me. Some languages are like that, presenting us with an incomplete subset of the necessary tools. The Perl languages, if they were advisers, wouldn’t be yes-men. They’d give me an accurate view of the world, even if that view would be a bit messy and hairy sometimes.

Which, I guess, is why Perl five and six are so often used in handling messy data and turning it into something useful.

To give a few specific examples:

  • Perl 5 takes quotes and quoting very seriously. Not just strings but lists of strings, too. (See the qw keyword.) Perl 6 does the same, but takes quoting further. See see the recent post on quoting.
  • jnthn shows in yesterday’s advent post that Perl 6 takes compiler phases seriously, and allows us to bundle together code that belongs together conceptually but not temporally. We need to do this because the world is gnarly and running a program happens in phases.
  • Grammars in Perl 6 are not just powerful, but in some sense honest, too. They don’t oversimplify the task for the programmer, because then they would also limit the expressibility. Even though grammars are complicated and intricate, they should be, because they describe a process (parsing) that is complicated and intricate.

Operators

Perl is known for its many operators. Some would describe it as an “operator-oriented” language. Where many other language will try to guess how you want your operators to behave on your values, or perhaps demand that you pre-declare all your types so that there’ll be no doubt, Perl 6 carries much of the typing information in its operators:

my $a = 5;
my $b = 6;

say $a + $b;      # 11 (numeric addition)
say $a * $b;      # 30 (numeric multiplication)

say $a ~ $b;      # "56" (string concatenation)
say $a x $b;      # "555555" (string repetition)

say $a || $b;     # 5 (boolean disjunction)
say $a && $b;     # 6 (boolean conjunction)

Other languages will want to bunch together some of these for us, using the + operator for both numeric addition and string concatenation, for example. Not so Perl. You’re meant to choose yourself, because the choice matters. In return, Perl will care a little less about the types of the operands, and just deliver the appropriate result for you.

“The appropriate result” is most often a number if you used a numeric operator, and a string if you used a string operator. But sometimes it’s more subtle than that. Note that the boolean operators above actually preserved the numbers 5 and 6 for us, even though internally it treated them both as true values. In C, if we do the same, C will unhelpfully “flatten” these results down to the value 1, its spelling of the value true. Perl knows that truthiness comes in many flavors, and retains the particular flavor for you.

Operator precedence

“All operators are equal, but some operators are more equal than others.” It is when we combine operators that we realize that the operators have different “tightness”.

say 2 * 3 + 1;      # 7, because (2 * 3) + 1
say 1 + 2 * 3;      # 7, because 1 + (2 * 3), not 9

We can always be 100% explicit and surround enough of our operations with parentheses… but when we don’t, the operators seem to order themselves in some order, which is not just simple left-to-right evaluation. This ordering between operators is what we refer to as “precedence”.

No doubt you were taught in math class in school that multiplications should be evaluated before additions in the way we see above. It’s as if factors group together closer than terms do. The fact that this difference in precedence is useful is backed up by centuries of algebra notation. Most programming languages, Perl 6 included, incorporates this into the language.

By the way, this difference in precedence is found between other pairs of operators, even outside the realm of mathematics:

      Additive (loose)    Multiplicative (tight)
      ================    ======================
number      +                       *
string      ~                       x
bool        ||                      &&

It turns out that they make as much sense for other types as they do for numbers. And group theory bears this out: these other operators can be seen as a kind of addition and multiplication, if we squint.

Operator precedence parser

Deep in the bowels of the Perl 6 parser sits a smaller parser which is very good at parsing expressions. The bigger parser which parses your Perl 6 program is a really good recursive-descent parser. It works great for creating syntax trees out of the larger program structure. It works less well on the level of expressions. Essentially, what trips up a recursive-descent parser is that it always has to create AST nodes for all the possible precedence levels, whether they’re present or not.

So this smaller parser is an operator-table parser. It knows what to do with each type of operator (prefix, infix, postfix…), and kind of weaves all the terms and operators into a syntax tree. Only the precedence levels actually used show up in the tree.

The optable parser works by comparing each new operator to the top operator on a stack of operators. So when it sees an expression like this:

$x ** 2 + 3 * $x - 5

it will first compare ** against + and decide that the former is tighter, and thus $x ** 2 should be put together into a small tree. Later, it compares + against *, and decides to turn 3 * $x into a small tree. It goes on like this, eventually ending up with this tree structure:

infix:<->
 +-- infix:<+>
      +-- infix:<**>
      |    +-- term:<$x>
      |    +-- term:<2>
      +-- infix:<*>
           +-- term:<3>
           +-- term:<$x>

Because leaf nodes are evaluated first and the root node last, this tree structure determines the order of evaluation for the expression. The order ends up being the same as if the expression had these parentheses:

(($x ** 2) + (3 * $x)) - 5

Which, again, is what we’ve learned to expect.

Associativity

Another factor also governs how these invisible parentheses are to be distributed: operator associativity. It’s the concern of how the operator combines with multiple copies of itself, or other sufficiently similar operators on the same precedence level.

Some examples serve to explain the difference:

$x = $y = $z;     # becomes $x = ($y = $z)
$x / $y / $z;     # becomes ($x / $y) / $z

In both of these cases, we may look at the way the parentheses are doled out, and say “well, of course”. Of course we must first assign to $y and only then to $x. And of course we first divide by $y and only then by $z. So operators naturally have different associativity.

The optable parser compares not just the precedence of two operators but also, when needed, their associativity. And it puts the parentheses in the right place, just as above.

User-defined operators

Now we come back to Perl not being a yes-man, and working hard to give you the appropriate tools for the job.

Perl 6 allows you to define operators. See my post from last year on the details of how. But it also allows you to specify precedence and associativity of each new operator.

As you specify a new operator, a new Perl 6 parser is automatically constructed for you behind the scenes, which contains your new operator. In this sense, the optable parser is open and extensible. And Perl 6 gives you exactly the same tools for talking about precedence and associativity as the compiler itself uses internally.

Perl treats you like a grown-up, and expects you to make good decisions based on a thorough understanding of the problem space. I like that.

Day 9 – Longest Token Matching

December 9, 2012

Perl 6 regular expressions prefer to match the longest alternative when possible.

say "food and drink" ~~ / foo | food /;   # food

This is in contrast to Perl 5, which would prefer the first alternative above, and produce the match “foo”.

You can still get the first-alternative behavior if you want; it’s tucked away in the slightly longer alternation operator ||:

say "food and drink" ~~ / foo || food /;  # foo

…And that’s it! That’s Longest Token Matching. ☺ Short post.

“Huh, wait!” I hear you exclaim, in a desperate attempt to make the daily Perl 6 Advent goodness last a bit longer. “Why is Longest Token Matching such a big deal? Who would ever be so obsessed with long tokens?”

I’m glad you asked. As it turns out, Longest Token Matching (or LTM for short) plays very well with our intuition about how things should be parsed. If you’re creating a language, you want people to be able to declare a variable forest_density without the mention of this variable clashing with the syntax of for loops. LTM will see to that.

I like “strange consistencies” — when distal parts of a language design turn out to have commonalities that make the language feel more uniform. There is that kind of consistency here, between classes and grammars. Perl 6 basically exploits that consistency to the max. Let me briefly map out what I mean.

We’re all used to writing classes at this point. From a birds-eye view, they look like this:

class {
    method
    method
    method
}

Grammars have a suspiciously similar structure:

grammar {
    rule
    rule
    rule
}

(The keywords are actually regex, token and rule, but when we talk about them as a group, we just call them “rules”.)

We’re also used to being able to derive classes into subclasses (class B is A), and add or override methods in a way which produces a nice mix of old and new behavior. Perl 6 provides multi methods which even allow you to add new methods of the same name, and the old ones won’t be overridden, they’ll just all try to match alongside the new methods. The dispatch is handled by a (usually autogenerated) proto method that dispatches to all eligible candidates.

What does all this have to do with grammars and rules? Well, it turns out that first off, you can derive new grammars from old ones. It works the same as deriving classes. (In fact, under the hood it’s exactly the same mechanism. Grammars are classes with a different metaclass object.) New rules will override old rules just like you’d expect with methods.

S05 has a cute example with parsing of letters, and deriving the grammar to parse formal letters:

    grammar Letter {
         rule text     { <greet> $<body>=<line>+? <close> }
         rule greet    { [Hi|Hey|Yo] $<to>=\S+? ',' }
         rule close    { Later dude ',' $<from>=.+ }
         token line    { \N* \n}
     }

     grammar FormalLetter is Letter {
         rule greet { Dear $<to>=\S+? ',' }
         rule close { Yours sincerely ',' $<from>=.+ }
     }

The derived FormalLetter overrides greet and close, but not line.

But what about all the goodness with multi methods? Could we define some kind of “proto rule” that would allow us to have several rules in a grammar with the same name but different bodies? For example, we might want to parse a language with a rule term, but there are many different terms: strings, numbers… and maybe the numbers can be decimal or binary or octal or hexadecimal…

Perl 6 grammars can contain a proto rule, and then you can define and redefine a rule with the same name as many times as you want. And now we’re back full circle with the / foo | food / alternation from the start of the article. All those rules you write with the same name compile down to one big alternation. Not only that — rules which call other rules, some of them possibly proto rules, all of that will be “flattened” out into one big LTM alternation. In practice that means that all the possible things a term can be are tried out all at once, on equal footing. Neither alternative wins because you happened to define it before the others. An alternative wins because it is the longest.

The strange consistency resides in the fact that in the call-a-method side of things, the most specific method wins, and “most specific” has to with signature narrowness. The better the types in the signature describe the arguments coming in, the more specific the method.

In the parse-with-a-rule side of things, the most specific rule wins, but here “most specific” has to do with parse success. The better the rule can describe what comes next in the text, the more specific the rule.

And that’s strangely consistent, because on the surface methods and rules look like quite different beasts.

We really believe we have something going with this whole principle of deriving a grammar and getting a new language. LTM is right at the center of that because it allows new rules and old to intermix in a fair and predictable way. It’s a kind of meritocracy: rules win not based on whether they’re young or old, but based on whether they are able to parse the text well.

In fact, the Perl 6 compiler itself works this way. It parses your program using a Perl 6 grammar, and that grammar is derivable… whenever you declare a new operator in your program, a new grammar is derived for you. The parsing of your operator is added as a new rule in the new grammar, and the new grammar is given the task of parsing the rest of your program. Your new operator will win against similar but shorter ones, and lose against similar but longer ones.

Day 2 – Anonymous functions for great good

December 2, 2012

Perl 6 has great support for functions. It packs function signatures full with awesome, and lets you have your cake and eat it a couple of times over with all the ways you can specify a function. You can specify parameter types, optional parameters, named parameters, and even those cool where clauses. If I didn’t know better, I’d suspect Perl 6 was compensating for some predecessor’s rather rudimentary handling of parameters. (*cough* @_ *cough*)

Among all these other things, Perl 6 also allows you to define functions without naming them.

sub { say "lol, I'm so anonymous!" }

How is this useful? If you can’t name the function, you can’t call it, right? Wrong.

You can store the function in a variable. Or return it from another function. Or pass it to another function. In fact, when you don’t name your function, the focus becomes much more what code you’re going to run later. Like an executable “to do”-list.

Of course, Perl 5 has anonymous functions, too. With exactly the same syntax, even. In fact, all the big languages do anonymous functions, according to this list of languages on Wikipedia. Except, it seems, the historically significant languages C and Pascal. And the more modern but lumbering Java. “Planned for Java 8″. Haha, Java, catch up! Even C++ has them now.

How important are anonymous functions? Very. In the 1930s, Alan Turing showed how all computer processes could be simulated using just a pre-programmed machine that looks like a tape recorder, reading and writing values on a really long tape. (The Turing Machine.) Meanwhile, across the Atlantic, Alonzo Church showed how all computer processes could be simulated using just anonymous functions, no tape recorder required. (Lambda calculus.) It’s all quite elegant.

Later languages like Lisp and Scheme lean heavily on anonymous functions as a key component in the language. And lately a scrappy language called JavaScript, which also leans heavily on anonymous functions, has taken over the world while we were all busy surfing the web.

But let’s talk possibilities here. What can anonymous functions do for us? And how would it look in Perl 6?

Well, take sorting as a famous example. You could imagine Perl 6 having a sort_lexicographically function and a sort_numerically function. But it doesn’t. It has a sort function. When you want it to sort in a certain way, you just pass an anonymous function to it.

my @sorted_words = @words.sort({ ~$_ });
my @sorted_numbers = @numbers.sort({ +$_ });

(Technically, those are blocks, not functions. But the difference isn’t significant if you’re not planning to return anywhere inside.)

And of course it goes further than just those two sort orders. You could sort by shoe size, or maximum ground speed, or decreasing likelihood of spontaneous combustion. All because you can pass in any logic as an argument. Object-oriented people are very proud of this pattern, and call it “dependency injection”.

Come to think of it, map and grep and reduce all depend on this kind of function-passing. We sometimes refer to passing functions to functions as “higher order programming”, as if it was only something people with special privileges should be doing. But in fact it’s a very useful and broadly applicable technique.

The above examples all run the anonymous functions as part of their own execution. But there’s no need to restrict ourselves to this. We can create functions, return them, and then run them later:

sub make_surprise_for($name) {
    return sub { say "Sur-priiise, $name!" };
}

my $reveal_surprise = make_surprise_for("Finn");    # nothing happens, yet
# ...wait for it...
# ...wait...
# ...waaaaaaait...
$reveal_surprise();        # "Sur-priiise, Finn!"

The function in $reveal_surprise remembers the value of $name even though the original function passing it in has exited long ago. That’s pretty nice. This effect is referred to as the anonymous function closing over the variable $name. But there’s no need to get technical — the long and short of it is “it’s awesome”.

And in fact, it feels quite natural if we just look at anonymous functions alongside other staple storage mechanisms such as arrays and hashes. All of these can be stored in variables, passed as arguments or returned from functions. An anonymous array allows you to store a sequence of things for later. An anonymous hash allows you to store mappings/translations of things for later. An anonymous function allows you to store calculations or behavior for later.

Later this month, I’ll go through how to exploit dynamic scoping in Perl 6 to create nice DSL-y interfaces. We’ll see how anonymous functions come into play there as well.


Follow

Get every new post delivered to your Inbox.

Join 37 other followers