Author Archive

Day 23 – Macros

December 23, 2012

Syntactic macros. The Lisp gods of yore provided humanity with this invention, essentially making Lisp a programmable programming language. Lisp adherents often look at the rest of the programming world with pity, seeing them fighting to invent wheels that were wrought and polished back in the sixties when giants walked the Earth and people wrote code in all-caps.

And the Lisp adherents see that the rest of us haven’t even gotten to the best part yet, the part with syntactic macros. We’re starting to get the hang of automatic memory management, continuations, and useful first-class functions. But macros are still absent from this picture.

In part, this is because in order to have proper syntactic macros, you basically have to look like Lisp. You know, with the parentheses and all. Lisp ends up having almost no syntax at all, making every program a very close representation of a syntax tree. Which really helps when you have macros starting to manipulate those same trees. Other languages, not really wanting to look like Lisp, find it difficult-to-impossible to pull off the same trick.

The Perl languages love the difficult-to-impossible. Perl programmers publish half a dozen difficult-to-impossible solutions to CPAN before breakfast. And, because Perl 6 is awesome and syntactic macros are awesome, Perl 6 has syntactic macros.

It is known, Khaleesi.

What are macros?

For reasons even I don’t fully understand, I’ve put myself in charge of implementing syntactic macros in Rakudo. Implementing macros means understanding them. Understanding them means my brain melts regularly. Unless it fries. It’s about 50-50.

I have this habit where I come into the #perl6 channel, and exclaiming “macros are just X!” for various values of X. Here are some samples:

  • Macros are just syntax tree manipulators.
  • Macros are just “little compilers”.
  • Macros are just a kind of templates.
  • Macros are just routines that do code substitution.
  • Macros allow you to safely hand values back and forth between the compile-time world and the runtime world.

But the definition that I finally found that I like best of all comes from scalamacros.org:

Macros are functions that are called by the compiler during compilation. Within these functions the programmer has access to compiler APIs. For example, it is possible to generate, analyze and typecheck code.

While we only cover the “generate” part of it yet in Perl 6, there’s every expectation we’ll be getting to the “analyze and typecheck” parts as well.

Some examples, please?

Coming right up.

macro checkpoint {
  my $i = ++(state $n);
  quasi { say "CHECKPOINT $i"; }
}

checkpoint;
for ^5 { checkpoint; }
checkpoint;

The quasi block is Perl 6′s way of saying “a piece of code, coming right up!”. You just put your code in the quasi block, and return it from the macro routine.

This code inserts “checkpoints” in our code, like little debugging messages. There’s only three checkpoints in the code, so the output we’ll get looks like this:

CHECKPOINT 1
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 2
CHECKPOINT 3

Note that the “code insertion” happens at compile time. That’s why we get five copies of the CHECKPOINT 2 line, because it’s the same checkpoint running five times. If we had had a subroutine instead:

sub checkpoint {
  my $i = ++(state $n);
  say "CHECKPOINT $i";
}

Then the program would print 7 distinct checkpoints.

CHECKPOINT 1
CHECKPOINT 2
CHECKPOINT 3
CHECKPOINT 4
CHECKPOINT 5
CHECKPOINT 6
CHECKPOINT 7

As a more practical example, let’s say you have logging output in your program, but you want to be able to switch it off completely. The problem with an ordinary logging subroutine is that with something like:

LOG "The answer is { time-consuming-computation() }";

The time-consuming-computation() will run and take a lot of time even if LOG subsequently finds that logging was turned off. (That’s just how argument evaluation works in a non-lazy language.)

A macro fixes this:

constant LOGGING = True;

macro LOG($message) {
  if LOGGING {
    quasi { say {{{$message}}} };
  }
}

Here we see a new feature: the {{{ }}} triple-block. (Syntax is likely to change in the near future, see below.) It’s our way to mix template code in the quasi block with code coming in from other places. Doing say $message; would have been wrong, because $message is a syntax tree of the message to be logged. We need to inject that syntax tree right into the quasi, and we do that with a triple-block.

The macro conditionally generates logging code in your program. If the constant LOGGING is switched on, the appropriate logging code will replace each LOG macro invocation. If LOGGING is off, each macro invocation will be replaced by literally nothing.

Experience shows that running no code at all is very efficient.

What are syntactic macros?

A lot of things are called “macros” in this world. In programming languages, there are two big categories:

  • Textual macros. They substitute code on the level of the source code text. C’s macros, or Perl 5′s source filters, are examples.
  • Syntactic macros. They substitute code on the level of the source code syntax tree. Lisp macros are an example.

Textual macros are very powerful, but they represent the kind of power that is just as likely to shoot half your leg off as it is to get you to your destination. Using them requires great care, of the same kind needed for a janitor gig at Jurassic Park.

The problem is that textual macros don’t compose all that well. Bring in more than one of them to work on the same bit of source code, and… all bets are off. This puts severe limits on modularity. Textual macros, being what they are, leak internal details all over the place. This is the big lesson from Perl 5′s source filters, as far as I understand.

Syntactic macros compose wonderfully. The compiler is already a pipeline handing off syntax trees between various processing steps, and syntactic macros are simply more such steps. It’s as if you and the compiler were two children, with the compiler going “Hey, you want to play in my sandbox? Jump right in. Here’s a shovel. We’ve got work to do.” A macro is a shovel.

And syntactic macros allow us to be hygienic, meaning that code in the macro and code outside of the macro don’t step on each other’s toes. In practice, this is done by carefully keeping track of the macros context and the mainline’s context, and making sure wires don’t cross. This is necessary for safe and large-scale composition. Textual macros don’t give us this option at all.

Future

Both of the examples in this post work already in Rakudo. But it might also be useful to know where we’re heading with macros in the next year or so. The list is in the approximate order I expect to tackle things.

  • Un-hygiene. While hygienic macros are the sane and preferable default, sometimes you want to step on the toes of the mainline code. There should be an opt-out, and escape hatch. This is next up.
  • Introspection. In order to analyze and typecheck code, not just generate it, we need to be able to take syntax trees coming in as macro arguments, and look inside of them. There are no tools for that yet, and there’s no spec to guide us here. But I’m fairly sure people will want this. The trick is to come up with something that doesn’t tie us down to one compiler’s internal syntax-tree format. Both for the sake of compiler interoperability and future compatibility.
  • Deferred declarations. The sandbox analogy isn’t so frivolous, really. If you declare a class inside a quasi block, that declaration is limited (“sandboxed”) to within that quasi block. Then, when the code is injected somewhere in the mainline because of a macro invocation, it should actually run. Fortunately, as it happens, the Rakudo internals are factored in such a way that this will be fairly straightforward to implement.
  • Better syntax. The triple-block syntax is probably going away in favor of something better. The problem isn’t the syntax so much as the fact that it currently only works for terms. We want it to work for basically all syntactic categories. A solid proposal for this is yet to materialize, though.

With each of these steps, I expect us to find new and fruitful uses of macros. Knowing my fellow Perl 6 developers, we’ll probably find some uses that will shock and disgust us all, too.

In conclusion

Perl 6 is awesome because it puts you, the programmer, in the driver seat. Macros are simply more of that.

Implementing macros makes your brain melt. However, using them is relatively straightforward.

Day 22 – Parsing an IPv4 address

December 22, 2012

Guest post by Herbert Breunung (lichtkind).

Perl 5 brought regexes to mainstream programming and set a standard, one that is felt as relevant even in Redmond. Perl 6, of course, steps up the game by adding many new features to the regex camp, including easy-to-build grammars for your own complex parsers. But without getting too complex, you can get a lot of joy out of Perl 6′s rx (that’s how Perl 6 spells Perl 5′s qr operator, that enables you to save a Regex in a variable).

Because the Perl 6 regex syntax is less littered with exceptional cases, Larry Wall also likes to joke that he put the “regular” back into “regular expression”.

Some of the changes are:

  • most special variables are gone,
  • non-capturing groups and other grouping syntax is easier to type,
  • no more single/multi line modes,
  • x mode became default, making whitespace non-significant by default.

In summary, regexes are more regular than in Perl 5, confirming Larry’s joke. They try a bit harder to make your life easier when you need to match text. Under the hood, regexes have blossomed out into a complete sub-language within the bigger Perl 6 language. A language with its own parsing rules.

But don’t fret; not everything has changed. Some things remain the same:

/\d+/

This regex still matches one or more consecutive digits.

Similarly, if you want to capture the digits, you can do this, just like you’re used to:

/(\d+)/

You’ll find the matched digits in $0, not $1 as in Perl 5. All the special variables $0, $1, $2 are really syntactic sugar for indexing the match variable ($/[0], $/[1], $/[2]). Because indices start at 0, it makes sense for the first matched group to be $0. In Perl 5, $0 contains the name of the script or program, but this has been renamed into $*EXECUTABLE_NAME in Perl 6.

Should you be interested in getting all of the captured groups of a regex match, you can use @(), which is syntactic sugar for @($/).

The object in the $/ variable holds lots of useful information about the last match. For example, $/.from will give you the starting string position of the match.

But $0 will get us far enough for this post. We use it to extract individual features from a string.

Sometimes we want to extract a whole bunch of similar things at once. Then we can use the :g (or :global) modifier on the regex:

$_ = '1 23 456 78.9';
say .Str for m:g/(\d+)/; # 1 23 456 78 9

Note that the :g — as opposed to prior regex implementations — sits up front, right at the start of the regex. Not at the end. That way, when you read the regex from left to right, you will know from the start how the regex is doing its matching. No more end-heavy regex expressions.

Matching “all things that look like this” is so useful, that there’s even a dedicated method for that, .comb:

$str.comb(/\d+/);

If you’re familiar with .split, you can think of .comb as its cheerful cousin, matching all the things that .split discards.

Let’s tackle the matching of an IPv4 address. Coming from a Perl 5 angle, we expect to have to do something like this:

/(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3})/

This won’t do in Perl 6, though. First of all, the {} blocks are real blocks in a Perl 6 regex; they contain Perl 6 code. Second, because Perl 6 has lots of error handling to catch p5isms, like this, you’ll get an error saying “Unsupported use of {N,M} as general quantifier; in Perl 6 please use ** N..M (or ** N..*)”.

So let’s do that. To match between one and three digits in a Perl 6 regex, we should type:

/\d ** 1..3/

Note how the regex sublanguage re-uses parts from the main Perl 6 language. ** can be seen as a kind of exponentiation (if we squint), in that we’re taking \d “to the between-first-and-third power”. And the range notation 1..3 exists both outside and within regexes.

Using our new knowledge about the repetition quantifier, we end up with something like this:

/(\d**1..3) \. (\d**1..3) \. (\d**1..3) \. (\d**1..3)/

That’s still kinda clunky. We might end up wishing that we could use the repetition operator again, but those literal dots in between prevent us from doing that. If only we could specify repetition a given number of times and a divider.

In Perl 6 regexes, you can.

/ (\d ** 1..3) ** 4 % '.' /

The % operator here is a quantifier modifier, so it can only follow on a quantifier like * or + or **. The choice of % for this function is relatively new in Perl 6, and you may prefer to read it as “modulo”, just like in the main language. That is, “match four groups of digits, modulo literal dots in between”. Or you could think of the dots in between as the “remainder”, the separators that are left after you’ve parsed the actual elements.

Oh, and you might’ve noticed that \. changed to '.' on the way. We can use either; they mean exactly the same. In Perl 5, there isn’t a simple rule saying which symbols have a magic meaning and which ones simply signify themselves. In Perl 6, it’s easy: word characters (alphanumerics and the underscore) always signify themselves. Everything else has to be escaped or quoted to get its literal meaning.

Putting it all together, here’s how we would extract IPv4 addresses out of a string:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";

say .Str for m:g/ (\d ** 1..3) ** 4 % '.' /;
# output: 127.0.0.1 173.194.32.32

Or, we could use .comb:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";
my @ip4addrs = .comb(/ (\d ** 1..3) ** 4 % '.' /);

If we’re interested in individual integers, we can get those too:

$_ = "Go 127.0.0.1, I said! He went to 173.194.32.32.";
say .list>>.Str.perl for m:g/ (\d ** 1..3) ** 4 % '.' /;
# output: ("127", "0", "0", "1") ("173", "194", "32", "32")

If you want to know more, read the S05, or watch me battling with my slide deck and the English language in this presentation about regexes.

Day 20 – Dynamic variables and DSL-y things

December 20, 2012

Today, let’s talk about DSLs.

Post from the past: a motivating example

Two years ago I wrote a blog post about Nim, a game played with piles of stones. I just put in ASCII diagrams of the actual Nim stone piles, telling myself that if I had time, I would put in fancy SVG diagrams, generated with Perl 6.

Naturally, I didn’t have time. My self-imposed deadline ran out, and I published the post with simple ASCII diagrams.

But time is ever-regenerative, and there for people who want it. So, let’s generate some fancy SVG diagrams with Perl 6.

Have bit array, want SVG

What do we need, exactly? Well, a subroutine that takes an array of piles as input and generates an SVG file would be a really good start.

Let’s take the last “image” in the post as an example:

3      OO O
4 OOOO
5 OOOO    O

For the moment, let’s ignore the numbers at the left margin; they’re just counting stones. We summarize the piles themselves as a kind of bitmap, which also forms the input to the function:

my @piles =
    [0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 0, 1, 1, 0, 1],
    [1, 1, 1, 1, 0, 0, 0, 0, 1];

nim-svg(@piles);

At this point, we need only create the nim-svg function itself, and make it render SVG from this bitmap. Since I’ve long since tired of outputting SVG by hand, I use the SVG module, which comes bundled with Rakudo Star.

use SVG;

sub nim-svg(@piles) {
    my $width = max map *.elems, @piles;
    my $height = @piles.elems;

    my @elements = gather for @piles.kv -> $row, @pile {
        for @pile.kv -> $column, $is_filled {
            if $is_filled {
                take 'circle' => [
                    :cx($column + 0.5),
                    :cy($row + 0.5),
                    :r(0.4)
                ];
            }
        }
    }
    
    say SVG.serialize('svg' => [ :$width, :$height, @elements ]);
}

I think you can follow the logic in there. The subroutine simply iterates over the bitmap, turning 1s into circles with appropriate coordinates.

That’s it?

Well, this will indeed generate an SVG image for us, with the stones correctly placed. But let’s look again at the input that helped create this image:

    [0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 0, 1, 1, 0, 1],
    [1, 1, 1, 1, 0, 0, 0, 0, 1];

Clearly, though we can discern the stones and gaps in there if we squint in a bit-aware programmer’s fashion, the input isn’t… visually attractive. (The zeroes even look like stones, even though they’re gaps!)

We can do better

Instead of using a bit array, let’s start from the desired SVG image and try to make the input look like that.

So, this is what I would prefer to write instead of a bitmask:

nim {
  _ _ _ _ _ _ _ _ o;
  o o o o _ o o _ o;
  o o o o _ _ _ _ o;
}

That’s better. That looks more like my original ASCII diagram, while still being syntactic Perl 6 code.

Making a DSL

Wikipedia talks about a DSL as a language “dedicated to a particular problem domain”. Well, the above way of specifying the input would be a DSL dedicated to solving the draw-SVG-images-of-Nim-positions domain. (Admittedly a fairly narrow domain. But I’m mostly out to show the potential of DSLs in Perl 6, not to change the world with this particular DSL.)

Now that we have the desired end state, how do we connect the wires and make the above work? Clearly we need to declare three subroutines: nim, _, o. (Yes, you can name a subroutine _, no sweat.)

sub nim(&block) {
    my @*piles;
    my @*current-pile;

    &block();
    finish-last-pile();
    
    nim-svg(@*piles);
}

sub _(@rest?) {
    unless @rest {
        finish-last-pile();
    }
    @*current-pile = 0, @rest;
    return @*current-pile;
}

sub o(@rest?) {
    unless @rest {
        finish-last-pile();
    }
    @*current-pile = 1, @rest;
    return @*current-pile;
}

Ok… explain, please?

A couple of things are going on here.

  • The two variables @*piles and @*current-pile are dynamic variables which means that they are visible not just in the current lexical scope, but also in all subroutines called before the current scope has finished. Notably, the two subroutines _ and o.
  • The two subroutines _ and o take an optional parameter. On each row, the rightmost _ or o acts as a silent “start of pile” marker, taking the time to do a bit of bookkeeping with the piles, storing away the last pile and starting on a new one.
  • Each row in the DSL-y input basically forms a chain of subroutine calls. We take this into account by both incrementally building the @*current-pile array at each step, all the while returning it as (possible) input for the next subroutine call in the chain.

And that’s it. Oh yeah, we need the bookkeeping routine finish-last-pile, too:

sub finish-last-pile() {
    if @*current-pile {
        push @*piles, [@*current-pile];
    }
    @*current-pile = ();
}

So, it works?

Now, the whole thing works. We can turn this DSL-y input:

nim {
  _ _ _ _ _ _ _ _ o;
  o o o o _ o o _ o;
  o o o o _ _ _ _ o;
}

…into this SVG output:

<svg
  xmlns="http://www.w3.org/2000/svg"
  xmlns:svg="http://www.w3.org/2000/svg"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  width="9" height="3">

  <circle cx="8.5" cy="0.5" r="0.4" />
  <circle cx="0.5" cy="1.5" r="0.4" />
  <circle cx="1.5" cy="1.5" r="0.4" />
  <circle cx="2.5" cy="1.5" r="0.4" />
  <circle cx="3.5" cy="1.5" r="0.4" />
  <circle cx="5.5" cy="1.5" r="0.4" />
  <circle cx="6.5" cy="1.5" r="0.4" />
  <circle cx="8.5" cy="1.5" r="0.4" />
  <circle cx="0.5" cy="2.5" r="0.4" />
  <circle cx="1.5" cy="2.5" r="0.4" />
  <circle cx="2.5" cy="2.5" r="0.4" />
  <circle cx="3.5" cy="2.5" r="0.4" />
  <circle cx="8.5" cy="2.5" r="0.4" />
</svg>

Yay!

Summary

The principles I used in this post are fairly easy to generalize. Start from your desired DSL, and create the subroutines to make it happen. Have dynamic variables handle the communication between separate subroutines.

DSLs are nice because they allow us to shape the code we’re writing around the problem we’re solving. Using relatively little “adapter code”, we’re left to focus on describing and solving problems in a natural way, making the programming language rise to our needs instead of lowering ourselves down to its needs.

Day 16 – Operator precedence

December 16, 2012

All the precedence men

As I was taking a walk today, I realized one of the reasons why I like Perl. Five as well as six. I often hear praise such as “Perl fits the way I think”. And I have that feeling too sometimes.

If I were the president (or prime minister, as I’m Swedish), and had a bunch of advisers, maybe some of them would be yes-men, trying to give me advice that they think I will want to hear, instead of advice that would be useful to me. Some languages are like that, presenting us with an incomplete subset of the necessary tools. The Perl languages, if they were advisers, wouldn’t be yes-men. They’d give me an accurate view of the world, even if that view would be a bit messy and hairy sometimes.

Which, I guess, is why Perl five and six are so often used in handling messy data and turning it into something useful.

To give a few specific examples:

  • Perl 5 takes quotes and quoting very seriously. Not just strings but lists of strings, too. (See the qw keyword.) Perl 6 does the same, but takes quoting further. See see the recent post on quoting.
  • jnthn shows in yesterday’s advent post that Perl 6 takes compiler phases seriously, and allows us to bundle together code that belongs together conceptually but not temporally. We need to do this because the world is gnarly and running a program happens in phases.
  • Grammars in Perl 6 are not just powerful, but in some sense honest, too. They don’t oversimplify the task for the programmer, because then they would also limit the expressibility. Even though grammars are complicated and intricate, they should be, because they describe a process (parsing) that is complicated and intricate.

Operators

Perl is known for its many operators. Some would describe it as an “operator-oriented” language. Where many other language will try to guess how you want your operators to behave on your values, or perhaps demand that you pre-declare all your types so that there’ll be no doubt, Perl 6 carries much of the typing information in its operators:

my $a = 5;
my $b = 6;

say $a + $b;      # 11 (numeric addition)
say $a * $b;      # 30 (numeric multiplication)

say $a ~ $b;      # "56" (string concatenation)
say $a x $b;      # "555555" (string repetition)

say $a || $b;     # 5 (boolean disjunction)
say $a && $b;     # 6 (boolean conjunction)

Other languages will want to bunch together some of these for us, using the + operator for both numeric addition and string concatenation, for example. Not so Perl. You’re meant to choose yourself, because the choice matters. In return, Perl will care a little less about the types of the operands, and just deliver the appropriate result for you.

“The appropriate result” is most often a number if you used a numeric operator, and a string if you used a string operator. But sometimes it’s more subtle than that. Note that the boolean operators above actually preserved the numbers 5 and 6 for us, even though internally it treated them both as true values. In C, if we do the same, C will unhelpfully “flatten” these results down to the value 1, its spelling of the value true. Perl knows that truthiness comes in many flavors, and retains the particular flavor for you.

Operator precedence

“All operators are equal, but some operators are more equal than others.” It is when we combine operators that we realize that the operators have different “tightness”.

say 2 * 3 + 1;      # 7, because (2 * 3) + 1
say 1 + 2 * 3;      # 7, because 1 + (2 * 3), not 9

We can always be 100% explicit and surround enough of our operations with parentheses… but when we don’t, the operators seem to order themselves in some order, which is not just simple left-to-right evaluation. This ordering between operators is what we refer to as “precedence”.

No doubt you were taught in math class in school that multiplications should be evaluated before additions in the way we see above. It’s as if factors group together closer than terms do. The fact that this difference in precedence is useful is backed up by centuries of algebra notation. Most programming languages, Perl 6 included, incorporates this into the language.

By the way, this difference in precedence is found between other pairs of operators, even outside the realm of mathematics:

      Additive (loose)    Multiplicative (tight)
      ================    ======================
number      +                       *
string      ~                       x
bool        ||                      &&

It turns out that they make as much sense for other types as they do for numbers. And group theory bears this out: these other operators can be seen as a kind of addition and multiplication, if we squint.

Operator precedence parser

Deep in the bowels of the Perl 6 parser sits a smaller parser which is very good at parsing expressions. The bigger parser which parses your Perl 6 program is a really good recursive-descent parser. It works great for creating syntax trees out of the larger program structure. It works less well on the level of expressions. Essentially, what trips up a recursive-descent parser is that it always has to create AST nodes for all the possible precedence levels, whether they’re present or not.

So this smaller parser is an operator-table parser. It knows what to do with each type of operator (prefix, infix, postfix…), and kind of weaves all the terms and operators into a syntax tree. Only the precedence levels actually used show up in the tree.

The optable parser works by comparing each new operator to the top operator on a stack of operators. So when it sees an expression like this:

$x ** 2 + 3 * $x - 5

it will first compare ** against + and decide that the former is tighter, and thus $x ** 2 should be put together into a small tree. Later, it compares + against *, and decides to turn 3 * $x into a small tree. It goes on like this, eventually ending up with this tree structure:

infix:<->
 +-- infix:<+>
      +-- infix:<**>
      |    +-- term:<$x>
      |    +-- term:<2>
      +-- infix:<*>
           +-- term:<3>
           +-- term:<$x>

Because leaf nodes are evaluated first and the root node last, this tree structure determines the order of evaluation for the expression. The order ends up being the same as if the expression had these parentheses:

(($x ** 2) + (3 * $x)) - 5

Which, again, is what we’ve learned to expect.

Associativity

Another factor also governs how these invisible parentheses are to be distributed: operator associativity. It’s the concern of how the operator combines with multiple copies of itself, or other sufficiently similar operators on the same precedence level.

Some examples serve to explain the difference:

$x = $y = $z;     # becomes $x = ($y = $z)
$x / $y / $z;     # becomes ($x / $y) / $z

In both of these cases, we may look at the way the parentheses are doled out, and say “well, of course”. Of course we must first assign to $y and only then to $x. And of course we first divide by $y and only then by $z. So operators naturally have different associativity.

The optable parser compares not just the precedence of two operators but also, when needed, their associativity. And it puts the parentheses in the right place, just as above.

User-defined operators

Now we come back to Perl not being a yes-man, and working hard to give you the appropriate tools for the job.

Perl 6 allows you to define operators. See my post from last year on the details of how. But it also allows you to specify precedence and associativity of each new operator.

As you specify a new operator, a new Perl 6 parser is automatically constructed for you behind the scenes, which contains your new operator. In this sense, the optable parser is open and extensible. And Perl 6 gives you exactly the same tools for talking about precedence and associativity as the compiler itself uses internally.

Perl treats you like a grown-up, and expects you to make good decisions based on a thorough understanding of the problem space. I like that.

Day 9 – Longest Token Matching

December 9, 2012

Perl 6 regular expressions prefer to match the longest alternative when possible.

say "food and drink" ~~ / foo | food /;   # food

This is in contrast to Perl 5, which would prefer the first alternative above, and produce the match “foo”.

You can still get the first-alternative behavior if you want; it’s tucked away in the slightly longer alternation operator ||:

say "food and drink" ~~ / foo || food /;  # foo

…And that’s it! That’s Longest Token Matching. ☺ Short post.

“Huh, wait!” I hear you exclaim, in a desperate attempt to make the daily Perl 6 Advent goodness last a bit longer. “Why is Longest Token Matching such a big deal? Who would ever be so obsessed with long tokens?”

I’m glad you asked. As it turns out, Longest Token Matching (or LTM for short) plays very well with our intuition about how things should be parsed. If you’re creating a language, you want people to be able to declare a variable forest_density without the mention of this variable clashing with the syntax of for loops. LTM will see to that.

I like “strange consistencies” — when distal parts of a language design turn out to have commonalities that make the language feel more uniform. There is that kind of consistency here, between classes and grammars. Perl 6 basically exploits that consistency to the max. Let me briefly map out what I mean.

We’re all used to writing classes at this point. From a birds-eye view, they look like this:

class {
    method
    method
    method
}

Grammars have a suspiciously similar structure:

grammar {
    rule
    rule
    rule
}

(The keywords are actually regex, token and rule, but when we talk about them as a group, we just call them “rules”.)

We’re also used to being able to derive classes into subclasses (class B is A), and add or override methods in a way which produces a nice mix of old and new behavior. Perl 6 provides multi methods which even allow you to add new methods of the same name, and the old ones won’t be overridden, they’ll just all try to match alongside the new methods. The dispatch is handled by a (usually autogenerated) proto method that dispatches to all eligible candidates.

What does all this have to do with grammars and rules? Well, it turns out that first off, you can derive new grammars from old ones. It works the same as deriving classes. (In fact, under the hood it’s exactly the same mechanism. Grammars are classes with a different metaclass object.) New rules will override old rules just like you’d expect with methods.

S05 has a cute example with parsing of letters, and deriving the grammar to parse formal letters:

     grammar Letter {
         rule text     {    }
         rule greet { [Hi|Hey|Yo] $=(\S+?) , $$}
         rule body     { +? }   # note: backtracks forwards via +?
         rule close { Later dude, $=(.+) }
     }

     grammar FormalLetter is Letter {
         rule greet { Dear $=(\S+?) , $$}
         rule close { Yours sincerely, $=(.+) }
     }

The derived FormalLetter overrides greet and close, but not body.

But what about all the goodness with multi methods? Could we define some kind of “proto rule” that would allow us to have several rules in a grammar with the same name but different bodies? For example, we might want to parse a language with a rule term, but there are many different terms: strings, numbers… and maybe the numbers can be decimal or binary or octal or hexadecimal…

Perl 6 grammars can contain a proto rule, and then you can define and redefine a rule with the same name as many times as you want. And now we’re back full circle with the / foo | food / alternation from the start of the article. All those rules you write with the same name compile down to one big alternation. Not only that — rules which call other rules, some of them possibly proto rules, all of that will be “flattened” out into one big LTM alternation. In practice that means that all the possible things a term can be are tried out all at once, on equal footing. Neither alternative wins because you happened to define it before the others. An alternative wins because it is the longest.

The strange consistency resides in the fact that in the call-a-method side of things, the most specific method wins, and “most specific” has to with signature narrowness. The better the types in the signature describe the arguments coming in, the more specific the method.

In the parse-with-a-rule side of things, the most specific rule wins, but here “most specific” has to do with parse success. The better the rule can describe what comes next in the text, the more specific the rule.

And that’s strangely consistent, because on the surface methods and rules look like quite different beasts.

We really believe we have something going with this whole principle of deriving a grammar and getting a new language. LTM is right at the center of that because it allows new rules and old to intermix in a fair and predictable way. It’s a kind of meritocracy: rules win not based on whether they’re young or old, but based on whether they are able to parse the text well.

In fact, the Perl 6 compiler itself works this way. It parses your program using a Perl 6 grammar, and that grammar is derivable… whenever you declare a new operator in your program, a new grammar is derived for you. The parsing of your operator is added as a new rule in the new grammar, and the new grammar is given the task of parsing the rest of your program. Your new operator will win against similar but shorter ones, and lose against similar but longer ones.

Day 2 – Anonymous functions for great good

December 2, 2012

Perl 6 has great support for functions. It packs function signatures full with awesome, and lets you have your cake and eat it a couple of times over with all the ways you can specify a function. You can specify parameter types, optional parameters, named parameters, and even those cool where clauses. If I didn’t know better, I’d suspect Perl 6 was compensating for some predecessor’s rather rudimentary handling of parameters. (*cough* @_ *cough*)

Among all these other things, Perl 6 also allows you to define functions without naming them.

sub { say "lol, I'm so anonymous!" }

How is this useful? If you can’t name the function, you can’t call it, right? Wrong.

You can store the function in a variable. Or return it from another function. Or pass it to another function. In fact, when you don’t name your function, the focus becomes much more what code you’re going to run later. Like an executable “to do”-list.

Of course, Perl 5 has anonymous functions, too. With exactly the same syntax, even. In fact, all the big languages do anonymous functions, according to this list of languages on Wikipedia. Except, it seems, the historically significant languages C and Pascal. And the more modern but lumbering Java. “Planned for Java 8″. Haha, Java, catch up! Even C++ has them now.

How important are anonymous functions? Very. In the 1930s, Alan Turing showed how all computer processes could be simulated using just a pre-programmed machine that looks like a tape recorder, reading and writing values on a really long tape. (The Turing Machine.) Meanwhile, across the Atlantic, Alonzo Church showed how all computer processes could be simulated using just anonymous functions, no tape recorder required. (Lambda calculus.) It’s all quite elegant.

Later languages like Lisp and Scheme lean heavily on anonymous functions as a key component in the language. And lately a scrappy language called JavaScript, which also leans heavily on anonymous functions, has taken over the world while we were all busy surfing the web.

But let’s talk possibilities here. What can anonymous functions do for us? And how would it look in Perl 6?

Well, take sorting as a famous example. You could imagine Perl 6 having a sort_lexicographically function and a sort_numerically function. But it doesn’t. It has a sort function. When you want it to sort in a certain way, you just pass an anonymous function to it.

my @sorted_words = @words.sort({ ~$_ });
my @sorted_numbers = @numbers.sort({ +$_ });

(Technically, those are blocks, not functions. But the difference isn’t significant if you’re not planning to return anywhere inside.)

And of course it goes further than just those two sort orders. You could sort by shoe size, or maximum ground speed, or decreasing likelihood of spontaneous combustion. All because you can pass in any logic as an argument. Object-oriented people are very proud of this pattern, and call it “dependency injection”.

Come to think of it, map and grep and reduce all depend on this kind of function-passing. We sometimes refer to passing functions to functions as “higher order programming”, as if it was only something people with special privileges should be doing. But in fact it’s a very useful and broadly applicable technique.

The above examples all run the anonymous functions as part of their own execution. But there’s no need to restrict ourselves to this. We can create functions, return them, and then run them later:

sub make_surprise_for($name) {
    return sub { say "Sur-priiise, $name!" };
}

my $reveal_surprise = make_surprise_for("Finn");    # nothing happens, yet
# ...wait for it...
# ...wait...
# ...waaaaaaait...
$reveal_surprise();        # "Sur-priiise, Finn!"

The function in $reveal_surprise remembers the value of $name even though the original function passing it in has exited long ago. That’s pretty nice. This effect is referred to as the anonymous function closing over the variable $name. But there’s no need to get technical — the long and short of it is “it’s awesome”.

And in fact, it feels quite natural if we just look at anonymous functions alongside other staple storage mechanisms such as arrays and hashes. All of these can be stored in variables, passed as arguments or returned from functions. An anonymous array allows you to store a sequence of things for later. An anonymous hash allows you to store mappings/translations of things for later. An anonymous function allows you to store calculations or behavior for later.

Later this month, I’ll go through how to exploit dynamic scoping in Perl 6 to create nice DSL-y interfaces. We’ll see how anonymous functions come into play there as well.

Day 25 – Merry Christmas!

December 25, 2011

The kind elves who spend the rest of the year working in Santa’s shop to bring you more of Perl 6 each year would like to wish you a very warm and fuzzy Christmas vacation. December is always a special time for us, because we get to interact with you all through the interface of the advent calendar. We think that’s wonderful.

Be sure to check out this year’s Perl 6 coding contest, where you can win €100 worth of books!

Merry Christmas!

Day 22 – Operator overloading, revisited

December 22, 2011

Today’s post is a follow-up. Exactly two years ago, Matthew Walton wrote on this blog about overloading operators:

You can exercise further control over the operator’s parsing by adding traits to the definition, such as tighterequiv and looser, which let you specify the operator’s precedence in relationship to operators which have already been defined. Unfortunately, at the time of writing this is not supported in Rakudo so we will not consider it further today.

Rakudo is still lagging in precedence support (though at this point there are no blockers that I know about to simply going ahead and implementing it). But there’s a new implementation on the block, one that didn’t exist two years ago: Niecza.

Let’s try out operator precedence in Niecza.

$ niecza -e 'sub infix:<mean>($a, $b) { ($a + $b) / 2 }; say 10 mean 4 * 5'
15

Per default, an operator gets the same precedence as infix<+>. This is per spec. (How do we know it got the same precedence as infix<+> above? Well, we know it’s not tighter than multiplication, otherwise we’d have gotten the result 35.)

That’s all well and good, but what if we want to make our mean little operator evaluate tighter than multiplication? Nothing could be simpler:

$ niecza -e 'sub infix:<mean>($a, $b) is tighter(&infix:<*>) { ($a + $b) / 2 }; say 10 mean 4 * 5'
35

See what we did there? is tighter is a trait that we apply to the operator definition. The trait accepts an argument, in this case the language-provided multiplication operator. It all reads quite well, too: “infix mean is tighter [than] infix multiplication”.

Note the explicit use of intuitive naming for the precedence levels. Rather than the inherently confusing terms “higher/lower”, Perl 6 talks about “tighter/looser”, as in “multiplication binds tighter than addition”. Easier to think about precedence that way.

Internally, the precedence levels are stored not as numbers but as strings. Each original precedence level gets a letter of the alphabet and an equals sign (=). Subsequent added precendence levels append either a less-than sign (<) or a greater-than sign (>) to an existing precedence level representation. Using this system, we never “run out” of levels between existing ones (as we could if we were using integers, for example), and tighter levels always come lexigographically before looser ones. Language designers, take heed.

A few last passing notes about operators in Perl 6, while we’re on the subject:

  • In Perl 6, operators are subroutines. They just happen to have funny names, like prefix:<-> or postfix:<++> or infix:<?? !!>. This actually takes a lot of the hand-wavey magic out of defining them. The traits that we’ve seen applied to operators are really subroutine traits… these just happen to be relevant to operator definitions.
  • As a consequence, just like subroutines, operators are lexically scoped by default. Lexical scoping is something we like in Perl 6; it keeps popping up in unexpected places as a solid, sound design principle in the language. In practice, this means that if you declare an operator within a given scope, the operator will be visible and usable within that scope. You’re modifying the parser, but you’re doing it locally, within some block or other. (Or within the whole file, of course.)
  • Likewise, if you want to export your operators, you just use the same exporting mechanism used with subroutines. See how this unification between operators and subroutines keeps making sense? (In Perl 6-land, we say “operators are just funny-looking subroutines”.)
  • Multiple dispatch in operators works just as with ordinary subroutines. Great if you want to dispatch your operators on different types. As with all other routines in the core library in Perl 6, all operators are declared multi to be able to co-exist peacefully with module extensions to the language.
  • Operators can be macros, too. This is not an exceptions to the rule that operators are subroutines, because in Perl 6, macros are subroutines. In other words, if you want some syntactic sugar to execute at parse time (which is what a macro does), you can dress it up either as a normal-looking sub, or as an operator.

That’s it for today. Now, go forth and multiply, or even define your own operator that’s either tighter or looser than multiplication.

Day 19 – Abstraction and why it’s good

December 19, 2011

Some people are a bit afraid of the word “abstract”, because they’ve heard math teachers say it, and also, abstract art freaks them out. But abstraction is a fine and useful thing, and not so complicated. As programmers, we use it every day in different forms. The term is from Latin and means “to withdraw from” or “to pull away from”, and what we’re pulling away from is the specifics so we can focus on the big picture. That’s often mighty useful.

Here are a few examples:

Variables

If your computer only knew how to handle one specific number at a time, it’d be an abacus. Pretty early on, the programmer guild figured out it made a lot of sense to talk about the memory address of a value, and let that address contain whatever it pleased. They abstracted away from the value, and thus made the program more general.

As time passed, addresses were replaced by names, mostly as a convenience. Some people found it a good idea to give their variables descriptive names, as opposed to things like $grbldf.

Subroutines

Code re-use. We hear so much about it in the OO circles, but it holds equally well for subroutines. You write your code once, and then call it from all over the place. Convenient.

But, as I point out in an announcement pretending to be a computer science professor from an alternate timeline, there’s also the secondary benefit of giving your chunk of code a good mnemonic name, because that act in a sense improves the programming language itself. You’re giving yourself new verbs to program with.

This is especially true in Perl 6, because subroutines are lexically scoped (as opposed to Perl 5) and thus you can easily put a subroutine inside another routine. I use it when writing a Connect 4 game, for example.

Packages and modules

In Perl, packages don’t do much. They pull things together and keep them there. In a sense, what they abstract away is a set of subroutines from the rest of the world.

Perl 5 delivers its whole OO functionality through packages and a bit of dispatch magic on the side. It’s quite a feat, actually, but sometimes a bit too minimal. Moose fixes many of those early issues by providing a full-featured object system. Perl 6 lets packages go back to just being collections of subroutines, but provides a few dedicated abstractions for OO, a kind of built-in Moose. Which brings us to…

Classes

Object-orientation means a lot of different things to different people. To some, it’s the notion of an object, a piece of memory with a given set of operations and a given set of states. In a sense, we’re again in the business of extending the language like with did with subroutines. But this time we’re building new nouns rather than new verbs. One moment the language doesn’t know about a Customer object type; the next, it does.

To others, object-orientation means keeping the operations public and the states private. They refer to this division as encapsulation, because the object is like a little capsule, protecting your data from the big bad world. This is also a kind of abstraction, built on the idea that the rest of the world shouldn’t need to care about the internals of your objects, because some day you may want to refactor them. Don’t talk to the brain, talk to the hand; do your thing through the published operations of the object.

Roles

But class-based OO with inheritance will take you only so far. In the past 10 years or so, people have become increasingly aware of the limitations of inheritance-based class hierarchies. Often there are concerns which cut completely across a conventional inheritance hierarchy.

This is where roles come in; they allow you to apply behaviors in little cute packages here and there, without being tied up by a tree-like structure. In a post about roles I explore how this helps write better programs. But really the best example nowadays is probably the Rakudo compiler and its extensive use of roles; jnthn has been writing about that in an earlier advent post.

If classes abstract away complete sets of behaviors, roles abstract away partial sets of behaviors, or responsibilities.

You can even do so at runtime, using mixins, which are roles that you add to an object as the program executes. Objects changing type during runtime sounds magic almost to the point of recklessness; but it’s all done in a very straightforward manner using anonymous subclasses.

Metaobjects

Sometimes you want extra control over how the object system itself works. The object system in Perl 6, through one of those neat bite-your-own-tail tricks, is written using itself, and is completely modifiable in terms of itself. Basically, a bunch of the complexity has been removed by not having a separate hidden, unreachable system to handle the intricacies of the object system. Instead, there’s a visible API for interacting with the object system.

And, when we feel like it, we can invent new and exotic varieties of object systems. Or just tweak the existing one to our fancy.

Macros

On the way up the abstraction ladder, we’ve abstracted away bigger and bigger chunks of code: values, code, routines, behaviors, responsibilities and object systems. Now we reach the top, and there we find macros. Ah, macros, these magical, inscrutable beasts. What do macros abstract away?

Code.

Well, that’s rather disappointing, isn’t it? Didn’t we already abstract away code with subroutines? Yes, we did. But it turns out there’s so much code in a program that sometimes, it needs to be abstracted away on several levels!

Subroutines abstract away code that can then run in several different ways. You call the routine with other values, and it behaves differently. Macros abstract away code that can then be compiled in several different ways. You write a macro with other values, and it gets compiled into different code, which can then in turn run differently.

Essentially, macros give you a hook into the compiler to help you shape and guide what code it emits during the compilation itself. In a sense, you’re abstracting certain parts of the compilation process, the parsing and the syntax manipulation and the code generation. Again, you’re shaping the language — but this time not inventing new nouns or verbs, but whole ways of expressing yourself.

Macros come in two broad types: textual (a la C) and syntax tree (a la Lisp). The textual ones have a number of known issues stemming from the fact that they’re essentially a big imprecise search-and-replace on your code. The syntax tree ones are hailed as the best thing about Lisp, because it allows Lisp programs to grow and adapt to the needs of the programmer, by inventing new ways of expressing yourself.

Perl 6, being Perl 6, specifies both textual macros and syntax tree macros. I’m currently working on a grant to bring syntax macros to Rakudo Perl 6. There’s a branch where I’m hammering out the syntax and semantics of macros. It’s fun work, and made much more feasible by the past year’s improvements to Rakudo itself.

In conclusion

As an application grows and becomes more complex, it needs more rungs of the abstraction ladder to rest on. It needs more levels of abstraction with which to draw away the specifics and focus on the generalities.

Perl 6 is a new Perl, distinct from Perl 5. Its most distinguishing trait is perhaps that it has more rungs on the abstraction ladder to help you write code that’s more to the point. I like that.

Merry Christmas!

December 25, 2010

The people who brought you this year’s Advent Calendar had a blast doing so — it’s exciting to get to present new and old Perl 6 features to new and old readers. Thanks everyone! And Merry Christmas!

Day 1 – Reaching the Stars
Day 2 – Interacting with the command line with MAIN subs
Day 3 – File operations
Day 4 – The Sequence Operators
Day 5 – Why Perl syntax does what you want
Day 6 – The X and Z metaoperators
Day 7 – Lexical variables
Day 8 – Different Names of Different Things
Day 9 – The module ecosystem
Day 10 – Feed operators
Day 11 – Markov Sequence
Day 12 – Smart matching
Day 13 – The Perl6 Community
Day 14 – nextsame and its cousins
Day 15 – Calling native libraries from Perl 6
Day 16: Time in Perl6
Day 17 – Rosetta Code
Day 18 – ABC Module
Day 19 – False truth
Day 20 – The Perl 6 Synopses
Day 21 – transliteration and beyond
Day 22 – The Meta-Object Protocol
Day 23 – It’s some .sort of wonderful.
Day 24 – Yule the Ancient Troll-tide Carol
Day 25 – Merry Christmas!


Follow

Get every new post delivered to your Inbox.