Author Archive

Day 20 – Dynamic variables and DSL-y things

December 20, 2012

Today, let’s talk about DSLs.

Post from the past: a motivating example

Two years ago I wrote a blog post about Nim, a game played with piles of stones. I just put in ASCII diagrams of the actual Nim stone piles, telling myself that if I had time, I would put in fancy SVG diagrams, generated with Perl 6.

Naturally, I didn’t have time. My self-imposed deadline ran out, and I published the post with simple ASCII diagrams.

But time is ever-regenerative, and there for people who want it. So, let’s generate some fancy SVG diagrams with Perl 6.

Have bit array, want SVG

What do we need, exactly? Well, a subroutine that takes an array of piles as input and generates an SVG file would be a really good start.

Let’s take the last “image” in the post as an example:

3      OO O
4 OOOO
5 OOOO    O

For the moment, let’s ignore the numbers at the left margin; they’re just counting stones. We summarize the piles themselves as a kind of bitmap, which also forms the input to the function:

my @piles =
    [0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 0, 1, 1, 0, 1],
    [1, 1, 1, 1, 0, 0, 0, 0, 1];

nim-svg(@piles);

At this point, we need only create the nim-svg function itself, and make it render SVG from this bitmap. Since I’ve long since tired of outputting SVG by hand, I use the SVG module, which comes bundled with Rakudo Star.

use SVG;

sub nim-svg(@piles) {
    my $width = max map *.elems, @piles;
    my $height = @piles.elems;

    my @elements = gather for @piles.kv -> $row, @pile {
        for @pile.kv -> $column, $is_filled {
            if $is_filled {
                take 'circle' => [
                    :cx($column + 0.5),
                    :cy($row + 0.5),
                    :r(0.4)
                ];
            }
        }
    }
    
    say SVG.serialize('svg' => [ :$width, :$height, @elements ]);
}

I think you can follow the logic in there. The subroutine simply iterates over the bitmap, turning 1s into circles with appropriate coordinates.

That’s it?

Well, this will indeed generate an SVG image for us, with the stones correctly placed. But let’s look again at the input that helped create this image:

    [0, 0, 0, 0, 0, 0, 0, 0, 1],
    [1, 1, 1, 1, 0, 1, 1, 0, 1],
    [1, 1, 1, 1, 0, 0, 0, 0, 1];

Clearly, though we can discern the stones and gaps in there if we squint in a bit-aware programmer’s fashion, the input isn’t… visually attractive. (The zeroes even look like stones, even though they’re gaps!)

We can do better

Instead of using a bit array, let’s start from the desired SVG image and try to make the input look like that.

So, this is what I would prefer to write instead of a bitmask:

nim {
  _ _ _ _ _ _ _ _ o;
  o o o o _ o o _ o;
  o o o o _ _ _ _ o;
}

That’s better. That looks more like my original ASCII diagram, while still being syntactic Perl 6 code.

Making a DSL

Wikipedia talks about a DSL as a language “dedicated to a particular problem domain”. Well, the above way of specifying the input would be a DSL dedicated to solving the draw-SVG-images-of-Nim-positions domain. (Admittedly a fairly narrow domain. But I’m mostly out to show the potential of DSLs in Perl 6, not to change the world with this particular DSL.)

Now that we have the desired end state, how do we connect the wires and make the above work? Clearly we need to declare three subroutines: nim, _, o. (Yes, you can name a subroutine _, no sweat.)

sub nim(&block) {
    my @*piles;
    my @*current-pile;

    &block();
    finish-last-pile();
    
    nim-svg(@*piles);
}

sub _(@rest?) {
    unless @rest {
        finish-last-pile();
    }
    @*current-pile = 0, @rest;
    return @*current-pile;
}

sub o(@rest?) {
    unless @rest {
        finish-last-pile();
    }
    @*current-pile = 1, @rest;
    return @*current-pile;
}

Ok… explain, please?

A couple of things are going on here.

  • The two variables @*piles and @*current-pile are dynamic variables which means that they are visible not just in the current lexical scope, but also in all subroutines called before the current scope has finished. Notably, the two subroutines _ and o.
  • The two subroutines _ and o take an optional parameter. On each row, the rightmost _ or o acts as a silent “start of pile” marker, taking the time to do a bit of bookkeeping with the piles, storing away the last pile and starting on a new one.
  • Each row in the DSL-y input basically forms a chain of subroutine calls. We take this into account by both incrementally building the @*current-pile array at each step, all the while returning it as (possible) input for the next subroutine call in the chain.

And that’s it. Oh yeah, we need the bookkeeping routine finish-last-pile, too:

sub finish-last-pile() {
    if @*current-pile {
        push @*piles, [@*current-pile];
    }
    @*current-pile = ();
}

So, it works?

Now, the whole thing works. We can turn this DSL-y input:

nim {
  _ _ _ _ _ _ _ _ o;
  o o o o _ o o _ o;
  o o o o _ _ _ _ o;
}

…into this SVG output:

<svg
  xmlns="http://www.w3.org/2000/svg"
  xmlns:svg="http://www.w3.org/2000/svg"
  xmlns:xlink="http://www.w3.org/1999/xlink"
  width="9" height="3">

  <circle cx="8.5" cy="0.5" r="0.4" />
  <circle cx="0.5" cy="1.5" r="0.4" />
  <circle cx="1.5" cy="1.5" r="0.4" />
  <circle cx="2.5" cy="1.5" r="0.4" />
  <circle cx="3.5" cy="1.5" r="0.4" />
  <circle cx="5.5" cy="1.5" r="0.4" />
  <circle cx="6.5" cy="1.5" r="0.4" />
  <circle cx="8.5" cy="1.5" r="0.4" />
  <circle cx="0.5" cy="2.5" r="0.4" />
  <circle cx="1.5" cy="2.5" r="0.4" />
  <circle cx="2.5" cy="2.5" r="0.4" />
  <circle cx="3.5" cy="2.5" r="0.4" />
  <circle cx="8.5" cy="2.5" r="0.4" />
</svg>

Yay!

Summary

The principles I used in this post are fairly easy to generalize. Start from your desired DSL, and create the subroutines to make it happen. Have dynamic variables handle the communication between separate subroutines.

DSLs are nice because they allow us to shape the code we’re writing around the problem we’re solving. Using relatively little “adapter code”, we’re left to focus on describing and solving problems in a natural way, making the programming language rise to our needs instead of lowering ourselves down to its needs.

Day 16 – Operator precedence

December 16, 2012

All the precedence men

As I was taking a walk today, I realized one of the reasons why I like Perl. Five as well as six. I often hear praise such as “Perl fits the way I think”. And I have that feeling too sometimes.

If I were the president (or prime minister, as I’m Swedish), and had a bunch of advisers, maybe some of them would be yes-men, trying to give me advice that they think I will want to hear, instead of advice that would be useful to me. Some languages are like that, presenting us with an incomplete subset of the necessary tools. The Perl languages, if they were advisers, wouldn’t be yes-men. They’d give me an accurate view of the world, even if that view would be a bit messy and hairy sometimes.

Which, I guess, is why Perl five and six are so often used in handling messy data and turning it into something useful.

To give a few specific examples:

  • Perl 5 takes quotes and quoting very seriously. Not just strings but lists of strings, too. (See the qw keyword.) Perl 6 does the same, but takes quoting further. See see the recent post on quoting.
  • jnthn shows in yesterday’s advent post that Perl 6 takes compiler phases seriously, and allows us to bundle together code that belongs together conceptually but not temporally. We need to do this because the world is gnarly and running a program happens in phases.
  • Grammars in Perl 6 are not just powerful, but in some sense honest, too. They don’t oversimplify the task for the programmer, because then they would also limit the expressibility. Even though grammars are complicated and intricate, they should be, because they describe a process (parsing) that is complicated and intricate.

Operators

Perl is known for its many operators. Some would describe it as an “operator-oriented” language. Where many other language will try to guess how you want your operators to behave on your values, or perhaps demand that you pre-declare all your types so that there’ll be no doubt, Perl 6 carries much of the typing information in its operators:

my $a = 5;
my $b = 6;

say $a + $b;      # 11 (numeric addition)
say $a * $b;      # 30 (numeric multiplication)

say $a ~ $b;      # "56" (string concatenation)
say $a x $b;      # "555555" (string repetition)

say $a || $b;     # 5 (boolean disjunction)
say $a && $b;     # 6 (boolean conjunction)

Other languages will want to bunch together some of these for us, using the + operator for both numeric addition and string concatenation, for example. Not so Perl. You’re meant to choose yourself, because the choice matters. In return, Perl will care a little less about the types of the operands, and just deliver the appropriate result for you.

“The appropriate result” is most often a number if you used a numeric operator, and a string if you used a string operator. But sometimes it’s more subtle than that. Note that the boolean operators above actually preserved the numbers 5 and 6 for us, even though internally it treated them both as true values. In C, if we do the same, C will unhelpfully “flatten” these results down to the value 1, its spelling of the value true. Perl knows that truthiness comes in many flavors, and retains the particular flavor for you.

Operator precedence

“All operators are equal, but some operators are more equal than others.” It is when we combine operators that we realize that the operators have different “tightness”.

say 2 * 3 + 1;      # 7, because (2 * 3) + 1
say 1 + 2 * 3;      # 7, because 1 + (2 * 3), not 9

We can always be 100% explicit and surround enough of our operations with parentheses… but when we don’t, the operators seem to order themselves in some order, which is not just simple left-to-right evaluation. This ordering between operators is what we refer to as “precedence”.

No doubt you were taught in math class in school that multiplications should be evaluated before additions in the way we see above. It’s as if factors group together closer than terms do. The fact that this difference in precedence is useful is backed up by centuries of algebra notation. Most programming languages, Perl 6 included, incorporates this into the language.

By the way, this difference in precedence is found between other pairs of operators, even outside the realm of mathematics:

      Additive (loose)    Multiplicative (tight)
      ================    ======================
number      +                       *
string      ~                       x
bool        ||                      &&

It turns out that they make as much sense for other types as they do for numbers. And group theory bears this out: these other operators can be seen as a kind of addition and multiplication, if we squint.

Operator precedence parser

Deep in the bowels of the Perl 6 parser sits a smaller parser which is very good at parsing expressions. The bigger parser which parses your Perl 6 program is a really good recursive-descent parser. It works great for creating syntax trees out of the larger program structure. It works less well on the level of expressions. Essentially, what trips up a recursive-descent parser is that it always has to create AST nodes for all the possible precedence levels, whether they’re present or not.

So this smaller parser is an operator-table parser. It knows what to do with each type of operator (prefix, infix, postfix…), and kind of weaves all the terms and operators into a syntax tree. Only the precedence levels actually used show up in the tree.

The optable parser works by comparing each new operator to the top operator on a stack of operators. So when it sees an expression like this:

$x ** 2 + 3 * $x - 5

it will first compare ** against + and decide that the former is tighter, and thus $x ** 2 should be put together into a small tree. Later, it compares + against *, and decides to turn 3 * $x into a small tree. It goes on like this, eventually ending up with this tree structure:

infix:<->
 +-- infix:<+>
      +-- infix:<**>
      |    +-- term:<$x>
      |    +-- term:<2>
      +-- infix:<*>
           +-- term:<3>
           +-- term:<$x>

Because leaf nodes are evaluated first and the root node last, this tree structure determines the order of evaluation for the expression. The order ends up being the same as if the expression had these parentheses:

(($x ** 2) + (3 * $x)) - 5

Which, again, is what we’ve learned to expect.

Associativity

Another factor also governs how these invisible parentheses are to be distributed: operator associativity. It’s the concern of how the operator combines with multiple copies of itself, or other sufficiently similar operators on the same precedence level.

Some examples serve to explain the difference:

$x = $y = $z;     # becomes $x = ($y = $z)
$x / $y / $z;     # becomes ($x / $y) / $z

In both of these cases, we may look at the way the parentheses are doled out, and say “well, of course”. Of course we must first assign to $y and only then to $x. And of course we first divide by $y and only then by $z. So operators naturally have different associativity.

The optable parser compares not just the precedence of two operators but also, when needed, their associativity. And it puts the parentheses in the right place, just as above.

User-defined operators

Now we come back to Perl not being a yes-man, and working hard to give you the appropriate tools for the job.

Perl 6 allows you to define operators. See my post from last year on the details of how. But it also allows you to specify precedence and associativity of each new operator.

As you specify a new operator, a new Perl 6 parser is automatically constructed for you behind the scenes, which contains your new operator. In this sense, the optable parser is open and extensible. And Perl 6 gives you exactly the same tools for talking about precedence and associativity as the compiler itself uses internally.

Perl treats you like a grown-up, and expects you to make good decisions based on a thorough understanding of the problem space. I like that.

Day 9 – Longest Token Matching

December 9, 2012

Perl 6 regular expressions prefer to match the longest alternative when possible.

say "food and drink" ~~ / foo | food /;   # food

This is in contrast to Perl 5, which would prefer the first alternative above, and produce the match “foo”.

You can still get the first-alternative behavior if you want; it’s tucked away in the slightly longer alternation operator ||:

say "food and drink" ~~ / foo || food /;  # foo

…And that’s it! That’s Longest Token Matching. ☺ Short post.

“Huh, wait!” I hear you exclaim, in a desperate attempt to make the daily Perl 6 Advent goodness last a bit longer. “Why is Longest Token Matching such a big deal? Who would ever be so obsessed with long tokens?”

I’m glad you asked. As it turns out, Longest Token Matching (or LTM for short) plays very well with our intuition about how things should be parsed. If you’re creating a language, you want people to be able to declare a variable forest_density without the mention of this variable clashing with the syntax of for loops. LTM will see to that.

I like “strange consistencies” — when distal parts of a language design turn out to have commonalities that make the language feel more uniform. There is that kind of consistency here, between classes and grammars. Perl 6 basically exploits that consistency to the max. Let me briefly map out what I mean.

We’re all used to writing classes at this point. From a birds-eye view, they look like this:

class {
    method
    method
    method
}

Grammars have a suspiciously similar structure:

grammar {
    rule
    rule
    rule
}

(The keywords are actually regex, token and rule, but when we talk about them as a group, we just call them “rules”.)

We’re also used to being able to derive classes into subclasses (class B is A), and add or override methods in a way which produces a nice mix of old and new behavior. Perl 6 provides multi methods which even allow you to add new methods of the same name, and the old ones won’t be overridden, they’ll just all try to match alongside the new methods. The dispatch is handled by a (usually autogenerated) proto method that dispatches to all eligible candidates.

What does all this have to do with grammars and rules? Well, it turns out that first off, you can derive new grammars from old ones. It works the same as deriving classes. (In fact, under the hood it’s exactly the same mechanism. Grammars are classes with a different metaclass object.) New rules will override old rules just like you’d expect with methods.

S05 has a cute example with parsing of letters, and deriving the grammar to parse formal letters:

    grammar Letter {
         rule text     { <greet> $<body>=<line>+? <close> }
         rule greet    { [Hi|Hey|Yo] $<to>=\S+? ',' }
         rule close    { Later dude ',' $<from>=.+ }
         token line    { \N* \n}
     }

     grammar FormalLetter is Letter {
         rule greet { Dear $<to>=\S+? ',' }
         rule close { Yours sincerely ',' $<from>=.+ }
     }

The derived FormalLetter overrides greet and close, but not line.

But what about all the goodness with multi methods? Could we define some kind of “proto rule” that would allow us to have several rules in a grammar with the same name but different bodies? For example, we might want to parse a language with a rule term, but there are many different terms: strings, numbers… and maybe the numbers can be decimal or binary or octal or hexadecimal…

Perl 6 grammars can contain a proto rule, and then you can define and redefine a rule with the same name as many times as you want. And now we’re back full circle with the / foo | food / alternation from the start of the article. All those rules you write with the same name compile down to one big alternation. Not only that — rules which call other rules, some of them possibly proto rules, all of that will be “flattened” out into one big LTM alternation. In practice that means that all the possible things a term can be are tried out all at once, on equal footing. Neither alternative wins because you happened to define it before the others. An alternative wins because it is the longest.

The strange consistency resides in the fact that in the call-a-method side of things, the most specific method wins, and “most specific” has to with signature narrowness. The better the types in the signature describe the arguments coming in, the more specific the method.

In the parse-with-a-rule side of things, the most specific rule wins, but here “most specific” has to do with parse success. The better the rule can describe what comes next in the text, the more specific the rule.

And that’s strangely consistent, because on the surface methods and rules look like quite different beasts.

We really believe we have something going with this whole principle of deriving a grammar and getting a new language. LTM is right at the center of that because it allows new rules and old to intermix in a fair and predictable way. It’s a kind of meritocracy: rules win not based on whether they’re young or old, but based on whether they are able to parse the text well.

In fact, the Perl 6 compiler itself works this way. It parses your program using a Perl 6 grammar, and that grammar is derivable… whenever you declare a new operator in your program, a new grammar is derived for you. The parsing of your operator is added as a new rule in the new grammar, and the new grammar is given the task of parsing the rest of your program. Your new operator will win against similar but shorter ones, and lose against similar but longer ones.

Day 2 – Anonymous functions for great good

December 2, 2012

Perl 6 has great support for functions. It packs function signatures full with awesome, and lets you have your cake and eat it a couple of times over with all the ways you can specify a function. You can specify parameter types, optional parameters, named parameters, and even those cool where clauses. If I didn’t know better, I’d suspect Perl 6 was compensating for some predecessor’s rather rudimentary handling of parameters. (*cough* @_ *cough*)

Among all these other things, Perl 6 also allows you to define functions without naming them.

sub { say "lol, I'm so anonymous!" }

How is this useful? If you can’t name the function, you can’t call it, right? Wrong.

You can store the function in a variable. Or return it from another function. Or pass it to another function. In fact, when you don’t name your function, the focus becomes much more what code you’re going to run later. Like an executable “to do”-list.

Of course, Perl 5 has anonymous functions, too. With exactly the same syntax, even. In fact, all the big languages do anonymous functions, according to this list of languages on Wikipedia. Except, it seems, the historically significant languages C and Pascal. And the more modern but lumbering Java. “Planned for Java 8″. Haha, Java, catch up! Even C++ has them now.

How important are anonymous functions? Very. In the 1930s, Alan Turing showed how all computer processes could be simulated using just a pre-programmed machine that looks like a tape recorder, reading and writing values on a really long tape. (The Turing Machine.) Meanwhile, across the Atlantic, Alonzo Church showed how all computer processes could be simulated using just anonymous functions, no tape recorder required. (Lambda calculus.) It’s all quite elegant.

Later languages like Lisp and Scheme lean heavily on anonymous functions as a key component in the language. And lately a scrappy language called JavaScript, which also leans heavily on anonymous functions, has taken over the world while we were all busy surfing the web.

But let’s talk possibilities here. What can anonymous functions do for us? And how would it look in Perl 6?

Well, take sorting as a famous example. You could imagine Perl 6 having a sort_lexicographically function and a sort_numerically function. But it doesn’t. It has a sort function. When you want it to sort in a certain way, you just pass an anonymous function to it.

my @sorted_words = @words.sort({ ~$_ });
my @sorted_numbers = @numbers.sort({ +$_ });

(Technically, those are blocks, not functions. But the difference isn’t significant if you’re not planning to return anywhere inside.)

And of course it goes further than just those two sort orders. You could sort by shoe size, or maximum ground speed, or decreasing likelihood of spontaneous combustion. All because you can pass in any logic as an argument. Object-oriented people are very proud of this pattern, and call it “dependency injection”.

Come to think of it, map and grep and reduce all depend on this kind of function-passing. We sometimes refer to passing functions to functions as “higher order programming”, as if it was only something people with special privileges should be doing. But in fact it’s a very useful and broadly applicable technique.

The above examples all run the anonymous functions as part of their own execution. But there’s no need to restrict ourselves to this. We can create functions, return them, and then run them later:

sub make_surprise_for($name) {
    return sub { say "Sur-priiise, $name!" };
}

my $reveal_surprise = make_surprise_for("Finn");    # nothing happens, yet
# ...wait for it...
# ...wait...
# ...waaaaaaait...
$reveal_surprise();        # "Sur-priiise, Finn!"

The function in $reveal_surprise remembers the value of $name even though the original function passing it in has exited long ago. That’s pretty nice. This effect is referred to as the anonymous function closing over the variable $name. But there’s no need to get technical — the long and short of it is “it’s awesome”.

And in fact, it feels quite natural if we just look at anonymous functions alongside other staple storage mechanisms such as arrays and hashes. All of these can be stored in variables, passed as arguments or returned from functions. An anonymous array allows you to store a sequence of things for later. An anonymous hash allows you to store mappings/translations of things for later. An anonymous function allows you to store calculations or behavior for later.

Later this month, I’ll go through how to exploit dynamic scoping in Perl 6 to create nice DSL-y interfaces. We’ll see how anonymous functions come into play there as well.

Day 25 – Merry Christmas!

December 25, 2011

The kind elves who spend the rest of the year working in Santa’s shop to bring you more of Perl 6 each year would like to wish you a very warm and fuzzy Christmas vacation. December is always a special time for us, because we get to interact with you all through the interface of the advent calendar. We think that’s wonderful.

Be sure to check out this year’s Perl 6 coding contest, where you can win €100 worth of books!

Merry Christmas!

Day 22 – Operator overloading, revisited

December 22, 2011

Today’s post is a follow-up. Exactly two years ago, Matthew Walton wrote on this blog about overloading operators:

You can exercise further control over the operator’s parsing by adding traits to the definition, such as tighterequiv and looser, which let you specify the operator’s precedence in relationship to operators which have already been defined. Unfortunately, at the time of writing this is not supported in Rakudo so we will not consider it further today.

Rakudo is still lagging in precedence support (though at this point there are no blockers that I know about to simply going ahead and implementing it). But there’s a new implementation on the block, one that didn’t exist two years ago: Niecza.

Let’s try out operator precedence in Niecza.

$ niecza -e 'sub infix:<mean>($a, $b) { ($a + $b) / 2 }; say 10 mean 4 * 5'
15

Per default, an operator gets the same precedence as infix<+>. This is per spec. (How do we know it got the same precedence as infix<+> above? Well, we know it’s not tighter than multiplication, otherwise we’d have gotten the result 35.)

That’s all well and good, but what if we want to make our mean little operator evaluate tighter than multiplication? Nothing could be simpler:

$ niecza -e 'sub infix:<mean>($a, $b) is tighter(&infix:<*>) { ($a + $b) / 2 }; say 10 mean 4 * 5'
35

See what we did there? is tighter is a trait that we apply to the operator definition. The trait accepts an argument, in this case the language-provided multiplication operator. It all reads quite well, too: “infix mean is tighter [than] infix multiplication”.

Note the explicit use of intuitive naming for the precedence levels. Rather than the inherently confusing terms “higher/lower”, Perl 6 talks about “tighter/looser”, as in “multiplication binds tighter than addition”. Easier to think about precedence that way.

Internally, the precedence levels are stored not as numbers but as strings. Each original precedence level gets a letter of the alphabet and an equals sign (=). Subsequent added precendence levels append either a less-than sign (<) or a greater-than sign (>) to an existing precedence level representation. Using this system, we never “run out” of levels between existing ones (as we could if we were using integers, for example), and tighter levels always come lexigographically before looser ones. Language designers, take heed.

A few last passing notes about operators in Perl 6, while we’re on the subject:

  • In Perl 6, operators are subroutines. They just happen to have funny names, like prefix:<-> or postfix:<++> or infix:<?? !!>. This actually takes a lot of the hand-wavey magic out of defining them. The traits that we’ve seen applied to operators are really subroutine traits… these just happen to be relevant to operator definitions.
  • As a consequence, just like subroutines, operators are lexically scoped by default. Lexical scoping is something we like in Perl 6; it keeps popping up in unexpected places as a solid, sound design principle in the language. In practice, this means that if you declare an operator within a given scope, the operator will be visible and usable within that scope. You’re modifying the parser, but you’re doing it locally, within some block or other. (Or within the whole file, of course.)
  • Likewise, if you want to export your operators, you just use the same exporting mechanism used with subroutines. See how this unification between operators and subroutines keeps making sense? (In Perl 6-land, we say “operators are just funny-looking subroutines”.)
  • Multiple dispatch in operators works just as with ordinary subroutines. Great if you want to dispatch your operators on different types. As with all other routines in the core library in Perl 6, all operators are declared multi to be able to co-exist peacefully with module extensions to the language.
  • Operators can be macros, too. This is not an exceptions to the rule that operators are subroutines, because in Perl 6, macros are subroutines. In other words, if you want some syntactic sugar to execute at parse time (which is what a macro does), you can dress it up either as a normal-looking sub, or as an operator.

That’s it for today. Now, go forth and multiply, or even define your own operator that’s either tighter or looser than multiplication.

Day 19 – Abstraction and why it’s good

December 19, 2011

Some people are a bit afraid of the word “abstract”, because they’ve heard math teachers say it, and also, abstract art freaks them out. But abstraction is a fine and useful thing, and not so complicated. As programmers, we use it every day in different forms. The term is from Latin and means “to withdraw from” or “to pull away from”, and what we’re pulling away from is the specifics so we can focus on the big picture. That’s often mighty useful.

Here are a few examples:

Variables

If your computer only knew how to handle one specific number at a time, it’d be an abacus. Pretty early on, the programmer guild figured out it made a lot of sense to talk about the memory address of a value, and let that address contain whatever it pleased. They abstracted away from the value, and thus made the program more general.

As time passed, addresses were replaced by names, mostly as a convenience. Some people found it a good idea to give their variables descriptive names, as opposed to things like $grbldf.

Subroutines

Code re-use. We hear so much about it in the OO circles, but it holds equally well for subroutines. You write your code once, and then call it from all over the place. Convenient.

But, as I point out in an announcement pretending to be a computer science professor from an alternate timeline, there’s also the secondary benefit of giving your chunk of code a good mnemonic name, because that act in a sense improves the programming language itself. You’re giving yourself new verbs to program with.

This is especially true in Perl 6, because subroutines are lexically scoped (as opposed to Perl 5) and thus you can easily put a subroutine inside another routine. I use it when writing a Connect 4 game, for example.

Packages and modules

In Perl, packages don’t do much. They pull things together and keep them there. In a sense, what they abstract away is a set of subroutines from the rest of the world.

Perl 5 delivers its whole OO functionality through packages and a bit of dispatch magic on the side. It’s quite a feat, actually, but sometimes a bit too minimal. Moose fixes many of those early issues by providing a full-featured object system. Perl 6 lets packages go back to just being collections of subroutines, but provides a few dedicated abstractions for OO, a kind of built-in Moose. Which brings us to…

Classes

Object-orientation means a lot of different things to different people. To some, it’s the notion of an object, a piece of memory with a given set of operations and a given set of states. In a sense, we’re again in the business of extending the language like with did with subroutines. But this time we’re building new nouns rather than new verbs. One moment the language doesn’t know about a Customer object type; the next, it does.

To others, object-orientation means keeping the operations public and the states private. They refer to this division as encapsulation, because the object is like a little capsule, protecting your data from the big bad world. This is also a kind of abstraction, built on the idea that the rest of the world shouldn’t need to care about the internals of your objects, because some day you may want to refactor them. Don’t talk to the brain, talk to the hand; do your thing through the published operations of the object.

Roles

But class-based OO with inheritance will take you only so far. In the past 10 years or so, people have become increasingly aware of the limitations of inheritance-based class hierarchies. Often there are concerns which cut completely across a conventional inheritance hierarchy.

This is where roles come in; they allow you to apply behaviors in little cute packages here and there, without being tied up by a tree-like structure. In a post about roles I explore how this helps write better programs. But really the best example nowadays is probably the Rakudo compiler and its extensive use of roles; jnthn has been writing about that in an earlier advent post.

If classes abstract away complete sets of behaviors, roles abstract away partial sets of behaviors, or responsibilities.

You can even do so at runtime, using mixins, which are roles that you add to an object as the program executes. Objects changing type during runtime sounds magic almost to the point of recklessness; but it’s all done in a very straightforward manner using anonymous subclasses.

Metaobjects

Sometimes you want extra control over how the object system itself works. The object system in Perl 6, through one of those neat bite-your-own-tail tricks, is written using itself, and is completely modifiable in terms of itself. Basically, a bunch of the complexity has been removed by not having a separate hidden, unreachable system to handle the intricacies of the object system. Instead, there’s a visible API for interacting with the object system.

And, when we feel like it, we can invent new and exotic varieties of object systems. Or just tweak the existing one to our fancy.

Macros

On the way up the abstraction ladder, we’ve abstracted away bigger and bigger chunks of code: values, code, routines, behaviors, responsibilities and object systems. Now we reach the top, and there we find macros. Ah, macros, these magical, inscrutable beasts. What do macros abstract away?

Code.

Well, that’s rather disappointing, isn’t it? Didn’t we already abstract away code with subroutines? Yes, we did. But it turns out there’s so much code in a program that sometimes, it needs to be abstracted away on several levels!

Subroutines abstract away code that can then run in several different ways. You call the routine with other values, and it behaves differently. Macros abstract away code that can then be compiled in several different ways. You write a macro with other values, and it gets compiled into different code, which can then in turn run differently.

Essentially, macros give you a hook into the compiler to help you shape and guide what code it emits during the compilation itself. In a sense, you’re abstracting certain parts of the compilation process, the parsing and the syntax manipulation and the code generation. Again, you’re shaping the language — but this time not inventing new nouns or verbs, but whole ways of expressing yourself.

Macros come in two broad types: textual (a la C) and syntax tree (a la Lisp). The textual ones have a number of known issues stemming from the fact that they’re essentially a big imprecise search-and-replace on your code. The syntax tree ones are hailed as the best thing about Lisp, because it allows Lisp programs to grow and adapt to the needs of the programmer, by inventing new ways of expressing yourself.

Perl 6, being Perl 6, specifies both textual macros and syntax tree macros. I’m currently working on a grant to bring syntax macros to Rakudo Perl 6. There’s a branch where I’m hammering out the syntax and semantics of macros. It’s fun work, and made much more feasible by the past year’s improvements to Rakudo itself.

In conclusion

As an application grows and becomes more complex, it needs more rungs of the abstraction ladder to rest on. It needs more levels of abstraction with which to draw away the specifics and focus on the generalities.

Perl 6 is a new Perl, distinct from Perl 5. Its most distinguishing trait is perhaps that it has more rungs on the abstraction ladder to help you write code that’s more to the point. I like that.

Merry Christmas!

December 25, 2010

The people who brought you this year’s Advent Calendar had a blast doing so — it’s exciting to get to present new and old Perl 6 features to new and old readers. Thanks everyone! And Merry Christmas!

Day 1 – Reaching the Stars
Day 2 – Interacting with the command line with MAIN subs
Day 3 – File operations
Day 4 – The Sequence Operators
Day 5 – Why Perl syntax does what you want
Day 6 – The X and Z metaoperators
Day 7 – Lexical variables
Day 8 – Different Names of Different Things
Day 9 – The module ecosystem
Day 10 – Feed operators
Day 11 – Markov Sequence
Day 12 – Smart matching
Day 13 – The Perl6 Community
Day 14 – nextsame and its cousins
Day 15 – Calling native libraries from Perl 6
Day 16: Time in Perl6
Day 17 – Rosetta Code
Day 18 – ABC Module
Day 19 – False truth
Day 20 – The Perl 6 Synopses
Day 21 – transliteration and beyond
Day 22 – The Meta-Object Protocol
Day 23 – It’s some .sort of wonderful.
Day 24 – Yule the Ancient Troll-tide Carol
Day 25 – Merry Christmas!

Day 21 – transliteration and beyond

December 21, 2010

Transliteration sounds like it has Latin roots and means a changing of letters. And that’s what the Str.trans method does.

say "GATTACA".trans( "TCAG" => "0123" );  # prints "3200212\n"

Perl 5 people (and Unix shell folk) immediately recognize this as tr/tcag/0123/, but here’s a quick explanation for the rest of you out there: for every instance of T we find in the string, we replace it by 0, we replace every instance of C by 1, and so on. The two strings TCAG and 0123 supply alphabets to be translated to and from, respectively.

This can be used for any number of time-saving ends. Here, for example, is a simple subroutine that “encrypts” a text with ROT-13:

sub rot13($text) { $text.trans( "A..Za..z" => "N..ZA..Mn..za..m" ) }

When .trans sees those .. ranges, it expands them internally (so "n..z" really means "nopqrstuvwxyz"). Thus, the ultimate effect of the rot13 sub is to map certain parts of the ASCII alphabet to certain other parts.

In Perl 5, the two dots (..) are a dash (-), but we’ve tried in Perl 6 to have those two dots stand for the concept “range”; in the main language, in regexes, and here in transliterations.

Note also that the .trans method is non-mutating; it doesn’t change $text, but just returns a new value. This is also a general theme in Perl 6; in the core language we prefer to offer the side-effect-free variants of methods. You can easily get the mutating behavior by doing .=trans:

$kabbala.=trans("A..Ia..i" => "1..91..9");

(And that goes not only for .trans, but for all methods. It’s a silent encouragement to you as a programmer to write your libraries with non-mutating methods, making the world a happier, more composable place.)

But Perl 6 wouldn’t be Perl 6 if .trans didn’t also contain a hidden weapon which takes the Perl 5 tr/// and just completely blows it out of the water. Here’s what it also does:

Let’s say we want to escape some HTML, that is, replace things according to this table:

    & => &amp;
    < => &lt;
    > => &gt;

(By the way, I hope if you ever need to escape HTML, that there will be a library routine for you ready that does it for you. But the general principle is important; and in the few instances when you do need to do something like this, it’s good to know the tools are there, built into the language.)

This is nothing that a few well-placed regexes can’t handle. So what’s the big deal? Well, a naive in-place per-match replacement of the above three characters might be unlucky enough to get stuck in an infinite loop. (& => &amp; => &amp;amp; => ...) So you need to do various sordid trickery to avoid that.

But that’s not even the fundamental problem, which is that you want to resort to stitching together pieces of strings, rather than thinking of the problem in a more high-level manner. Generally, we wouldn’t want a solution that depends on the order of the substitutions. That would also affect something like this:

    foo         => bar
    foolishness => folly

If the former substitution is attempted first each time, there won’t ever be an occasion to perform the latter one — probably not what was intented. Generally, we want to try and match the longer substrings before shorter ones.

So, it seems we want a longest-token substitution matcher that avoids infinite cycles due to accidental re-substitution.

That’s what .trans in Perl 6 provides. That’s its hidden weapon: sending in a pair of arrays rather than strings. For the HTML escaping, all we need to do is this:

my $escaped = $html.trans(
    [ '&',     '<',    '>'    ] =>
    [ '&amp;', '&lt;', '&gt;' ]
);

…and the non-trivial problems of replacing things in the right order and avoiding cyclical replacement are taken care of for us.

Day 19 – False truth

December 19, 2010

Today’s advent gift teaches us how to use mixins for nefarious and confusing purposes. In fact, this feature will probably appear partly insane, but it turns out to be quite useful. Enter the but operator:

my $value = 42 but role { method Bool  { False } };
say $value;    # 42
say ?$value;   # False

So you see, we overload the .Bool method on our $value. It doesn’t affect other integers in the program, not even other 42s in the program, just this one. Normally, for Ints, the .Bool method (and therefore the prefix:<?> operator) returns whether the number is non-zero, but here we make it always return False.

In fact, there’s a shorter way to write this for enum values, of which False is one.

my $value = 42 but False;

Since False is a value of the Bool type, it will automatically overload the .Bool method, which by convention is a kind of conversion method in Perl 6. Values of other types will of course overload their corresponding conversion method.

Here’s the part that turns out to be quite useful: in Perl 5 when you put a &system call in an if statement wanting to check for success, you have to remember to negate the result of the call, since in bash only zero means success:

if ( system($cmd) == 0 ) {  # alternatively, !system($cmd) 
    # ...
}

But in Perl 6, the corresponding &run routine returns the above kind of overloaded integers; these boolify to True if and only if the return value is zero, which is the opposite of the default Int behavior, and just what we need.

if run($cmd) {  # we don't negate
    # ...
}

Oh, and here’s the part that appears insane. :-) We can overload the .Bool method of boolean values!

my $value = True but False;
say $value;    # True
say ?$value;   # False

Yes, Perl 6 allows you to shoot yourself in the foot in this particular way. Though I don’t see why anyone would want to do this except for obfuscatory purposes, I’m kinda glad Perl 6 has the presence of mind to keep track of the subtleties of that type of overloading. I know I almost don’t. :-)


Follow

Get every new post delivered to your Inbox.

Join 44 other followers