Day 23 – macros and secret agents

Let’s talk about 007.

$ perl6 bin/007 -e='say("OH HAI")'
OH HAI

007 is a small language, implemented in Perl 6. Its reason for existing is macros, a topic that has become quite dear to me.

I want to talk a little bit about macros, and what's happened with them in 2015. But first let's just get out of the way any doubts about 007 being a real language. Here, have a Christmas tree.

for [1, 2, 3, 4, 5, 2, 2] -> n {
    my indent = " " x (5 - n);
    my tree = "#" x (2 * n - 1);
    say(indent ~ tree);
}

Which gives this output:

    #
   ###
  #####
 #######
#########
   ###
   ###

If you want to describe 007 real quick, you could say it has a syntax very much like Perl 6, but it borrows things from Python (and JavaScript). For example, there are no sigils. (Awww!) There's no $_, so if you're interested in the elements you're iterating over, you need to declare a loop variable like we did. There's no flattening, no list context, and no conflation between a single item and a list-of-one. (Yay!)

007 does have objects:

my agent = {
    name: "James Bond",
    number: 007
};

say(agent.name);             # James Bond
say(agent.has("number"));    # 1

And, as it happens, all the types of statements, expressions, and other program elements are also expressible in the object system.

my code = quasi {
    say("secret agent!")
};

say(code.statementlist.statements[0]
        .expr.argumentlist.arguments[0]
        .value);             # secret agent!

The quasi there "freezes" the program code into objects, incidentally the exact representation that the 007 compiler deals with. We can then dig into that structure, passing down through a block, a statement, a (function invocation) expression, and finally a string value.

Being able to take this object-oriented view of the program structure is very powerful. It opens up for all kinds of nice manipulations. Perhaps the most cogent thing I've written about such manipulations is a gist about three types of macros. But the way ahead is still a bit uncertain, and 007 is there exactly to scout out that way.

I recently blogged about 007 over at strangelyconsistent, telling a little bit about what's happened with the language in the past year. Here, today, I want to tell about some things that are more directly relevant to Perl 6 macros.

But first...

What are all those Q types, then?

Giving a list of the Q types is still dangerous business, since the list itself is still in mild flux. But with that in mind, let's see what we have.

There's a Q type for each type of statement. Q::Statement::If is a representative example. There's also while loops, for loops, declarations of my variables, constants, subs, macros, etc. Return statements.

A common type of statement is Q::Statement::Expr, which covers anything from a function call to doing some arithmetic. A Q::Statement::Expr is simply a statement which contains a single expression.

So what qualifies as an expression? All the operators: prefix, infix, postfix. Various literal forms: ints, strings. Bigger types of terms like arrays and objects. Identifiers — that is, something like a variable, whether it's being declared or being used. Quasi terms, like we saw above.

Some constructs have some "extra" moving parts that are neither statements nor expressions. For example, blocks have a parameter list with parameters — the parameter list is a Q type, and a parameter is a Q type. Function calls have argument lists. Subs and macros have trait lists. Quasis have "unquotes", a kind of parametric hole in the quasi where a dynamic Qtree can be inserted.

And that's it. For a detailed current list, see Q.pm.

First insight: the magic is in the identifiers

Way back when we started to think about macros in Perl 6, we came to realize that Qtrees are never quite free of their context. You always keep doing variable lookups, calling functions, etc. Because this happens in two vastly different environments (the macro's environment, and the macro user's environment), quite a bit of effort goes to keeping these lookups straight and not producing surprising or unsafe code. This concern is usually referred to as "hygiene", but all we really need to know is that the macro environment is supposed to be able to contain any variable bindings and still not mess up the macro user's environment... and vice versa.

macro moo() {
    my x = "inside macro";
    
    return quasi { say(x) };
}

my x = "outside macro";
moo();    # inside macro

The prevailing solution to this (and the one that I started to code into Rakudo as part of my macros grant) was to achieve hygiene by the clever use of scopes. See this github issue comment for a compelling example of how code blocks would be injected in such a way that the variable lookup would just happen to find the right declaration, even if that means jumping all over the program.

With the recent work of 007, it's evident that the "clever use of scopes" solution won't work all the way. I'm glad I discovered this now, in a toy language implementation, and not after investing lots more time writing up a real solution of this in Rakudo.

The fundamental problem is this: injecting blocks is all good and well in either a statement or an expression. Both of these situations can be made to work. But we also saw in the previous section that there are "extra" Q types which fall outside of the statement/expression hegemony. We don't always think about those, but they are just as important. And they sometimes contain identifiers, which would be unhygienic if there wasn't a solution to cover them. Example: a trait like is looser(infix:<+>). The hygiene consists of infix:<+> being guaranteed to mean what it means in the defining macro environment, not what it means in the using mainline environment.

The new solution is deceptively simple, and will hopefully take us all the way: equip identifiers with a knowledge of what block they were defined in. If we exclude all macro magic, all identifiers will always start their lookup from "here", the position in the code that the runtime is in. (Note that this is already strong enough to account for things like closures, because "here" is exactly the context in which the closure was defined.)

The magic thing about macros is that you will inject a lot of code, and all the identifiers in that code will remember where they were born: in a quasi in a macro somewhere. That's the context in which they will do the lookup, not the "here" of the mainline code. Et voilà, hygiene.

We still allow identifiers to be synthetically created (using object construction syntax) without such a context. This makes them "detached", and the idea is that this will provide a little bit of a safety vent for when people explicitly want to opt out of hygiene. The Common Lisp and Scheme people seem to agree that there are plenty of such cases.

Second insight: there are two ways to create a program

So we know that Qtrees are formed by the parser going over a program, analyzing it, and spitting out Q objects that hang together in neat ways. We know that this process is always "safe", because we trust the parser and the parser is nice.

But in 007 where Q objects are in some sense regular objects, and they can be plugged together in all kinds of way by the user, anything could happen. The user is not necessarily nice — as programmers we take this as a self-evident truth. Someone could try to cram a return statement into a trait! Or drop a parameter list into a constant declaration. Many other horrors could be imagined.

So there have to be rules. Kind of like a HTML DOM tree is not allowed to look any which way it wants. We haven't set down those rules in stone yet in 007, but I suspect we will, and I suspect it will be informative.

(The challenging thing is when we try to combine this idea of restriction with the idea of extending the language. Someone wants to introduce optional parameters. Ok, let's say there's a process for doing that. Now the first obstacle is the rule that says that a parameter consists of just an identifier and nothing else. Once you've negotiated with that rule and convinced it to loosen up, you still have various downstream tools such as linters and code formatters you need to talk to. The problem is not unsolvable as such, just... the right kind of "interesting".)

But it doesn't end with just which nesting relationships — there's a certain sense in which a completely synthetically constructed Qtree isn't yet part of the program as such. For example, it contains variable declarations, but those variable haven't been declared yet. It contains macro calls, but those macros haven't been applied yet. And so on. All of those little things need to happen as the synthetic Qtree is introduced into the bigger program. We currently call this process checking, but the name may change.

Parsing and checking are like two sides of the same coin. Parsing is, in the end, just a way to convince the compiler that a piece of program text is actually a 007 program. Checking is just a way to convince the compiler that a Qtree is actually a 007 program (fragment). We've grown increasingly confident in the idea of checking, partly because we're using it quite heavily in our test suite.

But why stop there? Having two code paths is no fun, and in a sense parsing is just a special case of checking! Or, more exactly, parsing is a very optimized form of checking, where we make use of the textual medium having certain strengths. Why are identifiers limited to the character set of alphanumerics and underscores (and, in Perl 6, hyphens and apostrophes)? Mostly because that's a way for the parser to recognize an identifier. There's nothing at all that prevents us from creating a synthetic Q::Identifier with the name , and there's nothing at all that prevents the checker from processing that correctly and generating a correct program.

It seems to me that the work forwards is to explore the exact relationship between text and Qtrees, between parsing and checking, and to set up sensible rules within which people can write useful macros.

This is exciting work, and will eventually culminate in very nice macros in Perl 6. Expect to hear more about 007 in 2016. See you on the other side!

Day 22 – Perl6 and CPAN

Here’s the short version.

Not yet; but efforts are underway. Stay tuned! News at http://pl6anet.org. And, as always, hit up #perl6 on freenode if you want to talk about it.

Please continue below if you’ve any interest in an overly detailed advent post.

First, I’d like to point out that the imminent Christmas Release is largely not concerned with the topic of this post. I’ll leave it to someone more qualified to qualify the release in more precise terms but suffice it to say its about the Perl6 language and at least one implementation. That does not include non-core ecosystem concerns such as: packaging, distribution, searching, installing, testing services, linters, etc… In the Perl5 world these things are collectively known as “CPAN” and are a huge part of what makes Perl5 useful to many.

The second item I’d like to bring to your attention is that we’ve had an ecosystem solution for quite some time now. Its basically a collection of repos hosted at github (https://github.com/perl6/ecosystem/blob/master/META.list), which can be searched (modules.perl6.org) and installed (panda or zef).
If you want to publish or use Perl6 modules now then this is the way to go. Its probably worth noting that versioning support is not fully implemented yet.

And now onto the future! What should a Perl6 ecosystem be? This is still an open question and an active area of experimentation. Should it be based on shiny things like github and travis? This could probably be made to work by adding versioning support and a few other things to the existing ecosystem. But then there’s the issue of being dependent on entities we don’t control. Should it be built from scratch? I’m aware of one example of that: http://cpan6.org/. Or should it be based on, or even built on top of, Perl5’s CPAN?

I decided a while back I wanted to explore the last of those options.
See http://jdv79.blogspot.com/2015/10/perl6-and-cpan.html and
http://jdv79.blogspot.com/2015/10/perl6-and-cpan-metacpan-status-as-of.html. Those two posts cover it pretty well actually.

Since then all I’ve done is fix Perl6 related bugs in MetaCPAN. Things like syntax highlighting, Pod6 rendering, and of course searching more or less work. One of the better working dists is
http://hack.p6c.org:5001/release/Data-Selector since that’s the one I did the most testing with. And here’s the full listing: http://hack.p6c.org:5001/author/JDV.

Also, just a few days ago, Ranguard decided to help move us forward a bit by basically uploading the ecosystem onto CPAN under the PSIXDISTS user. Unfortunately, or fortunately depending on how you look at it, this led to the discovery of a bug in PAUSE which is not yet fixed (https://github.com/andk/pause/issues/194). It’s probably best to wait until that’s fixed before uploading any Perl6 dists to CPAN.

In summary, we only have the beginnings of PAUSE and MetaCPAN support. Once we get the installers working we’ll have a useful Perl6 CPAN that we can begin to play with.  Or throw out – who knows:) Until then use the ecosystem at http://modules.perl6.org/ for all your Perl6 module needs.

UPDATE: I just noticed the aforementioned PAUSE bug may be fixed as of about an hour ago (~ 1am cet here). Andk++

Day 21 – NativeCall-backs and Beyond C

One of my favorite features in Perl 6 is the NativeCall interface, because it allows gluing virtually any native library into it relatively easily. There have even been efforts to interface with other scripting languages so that you can use their libraries as well.

There have already been a pair of advent posts on NativeCall already, one about the basics in 2010 and one about objectiness in 2011. So this one won’t repeat itself in that regard, and instead be about Native Callbacks and C++ libraries.

Callbacks

While C isn’t quite as good as Perl at passing around functions as data, it does let you pass around pointers to functions to use them as callbacks. It’s used extensively when dealing with event-like stuff, such as signals using signal(2).

In the NativeCall docs, there’s a short quip about callbacks. But they can’t be that easy, can they?

Let’s take the Expat XML library as an example, which we want to use to parse this riveting XML document:

<calendar>
    <advent day="21">
        <topic title="NativeCall Bits and Pieces"/>
    </advent>
</calendar>

The Expat XML parser takes callbacks that are called whenever it finds and opening or closing XML tag. You tell it which callbacks to use with the following function:

XML_SetElementHandler(XML_Parser parser,
                      void (*start)(void *userdata, char *name, char **attrs),
                      void (*end)(void* userdata, char *name));

It associates the given parser with two function pointers to the start and end tag handlers. Turning this into a Perl 6 NativeCall subroutine is straight-forward:

use NativeCall;

sub XML_SetElementHandler(OpaquePointer $parser,
                          &start (OpaquePointer, Str, CArray[Str]),
                          &end   (OpaquePointer, Str))
    is native('expat') { ... }

As you can see, the function pointers turn into arguments with the & sigil, followed by their signature. The space between the name and the signature is required, but you’ll get an awesome error message if you forget.

Now we’ll just define the callbacks to use, they’ll just print an indented tree of opening and closing tag names. We aren’t required to put types and names in the signature, just like in most of Perl 6, so we’ll just leave them out where we can:

my $depth = 0;

sub start-element($, $elem, $)
{
    say "open $elem".indent($depth * 4);
    ++$depth;
}

sub end-element($, $elem)
{
    --$depth;
    say "close $elem".indent($depth * 4);
}

Just wire it up with some regular NativeCallery:

sub XML_ParserCreate(Str --> OpaquePointer)               is native('expat') { ... }
sub XML_ParserFree(OpaquePointer)                         is native('expat') { ... }
sub XML_Parse(OpaquePointer, Buf, int32, int32 --> int32) is native('expat') { ... }

my $xml = q:to/XML/;
    <calendar>
        <advent day="21">
            <topic title="NativeCall Bits and Pieces"/>
        </advent>
    </calendar>
    XML

my $parser = XML_ParserCreate('UTF-8');
XML_SetElementHandler($parser, &start-element, &end-element);

my $buf = $xml.encode('UTF-8');
XML_Parse($parser, $buf, $buf.elems, 1);

XML_ParserFree($parser);

And magically, Expat will call our Perl 6 subroutines that will print the expected output:

open calendar
    open advent
        open topic
        close topic
    close advent
close calendar

So callbacks are pretty easy in the end. You can see a more involved example involving pretty-printing XML here.

C++

Trying to call into a C++ library isn’t as straight-forward as using C, even if you aren’t dealing with objects or anything fancy. Take this simple library we’ll call cpptest, which can holler a string to stdout:

#include <iostream>

void holler(const char* str)
{
    std::cout << str << "!\n";
}

When you try to unsuspectingly call this function with NativeCall:

sub holler(Str) is native('cpptest') { ... }
holler('Hello World');

You get a nasty error message like Cannot locate symbol 'holler' in native library 'cpptest.so'! Why can’t Perl see the function right in front of its face?

Well, C++ allows you to create multiple functions with the same name, but different parameters, kinda like multi in Perl 6. You can’t actually have identical names in a native library though, so the compiler instead mangles the function names into something that includes the argument and return types. Since I compiled the library with g++ -g, I can get the symbols back out of it:

$ nm cpptest.so | grep holler
0000000000000890 T _Z6hollerPKc

So somehow _Z6hollerPKc stands for “a function called holler that takes a const char* and returns void. Alright, so if we now tell NativeCall to use that weird gobbledegook as the function name instead:

sub holler(Str) is native('cpptest') is symbol('_Z6hollerPKc') { ... }

It works, and we get C++ hollering out Hello World!, as expected… if the libary was compiled with g++. The name mangling isn’t standardized in any way, and different compilers do produce different names. In Visual C++ for example, the name would be something like ?holler@@ZAX?BPDXZ instead.

The proper solution is to wrap your function like so:

extern "C"
{
    void holler(const char* str)
    {
        std::cout << str << "!\n";
    }
}

This will export the function name like C would as a non-multi function, which is standardized for all compilers. Now the original Perl 6 program above works correctly and hollers without needing strange symbol names.

You still can’t directly call into classes or objects like this, which you probably would want to do when you’re thinking about NativeCalling into C++, but wrapping the methods works just fine:

#include <vector>

extern "C"
{
    std::vector<int>* intvec_new() { return new std::vector<int>(); }
    void intvec_free(std::vector<int>* vec) { delete v; }
    // etc. pp.
}

There’s a more involved example again.

Some C++ libraries already provide a C wrapper like that, but in other cases you’ll have to write your own. Check out LibraryMake, which can help you compile native code in your Perl 6 modules. There’s also FFI::Platypus::Lang::CPP for Perl 5, which lets you do calls to C++ in a more direct fashion.

Update on 2015-12-22: as tleich points out in the comments, there is an is mangled attribute for mangling C++ function names. So you might be able to call the pure C++ function after all and have NativeCall mangle it for you like your compiler would do – if your compiler is g++ or Microsoft Visual C++:

sub holler(Str) is native('cpptest') is mangled { ... }
holler('Hello World');

It doesn’t seem to be working for me though and fails with a don't know how to mangle symbol error. I’ll amend this post again if I can get it running.

Update on 2015-12-23: the NativeCall API has changed (thanks to jczeus for pointing it out) and now automatically adds a lib prefix to library names. The code changed from is native('libexpat') to is native('expat'). It will also complain that a version should be added to the library name, but I don’t want to weld this code to an exact version of the used libraries.

Perl 6.christmas – have an appropriate amount of fun!

A decade ago Audrey Tang started the Pugs project to implement Perl 6 in Haskell and declared that Perl 6 was optimized for fun: -Ofun.

Game designers know what -Ofun looks like – it’s when players find themselves engrossed in their game, time flies and they experience flow. The art of great game design is balancing the player’s increasing ability with the challenge of the game. Sometimes it’s a bumpy ride!

flow

Flow can happen when programming too: when you’re free from distraction, and the challenge you’re facing is not too easy, not too hard and you’re making positive progress.

Audrey’s invitation to new Perl 6 contributors to have ‘the appropriate amount of fun’ encouraged them to step forward and pick their place on this challenge/ability curve.

The challenge of designing and implementing Perl 6 has presented some scary technical dragons! Fortunately some of the most able programmers stepped forward to hack them down to size.

Over the past decade, it has been a joy to regularly read the #perl6 IRC logs and watch this process unfold. Despite the naysayers, #perl6 has remained a positive, productive place – the propensity for -Ofun is part of the DNA of Perl 6.

camelia-flowing

Camelia, the Perl 6 butterfly logo, helps sum this up. Camelia includes a reference to a ‘camel’ in her name, however, unlike a camel she doesn’t smell or spit. Flow requires a positive mindset. Camelia is a licence to have -Ofun – not the frivolous, Christmas cracker type, but the deeply productive, enjoyable and enduring type.

The Christmas release is a turning point in the whirlpool of Perl 6 development. The -Ofun will now freely flow outwards to new programmers.

Perl 6, with its emphasis on whipupitude and expressivity, makes it ideally suited to finding flow while programming.

If you haven’t unwrapped Perl 6 yet, this Christmas is the perfect time. You can download Perl 6 here and remember to have an appropriate amount of fun!

Day 19 – Introspection

Perl 6 is an Object-Oriented Programming Language. Well, really, it’s multi-paradigm; but one of the options (that’s used throughout its core) is definitily Object Orientation.

However, as the (current) perl6.org tells us, Perl 6 supports “generics, roles and multiple dispatch”, which are some very nice features, and already covered in other advent calendar posts.

But the one we’re going to take a look at today is the MOP. “MOP” stands for “Meta-Object Protocol”. It means that, instead of objects, classes, etc defining the language; they’re actually a part you can change (or just look at) from the user’s side of things.

Indeed, in Perl 6, you can add methods to a type, remove some, wrap methods, augment classes with more capabilities (OO::Actors and OO::Monitors are two such examples), or you could totally redefine it (and, say, use a Ruby-like object system. Such an example here).

But today, instead, we’re gonna look at the first part: Introspection. Looking at a type after it’s been built, getting to know it, and use these informations.

The module we’re going to build together is based on a need for the Sixcheck module (a QuickCheck-like module): generate some random data for a type, then feed that data to the function we’re testing, and check some post-condition.

So, we write our first version:

my %special-cases{Mu} =
  (Int) => -> { (1..50).pick },
  (Str) => -> { ('a'..'z').pick(50).join('') },
;
sub generate-data(Mu:U \t) {
  %special-cases{t} ?? %special-cases{t}() !! t.new;
}
generate-data(Int);

okay, that’s the first version we wrote. We note a few things:

  • We specify the key type for %special-cases. That’s because, by default, the type is Str. Obviously, we don’t want our types to be stringified. What we actually do is specify they’re going to be subtypes of “Mu” (which is at the top of the types “food chain”)
  • We put parentheses around Int and Str, to avoid stringification.
  • We use the :U in the function’s argument’s type. That means the value has to be undefined. Type objects (as in Int, Str, etc) are undefined, so it serves us well (a different unknown value you’ve probably seen is Nil).
  • Type objects really are… Objects, like any other object (more on that later).
    That’s why we can call .new on them, like here. It’s the same as directly calling Int.new, for example (that’s useful for consistency, and autovivification).
  • We provide a fallback for Int and Str, because calling Int.new and Str.new (0 and “”) would not give us any randomness in the data we create
  • Perl 6 automatically returns the last expression in a function, so we didn’t have to put a return there.

That’s our code to generate the data, fair and square. But we’re gonna need to generate much more than those simple examples.

The least we need to support is classes with properties: we want to look at the list of attributes, generate data for their type, and feed them to the constructor (the only classes we support right now are those with a parameterless constructor).

We need to be able to look inside a class. What we’re going to reach for, in Perl 6 terms, is the Meta-Object Protocol (or MOP for short). First off, let’s define a sample data class for our, huh, blog (the author is sorry for such a lack of imagination).

class Article {
  has Str $.title;
  has Str $.content;
  has Int $.view-count;
}
# we can manually create instances this way:
Article.new(title => "Perl 6 Advent, Day 19",
            content => "Magic!",
            view-count => 0);

But we don’t want to create the article by hand. We want to pass the class Article to our generate-data function, and get an Article back (with random data inside). Let’s go back to our REPL…

> say Article.^attributes;
(Str $!title Str $!content Int $!view-count)
> say Article.^attributes[0].WHAT;
(Attribute)

If you’ve clicked on the MOP link, you shouldn’t be surprised that we get a 3-elements array. If you’re still surprised by the syntax, .^ is the meta-method call. What it means is that a.^b translates to a.HOW.b(a).

If we want to know what’s available to us, we could just ask (removing the anonymous ones):

> Attribute.^methods.grep(*.name ne '<anon>')
(compose apply_handles get_value set_value container readonly package inlined WHY set_why Str gist)
> Attribute.^attributes
Method 'gist' not found for invocant of class 'BOOTSTRAPATTR'

Whoops… Seems like this is a bit too meta. Thankfully, we can use a very nice property of Rakudo: a lot of it is written in Perl 6! To know what’s available to us, we can just look up the source:

#     has str $!name;
...
#     has Mu $!type;

We got the name for the key, and the type to generate the value. Let’s see…

> say Article.^attributes.map(*.name)
($!title $!content $!view-count)
> say Article.^attributes.map(*.type)
((Str) (Str) (Int))

Yep! Seems correct! (If you’re wondering why we get $! (private) twigils back, it’s because $. only means a getter method will be generated. The attribute itself is still private, and accessible in the class).

Now, the only thing we need to build is a loop…

my %args;
for Article.^attributes -> $attr {
 %args{$attr.name.substr(2)} = generate-data($attr.type);
}
say %args.perl;

This is an example of what could be printed:

{:content("muenglhaxrvykfdjzopqbtwisc"), :title("rfpjndgohmasuwkyzebixqtvcl"), :view-count(45)}

You get different results each time you run your code (I don’t think it’ll produce an article worth reading, however…).

Only thing left to do is pass them to Article‘s constructor:

say Article.new(|%args);

(the prefix pipe | allows us to pass %args as named arguments, instead of a single positional argument). Again, you should get something like this printed:

Article.new(title => "kyvphxqmejtuicrbsnfoldgzaw", content => "jqbtcyovxlngpwikdszfmeuahr", view-count => 26)

Yay! We managed to create an Article instance “blindly”, without knowing nothing special about Article at all. Our code can be used to generate data for any constructor that expects its class attributes to be passed. Done!

Before I go, I need to remind you of one things: with great power comes a lot of “WTF”. If you’re only taking a peek like we do here, and gathering information, it should be all fine. But don’t go around and change the internals of Rakudo’s classes, because no one knows what’d happen then :-). As always, have a moderate amount to fun introspecting.

PS: Left as an exercise to the reader is the recursive implementation, move the loop to generate-data, so that we could add a User $.author attribute to Article, and get this one constructed as well. Good luck!

Day 18 – Sized, Typed, Shaped

If you’ve used Perl 5, you probably know PDL (Perl Data Language), the library to manipulate highly efficient data structures.

In Perl 6, we do not (yet) have full-fledged support for PDL (or a PDL-like module), but we do have support for specific features: sized types, shaped variables (and typed arrays).

The first feature, sized types, are low-level types that are composed of a generic low-level type name, and the number of bytes they use. Such an example is int1, int64 (aka int on 64-bit machines), buf8 (a “normal” byte buffer). If you’re interesting in reading the spec, it’s all in S09.

The second feature, shaped variables (still in S09), allow us to specify the fixed size of an array:

my @dwarves[7];

This means the only available indices are 0 to 6 (inclusive). This is an example from the Perl 6’s REPL (Read-Eval-Print-Loop. The input lines start with “>”):

> my @dwarves[7];
[(Any) (Any) (Any) (Any) (Any) (Any) (Any)]
> @dwarves[0] = 'Sleepy';
Sleepy
> @dwarves[7];
Index 7 for dimension 1 out of range (must be 0..6)

The last line of input mentions “dimension 1”. Indeed, Perl 6’s shaped variables can be multi-dimensional (see S09):

> my @ints[4;2];
[[(Any) (Any)] [(Any) (Any)] [(Any) (Any)] [(Any) (Any)]]
> @ints[4;3]
Index 3 for dimension 2 out of range (must be 0..1)
> @ints[4;1]
Index 4 for dimension 1 out of range (must be 0..3)
> @ints[0;0] = 1;
1
> @ints
[[1 (Any)] [(Any) (Any)] [(Any) (Any)] [(Any) (Any)]]

The first output shows us that Perl 6 filled our array with Any at first: this means the whole array is “empty”, and that it can store anything (since Any is a supertype of IntStr, etc).

We might not want that: our array should contain integers (Int). Instead, we can declare a typed array:

> my Int @a[8] = 0..7;
[0 1 2 3 4 5 6 7]
> @a[8] = 8;
Index 8 for dimension 1 out of range (must be 0..7)
> @a[0].WHAT.say # print the type
(Int)

This is all very useful in itself, but there’s a special property we can benefit from, by using everything presented in this post together: contiguous storage!

Natively-typed shaped arrays in Perl 6 are specced to take contiguous memory, meaning we get a very good memory layout, and use little memory

> my int @a[2;4] = 1..4, 5..8;
[[1 2 3 4] [5 6 7 8]]
> @a[0;0]++;
2
> @a.WHAT.say
(array[int])

Because we know how much memory each element takes, and how many elements we have, we can very easily calculate the array upfront, once and for all – or even calculate the size necessary to store the elements: we know that int, on our 64-bit computers (that’s int64), we need 2*4*8 bytes for our 2*4 array.

Wrapping up; that’s 64 bytes total (…almost), and we get to still write Perl 6! We can use the operations we know and love (even meta/hyper-operators) and al – they’re still arrays – with our cool performance boost. Amazing!

(note about the size: we need to add some overhead for the runtime; on MoarVM, that’s something like 16 bytes for GC header, 16 bytes for the dimensions list. That’s a grand total of 16+16+64=96 bytes. pretty cool!)

Day 17 – A loop that bears repeating

I don’t believe anyone has ever Advent blogged about this topic. But it’s getting hard to tell, what with almost six years of old posts. 😁

This story starts in Perl 5, where we implement a countdown with a simple while loop:

$ perl -wE'my $c = 5; while ($c--) { say $c }'
4
3
2
1
0

And of course, if we want to exit early from the loop, we use the trusty last:

$ perl -wE'my $c = 5; while ($c--) { say $c; last if $c == 2 }'
4
3
2

So far, so good. But notice that $c is decremented before each run of the loop body. Under some circumstances it makes more sense to do it afterwards. In Perl 5, we’d use do ... while:

$ perl -wle'my $c = 5; do { print $c; } while $c--'
5
4
3
2
1
0

Great. Now we just need to combine this with the early exit:

$ perl -wle'my $c = 5; do { print $c; last if $c == 2 } while $c--'
5
4
3
2
Can't "last" outside a loop block at -e line 1.

Eh? Oops.

Experienced Perl people will know what’s going on here. The documentation, in this case perldoc perlsyn, explains it clearly:

The “while” and “until” modifiers have the usual “”while” loop” semantics (conditional evaluated first), except when applied to a “do”-BLOCK […], in which case the block executes once before the conditional is evaluated.

…and then…

Note also that the loop control statements described later will NOT work in this construct, because modifiers don’t take loop labels. Sorry.

In other words, the do ... while construct is a little bit of a cheat in Perl 5, made to work as we expect, but really just a do block with reindeer horns glued on top of its head. The cheat is exposed when we try to next and last from the do block.

perldoc perlsyn goes on to explain how you can mitigate this with more blocks and loop labels, but in a way, the damage is already done. We’ve lost a little bit of that natural belief in goodness, and Santa, and predictable language semantics.

It goes without saying that this is a situation up with which Perl 6 will not put. Let’s see what Perl 6 provides us with.

Instead of painstakingly lining up differences, let’s just first replay our Perl 5 session with the corresponding Perl 6:

$ perl6 -e'my $c = 5; while $c-- { say $c }'
4
3
2
1
0
$ perl6 -e'my $c = 5; while $c-- { say $c; last if $c == 2 }'
4
3
2
$ perl6 -e'my $c = 5; repeat { say $c } while $c--'
5
4
3
2
1
0
$ perl6 -e'my $c = 5; repeat { say $c; last if $c == 2 } while $c--'
5
4
3
2

We’ve gotten rid of a few parentheses, as is usually the case when changing to Perl 6. But the biggest difference is that do has now been replaced by another keyword repeat.

In Perl 5, when we saw the do keyword and a block, we didn’t know if there would come a while or until statement modifier after the block. In Perl 6, when we see the repeat, that’s a promise that this is a repeat ... while or repeat ... until loop. So it’s a real loop in Perl 6, not a fake.

Oh, and the last statement works! As it does in real loops.


The story might have ended there, but Perl 6 has a little bit more of a gift to give with this type of loop.

Let’s say we’re waiting for an input that has to have exactly five letters.

sub valid($s) {
    $s ~~ /^ <:Letter> ** 5 $/;
}

This is the type of thing that we’d tend to express with a repeat-style loop, because we have to read the input first, and then find out if we’re done or not:

my $input;
repeat {
    $input = prompt "Input five letters: ";
} until valid($input);

The $input variable has to be declared separately outside of the loop block, because we’re testing it in the until condition outside of the loop block.

Even though we now know to expect a while or until after the block, having that information there makes the end-weight of the loop a bit problematic: we’re delaying some of the most pertinent information until last.

(A similar problem happens in Perl 5 regexes, where a lot of modifiers can show up after the final /, changing the meaning of the whole regex body. And of course, even in spoken and written language, you might have a sentence which goes on and on and just won’t stop, even though it should have long ago, and finally when you think you’ve got the hang of it, it surprises you unduly by ending with altogether the wrong hamster.)

For this reason, Perl 6 allows the following variant of the above repeat loop:

my $input;
repeat until valid($input) {
    $input = prompt "Input five letters: ";
}

That looks nicer!

I hasten to underline that even after moving the condition to the top of the loop like this, the code still has the previous behavior — that is, the condition valid($input) is still evaluated after each iteration of the loop body. (That is, after all, the difference between a repeat until loop and just an until loop.) In other words, we get to place the condition where it’s more prominent and harder to miss, but we retain the expected eval-condition-afterwards semantics.

As a final nice bonus, we can now inline the declaration of $input into the loop condition itself.

repeat until valid(my $input) {
    $input = prompt "Input five letters: ";
}

This clearly shows the difference between order of elaboration (in which variables are declared before they are used) and order of execution (in which statements and expressions evaluate in the order they damn well please).

That’s repeat, folks. Solves a slew of problems, and lets us write nice, idiomatic code.