Day 23 – macros and secret agents

Let’s talk about 007.

$ perl6 bin/007 -e='say("OH HAI")'

007 is a small language, implemented in Perl 6. Its reason for existing is macros, a topic that has become quite dear to me.

I want to talk a little bit about macros, and what's happened with them in 2015. But first let's just get out of the way any doubts about 007 being a real language. Here, have a Christmas tree.

for [1, 2, 3, 4, 5, 2, 2] -> n {
    my indent = " " x (5 - n);
    my tree = "#" x (2 * n - 1);
    say(indent ~ tree);

Which gives this output:


If you want to describe 007 real quick, you could say it has a syntax very much like Perl 6, but it borrows things from Python (and JavaScript). For example, there are no sigils. (Awww!) There's no $_, so if you're interested in the elements you're iterating over, you need to declare a loop variable like we did. There's no flattening, no list context, and no conflation between a single item and a list-of-one. (Yay!)

007 does have objects:

my agent = {
    name: "James Bond",
    number: 007

say(;             # James Bond
say(agent.has("number"));    # 1

And, as it happens, all the types of statements, expressions, and other program elements are also expressible in the object system.

my code = quasi {
    say("secret agent!")

        .value);             # secret agent!

The quasi there "freezes" the program code into objects, incidentally the exact representation that the 007 compiler deals with. We can then dig into that structure, passing down through a block, a statement, a (function invocation) expression, and finally a string value.

Being able to take this object-oriented view of the program structure is very powerful. It opens up for all kinds of nice manipulations. Perhaps the most cogent thing I've written about such manipulations is a gist about three types of macros. But the way ahead is still a bit uncertain, and 007 is there exactly to scout out that way.

I recently blogged about 007 over at strangelyconsistent, telling a little bit about what's happened with the language in the past year. Here, today, I want to tell about some things that are more directly relevant to Perl 6 macros.

But first...

What are all those Q types, then?

Giving a list of the Q types is still dangerous business, since the list itself is still in mild flux. But with that in mind, let's see what we have.

There's a Q type for each type of statement. Q::Statement::If is a representative example. There's also while loops, for loops, declarations of my variables, constants, subs, macros, etc. Return statements.

A common type of statement is Q::Statement::Expr, which covers anything from a function call to doing some arithmetic. A Q::Statement::Expr is simply a statement which contains a single expression.

So what qualifies as an expression? All the operators: prefix, infix, postfix. Various literal forms: ints, strings. Bigger types of terms like arrays and objects. Identifiers — that is, something like a variable, whether it's being declared or being used. Quasi terms, like we saw above.

Some constructs have some "extra" moving parts that are neither statements nor expressions. For example, blocks have a parameter list with parameters — the parameter list is a Q type, and a parameter is a Q type. Function calls have argument lists. Subs and macros have trait lists. Quasis have "unquotes", a kind of parametric hole in the quasi where a dynamic Qtree can be inserted.

And that's it. For a detailed current list, see

First insight: the magic is in the identifiers

Way back when we started to think about macros in Perl 6, we came to realize that Qtrees are never quite free of their context. You always keep doing variable lookups, calling functions, etc. Because this happens in two vastly different environments (the macro's environment, and the macro user's environment), quite a bit of effort goes to keeping these lookups straight and not producing surprising or unsafe code. This concern is usually referred to as "hygiene", but all we really need to know is that the macro environment is supposed to be able to contain any variable bindings and still not mess up the macro user's environment... and vice versa.

macro moo() {
    my x = "inside macro";
    return quasi { say(x) };

my x = "outside macro";
moo();    # inside macro

The prevailing solution to this (and the one that I started to code into Rakudo as part of my macros grant) was to achieve hygiene by the clever use of scopes. See this github issue comment for a compelling example of how code blocks would be injected in such a way that the variable lookup would just happen to find the right declaration, even if that means jumping all over the program.

With the recent work of 007, it's evident that the "clever use of scopes" solution won't work all the way. I'm glad I discovered this now, in a toy language implementation, and not after investing lots more time writing up a real solution of this in Rakudo.

The fundamental problem is this: injecting blocks is all good and well in either a statement or an expression. Both of these situations can be made to work. But we also saw in the previous section that there are "extra" Q types which fall outside of the statement/expression hegemony. We don't always think about those, but they are just as important. And they sometimes contain identifiers, which would be unhygienic if there wasn't a solution to cover them. Example: a trait like is looser(infix:<+>). The hygiene consists of infix:<+> being guaranteed to mean what it means in the defining macro environment, not what it means in the using mainline environment.

The new solution is deceptively simple, and will hopefully take us all the way: equip identifiers with a knowledge of what block they were defined in. If we exclude all macro magic, all identifiers will always start their lookup from "here", the position in the code that the runtime is in. (Note that this is already strong enough to account for things like closures, because "here" is exactly the context in which the closure was defined.)

The magic thing about macros is that you will inject a lot of code, and all the identifiers in that code will remember where they were born: in a quasi in a macro somewhere. That's the context in which they will do the lookup, not the "here" of the mainline code. Et voilà, hygiene.

We still allow identifiers to be synthetically created (using object construction syntax) without such a context. This makes them "detached", and the idea is that this will provide a little bit of a safety vent for when people explicitly want to opt out of hygiene. The Common Lisp and Scheme people seem to agree that there are plenty of such cases.

Second insight: there are two ways to create a program

So we know that Qtrees are formed by the parser going over a program, analyzing it, and spitting out Q objects that hang together in neat ways. We know that this process is always "safe", because we trust the parser and the parser is nice.

But in 007 where Q objects are in some sense regular objects, and they can be plugged together in all kinds of way by the user, anything could happen. The user is not necessarily nice — as programmers we take this as a self-evident truth. Someone could try to cram a return statement into a trait! Or drop a parameter list into a constant declaration. Many other horrors could be imagined.

So there have to be rules. Kind of like a HTML DOM tree is not allowed to look any which way it wants. We haven't set down those rules in stone yet in 007, but I suspect we will, and I suspect it will be informative.

(The challenging thing is when we try to combine this idea of restriction with the idea of extending the language. Someone wants to introduce optional parameters. Ok, let's say there's a process for doing that. Now the first obstacle is the rule that says that a parameter consists of just an identifier and nothing else. Once you've negotiated with that rule and convinced it to loosen up, you still have various downstream tools such as linters and code formatters you need to talk to. The problem is not unsolvable as such, just... the right kind of "interesting".)

But it doesn't end with just which nesting relationships — there's a certain sense in which a completely synthetically constructed Qtree isn't yet part of the program as such. For example, it contains variable declarations, but those variable haven't been declared yet. It contains macro calls, but those macros haven't been applied yet. And so on. All of those little things need to happen as the synthetic Qtree is introduced into the bigger program. We currently call this process checking, but the name may change.

Parsing and checking are like two sides of the same coin. Parsing is, in the end, just a way to convince the compiler that a piece of program text is actually a 007 program. Checking is just a way to convince the compiler that a Qtree is actually a 007 program (fragment). We've grown increasingly confident in the idea of checking, partly because we're using it quite heavily in our test suite.

But why stop there? Having two code paths is no fun, and in a sense parsing is just a special case of checking! Or, more exactly, parsing is a very optimized form of checking, where we make use of the textual medium having certain strengths. Why are identifiers limited to the character set of alphanumerics and underscores (and, in Perl 6, hyphens and apostrophes)? Mostly because that's a way for the parser to recognize an identifier. There's nothing at all that prevents us from creating a synthetic Q::Identifier with the name , and there's nothing at all that prevents the checker from processing that correctly and generating a correct program.

It seems to me that the work forwards is to explore the exact relationship between text and Qtrees, between parsing and checking, and to set up sensible rules within which people can write useful macros.

This is exciting work, and will eventually culminate in very nice macros in Perl 6. Expect to hear more about 007 in 2016. See you on the other side!