Day 14 – Targetting MoarVM, the Wrong Way

MoarVM is a virtual machine specifically designed to be a backend for the NQP compiler toolchain in general and the Rakudo Perl 6 compiler in particular.

It is not restricted to running Perl 6, though, and if anyone wants to implement their own language on top of it, Jonathan has been kind enough to provide free course material that walks you through the process. In particular, the code examples for PHPish and Rubyish are worth a quick look to see how things are supposed to work.

However, where there’s a Right Way of doing things, there’s also a Wrong Way, and that’s what we’re gonna look at today!

Generating Bytecode

MoarVM bytecode is generated from MAST trees, defined in lib/MAST/Nodes.nqp of your MoarVM checkout. The file states:

# This file contains a set of nodes that are compiled into MoarVM
# bytecode. These nodes constitute the official high-level interface
# to the VM. At some point, the bytecode itself will be declared
# official also. Note that no text-based mapping to/from these nodes
# will ever be official, however.

This has historical reasons: Parrot, the VM that Rakudo used to target, had an unhealthy overreliance on its textual intermediate representation PIR. Personally, I think it is a good idea to have some semi-official text-based bytecode representation – you just shouldn’t use it as the exchange format between compilation stages.

That’s where doing things the Wrong Way come in: During the last two weeks, I’ve started writing an assembler targetting MAST and a compiler for a tiny low-level language targetting this assembly dialect, doing exactly what I just told you not to do.

Why did I? What I hope to accomplish eventually is providing a bootstrapped alternative to the NQP toolchain, and you have to start your bootstrapping process somewhere.

Currently, only a few bits and pieces have been implemented, but these bits and pieces are somewhat functional and you can do such useful things as echo input from stdin to stdout:

$ cat t/echo.tiny
fn main() {
    obj stdin = getstdin
    do {
        str line = readline stdin
        int len = chars line
        done unless len
        print line
        redo
    }
    exit 0
}

You can either run the code directly

$ ./moartl0 --run t/echo.tiny

compile it first

$ ./moartl0 --compile t/echo.tiny

$ moar t/echo.moarvm

or take a look at the generated assembly

$ ./moartl0 --dump t/echo.tiny
.hll tiny
.frame main
.label bra0_main
    .var obj v0_stdin
    getstdin $v0_stdin
.label bra1_do
    .var str v1_line
    readline_fh $v1_line $v0_stdin
    .var int v2_len
    chars $v2_len $v1_line
    unless_i $v2_len @ket1_do
    print $v1_line
    goto @bra1_do
.label ket1_do
    .var int i0
    const_i64 $i0 0
    exit $i0
.label ket0_main
# ok

There isn’t really anything fancy going on here: Text goes in, text goes out, we can explain that.

Note that the assembly language is not yet finalized, but so far I’ve opted for a minimalistic syntax that has VM instructions separated from its operands by whitespace and accompanied by assembler directives prefixed with a ..

Under the Hood

If you were to look at the source code of the compiler (as we probably should – this is supposed to be the Perl 6 advent calendar, after all), you might discover some useful idiom likes using a proto declaration

proto MAIN(|) {
    CATCH {
        ... # handle errors
        exit 1;
    }

    ... # preprocess @*ARGS
    {*}
}

to accompany our multi MAIN subs that define the command line interface.

However, you would also come across things that might not necessarily be considered best practice.

For one, the compiler is not reentrant: In general, we’re supposed to pass state along the call chain either in the form of arguments (the implicit self parameter of methods is a special case of that) or possibly as dynamic variables. When writing compilers specifically, the latter tend to be useful to implement recursive declarations like nested lexical scopes: a lexical frame of the target language will correspond to a dynamic frame of the parser. If you don’t care about reentrancy, though, you can just go with global variabes and use the temp prefix to achieve the same result.

For another, the compiler also doesn’t use grammars, but instead, the body of the line-based parsing loop is a single regex, essentially

# next-line keeps track of line numbering and trims the string
while ($_ := next-line) !=:= IterationEnd { /^[
    | ['#'|$]                       # ignore comments and empty lines
    | (:s ld (\w+)'()' '{'${ ... }) # start of load frame definition
    | (:s fn (\w+)${ ... })         # forward declaration of a function
    | ...                           # more statements
    || {bailout}
]/ }

The blocks { ... } represent the actions that have been embedded into the regex after $ anchors terminating each line.

That’s not really a style of programming I’d be comfortable advocating for in general – but Perl being Perl, There’s More Than One Way to Do It: For better or worse, Perl 6 gives programmers a lot of freedom to structure code how they see fit. As the stage 0 compiler is supposed to be supplanted anyway, I decided to have some fun instead of crafting a proper architecture.

In comparison, the assembler implemented in NQP is far more vanilla, with state held by an actions object.

But… Why?

The grinches among you may ask, What is this even doing here? Is this just someone’s personal side project that just happens to be written in Perl 6, of no greater use to the community at large?

Well, potentially, but not necessarily:

First, I do plan on writing a disassembler for MoarVM bytecode, and that may come in handy for bug hunting, testing or when looking for optimization opportunities.

Second, when running on MoarVM, Perl 6 code may load and interact with compilation units written in our tiny language or even hand-optimized VM assembly. The benefit over something like NativeCall is that we never leave the VM sandbox, and in contrast to foreign code that has to be treated as black boxes, the JIT compiler will be able to do its thing.

Third, an expansion of the MoarVM ecosystem might attract the attention of language enthusiasts beyond the Perl community, with at least the chance that MoarVM could deliver on what Parrot promised.

However, for now that’s all just idle speculation – remember, all there is right now is a two weeks old toy I came up with when looking for something to write about for this advent calendar. It’s a very real possibility that this project will die a quiet death before amounting to anything. But on the off chance it does not, it’s nice to have a hot cup of the preferred beverage of your choice and dream about a future where MoarVM rises as a butterfly-winged phoenix from the ashes of a dead parrot….

One thought on “Day 14 – Targetting MoarVM, the Wrong Way

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s