Day 14 – Targetting MoarVM, the Wrong Way

MoarVM is a virtual machine specifically designed to be a backend for the NQP compiler toolchain in general and the Rakudo Perl 6 compiler in particular.

It is not restricted to running Perl 6, though, and if anyone wants to implement their own language on top of it, Jonathan has been kind enough to provide free course material that walks you through the process. In particular, the code examples for PHPish and Rubyish are worth a quick look to see how things are supposed to work.

However, where there’s a Right Way of doing things, there’s also a Wrong Way, and that’s what we’re gonna look at today!

Generating Bytecode

MoarVM bytecode is generated from MAST trees, defined in lib/MAST/Nodes.nqp of your MoarVM checkout. The file states:

# This file contains a set of nodes that are compiled into MoarVM
# bytecode. These nodes constitute the official high-level interface
# to the VM. At some point, the bytecode itself will be declared
# official also. Note that no text-based mapping to/from these nodes
# will ever be official, however.

This has historical reasons: Parrot, the VM that Rakudo used to target, had an unhealthy overreliance on its textual intermediate representation PIR. Personally, I think it is a good idea to have some semi-official text-based bytecode representation – you just shouldn’t use it as the exchange format between compilation stages.

That’s where doing things the Wrong Way come in: During the last two weeks, I’ve started writing an assembler targetting MAST and a compiler for a tiny low-level language targetting this assembly dialect, doing exactly what I just told you not to do.

Why did I? What I hope to accomplish eventually is providing a bootstrapped alternative to the NQP toolchain, and you have to start your bootstrapping process somewhere.

Currently, only a few bits and pieces have been implemented, but these bits and pieces are somewhat functional and you can do such useful things as echo input from stdin to stdout:

$ cat t/echo.tiny
fn main() {
    obj stdin = getstdin
    do {
        str line = readline stdin
        int len = chars line
        done unless len
        print line
        redo
    }
    exit 0
}

You can either run the code directly

$ ./moartl0 --run t/echo.tiny

compile it first

$ ./moartl0 --compile t/echo.tiny

$ moar t/echo.moarvm

or take a look at the generated assembly

$ ./moartl0 --dump t/echo.tiny
.hll tiny
.frame main
.label bra0_main
    .var obj v0_stdin
    getstdin $v0_stdin
.label bra1_do
    .var str v1_line
    readline_fh $v1_line $v0_stdin
    .var int v2_len
    chars $v2_len $v1_line
    unless_i $v2_len @ket1_do
    print $v1_line
    goto @bra1_do
.label ket1_do
    .var int i0
    const_i64 $i0 0
    exit $i0
.label ket0_main
# ok

There isn’t really anything fancy going on here: Text goes in, text goes out, we can explain that.

Note that the assembly language is not yet finalized, but so far I’ve opted for a minimalistic syntax that has VM instructions separated from its operands by whitespace and accompanied by assembler directives prefixed with a ..

Under the Hood

If you were to look at the source code of the compiler (as we probably should – this is supposed to be the Perl 6 advent calendar, after all), you might discover some useful idiom likes using a proto declaration

proto MAIN(|) {
    CATCH {
        ... # handle errors
        exit 1;
    }

    ... # preprocess @*ARGS
    {*}
}

to accompany our multi MAIN subs that define the command line interface.

However, you would also come across things that might not necessarily be considered best practice.

For one, the compiler is not reentrant: In general, we’re supposed to pass state along the call chain either in the form of arguments (the implicit self parameter of methods is a special case of that) or possibly as dynamic variables. When writing compilers specifically, the latter tend to be useful to implement recursive declarations like nested lexical scopes: a lexical frame of the target language will correspond to a dynamic frame of the parser. If you don’t care about reentrancy, though, you can just go with global variabes and use the temp prefix to achieve the same result.

For another, the compiler also doesn’t use grammars, but instead, the body of the line-based parsing loop is a single regex, essentially

# next-line keeps track of line numbering and trims the string
while ($_ := next-line) !=:= IterationEnd { /^[
    | ['#'|$]                       # ignore comments and empty lines
    | (:s ld (\w+)'()' '{'${ ... }) # start of load frame definition
    | (:s fn (\w+)${ ... })         # forward declaration of a function
    | ...                           # more statements
    || {bailout}
]/ }

The blocks { ... } represent the actions that have been embedded into the regex after $ anchors terminating each line.

That’s not really a style of programming I’d be comfortable advocating for in general – but Perl being Perl, There’s More Than One Way to Do It: For better or worse, Perl 6 gives programmers a lot of freedom to structure code how they see fit. As the stage 0 compiler is supposed to be supplanted anyway, I decided to have some fun instead of crafting a proper architecture.

In comparison, the assembler implemented in NQP is far more vanilla, with state held by an actions object.

But… Why?

The grinches among you may ask, What is this even doing here? Is this just someone’s personal side project that just happens to be written in Perl 6, of no greater use to the community at large?

Well, potentially, but not necessarily:

First, I do plan on writing a disassembler for MoarVM bytecode, and that may come in handy for bug hunting, testing or when looking for optimization opportunities.

Second, when running on MoarVM, Perl 6 code may load and interact with compilation units written in our tiny language or even hand-optimized VM assembly. The benefit over something like NativeCall is that we never leave the VM sandbox, and in contrast to foreign code that has to be treated as black boxes, the JIT compiler will be able to do its thing.

Third, an expansion of the MoarVM ecosystem might attract the attention of language enthusiasts beyond the Perl community, with at least the chance that MoarVM could deliver on what Parrot promised.

However, for now that’s all just idle speculation – remember, all there is right now is a two weeks old toy I came up with when looking for something to write about for this advent calendar. It’s a very real possibility that this project will die a quiet death before amounting to anything. But on the off chance it does not, it’s nice to have a hot cup of the preferred beverage of your choice and dream about a future where MoarVM rises as a butterfly-winged phoenix from the ashes of a dead parrot….

Day 6 – On Opening Files and Contributing to Open Source Projects

Why do people contribute to open source projects? Some do it for fun, some for fame and some for fortune by actually getting paid to do such work. However, probably the most important factor is scratching your own itch: You want something done, and make it happen.

That was the position I found myself in earlier this year: I wanted to use Perl 6 to read a binary file and patch it in-place. According to the documentation, I should have been able to do so, as Rakudo claimed to support opening files in the modes read-only :r, write-only :w, append :a and read-write :rw, and I assumed the last one would do the trick.

Not so: The code silently adjusted the flag :rw to :w, and I was left hanging.

After a bit of bikeshedding, a design emerged that I was happy with, and as it was merged upstream a month and a half later, others seemed to agree. But while I added an extensive commit message to lay out the new system in its glory, slacker that I am, it was left for others to perfom the tedious tasks of writing tests and documentation.

As you might expect, that hope never materialized, and the extended open modes I introduced became something of an Easter egg: Undocumented, untested and only discoverable by reading the commit log or the code itself. I may very well be the only person who has actually made use of them.

Fast-forward to December: Here we are, Christmas draws near, and with it the first proper release of Perl 6. As a sometimes-good on-and-off-again citizen of the Perl 6 community, I decided to put on my big boy pants and do the Right Thing: Some tests have now been written (which makes the new open modes an official part of the 6.c language release), and the documentation will be updated soon-ish to conform to what is actually implemented instead of what had been planned to be implemented eventually.

Having said all that, let us now take a look at what the fuss is all about:

# not the actual signature as &open delegates to IO::Handle.open
sub open(
    IO() $path,
    :$mode, :$create, :$append, :$truncate, :$exclusive,
    :$r, :$w, :$x, :$a, :$rw, :$rx, :$ra, :$update,
    :$bin, :$enc = 'utf8',
    :$nl-in = ["\x0A", "\r\n"], :$nl-out = "\n", :$chomp = True
--> IO::Handle)

Isn’t it a thing of beauty? Perhaps not, but it’s also not as scary as it looks:

The only required argument is the $path, which must be something we can call .IO on. The common example would be a string that holds the name of a file you want to access.

One design goal behind the additional open modes was to support all the features of fopen(3) as specified by the C 11 standard library. To keep things sane, inspiration was drawn from open(2) and the flags specified by POSIX.

The first row of named arguments lists the POSIX-inspired ones. Here, :$mode may take the values 'ro', 'wo' and 'rw', the rest are boolean flags. If I did my job well, you will rarely need to use them – instead, the single- and double-letter variants listed on the next line should suffice.

Read-only mode :r is the default. The three write-only modes are :w, which truncates the file if it aready exists (and is the only shorthand mode that does so implicitly), exclusive mode :x, which fails if the file already exists (and thus can be used to implement a poor man’s locking mechanism), and append mode :a, which adds content at the end of a file.

If you want to read and write to a file at the same time (as I did), you may combine the flag :r with any of :w, :x or :a. For convenience, this may also be spelled :rw, :rx, :ra. Note that the effect of providing :r,:w (or its alias :rw) is not a combination of the effects of :r and :w – existing files will not be truncated as one might expect, which is a deliberate departure from the pattern.

Aside from :r, all shorthand modes will implicitly create the file if it doesn’t already exist. In contrast, the fopen(3) mode "r+" (known to Perl 5 programmers as "+<") does not. In Perl 6, this mode is now known as :update, corresponding to the low-level :mode<rw>. In contrast, :rw maps to :mode<rw>,:create, which does not have a direct equivalent in either C 11 or Perl 5.

The :bin flag and :$enc argument control whether the file should be opened in binary mode or text mode with given encoding. By default, files are assumed to be text files encoded as UTF-8. This is different from what the documentation tells you (there has never been autodetection of Unicode encodings as far as I’m aware), and binary files also do not return buffers instead of strings when processing the file line-by-line. I’ve raised that issue with The Man (translation: talked on IRC about it) and a pull request has been sent. We’ll have to wait and see what develops on that front in the remaining weeks leading up to the party.

Finally, the last line of named arguments lists those that control line-based access in general and the behaviour of the .get and .lines methods in particular. The :$nl-in argument allows you to provide a list of strings that should be considered line separators, the :$nl-out argument controls what gets written to disk when you request a newline, eg when using the methods .say, .put or .print-nl. If the final argument :$chomp is set to True (which is the default), line separators will be discarded instead of being included at the end of the strings returned by .get or .lines. As a side note, if you find chomp => False too much of a burden to type, Perl 6 supports the shorthand notation :!chomp.

For those of you still with me: Congratulations, you now know how to open files in Perl 6, and all that remains for me to say to you is this:

Have a happy St.Nicholas Day and a fun time playing with Rakudo!

PS: As to what you can do with a file handle once it has been opened, I’ll leave that as an exercise to the reader. But note that while Perl wants to make hard things possible, it also tries to keep easy things easy – which means that you can use interfaces like slurp, spurt and lines (or their method forms .IO.slurp, .IO.spurt, .IO.lines) without having to manually open (and close!) anything.