Day 24 – Seeing Wrong Right

December 24, 2014 by

There are all kinds of things I could’ve written about here, but none of them felt quite right. So I decided to write a few wrongs instead.

This year I had the privilege of experiencing a triple retinal detachment. Fortunately, they did two surgeries to correct this. Unfortunately, despite the surgeries, I ended up with a blind spot in half my macula and a paralyzed iris, stuck halfway open. Fortunately, this was all in my “bad” eye, so if I look out the right eye, meaning the wrong eye, things look a little wrong, but if I look out the right eye, meaning the left eye, things still look fine. It’s all very confusing, unless you’re the one looking out the eyes in question, in which case everything is completely clear, except for the fuzzy bits. From the outside, I just look a little crazy, but that’s nothing new.

Similarly, people inside the Perl 6 community have a different viewpoint from people outside the community. We often fret about this “echo chamber” effect, but that’s a problem that will become self-correcting as Perl 6 comes into more widespread use. But for now, it does induce a few distortions in outsiders’ perceptions of our community.

One thing that has never been quite clear to folks outside the community is how our design process works. To the people inside the community, it’s clear that the process works quite well, and that’s why Perl 6 just keeps getting better all the time. Every month we make a new release that is more performant and closer to the ideal of Perl 6 as expressed in the design docs and spec tests, even as those design docs and spec tests continue to converge on an implementable and optimizable design. The whirlpool methodology works. Eventually.

But this whirlpool that we find so strangely attractive depends on a kind of feedback loop that many projects do not achieve, and I think that part of the problem is that these projects get trapped by an inadequate view of the nature of wrongness. They see wrong wrong, as it were. In contrast, we try to see wrong right. Or at least righter.

And that’s really the first step, learning not to ask either/or questions. Most design decisions are not binary, because most ideas are not completely right or wrong. Instead, their degree of rightness is kinda truthy. Which makes them righty or wrongy, I guess. So the first step to figuring out right and wrong is to see how they are mixed together in various contexts. And maybe we have an unfair advantage here. This was a relatively easy principle for the Perl 6 community to learn, because as the original RFCs pointed out, Perl 5 is obviously a mixture of righty and wrongy ideas. That’s why we decided to do Perl 6, after all.

Now, Santa Claus knows that most kids are really a mixture of naughty and nicey. So the next step is to zoom out from ideas or features and also allow the people in your project to be a mixture of wrongy and righty, naughty and nicey. Several decades ago people used to talk a lot about “egoless” programming, but that’s not really what we’re doing. We all have our egos here, but somehow we’ve managed (mostly) to attach those egos to higher community ideals rather than to individual ideas or implementations. This frees people to experiment.

The social contract, frequently expressed as “forgiveness > permission”, really means that you have permission to try things out that might turn out right, as long as the rest of us have permission to back them out again if they turn out to be not-so-right. This operates on different scales, from small bug patches to complete attempts at implementation. We don’t promise to use a particular idea or module or implementation, but we do promise to learn from it, and to learn why it is suboptimal if we don’t use it.

It seems our modern western culture always has to find someone to blame, and even in our community we don’t promise not to blame anyone if something goes haywire. On the other hand, in our community, blame is merely a reason for considering someone human, and forgiving them. It’s not a mechanism for determining pecking order.

So we argue a lot on our IRC channel, but many of us take great delight in arguing both sides of any issue. Eventually we either reach a consensus, or decide we don’t know enough yet, so we’d better argue again later when we know more. Occasionally I have to invoke rule #1, “Larry is always right”—but not as often as you might think. Note that we also have rule #2, “If Larry changes his mind, he’s still right”—which is about letting Larry make mistakes, because Larry knows his brain is a mixture of wrongy and righty too.

So in a sense, this entire Perl 6 process is just a large, complicated experiment that we expect to have mixed success with, because we know our own mental limitations. And we hope this also gives us some empathy with our end users.

Once you’ve decided that mistakes are part of the creative process, you can pretty much view the entire evolution of Perl as a dialectic succession of mistakes and corrections. The original Perl design was a mistake. Letting Perl get typecast as a “CGI scripting language” in order to prototype the Web was a mistake. Letting the community try to redesign Perl was a mistake, hence most of the RFCs were mistakes. Or at least many of the proposed solutions in the RFCs were mistakes, and there was no coherence at all to ideas for fixing Perl 5.

The various attempts at implementation can be viewed as unsuccessful experiments. The early implementors prohibited me from working on the implementation. This may have been a mistake. Nowadays the implementors allow me to work on the implementations, which is certainly a mistake. :)

Another way things go wrong in a design is to pick the wrong principle for guidance. Most of the design principles we derived from thinking about the RFCs can be easily be abused when taken to extremes. And we’ve got lots of principles to pick from.

Just for the fun of it, this last year I finally went back through the 361 original RFCs and reread them to see how we did. (Maybe that’s why my retina detached…) What struck me was how many of Perl 6’s current design principles trace directly back to one or more of the RFCs. Of course, many of these principles came from elsewhere, but we decided to apply them in the new design. After I’d annotated my list of RFCs, I tried to give a talk on them, but there were just too many principles to talk about. Here is a list of principle we’ve used in the design, in no particular order:

☞ Don’t just do the first thing that occurs to you.
☞ YAGNI. (You Ain’t Gonna Need It)
☞ Self-clocking code produces better syntax error messages.
☞ Band-aids won’t regrow your limbs.
☞ Torment the implementors on behalf of the users.
☞ One-pass parsing is darn near mandatory for an extensible, braided language.
☞ Kill two birds with one stone.
☞ Avoid lists of exceptions the user has to memorize.
☞ Think ahead. (You think that’s cute this year…)
☞ Unifications can simplify things, sometimes at little cost.
☞ There’s no reason for an error message to be LTA (Less Than Awesome).
☞ Watch out for XY problems.
☞ There are not enough sigils in ASCII to classify everything. Choose wisely.
☞ Choose the right default, the one that enhances readability when omitted.
☞ Larry is not omniscient.
☞ Stick with Perl’s operator-rich philosophy.
☞ Discourage unrelated overloading of existing operators; define new ops instead.
☞ Don’t buy that special tool you see on TV.
☞ DIHWIDT. (Doctor, it hurts when I do this…)
☞ Hang things on the right peg—and if there isn’t one, install one.
☞ If you’re gonna generalize, do it harder, but not too hard.
☞ Pragmas are like vice grips: they’re the wrong tool for any job, but they’ll do any job.
☞ Languages with Do-What-I-Mean autodeclarations too often Don’t-What-I-Mean.
☞ People really get attached to their preferred methods of sloppy thinking.
☞ Question authority, especially your own.
☞ Avoid accidental genericity. Intentional genericity is okay though.
☞ Late binding sometimes causes your program to be late.
☞ Major features should fix lots of things.
☞ Grammatical categories can create a system of related precedence droppers.
☞ If you’re going to reduce the power of a construct, maybe remove it instead.
☞ Little languages (such as regexes) don’t have to violate one-pass parsing.
☞ Replace weak features with stong ones that degenerate to the weak case.
☞ Plan to know more someday.
☞ Don’t multiply entities without reason.
☞ Sometimes it’s worth making a feature pervasive, such as topicization.
☞ Don’t reinvent OO poorly.
☞ “Everything is an object” doesn’t mean it’s all method calls.
☞ Operational definitions are often weaker than declarative definitions.
☞ Even declarative definitions are implemented by operations at compile time.
☞ There are always tradeoffs.
☞ Good ideas go nowhere without a champion.
☞ Don’t punish the innocent with the guilty.
☞ Math just is. Don’t make people declare it.
☞ The identity is the most important part of the API; the right method on the wrong object is always wrong.
☞ People are like sheep, and will wander out any open gate.
☞ Save your money for power tools.
☞ No battle plan survives first contact with the enemy, or the second, or the third…
☞ The waterbed theory of complexity means if you push here, it goes up over there, unless there’s a leak.
☞ Generalizing is often cheaper than specializing.
☞ Bad features are sometimes not intrinsically bad, but just in the wrong context.
☞ Introspection is nearly always a design smell.
☞ You can’t be both eager and lazy simultaneously. Duh.
☞ Conserve your brackets. ASCII has so few of them.
☞ Encourage Unicode operators, yet provide ASCII workarounds.
☞ Sometimes it’s worthwhile to provide more than one way to do it; sometimes one way is enough.
☞ Fundamental unifications don’t have to hurt. If they do, maybe you’re persecuting the user.
☞ Easy things should be easy, and hard things should be possible (aka “Huffman coding”).
☞ The degenerate case should make sense.
☞ Remove artificial discontinuities (such as special-casing negative subscripts).
☞ Sometimes it’s better to leave where you are and approach the design from a different direction.
☞ Don’t plan to confuse things without a really good reason.
☞ Allow last-ditch escape hatches such as BEGIN and EVAL, but try to take away all their use cases.
☞ Premature failure (always throwing exceptions) is often a kind of premature optimization.
☞ Sometimes you have to retarget your shallow principles to a deeper level.
☞ Perl is not intended so much as a first language as it is a last language.
☞ All generalizations are false, including this one.

Arguably it was also a mistake to adopt so many principles. :)

Despite all these mistakes, a bunch of stubbornly optimistic people keep working on Perl 6. And somehow, after all the years of mistakes, we’ve managed to come out with something rather nice.

And now we’re talking about releasing Perl 6.0.0 “officially” sometime in 2015. After all these years of refusing to name a date, maybe this is a big mistake. On the other hand, maybe NOT announcing would be a mistake.

Either way, it doesn’t really matter, because there’s no stopping Perl 6 now. After a few final tweaks, sometime in 2015 we’ll finally be at a point where we can just decide when to tag the next release as 6.0.0.beta. And maybe we’ll keep it in beta until 2015-12-25 or so.

Of course, I could be wrong.

But what is certainly right is this: someday the Whos will start singing, all the Grinches of the world will be surprised, and perhaps a few of them will decide to be not quite so wrong. And they will be welcomed to our song, because we really do believe in forgiveness.

It’s beginning to look a lot like Christmas.

Day 23 – Webscale sorting of the sleepy kind

December 23, 2014 by

In the emerging tradition of writing about sort on the 23rd, let me elaborate on a particularly silly kind of sorting algorithm: sleepsort.

The idea is that to sort a list of numbers, you spawn a separate process (or thread), sleep for as long as the number specifies, and then print the result. That way the numbers are printed in ascending order.

Rakudo on the MoarVM backend has threads, so let's put them to good use:

 use v6;
 
 my @unsorted = (1..10).pick(5);
 
 say "Unsorted: @unsorted[]";
 
 await @unsorted.map: -> $t {
     start {
         sleep $t;
         say $t;
     }
 };

If you run it, you should get output like this:

 $ ./perl6-m sleepsort-01.p6
 Unsorted: 1 7 5 8 3
 1
 3
 5
 7
 8

How did we get there? (1..10).pick(5) randomly picks (without replacing) five numbers from 1 to 10. start { ... } starts a thread and returns a Promise object. We do that for all the numbers, and await blocks until all promises that have passed to it are fullfilled (or broken, but that won't happen here). Or in other words: until all the newly spawned threads have finished.

Now let's make it web scale! Well, sort (sic) of.

First an obvious improvement is to not sleep for the whole second. Let's just scale it with a constant factor: sleep $t / 10, and we're about ten times as fast!

But there's another scaling problem: If you want to sort hundreds of thousands of numbers with sleep sort (you'd do that, right?), our program would spawn as many threads. So lots of memory wasted. Also sleep maps pretty much directly to the underlying libc call, and thus blocks the thread. We can do better:

 use v6;
 
 my @unsorted = (1..10).pick(5);
 
 await @unsorted.map: -> $t {
     Promise.in($t / 10 ).then({ say $t });
 };

Promise.in($s) creates a Promise that will be kept in $s seconds. .then creates a chained promise whose block is executed when the first one is kept.

This also removes the need to spawn processes explicitly. The standard scheduler takes care of that. We're basically web scale!

Now it's also time to address an issue of correctness. The sleepsort algorithm is a kind of divide-and-conquer approach, but there's no joining of the results. There is no single point in the program that can access the entire the sorted list. To fix that, we use a Channel, a thread-safe queue:

 use v6;
 my @unsorted = (1..10).pick(5);
 
 my $channel = Channel.new;
 
 await @unsorted.map: -> $t {
     Promise.in($t / 10 ).then({ $channel.send($t) });
 };
 $channel.close;
 
 say $channel.list;

This prints the sorted list all from the main thread.

So now it can be encapsulated in a nice subroutine.

 sub sleepsort(*@values, :$factor) {
     my $channel = Channel.new;
 
     await @unsorted.map: -> $t {
         Promise.in($t * $factor ).then({ $channel.send($t) });
     };
     $channel.close;
 
     $channel.list;
 }

Another issue of correctness remains: Both sleep and Promise.in only specify a minimal time to be slept; implementation-dependent, longer sleep times are possible. If $factor is too small, the promises might executed out of the desired order.

So let's find the minimal factor with which it still works:

 my $previous = Inf;
 for 0.1, */1.5 ... * -> $factor {
     say "Trying $factor";
     my $success = [<=] sleepsort(:$factor, @unsorted);
     say $success ?? 'Success' !! 'Failed';
     unless $success {
         say "Last good factor: ", $previous;
         last;
     }
     $previous = $factor;
 }

On my machine, this produces the following output:

 Trying 0.1
 Success
 Trying 0.066667
 Success
 Trying 0.044444
 Success
 Trying 0.029630
 Success
 Trying 0.019753
 Success
 Trying 0.013169
 Success
 Trying 0.008779
 Success
 Trying 0.005853
 Failed
 Last good factor: 0.008779

So a :$factor of around 0.01 or 0.09 seems to work on my setup.

Your output/mileage may vary.

Over and out, sleeping until Christmas Eve.

Day 22 – The Cool subset of MAIN.

December 22, 2014 by

In the Unix environment, many scripts take arguments and options from the command line. With Perl 6 it’s very easy to accept those. At least, that’s what Our 2010 advent post on MAIN says.

But today, we’re interested in going further, and we’re going to try to gain more expressiveness from it.
The first Perl 6 feature we’re going to look at here is subsets. Subsets have a very broad range of applications, but they’re even better when used in conjunction with MAIN.

Subsets are a declarative way to specify conditions on your type. If you wanted to translate “A QuiteSmallInt is an Int with a value lesser than 10”, you can write

subset QuiteSmallInt where * < 10;

In case you don’t remember from our 2010 advent post about smart matching or our 2011 advent post about multi-method dispatch, here’s a quick reminder of how you can use subsets with smart matching, multi-method dispatch or typed variables.

say 9 ~~ QuiteSmallInt; # True
say 11 ~~ QuiteSmallInt; # False

multi sub small-enough(QuiteSmallInt) { say "Yes!" }
multi sub small-enough($) { say "No, too big."; }

small-enough(9); # Prints "Yes!"
small-enough(11); # Prints "No, too big."

# You can also use it to type variables
my QuiteSmallInt $n = 9; # Works!
my QuiteSmallInt $n = 11; # Errors out with "Type check failed in assignment to '$n'; ..."

Alright; now that we got this out of the way, what does this buy us?
Well, as we just demonstrated, we can dispatch to different subroutines using those “where” conditions.
That goes for MAIN as well.

Say we wanted to take a filepath as the first argument of a MAIN. We’ll also write two “companions” that’ll give a descriptive error message in case we pass bad arguments.

# a File is a string containing an existing filename
subset File of Str where *.IO.e; # .IO returns an IO::Path object, and .e is for "exists"
subset NonFile of Str where !*.IO.e;

multi sub MAIN(File $move-from, NonFile $move-to) {
  rename $move-from, $move-to;
  say "Moved!";
}
multi sub MAIN($, File $) {
  say "The destination already exists";
}
multi sub MAIN(NonFile $, $) {
  say "The source file doesn't exist";
}

And now, if we try to use it (after saving it as “main.p6”)…

$ touch file1.txt
$ perl6 ./main.p6 file1.txt destination.txt
Moved!
$ perl6 ./main.p6 non-existant.p6 non-existant.txt
The source file doesn't exist!
$ perl6 ./main.p6 destination.txt destination.txt
The destination already exists

Alright, looks good. There’s a little catch, however: the –help will display 3 usages, when there’s only one.

$ perl6 ./main.p6 --help
Usage:
tmp.p6 <move-from> <move-to>
tmp.p6 <Any> (File)
tmp.p6 (NonFile) <Any>

Whoops! We need to find a way to hide them (if you’re wondering, the “<Any>” is because the arguments both are unnamed and untyped).
Luckily enough, Perl6 provides a trait for that: hidden_from_USAGE, added by moritz++ for this advent post (USAGE is the –help message).

Let’s add those to our MAINs.

multi sub MAIN(File $move-from, NonFile $move-to) {
  rename $move-from, $move-to;
  say "Moved!";
}
multi sub MAIN($, File $) is hidden_from_USAGE {
  say "The destination already exists";
}
multi sub MAIN(NonFile $, $) is hidden_from_USAGE {
  say "The source file doesn't exist";
}

And now, the USAGE is correct:

$ perl6 ./main.p6 --help
Usage:
tmp.p6 <move-from> <move-to>

Okay, now, we just need to add a description.

The second Perl6 feature we’re going to use is Pod.
Pod is an easy-to-use markup language, which you can embed to document your code.

#= Rename a file
multi sub MAIN(File $move-from, NonFile $move-to) {
  rename $move-from, $move-to;
  say "Moved!";
}

# You can write it in multiple lines, though in USAGE they'll be collapsed
# (joined by a space)

#={Rename
a
file}
multi sub MAIN(File $move-from, NonFile $move-to) {
  rename $move-from, $move-to;
  say "Moved!";
}

Both versions above will print the same –help:

$ perl6 ./main.p6 --help
Usage:
tmp.p6 <move-from> <move-to> -- Rename a file

And now you have everything you need to create a nice CLI interface :-).

Day 21 – Community smoke testing

December 21, 2014 by

So after all we are good programmers who know their stuff, right? I mean, I know that I should better test the code that I write, and I tend to do that for every change I do. At work I often just run a particular script that tests the new feature or the absence of a bug, but eventually these test codes are added to a test suite at the end of the day. But even when you are a better programmer than me, and you write tests beforehand, there is one problem left:

Assumptions

We assume quite a lot when we write or test our programs. We know how the program is meant to work and that is our problem, which is so hard to escape from. We assume that someone will enter a valid date in a date field, we expect positive numbers when it comes to quantities. And we usually test our programs on a given set of platforms, often just the one PC or Mac we’re sitting at.

One important rule about test suites is that you should try to forget about what Certainly Should Work™. No, test built-ins, because these can change over time. Also query the infrastructure if other tests rely on specific things. Test the environment as detailed as you can, because if something fails early, you save time debugging the problem. This is even much more worth if you get reports from systems you don’t have access to. Debugging a very high level and late problem in a test suite on a strange or just foreign platform can be frustrating.

So in case we want to minimize the assumptions we talked about… What’s the opposite? – It is knowledge made out of experience, which itself is made of input. That’s the approach we will take. I come back to that a little bit later.

Back to what we can do for you. We cannot directly do much about your expectations and assumptions when it comes to your test suite, sadly. We don’t have a framework that pushes garbage input data to your scripts, but we can test your program in different ways. And we can hopefully provide feedback to recent changes to your code base very quickly, but that will also only work out when there are testers that test your distribution often.

Personally I code using the most bleeding edge rakudo compiler with bleeding edge of the MoarVM backend, on an Ubuntu linux box. That is a quite specific scenario. And there are a lot of different setups out there, and even when your code is not platform specific, one library, a dependency of a dependency surely is.

Generate a lot of input, to be turned into experience and knowledge

How are we going to achieve that? Of course we have to intercept the build and test stages that happen when a random user installs our distributions. That user can help us getting reports be doing:

# on unixes:
PANDA_SUBMIT_TESTREPORTS=1 panda install Foo

# on windows
set PANDA_SUBMIT_TESTREPORTS=1
panda install Foo

On can also set and export this environment variable permanently, so that every attempt of installing a distribution will be reported and the dist author is able to care about upcoming problems.

The great benefit of this way of gathering information in contrast to smoke testing the ecosystem on a central box is of course that we get reports from a wide variety of operating systems, compiler versions, locales and dependencies like other Perl 6 distributions but also C libraries.

A test report made by panda will look somewhat like this:

Sample test report

Though the last section contains the entire report in fact. You’ll see information about the operating system, the kernel, the backend used by rakudo, flags how that backend was built, and you also see some information about the tested distribution. I think it would be worthwile to extend that information so that certain meaningful environment variables are included as well, I am thinking of vars like MVM_JIT_DISABLE that have an impact on how the test programs are executed.

Once we received reports like this we build stats to highlight how well the distribution is doing. One of them is a matrix that shows the pass rate across compiler version, operating system and backend.

Platform/version matrix

When you’d send a report for a platform that is not listed there yet (like GNU/Hurd), it would automatically expand the list when the matrix gets regenerated. As of today that is done every five minutes. The colored bars represent the three backends, MoarVM is the topmost, JVM in the middle and Parrot at the bottom. The color coding is:

  • green – The tests passed, that also means that the test stage has to exit with code 0.
  • orange – That usually means that we did not run any tests, but besides that everything ran cleanly.
  • red – Something went wrong. Either stage build or stage test.

You see that the list of compiler versions can grow pretty quickly, and I think the best solution is to collapse the development releases into a single line, and make it expand on click. On this matrix all shown releases are dev releases, recognizable by the trailing partial sha1.

So now imagine these stats are about your distribution. You see red flashing lights that should put you in action and you will quickly take a look at all negative test results. Now, some of the reports might reveal pretty quickly what is wrong. Maybe you’ve forgotten to add a dependeny to the META file shipped with the distribution. But there might be other failing tests where you don’t have a clue what went actually wrong, or where to start poking.

The good thing is that testers.perl6.org is not a tool that only works in one direction. Sure, the user runs your test and the result travels in your direction. But remember that you are the master of the tests in question. If you are in doubt about the cause of a bug, then extend the test suite, and wait for more reports to arrive that provide your recently added diagnostics.

That’s where the loop closes to the things I said earlier. Wipe everything from your mind that you *think* that applies to the box the tests failed. Test everything that is involved in the failing functionality. And if you have to test the built ins of the Perl 6 compiler, do it. If you are pedantic and suspicious enough, you’ll get a great test suite that offers many informations also for upcoming test failures.

Sidenote: For the Perl 5 community it is quite usual that there are volunteers that run tests for distributions every day around the clock. I hope that we’ll have such awesome volunteers too in near future. So if you have boxes that run 24/7, have a weird architecture or a rare operating system, please run tests to aid the other developers in the right direction.

Forecast

I have plenty of ideas how this tool can be improved. A connection to issue trackers like GitHub Issues is one of my favourite which is also listed here: TODO of testers.perl6.org
If you are interested in helping out, the repository of testers.perl6.org is here and I’d be pleased to accept pull request or hand out commit bits.

Day 20 – Helping Santa with Roles

December 20, 2014 by

On Day 14 of the Advent calendar I stopped lurking and started learning Perl 6 by converting a Perl 5 Timer object into its Perl 6 equivalent. Day 14 ends with a ‘challenge’ to convert the Perl 6 Timer class into a role.

Perl 6 roles encapsulate functionality for ready reuse by classes and objects.

The Timer class is a good candidate for ‘role-ification’ – it’s simple and is potentially useful to all sorts of objects.

Santa wants to go Lean this year, don’t worry he’s still rotund – he just wants to apply Lean principles to the process of present giving. The Timer role will help identify bottlenecks so Santa Inc. can improve.

To turn the Timer class into a role we just swap class for role (i.e., s/class/role/):

role Timer {

    has $!start-time;   # private scalar storing the starting time
    has @!event-log;    # private list of events
                        # dashes-in-identifiers save keystrokes too

    method record ($event_description) {
        # initialise the start-time if this is the first time through
        $!start-time //= now;

        @!event-log.push({
            timestamp   => now - $!start-time, 
            description => $event_description
        });
    }

    method report {
        my @report_lines;

        for @!event-log -> %event {
            @report_lines.push("[%event<timestamp>.fmt("%.3f")] %event<description>");
        }
        return @report_lines.join("\n");
    }
}

So that’s the role declared. Now I need a class or object to compose it into – something that does the role. How about a Timer for Santa’s Helpers?

use Timer;

class SantasHelper does Timer {
    has @!presents = qw<train lego doll> ;

    method wrap-present($child's_name) {
        return @!presents.pick ~ " for " ~ $child's_name;            
    } 
}

A quick check in the REPL shows:

> use SantasHelper;
> my $bruce = SantasHelper.new;    # this helper has an Australian accent
> $bruce.^methods
record report wrap-present

Pointing to the Higher Order Workings (HOW) of the $bruce object via .^ shows the Timer role’s methods (record, report) are now combined with the methods for SantasHelper (wrap-present). Let’s start wrapping presents:

use SantasHelper;

my $helper = SantasHelper.new;

my @children = qw<Hugo Honey Willow>;

# using {@children} to interpolate the list
$helper.record("started wrapping presents for {@children}"); 

for @children -> $child {
    my $present = $helper.wrap-present($child);
    $helper.record("wrapped $present");
}
say $helper.report; 

The report shows:

[0.006] started wrapping presents for Hugo Honey Willow
[0.013] wrapped train for Hugo
[0.016] wrapped doll for Honey
[0.020] wrapped doll for Willow

Nice.

But now there’s a problem! Santa’s Lean approach has exposed a bottleneck. To wrap all the presents in time Santa will need to roll up his sleeves and start wrapping too. We need to refactor the wrap-present method into its own role so Santa can wrap-presents too:

role WrappingPresents {
    has @!presents = qw<doll train lego> ;

    method wrap-present($child's_name) {
        return @!presents.pick ~ " for " ~ $child's_name;            
    } 
}

It turns out we can dynamically add a role to an existing object with does:

> use Santa
> my $santa = Santa.new
Santa.new()
> use WrappingPresents
> $santa does WrappingPresents
Santa+{WrappingPresents}.new(name => Any)
> $santa.wrap-present('Emma')
lego for Emma
> $santa.wrap-present('Claire')
lego for Claire

Santa Inc is now back on track.

It’s been fun playing with Perl 6 roles. Coding in Perl 6 feels malleable and expressive – and I’ve only just started!

jnthn++ has more details about parameterised roles and what happens when you compose multiple roles with the same method signature.

For now I’ll leave the last word to Perl 6’s Santa:

> $santa.greeting();
> Happy Christmas!

Day 19 – Snow white and the seven conditionals

December 19, 2014 by

Perl 6 is a big language. It’s rather unapologetic about it. The vast, frightening, beautiful (and slightly dated) periodic table of operators hammers this point home. The argument for having a plethora of operators is to cover a lot of semantic ground, to let you freely express yourself with More Than One Way To Do It. The argument against is fear: “aaargh, so many!”

But ’tis the season, and love conquers fear. Today I went searching for conditionals in the language; ways to say “if p then q” in Perl 6. All this in order to take you on a guided tour of conditionals.

Here’s what we are looking for: a language construct with an antecedent (“the ‘if’ part”) and a consequent (“the ‘then’ part”). (They may also have an optional ‘else’ part, but under my arbitrary search criteria, I don’t care.)

I went looking, and I found seven of them. Now for the promised tour.

if

An oldie but a goodie.

if prompt("Should I exult? ") eq "yes" {
    say "yay!";
}

Unlike in Perl 5, you don't have to put parentheses around the antecedent expression. (Cue Python programmers rolling their eyes, unimpressed.) There are elsif and else clauses that you can add if you need it.

I won't count unless separately, as all it does is reverse the logical value of the antecedent.

when

The when statement also qualifies. It's a conditional statement, but with a preference to matching against the topic variable, $_.

given prompt("What is your age? ") {
    when * < 15 { do-not-let-into-bar() }
    when * < 25 { ask-for-ID() }
    when * > 80 { suggest-bingo-around-the-corner() }
}

The semantic "bonus" with this statement is that the end of a when block implicitly triggers a succeed (known as break to traditionalists), which will leave the surrounding contextualizer ($_-binding) block, if any. Or, in plain language, you don't have to put in explicit breaks in your Perl 6 switch statements.

The when statement doesn't have an opposite version. I have sometimes campaigned for whenn't, but with surprisingly (or rather, predictably) little uptake. Ah well; there will be modules.

&&

"Now hold your horses!", I hear you say. "That ain't no conditional!" And you would be right. But, according to the criteria of the search, it is, you see. It has an antecedent and a consequent. The consequent only gets evaluated if the antecedent is truthy.

if @stuff && @stuff[0] eq "interesting" {
    # ...
}

This behavior is called "short-circuiting", and Perl 6 is only one of a long list of languages that work this way.

I won't count ||, because it's the dual of &&. It also short-circuits, but it bails early on truthy values instead of falsy ones.

and

If we count &&, surely we should count its low-precedence cousin, and.

open("file.txt")
    and say "Yup, it opened";

Some people use && and and interchangeable, or prefer only one of them. I'm a little bit more restrictive, and tend to use && inside of expressions, and and only between statements.

I won't count or, again because it's the dual of and.

not ?&

A non-entrant in our tour which nontheless deserves an honorable mention is the ?& operator. It's a "boolifying &&".

But besides always returning a boolean value (rather than the rightmost truthy value, as && and and do), it also doesn't short-circuit. Which makes it useless as a conditional.

sub f1 { say "first"; False }
sub f2 { say "second" }

f1() && f2();   # says "first"
f1() ?& f2();   # says "first", "second"

Therefore, there are two reasons I'm not counting ?|.

not & either

We have another honorable mention. In Perl 5, the & operator deals in bitmasks (just as in a lot of C-derived languages), but in Perl 6 this operator has been stolen for a higher purpose: junctions.

f1() & f2();    # says "first", "second" in some order

The old bitmask operator (now spelled +&, by the way) doesn't short-circuit, and never did in any language I'm aware of. This new junctional operator also doesn't short-circuit. In fact, the semantics it conveys is one of quantum computer "simultaneous evaluation". Actual threading semantics is not guaranteed, currently does not happen in Rakudo — and may not be worth the overhead for most small calculations anyway — but the message is clear: hie the hence, short-circuiting.

//

The most questionable conditional on my list, this one nevertheless made the cut. It's a version of ||, but instead of testing for truthiness, it tests for undefinedness. Other than that, it's very conditional.

my $name = %names{$id} // "<no name given>";

This operator is notable for sharing a first place (with say) as "most liked feature side-ported to Perl 5". (Just don't mention smartmatch to a Perl 5 person.)

The dual of //, for evaluating a consequent only if the antecedent is a defined value, is spelled ⅋⅋, and of course, it does not exist and never did. *bright neuralizer flash*

?? !!

Loved and known by many names — ternary operator, trinary operator, conditional operator — this operator defies grammatical categories but is clearly a conditional, too. It even has an else part.

my $opponent = $player == White ?? Black !! White;

Almost every language out there spells this operator as ? : and it takes a while to get used to the new spelling in Perl 6. (The reason was that both ? and : were far too short and valuable to waste on this operator, so it got "demoted" to a longer spelling.)

If you ask me, ?? !! is an improvement once you see past the historical baggage: all of the other pure boolean operators already identify themselves by doubling a symbolic character like that. It blends in better.

You didn't hear it from me, but if anyone ever feels like creating an unless/else version of this operator, you could do a lot worse than choosing ¿¿ ¡¡ for the job. Ah, Unicode.

andthen

Finally, a late entry, andthen also works as a conditional. Just when you thought Perl 6 had you semantically covered with && and and for ordinary boolean values, and // for undefined values...

...it opens up a third door. The andthen operator only evaluates its right-hand statement if the left-hand statement succeeded.

shut-down-reactor()
    andthen don-hazmat-suit()
    andthen enter-chamber()
;

Succeeding means returning something defined and not failing along the way. Structually, the andthen operator has been compared to Haskell's monadic bind: success of earlier statements guards execution of later ones. Though it's not implemented yet in Rakudo, you're even supposed to be able to pass the successful value from the left-hand statement to the right-hand one. (It ends up in $_.)

I won't count orelse, even though it also short-circuits. Though it's worth noting that the right-hand side of orelse ends up with the $! value from the left-hand side.

Signoff

Thank you for taking the tour. As you can see, there really is More Than One Way to write a condition in Perl 6. Some of them will turn up more often in everyday code, and some less often. I use and only rarely, and I've yet to use an andthen in real code. The others all crop up fairly regularly, and all in their own favorite contexts.

Some languages give you a sparse diet of the bare necessities, claiming that it's good for you to have fewer choices. Perl is not one of those languages. Instead, you get lots of choice, lots of freedom, and it's up to you to do great things with it. Of course, we also try to make sure that the constructs are there make sense, are in good taste, and feel consistent with the rest of the language. The end result is a pleasant language with plenty of rope to shoot yourself in the foot with.

Merry conditional Christmas!

Day 18 – MoarVM Internals for the Brave (and Curious)

December 18, 2014 by

Hi there! I’m the guy who developed the MoarVM x64 JIT compiler during
this year’s GSoC. Jonathan has already given a detailed (and clear)
overview of MoarVMs optimization subsystems, of which the JIT compiler
is but a small step. Today, I’m going to try and show you how you can
see this in action. But first, a sample program:

    use v6;
    
    sub fibonacci (int $n --> int) {
        my int $x = 1;
        my int $y = 1;
        while $n > 0 {
            $x = $x + $y;
            $y = $x - $y;
            $n = $n - 1;
        }
        return $y;
    }
    
    for ^40 -> int $x {
        say fibonacci $x;
    }


Careful readers will spot the native integer definitions sprinkled
throughout the code. You can write perl6 code as if it is C,
too. (After all, that’s just one more way to do it). I’m not entirely
sure about other backends, but on MoarVM these really work – no object
is ever allocated for them.

Rakudo perl 6 can do more than just run this code, it can also show
you how it understands it. I’ve saved the above program as ‘fib.p6’,
and you can get a (rather verbose) textual representation of the
syntax tree by using the special target command line option:

perl6-m --target=mast fib.p6

However, this is just a static analysis. What we’re really interested
in is what happens during runtime. MoarVM will show you all the gory
details if you specify a filename with the MVM_SPESH_LOG environment
variable, like so.

export MVM_SPESH_LOG=spesh-log.txt; perl6-m fib.p6

From the outside it looks as if nothing special has
happened. As they say, looks are deceiving, because MoarVM has
faithfully produced a text file that details the way your code has
been changed. It is larger even than the first analysis, but it also
contains more information. Let’s look at the our function:

    Inserting logging for specialization of 'fibonacci' 
(cuid: cuid_2_1418763286.32937)
    
    Before:
    Spesh of 'fibonacci' (cuid: cuid_2_1418763286.32937, file: fib.p6:3)

    BB 0 (0x4f77118):
      Instructions:
        no_op 
      Successors: 1, 6, 5, 7, 9
      Predeccessors: 
      Dominance children: 1, 5, 6, 7, 9, 10

    BB 1 (0x4f77188):
      Instructions:
        checkarity liti16(1), liti16(1)
        param_rp_i r2(1), liti16(0)
        bindlex lex(idx=5,outers=0,$n), r2(1)
        paramnamesused 
        const_i64_16 r0(1), liti16(0)
        bindlex lex(idx=1,outers=0,$x), r0(1)

    ....


This little snippet is not without jargon, so I’ll try to explain. (I
did warn you at the top). The ‘cuid’ is the compilation unit
identifier, and it serves to identify the source of any
function. Sometimes compilation units correspond to files, and
sometimes they don’t (like in a REPL loop). The indented blocks denote
Basic Blocks. A Basic Block is a sequence of instructions that is
uninterrupted by a (conditional) jump. They are important because
within a basic block it is always obvious where all the values are
coming from.

Further along in our spesh log, we can see how spesh has transformed
the first block:

    ...
      BB 1 (0x4f77188):
        Instructions:
          sp_getarg_i r2(1), liti16(0)
          bindlex lex(idx=5,outers=0,$n), r2(1)
          const_i64_16 r0(1), liti16(0)
     ...


This is the same start of the function as before. What has happened is
that the polymorphic instruction param_rp_i to the much simpler
instruction sp_getarg_i. As Jonathan explained, we can get away with
this because we know exactly how this function is called, which is
just at line 15 of our little script.

While the spesh log is certainly interesting, it is no good for light
reading. Which is why I wanted to show off a really cool tool that
Timo Paulssen (timotimo) made a while ago – a way to transform a piece
of the spesh log (the ‘spesh graph’ that represents the code of a
function) into a visual form (with a little help of
graphviz). Unfortunately, I couldn’t really get it to work. This
demonstrates something important about all the tools that I’m showing
today – they’ve all been designed not for language users but VM
developers, and they may all be broken this time next year.

[I was able to run the tool and get this piece of output. – timotimo]

Let’s wrap it up. If you’re interested in what kind of assembly code
the JIT will produce for you, there is also a way to get to that. Run
your script again as follows:

export MVM_JIT_BYTECODE_DIR=. # or /tmp, or some other directory
perl6-m fib.p6

If you’ve followed these instructions directly, what you’ll now see is
your working directory littered with binary files representing different
frames (functions). Every such file will contain the raw binary data
generated by the JIT compiler. Inspecting these files is easy enough
(as long as you do know x64 assembly):

objdump -D -b binary -m i386:x86-64 -M intel `ls *.fibonacci.bin`

If you’re on a mac, that’s gobjdump rather than objdump. And if you
prefer AT&T syntax over intel syntax, just drop the ‘-M intel’ part.
Looking at the output of this bytecode, you might also see the rather
wastefull way your code is compiled. After all, I’ve written this
function specifically for simplicity. Yet I count no fewer than 215
‘mov’ instructions, 18 conditional move instructions, and 16 function
calls. As much as MoarVM and perl6 have achieved this year, there is
still a lot left to do. And with that, I wish you a hacky christmas
:-)

Day 17 – Three strange loops

December 17, 2014 by

For a long time, I’ve been fascinated by strange loops.

hands drawing each other

The concept “strange loop” revels in the circular definition. Something is the way it is because it’s defined in terms of itself. I think fascination for this concept is related to fascination with the infinite. Chicken-egg conundrums. Grandfather paradoxes. Catch-22s. (“Catches-22”?) Newcomb’s paradox. Gödel’s incompleteness theorems. Something about causal feedback loops tickle our brains.

The concept has taken root not just in art or in brainteasers. When we talk of “bootstrapping compilers” or “metacircular evaluators”, we mean exactly this type of activity. To make a compiler bootstrapping means to make it able to compile itself by (re-)writing it in the target language. An evaluator is metacircular if it implements features not in terms of a helper subsystem, but in terms of themselves. And people do this not just because it’s a cool trick, but because it opens up possibilities. Arbitrary limits in programming stop us from doing what we want, and the limit between “host system” and “guest system” is arbitrary.

In other words, if you’re seriously interested in extensibility, then you’re also likely interested in strange loops. That’s why you see these ideas floated around in the Lisp communities a lot. And that’s why compilers sometimes take the bootstrapping route. (Another happy example is JavaScript, where some language extension proposals happens through their metacircular interpreter, aptly named Narcissus.)

Perl 6 has three strange loops. Grammar, OO, and program elements. It’s in various stages of closing those.

Grammar

Status: not quite closed yet

Perl 6 grammars are famously powerful and allows the programmer to specify entire languages. It stands to reason that a Perl 6 implementation would want to eat its own dogfood and use a Perl 6 grammar to specify the syntax of the language itself. And behold, Rakudo and Niecza do exactly that. (Pugs doesn’t, but among its future versions is the milestone of being self-hosting.)

Having this bootstrapping in place gives us a high confidence in our grammar engine. We know that it’s powerful enough to parse all the Perl 6 we pass it every day. This is also what allows us to write most of Rakudo in Perl 6, making it easier for others to contribute.

I say “not quite closed yet” because the access from user code into the Perl 6 grammar has some ways to go still.

Benefits: tooling, slangs, macros

Something like PPI should be trivial once we close the loop completely. The compiler already has access to all that information, and we just need to get it out into user space. Slangs can either extend the existing Perl 6 grammar, or start from scratch and provide a completely different syntax. In either case, good communication between old and new grammars is required for e.g. variables to be visible across languages. Macros need to be able to effortlessly go from the textual form of something to AST form.

OO

Status: closed

The system underlying all objects in the Perl 6 runtime is called 6model. An object was created from a class, a class was created from a “class metaobject”. After just a few more steps you end up with something called a “knowhow” — this is the end of the line, because by all appearances the knowhow created itself.

6model is our object-oriented implementation of object orientation. It’s very nice and it allows us to extend OO when we need it.

Benefits: make your own OO rules

You will find most examples of the kind of OO extension that’s posssible through jnthn’s modules. Grammar::Debugger extends what it means to call a grammar rule. Test::Mock similarly keep track of how and when methods are called. The recent modules OO::Monitors and OO::Actors both provide new class metaobjects, basically giving us new rules for OO. They also go one step further and provide definitional keywords for monitors and actors, respectively.

Program elements

Status: just discovered

Macros can generate code, but in some cases they also analyze or extend code. That’s the idea anyway. What’s been stopping us so far in realizing this in Perl 6 is that there hasn’t been a standard way to talk about Perl 6 program elements. What do I mean by “program element” anyway? Let’s answer that by example.

Say you have a class definition in your program. Maybe you have a fancy editor, with refactoring capability. The editor is certainly aware of the class definition and can traverse/manipulate it according to the rules of the language. In order for it to do that, it needs to be able to represent the class definition as an object. That object is a program element. It’s different from the class metaobject; the former talks about its relation to the program text, and the latter talks about its relation to the OO system.

<PerlJam> masak: "program elements" reads like
          "DOM for Perl" to me.
<masak> yep.

Macros are headed in the same way as such a refactoring editor. By handling program elements, you can analyze and extend user code at compile time inside macros. The API of the program elements give the user the ability to extend the Perl 6 compiler itself from library code and user code.

The work on this has only just started. My progress report so far is this: Perl 6 has a lot of node types. :) Having a rich language that doesn't take the Lisp route of almost-no-syntax also means that the API of the program elements becomes quite extensive. But it will also open up some exciting doors.

Benefits: macros, slangs

Parting words

Part of the reason why I like Perl 6 is that it has these strange loops. Little by little, year by year, Perl 6 lifts itself up by the bootstraps. There's still some work left to close some of these loops. I've been hopefully waiting for a long time to be able to do Perl 6 parsing from user code. And I'm eager to provide more of the program element loop so that we can write Perl 6 that writes Perl 6 from inside of our macros.

Mostly I'm excited to see people come up with things that none of us have thought of yet, but that are made possible, even easy, by embedding these strange loops into the language.

Have a loopy Christmas.

Day 16 – Quoting on Steroids

December 16, 2014 by

A few days ago, there was a blog post about String Interpolation. Most of the examples there used a simple double quoted string.

my $a = 42;
say "a = $a"; # a = 42

The possibilities of the double quoted string are quite powerful already. But in the end, they’re just a special case of a much more generic and malleable quoting construct, named Q.

Q’s Basic Features

In its most generic form, Q just copies the string without any changes or interpretation:

my $a = 42;
say Q/foo $a \n/; # foo $a \n

You can add adverbs to Q[…], to change the way the resulting string will be formatted. For instance, if you want to have interpolation of scalars, you can add :s. If you want interpretation of backslashes like \n, you can add :b. And you can combine them:

my $a = 42;
say Q:s/foo $a\n/;   # foo 42\n
say Q:b/foo $a\n/;   # foo $a␤
say Q:s:b/foo $a\n/; # foo 42␤

(If you wonder what a is, it is a U+2424 SYMBOL FOR NEWLINE [So] (␤) and should be show up in your browser as a character containing an N and an L as a visible representation of a new line character)

In fact, the list of adverbs of basic features is:

    short       long            what does it do
    =====       ====            ===============
    :q          :single         Interpolate \\, \q and \' (or whatever)
    :s          :scalar         Interpolate $ vars
    :a          :array          Interpolate @ vars
    :h          :hash           Interpolate % vars
    :f          :function       Interpolate & calls
    :c          :closure        Interpolate {...} expressions
    :b          :backslash      Interpolate \n, \t, etc. (implies :q at least)

The :q (:single) basically gives you single quoted string semantics. The other adverbs, give you typically the functionality that you would expect from double quoted strings. If you really want to be verbose on your double quoted strings, you can write them like this:

my $a = 42;
say Q :scalar :array :hash :function :closure :backslash /foo $a\n/; # foo 42␤

Of course, you can also specify the short versions of the adverbs, and not separate them by whitespace. So, if you want to be less verbose:

my $a = 42;
say Q:s:a:h:f:c:b/foo $a\n/; # foo 42␤

As with any adverbs (which are just named parameters, really), the order does not matter:

my $a = 42;
say Q:f:s:b:a:c:h/foo $a\n/; # foo 42␤

Actually, the story about the order of the named parameters is a little bit more complicated than this. But for this set of adverbs, it does not matter in which order they are specified.

.oO( is that a brother of Johann Sebastian? )

But seriously, that is still a mouthful. So an even shorter shortcut is provided: :qq

    short       long            what does it do
    =====       ====            ===============
    :qq         :double         Interpolate with :s, :a, :h, :f, :c, :b

So, you can:

my $a = 42;
say Q:double/foo $a\n/; # foo 42␤
say Q:qq/foo $a\n/; # foo 42␤

All that for simply doing a double quoted string with interpolation? Well, because people are using double quoted strings a lot, the simple remains the quickest way of interpolating values into a string. However, underneath that all, it’s really Q:qq, which in turn is really Q:s:a:h:f:c:b.

What about a double quoted string that has double quotes in it? For those cases, the Q:qq form is available, but that is still quite verbose. Synopsis 2 therefore specifies:

In fact, all quote-like forms derive from Q with adverbs:
    q//         Q:q//
    qq//        Q:qq//

Which means we can shorten the Q:qq in that last example to qq (and have double quotes in the double quoted string without any problems):

my $a = 42;
say qq/foo "$a"\n/; # foo "42"␤

Both q// and qq// also support (the same) adverbs. This initially seems the most useful with q//, for instance in combination with :s, which would (also) interpolate scalars:

my $a = 42;
say q:s/foo "$a"\n/; # foo "42"\n

However, adverbs (just as named parameters) are just a shortcut for a Pair: :s is really s => True. And :!s is really just s => False. Can we also apply this to quoting constructs? The answer is: yes, you can!

say qq:!s:!c/foo "$x{$y}"\n/; # foo "$x{$y}"␤

Even though we specified qq//, the scalar is not interpolated, because of the :!s adverb. And the scope is not interpolated, because of the :!c. This can for instance be handy when building strings to be EVALled. So, if you want all quoting features except one or more, you can easily de-activate that feature by negating those adverbs.

Some of Q’s Advanced Features

Quoting features do not stop here. This is a list of some of the other features that already work in Rakudo Perl 6:

    short       long            what does it do
    =====       ====            ===============
    :x          :exec           Execute as command and return results
    :w          :words          Split result on words (no quote protection)
    :ww         :quotewords     Split result on words (with quote protection)
    :to         :heredoc        Parse result as heredoc terminator

Here are some examples. Note that we don’t bother specifying these features as attributes, because they’re basically the first additional attribute, so there is a shortcut version for them. For example, qq:x// can be shortened to qqx//. Whether you could consider that more readable or not, I leave up to the reader. There is, after all, more than one way to do it.

Interpolate and execute as an external program:

my $w = 'World';
say qqx/echo Hello $w/; # Hello World

Interpolate as single quoted words (please look at what happens to the single quotes):

.say for qw/ foo bar 'first second' /;
==
foo
bar
'first
second'

Interpolate as single quoted words with quote protection. This will make sure that balanced quotes will be treated as one entity (and note again what happened to the single quotes).

.say for qww/ foo bar 'first second' /;
==
foo
bar
first second

Interpolate variables into a heredoc:

my $a = 'world';
say qqto/FOO/;
  Hello $a
  FOO
==
Hello world␤

The text is exdented automatically for the same number of indents as the target has.

Conclusion

Perl 6 has a very powerful basic quoting construct in Q[…], from which all other quoting constructs are derived. They allow mix and matching of features in various short and more verbose ways. There are still some adverbs unimplemented, but the ones that are mentioned here, should Just Work™.

Finally, the Synopsis also indicates that there will be a way out of this alphabet soup:

If you want to abbreviate further, just define a macro:

    macro qx { 'qq:x ' }          # equivalent to P5's qx//
    macro qTO { 'qq:x:w:to ' }    # qq:x:w:to//
    macro quote:<❰ ❱> ($text) { quasi { {{{$text}}}.quoteharder } }

We can only hope that someone will find enough quality tuits soon to implement this.

Day 15 – Bioinformatics and the joy of Perl 6

December 15, 2014 by

On the 15th day of Christmas Perl6 said let there be life-sciences, and so there was.

A lot of coverage and testament is made to Perl’s role in developing dynamic content for the early web. As well as its slow decline. However, there is an equally important story that goes alongside the tale of Perl as the language of CGI. That is the story of Perl as the scientific glue that saved the human genome project, there is even a published journal article in Perl’s honour.

It’s not an accident that Perl did well and continues to do well in the field of Bioinformatics. Few languages at the time provided first class semantics that matched the problem domain such as: regular expression literals, good string handling, trivial data structure creation, sensible automatic type conversion and XS for native code wrapping when speed was important. That Perl replaced a lot of sed+awk use cases for one liners is also a welcome feature for a busy bioinformatician. Perl persists today as a de facto scripting language due to the large community effort known as BioPerl. For those not in the know the API of BioPerl inspired a whole host of work in most other languages such as Biopython, BioJava, BioRuby etc. To date none are quite as feature complete as BioPerl though each has its unique additional features. Biopython is getting there from a lot of work done through GSoC action in recent years. I also use ete2 from Python on a regular basis and recommend people check that out for phylogenetics work. However, this post is about the future and I think Perl 6 has a lot to offer everyone working in bioinformatics.

XS the Perl5 system of integrating native code into a Perl module has done fantastically well in providing scientists with the ability to integrate high performance C code with the simplicity of Perl’s syntax. But XS in the Perl6 world has been replaced with NativeCall. This library makes it incredibly simple to integrate C code with a Perl6 class or subroutine. I’m not going to cover this here though, the repo has plenty of docs and examples and two previous advent articles demonstrate use cases.

Instead everything on show here are core features available from writing ‘perl6’ at a command prompt. You don’t need any external dependencies so any system with vanilla Rakudo Perl6 can do everything you’re about to see.

The Grammar of life

Hopefully most people have at least heard about DNA, RNA and perhaps even proteins. All of these are what are referred to in the biz as biopolymers, or translated into programming parlance “strings of life”. Essentially each molecule is a big chain of smaller building blocks you can denote in a textual form with a letter. In DNA we have the alphabet GCTA , in RNA we never see a T instead we see a U in its place. Proteins have their own alphabet that is unrelated to DNA and RNA between 21 and 23 characters long, the last two are very quirky and relatively rare.

So in this section I’m going to outline the hip way to parse sequence files in Perl6. This is how sequencing data usually comes into a bioinformatician. A giant text file! I’ll be using our own class to hold the sequence and a well defined grammar for the well loved (not) FASTA format. So to start here is the Grammar as if it fell from heaven as a shooting star this wintery night:

grammar FASTA::Grammar {
    token TOP { <record>+ }

    token record { ">"<id><comment>?"\n"<sequence> }

    token id { <-[\ \n]>+ }

    token comment { " "<-[\n]>+ }

    proto rule sequence {*}
          rule sequence:sym<dna> { <[ACGTRYKMSWBDHVNX\-\n]>+ }
          rule sequence:sym<rna> { <[ACGURYKMSWBDHVNX\-\n]>+ }
          rule sequence:sym<aa> { <[A..Z\*\-\n]>+ }
}

Cool we have a super formal grammar defined for FASTA and it even looks like we are doing something special for different types of sequence! I guess I had better explain a little, keep in mind that Grammar is just a huge extension to regular expressions which you are hopefully familiar with.

First up we have token and rule. The difference is that token is like a hungry hippo, it consumes the input sequence and is quite greedy. So if a token matches at all it consumes the string and then lets parsing continue. If there is a choice of tokens the longest wins, which can be thought of as the winner in a game of hungry hippos!
A rule is a bit more tricksome, it has backtracking and can regurgitate and give up. Specifically we have a set of rules called sequence with the more abstract one defined with the keyword proto. The reason we have three rules is for each type of biopolymer, they are ordered by the most to least specific. Unfortunately DNA and RNA can both look like protein. So our Grammar is limited in its ability to detect the difference, but so would a human that’s how rubbish this major data format is!
With rule unlike token if the start of a sequence matches DNA but the rest of it looks like protein we back out of matching DNA because the grammar is smart enough to realise it can get a longer match from the other rule, but has to spew up the letters already consumed first. If we had used token we might have had a really short DNA token then a protein one etc. This might all sound a bit complicated but I’ve made a trace using Grammar::Debugger of what happens matching a test.fa file in this Gist.
If you do look at the trace you will see the final important fact about Grammars. We need a TOP token or rule, this is where we start the thought process and parsing of data. So a FASTA file is made of more than one record, that’s our TOP. A record is made of an id, a comment and a sequence and so the logic flows. The Perl6 regular expression syntax is way out of scope but Masak has a nice intro.

Now lets get this Grammar rolling along doing something. If we want to use a sequence in our code we need a basic type to represent it. Below is probably the simplest thing you would want:

class Seq {
    has Str $.id = "";
    has Str $.comment = "";
    has Str $.sequence = "";
}

It’s not the most inspired or interesting class in the world but it will do!

Now how do we hook up our Seq class to our FASTA Grammar to produce some nice objects from our FASTA file? The answer is we need to take action! But not the political kind, just create an Actions class for the Grammar to use when parsing. By default we could call .parse on our grammar above and we would get all the text chunked up into nicely named labels as a giant tree of Match variables. Some of the time that’s what you want. Action classes have methods that match the names of tokens and rules in a grammar. Each method is called at the corresponding point in the Grammar’s parse tree and given the chance to make something more interesting, replacing all the crummy Match variables in that part of the output. So what do we want:

class FASTA::Actions {
    method TOP($/) {
        make $<record>>>.ast;
    }
    method record($/) {
        make Seq.new(
            id => ~$<id>,
            comment => ($<comment>) ?? (~$<comment>).trim !! '',
            sequence => $<sequence>.subst("\n","", :g)
        );
    }
}

So the logic above is whenever you parse out the whole of a record we want to make a new Seq object and shove it in the resulting match tree for later. You can see that we do some trimming and cleaning up of white space in the action before creating a clean Seq object. Finally the TOP action basically just says that the final product of .parse should be all the record matches, which we just transformed into Seq objects, so it’s literally a list of Seq! Nice! Lets give it a whirl on that test file I mentioned:

Input

>hello
GCTATATAAGC
>world prot
TATAKEKEKELKL

Code

my $grammar = FASTA::Grammar.new;
for $grammar.parsefile("test.fa", actions => FASTA::Actions).ast -> $seq {
    say $seq.id;
}

Output

hello
world

Ultimately we didn’t need the added complexity of matching a sequence as DNA/RNA/protein but it was kind of cool, and at a later date we might want to do something different and have fancier Seq objects. I’ll leave exploration of this as a task to the interested reader :P

A Bag full of sequence

So I’m concerned that our Seq class is kind of crap still. It might as well just be a hash of values at this point. Sure we get a type which is nice, but it just doesn’t have anything cool about it. I think a nice addition is if we made our class work a little better with the rest of the Perl6 family of operations, letting more of the languages magic get at our data. The results of this simple change might actually impress you!

class Seq {
    has Str $.id = "";
    has Str $.comment = "";
    has Str $.sequence = "";

    multi method Numeric (Seq:D: --> Numeric) {
        $.sequence.codes;
    }

    multi method Bag (Seq:D: --> Bag) {
        bag $.sequence.comb;
    }

    multi method Str (Seq:D: --> Str) {
        $.sequence;
    }

    multi method gist (Seq:D: --> Str) {
        ">$.id $.comment\n$.sequence";
    }
}

Above I’ve introduced some snazzy new methods specific to our class. Anyone familiar with toString from Java should get what both the Str and gist methods are upto. They are well observed methods in Perl6 that deal with interpreting an object as a string. But it’s probably not clear at all why you would want two of them? Well simply put the Str method is there for the pure string interpretation of our object. If I use operators that expect Str this would be called to automagic our Seq into a string. The gist is kind of a neat one, if we use the higher level say function this calls the .gist on our object and prints that since it’s already assumed you want something a bit more human readable. Even more neat is if you are using the command line REPL the .gist is immediately called on an object that resulted from an expression. So when we create a new Seq in the Perl6 REPL or just give the variable name on its own we get FASTA ready to copy and paste! Take a look at the REPL dump below and see if you can work out where we are implicitly using $seq as a Str rather than Seq:

> my $seq = Seq.new(id=>'hello',comment=>'world',sequence=>'GCTACCTATAAAGC');
>hello world
GCTACCTATAAAGC
> say $seq
>hello world
GCTACCTATAAAGC
> "junk" ~ $seq
junkGCTACCTATAAAGC
> my $big-seq = Seq.new(id=>'whole',comment=>'new world',sequence=> $seq x 3);
>whole new world
GCTACCTATAAAGCGCTACCTATAAAGCGCTACCTATAAAGC
> print $seq
GCTACCTATAAAGC>

But wait! There’s more. For the low low price of just two more methods we get numeric wrangling and statistical modelling! The Numeric method does what it says on the tin, and we defined this as the length of the sequence which feels reasonable as a numerical representation.

> $seq + $big-seq
56
> ($seq + $big-seq) / $seq
4

Bag creates a Bag type object from our sequence. Basically the Bag we create is an immutable counting dictionary of all the letters in the sequence. So we now have the first step in some frequency analysis, letters and a count of how often they occurred. But what is even more awesome about Bags in Perl6 are the pick and roll methods. These allow you to sample the keys of the Bag based on their frequency! So if you wanted to create some dummy sequences to do statistical tests with it’s easy as pie! Roll keeps the letters in the bag as you sample whereas pick removes them and then alters the distribution of the next pick. Below I give an example where we .pick strings that are as long the input sequence but will be randomly different but with the same distribution of characters. This sort of dummy sequence is useful for calculating the statistical significance of our sequence.

> $seq.Bag
bag(G(2), C(4), T(3), A(5))
> $seq.Bag.pick($seq).join
CAGATCGTAAATCC
> $seq.Bag.pick($seq).join
GAATCCCGTACATA

Look at that we even use $seq as the length to pick to! Perhaps not the most obvious or readable thing in the world, but for playing on the REPL it’s a nice feature and not utterly unintuitive to people used to context in Perl.

A Range of possibilities

A common task in bioinformatics is dealing with what is commonly referred to as “annotation” type data. This can be anything from specifying the location of a gene in a whole genome, to specifying a patterned site within a protein sequence. In BioPerl there is a specific and special Bio::Range class for dealing with the sort of data that represents a segment on a number line with a start and an end position. In Perl6 this is built right into the language with most of the features any bioinformatician would be happy with. Plus they can be wrangled to Sets and use those operators. So creating a Range couldn’t be easier and more obvious, you’ve all done it before!

> my $range = 1..10;
> $range.WHAT.say;
(Range)

This is the exact same syntax and object used if you wanted to create a for loop over a given range both in Perl5 and Perl6.

To show what we can do with this basic type in Perl6 I’ve taken some data about the gene p53 which is a well known one for the development of many cancers. It’s all real data and you can see a coloured block diagram of the protien annotation data here. You think of each block as being single Range object you get some idea of what I’m harping on about. So lets start out with some DNA and some protein features in code but remember we could just makes these from a file just as easily:

#Genomic chromosome coordinates of the p53 gene in human
my $p53_gene = 7_565_097 .. 7_590_863;

#A single nucleotide polymorphism AKA a point mutation
my $snp = 7_575_996;

#Some annotated protein domains
my %domains = p53-TF => 97..287, p53-Tetramer => 319..357;

#Predicted interaction site
my $binding_site = 322..355;

#A known protein Phosphorylation site
my $phospho_site = 99;

So good so far, we have some one off integer values some ranges and a cool hash of ranges! The bonus Perl6 features come from operators that accept a Range as their operands. For example most importantly set operations work as you might expect:

#Work out if the SNP is within the gene
say "MUTATION!!!" if $p53_gene (cont) $snp;

#Find out which domains contain the phosphorylation site
my %phosphorylated = %domains.grep({ .value (cont) $phospho_site })

#Find out which domains are in the binding site and what percentage is binding
my %binding = %domains.map({
                            $_.key => 100 / $_.value.elems * ($_.value (&) $binding_site)
                      })
                      .grep(*.value > 0.0);

Hopefully a lot of the above is almost self explanatory once I introduce the set operators for contains (cont) or ∋ and intersection (&) or ∩. I chose the ASCII variants, mostly because I don’t have a ∩ key on my keyboard and I wouldn’t want my code to be trashed if someone used a non Unicode editor! The contains returns a Bool result of if the right operand contains the left and the intersection operator returns the set of numbers that intersect both of the Ranges. The final hash wrangling is quite nice, it keeps the keys of the %domains hash but replaces the values with the percentage coverage with our binding site and only gives a result if there was any coverage between the two Ranges. Hopefully almost any bioinformatician can see how nice these sort of primitive types and operations are to have in the core of a language.

A Curio

If any of the above has you at least a little curious if not excited to see what Perl6 can do within Bioinformatics then you can check out more at either Chris Fields official BioPerl6 repo or my own playpen. I’ll leave you with one final code nugget to chew on. I found this by accident just writing a method to wrangle DNA sequence to RNA sequence in Perl6. Something you might like to add to our Seq class. This is approximately the session I had at the REPL:

> my $dna = ‘GATGGGATTGGGGTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAA’;
> my $rna = $dna ~~ tr/GCAT/GCAU/;
> $rna.WHAT.say
(StrDistance)

WAT?! So I was originally expecting to just have the translated string. As an aside you should use the method $dna.trans(‘T’=>’U’) if you do want that. Instead I got back this odd object StrDistance… What could it be I thought, so I did some basic inspection to take a look:

> $rna.perl
StrDistance.new(before => "GATG…TAAAA", after => "GAUG…UAAAA")
> $rna.^methods
Bool Numeric Int <anon> <anon>
> $rna.^attributes
Str $!before Str $!after Int $!distance

So in the .perl we see that .before and .after are the two strings the DNA and the newly translated RNA. But there is also this other guff like .Numeric and .Int. Turns out its something awesome!

> say +$rna;
13

Lucky number 13… or the Hamming distance between the two strings! Anyone from the world of Bioinformatics will instantly see this as quite a handy feature. Perl6 wasn’t created or invented for Bioinformatics, but it has so many accidentally great features laying around relatively uncommented on.

Sorry for the blowhard post but I thought it might be nice to put these ideas all in one place. Have a happy Christmas and New Year!


Follow

Get every new post delivered to your Inbox.

Join 49 other followers