Day 11 – Markov Sequence

On Day 4, I teased with mention of non-numeric sequences. Today I’d like to explore one such sequence, based on the common idea of using Markov chains on text. We will determine the next letter in the sequence randomly based on the previous two letters. The distribution follows the patterns contained in a model text, so that the result approximates the language of the model.

use v6;
use List::Utils;

my $model-text = $*;
$model-text .=subst(/<[_']>/, "", :global);
$model-text .=subst(/<-alpha>+/, " ", :global);

my %next-step;
for sliding-window($model-text.comb, 3) -> $a, $b, $c {
    %next-step{$a ~ $b}{$c}++;

my $first = $model-text.substr(0, 1);
my $second = $model-text.substr(1, 1);
my @chain := $first, $second, -> $a, $b { %next-step{$a ~ $b}.roll.key } ... *;
say @chain.munch(80);

After the initial use statements, the code divides neatly into three sections. The first section inputs the model text and gets rid of the non-alphabetic characters. Line 4 uses slurp to read standard input ($*IN) into a single string, and lc makes it all lowercase. The first subst (line 5) removes all underscores and apostrophes from the text. The second (line 6) replaces each string of non-alphabetic characters with a single space.

The second section combines the sliding-window sub from List::Utils with a bit of the good old Perl hash magic. (You can get List::Utils using neutro.)

$model-text.comb splits the text into individual characters.

sliding-windows iterates through a list, giving you the next N (3, in this case) elements in the list starting with each element in the list. (That is, you get the 1st, 2nd, and 3rd elements, then the 2nd, 3rd, and 4th elements, then the 3rd, 4th, and 5th elements, etc.) In this case we use it to get every set of three consecutive characters in the text.

In that loop, we construct a hash table of hash tables. The keys to the outer are the first two of those three consecutive characters. The keys to the inner are the third consecutive character, and its value is how many times that character follows the first two. So, for instance, if I feed the lyrics to the Aqualung album into this program, then %next-step{"qu"} looks like this:

{"a" => 5, "e" => 2}

That is to say, if you have “q” and “u”, they are followed by “a” five times (presumably all in the name Aqualung) and “e” twice (“requests” and “question”).

The third section of the code, then uses the knowledge we’ve just accumulated to build the sequence. First we get the first and second characters of the model space, just so we can start with a pair of letters we know for sure will have a letter coming after them. Then we construct a sequence which starts with those two, and uses -> $a, $b { %next-step{$a ~ $b}.roll } as a generator. The generator uses the previous two characters in the sequence to look up the appropriate frequency hash for the next character. The roll method randomly returns one of the keys of that hash, weighted by the value. (In the “qu” example above, you could think of this as rolling a seven-sided die, five of whose faces say “a” and two “e”.) If there is no known character which follows the previous two (for instance, if the last two characters in the model text are a pair unique in the text and you reach them both in order), then an undefined value is returned, which stops the sequence. We get the first 80 characters from the sequence using the munch method, chosen because it is well-behaved if the sequence terminates early.

Running the script on the lyrics to Aqualung produces sequences like
“t carealven thead you he sing i withe and upon a put saves pinsest to laboonfeet” and “t steall gets sill a creat ren ther he crokin whymn the gook sh an arlieves grac”. (Why does it start with “t “? My Aqualung lyrics file starts with some ASCII-art apparently attempting to imitate the original liner notes, and when we removed the non-alphabetic characters it boils down to “t “.)

Note that nowhere here does this script make assumptions about the character set it is working with. Anything that Perl 6 recognizes as alphabetic can be processed this way. If you feed it the standard “Land der Berge” file that p6eval uses as stdin, you’ll get strings like “laß in ber bist brüften las schören zeites öst froher land der äckerzeichöne lan”. (Fingers crossed that WordPress and your browser handle the non-ASCII characters correctly!)

One word of warning: As I was finishing this, the #perl6 channel raised the question of what Hash.roll should actually do. Right now in Rakudo it has the functionality of the (not yet implemented) KeyBag.roll method. Once it is implemented, KeyBag could be substituted if Hash.roll ends up spec’d differently.

There is a simple alternative which works today, however. If you change to %next-step{$a ~ $b}{$c}++ to %next-step.push($a ~ $b, $c), %next-step will be constructed as a Hash of Arrays. Each array will list all the characters $c which appear after $a and $b, with each distinct character repeated the number of times it appears in the file. This naturally acts as a weighting for .roll to use in the sequence generator.

I need to give a big thank you to Moritz Lenz, who was a huge help cleaning up and simplifying this script.

Day 4 – The Sequence Operators

Last year, there was a brief tease of the sequence operator (tweaked slightly to be correct after a year’s worth of changes to the spec):

my $even-numbers  := 0, 2 ... *;    # arithmetic seq
my $odd-numbers   := 1, 3 ... *;
my $powers-of-two := 1, 2, 4 ... *; # geometric seq

This now works in Rakudo:

> my $powers-of-two := 1, 2, 4 ... *; 1;
> $powers-of-two[^10]
1 2 4 8 16 32 64 128 256 512

(Note: All the code examples in this post have been run in Rakudo’s REPL, which you can reach by running the perl6 executable with no command line arguments. Lines that start with > I what I typed; the other lines are Rakudo’s response, which is generally the value of the last expression in the line. Because the variable $powers-of-two is an infinite lazy list, I’ve added 1; at the end of the line, so the REPL prints that instead of going into an infinite loop.)

We need to trim the infinite list so that Rakudo doesn’t spend an infinitely long time calculating it. In this case, I used [^10], which is a quick way of saying “Give me the first ten elements.” (Note that when you bind a lazy list to an array variable like this, values which have been calculated are remembered; it’s a quick form of memoization.)

The sequence operator ... is a very powerful tool for generating lazy lists. The above examples just start to hint at what it can do. Given one number, it just starts counting up from that number (unless the terminal end of the sequence is a lower number, in which case it counts down). Given two numbers to start a sequence, it will treat it as an arithmetic sequence, adding the difference between those first two numbers to the last number generated to generate the next one. Given three numbers, it checks to see if they represent the start of an arithmetic or a geometric sequence, and will continue it.

Of course, many interesting sequences are neither arithmetic nor geometric, in which case you need to explicitly provide the sub to generate the next number in the sequence:

> my $Fibonacci := 0, 1, -> $a, $b { $a + $b } ... *; 1;
> $Fibonacci[^10]
0 1 1 2 3 5 8 13 21 34

The -> $a, $b { $a + $b } there is a pointy block (ie a lambda function) which takes two arguments and returns their sum. The sequence operator figures out how many arguments the block takes, and passes the needed arguments from the end of the sequence so far to generate the next number in the sequence. And so on, forever.

Or not forever. So far all these examples have had the Whatever star on the right hand side, which means “There is no terminating condition.” If you instead have a number there, the list will terminate when that number is exactly reached.

> 1, 1.1 ... 2
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
> 1, 1.1 ... 2.01
... Rakudo spins its wheels, because this is an infinite list ...
> (1, 1.1 ... 2.01)[^14]
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3

The first one of those terminates naturally, but the second one missed the terminator and kept right on going. The result is an infinite list, so I limited it to the first 14 elements so that we could see what it was doing.

Those of you with backgrounds doing floating point math are probably sputtering about the dangers of assuming that adding .1 repeatedly will add up to exactly 2. In Perl 6, that’s not quite such an issue because it will use Rat (ie fractional) math where possible. But the general point is still very solid. If I want to find all the Fibonacci numbers below 10000, needing to know exactly the number to stop on is a big hassle. Luckily, just as you can use a block to specify how to generate the next element in a sequence, you can also use one to test to see whether the sequence should end yet:

> 0, 1, -> $a, $b { $a + $b } ... -> $a { $a > 10000 };
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946

The pointy block -> $a { $a > 10000 } creates a block which takes one argument, and returns true when that argument is greater than 10000; just the test we want.

Except we were looking for all the Fibonacci less than 10000. We generated that plus the first Fibonacci number greater than 10000. When passed a block as a termination test, the sequence operator returns all its elements until that block returns true, then it returns that last element and stops. But there is alternative form of the sequence operator that will do the trick:

> 0, 1, -> $a, $b { $a + $b } ...^ -> $a { $a > 10000 };
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765

Switching from ... to ...^ means the resulting list does not include the first element for which the termination test returned true.

Two side notes on this. This is actually a long-winded way of specifying these sequences in Perl 6. I don’t have space to explain Whatever Closures here, but this post from last year talks about them. Using them, you can rewrite that last sequence as

> 0, 1, * + * ...^ * > 10000;
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 676

It’s up to you whether or not you think this is clearer; there’s more than one way to do it.

Also, the left-hand-side of the sequence operator can be any list, even lazy ones. This means you can easily use a terminating block to get a limited portion of an existing lazy list:

> my $Fibonacci := 0, 1, * + * ... *; 1;
> $Fibonacci ...^ * > 10000
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765
> $Fibonacci[30]

(I stuck the last check there just to demonstrate that $Fibonacci still goes on past 10000.)

This only begins to scratch the surface of what sequences can do. For more information, see “List infix precedence” in the spec, and scroll down to the sequence operator. (Though note that it is still not completely implemented! It is an extremely complex operator.)

One particular twist I’d like to leave you with: the sequence operator is not constrained to working with numeric values. If you explicitly specify your own generator, you can make a sequence out of any type at all. But I’d like to leave that for a future Advent present…

Day 17: Making Snowmen

I started out planning this day to be about complex numbers in Perl 6. But after I thought about it a bit, I decided that the complex number implementation is so straightforward explaining it would make a pretty boring gift. Instead, let’s explore the Mandelbrot set, which will let us do a bit of complex math, look at pretty pictures, and hint at some advanced features of Perl 6, too.

Without further ado, here’s the first version of the script:

use v6;

my $height = @*ARGS[0] // 31;
my $width = $height;
my $max_iterations = 50;

my $upper-right = -2 + (5/4)i;
my $lower-left = 1/2 - (5/4)i;

sub mandel(Complex $c) {
    my $z = 0i;
    for ^$max_iterations {
        $z = $z * $z + $c;
        return 1 if ($z.abs > 2);
    return 0;

sub subdivide($low, $high, $count) {
    (^$count).map({ $low + ($_ / ($count - 1)) * ($high - $low) });

say "P1";
say "$width $height";

for subdivide($, $, $height) -> $re {
    my @line = subdivide($re + ($, $re + 0i, ($width + 1) / 2).map({ mandel($_) });
    my $middle = @line.pop;
    (@line, $middle, @line.reverse).join(' ').say;

So, lines 3-5 set up the pixel size of the graphic we will create. @*ARGS is the new name of the command line argument array. The // operator is the new “defined” operator; it returns its first argument if that argument is defined, its second otherwise. In other words, line 3 sets $height to be the first argument on the command line, or 31 if no such argument was set. $width is set equal to $height — the code is set up to generate a square graphic right now, but the variables are set apart for ease of future hacking. $max_iterations sets how many times the core Mandelbrot loop will iterate before it concludes a point is in the set. (Because we’re relying on the symmetry of the Mandelbrot set, $width must be odd.)

Lines 7-8 set the boundaries of our image on the complex plane. Introducing the imaginary component of a number is as simple as giving a number (or numeric expression) followed by i; this creates a number of the Complex type. Complex math works pretty much the way you would expect it to, for example, (as we see here) adding a Complex to an Int or a Rat yields another Complex.

Lines 10-17, then, are the core Mandelbrot function. To quickly explain, a complex number c is in the Mandrelbrot set if the equation z = z * z + c (with initial z of 0) stays bounded as we iterate the equation. This function implements exactly that in Perl 6. We set up a loop to iterate $max_iterations times. It is known that once |z| grows bigger than 2 it will not stay bounded, so we use $z.abs > 2 to check for that condition. If it is true, we leave the loop early, returning 1 from the function to indicate the corresponding pixel should be black. If the loop finishes the number of iterations without exceeding those bounds, we return 0 for the color white.

Lines 19-21 are a simple helper function to return a list of a simple arithmetic progression from $low to $high with $count elements. Note that $low and $high have no specified type, so any type (or even pair of types) that the basic arithmetic operators will work on will work here. (In this script, we use it first for Num, and then for Complex.)

Lines 23-24 print the header for the header for a PBM file.

Lines 26-30 print the actual image data. $ is the real part of the complex number $upper-right, and $ is the imaginary part. The loop iterates over the real part of the range. Inside the loop, we subdivide again along the imaginary part to generate a list of the complex values we are interested in examining for one half of this row of the image. We then run that list through the mandel function using map, generating a list of 0s and 1s for half of the row, including the midpoint.

We do it this way because the Mandelbrot set is symmetric about the imaginary axis. So we then pop that midpoint, and make a new list which is the old list (minus the midpoint), the midpoint, and the list (minus the midpoint) reversed. We then feed that to join to make a string for the entire line, and finally say to print it out.

Note that doing it this way rotates the Mandelbrot set 90 degrees from the way it is normally displayed, giving us a lovely snowman shape:
White on black Mandelbrot set

With the current Rakudo, this is quite slow, and prone to crash randomly depending on the size of the image you are generating. However, imagine for a minute that happy future day when Rakudo is not only snappy, but handles automatic hyperoperator threading as well. At that point, it will be easy to make a parallel version of this script by changing the map call to a hyperoperator.

There’s only one tricky bit: there’s no way to have a hyperoperator call a normal sub. They only call class methods and operators. So, as a first approximation which works in current Rakudo, we can “augment” the Complex class to have a .mandel.


augment class Complex {
    method mandel() {
        my $z = 0i;
        for ^$max_iterations {
            $z = $z * $z + self;
            return 1 if ($z.abs > 2);
        return 0;

for subdivide($, $, $height) -> $re {
    my @line = subdivide($re + ($, $re + 0i, ($width + 1) / 2)>>.mandel;
    my $middle = @line.pop;
    (@line, $middle, @line.reverse).join(' ').say;

The only difference to mandel is it is now a method, and the role of the former $c argument is taken by self. Then instead of map({mandel($_)}) we use the hyperoperator.

As I said, this version works now. But personally, I’m a little uncomfortable augmenting an existing class like that; it feels dirty to my old C++ instincts. We can avoid it by turning mandel into an operator:

sub postfix:<☃>(Complex $c) {
    my $z = 0i;
    for ^$max_iterations {
        $z = $z * $z + $c;
        return 1 if ($z.abs > 2);
    return 0;

for subdivide($, $, $height) -> $re {
    my @line = subdivide($re + ($, $re + 0i, ($width + 1) / 2)>>☃;
    my $middle = @line.pop;
    (@line, $middle, @line.reverse).join(' ').say;

This takes advantage of Perl 6’s Unicode ability to have a little fun, defining the operator using the snowman symbol. This ☃ operator works fine in Rakudo today, but alas the >>☃ hyperoperator does not work yet.

Thanks to Moritz and TimToady for suggests and help with this code. Two versions of the script (one full color) are up at github, if you’d like to play around with them.

Update (4/18/2010): I’ve ported the color version of the script at github to the latest version of Rakudo. It’s quite slow, and uses phenomenal amounts of memory, but unlike the previous version it is rock-solid stable. Here’s a 1001×1001 full color Mandelbrot set that took it 14 hours and 6.4 GB of memory to compute.

Day 14: Going to the Rats

As I hinted at back in the in the Day 1 post, Perl 6 has rational numbers. They are created in the most straightforward fashion, by dividing an integer with another integer. But it can be a bit hard to see that there is anything unusual about the result:

> say (3/7).WHAT
> say 3/7

When you convert a Rat to a Str (for example, to “say” it), it converts to a decimal representation. This is based on the principle of least surprise: people generally expect 1/4 to equal 0.25. But the precision of the Rat is exact, rather than the approximation you’d get from a floating point number like a Num:

> say (3/7).Num + (2/7).Num + (2/7).Num - 1;
> say 3/7 + 2/7 + 2/7 - 1

The most straightforward way to see what is going on inside the Rat is to use the .perl method. .perl is a standard Perl 6 method which returns a human-readable string which, when eval’d, recreates the original object as closely as is possible:

> say (3/7).perl

You can also pick at the components of the Rat:

> say (3/7).numerator
> say (3/7).denominator
> say (3/7).nude.perl
[3, 7]

All the standard numeric operators and operations work on Rats. The basic arithmetic operators will generate a result which is also a Rat if that is possible; the rest will generate Nums:

> my $a = 1/60000 + 1/60000; say $a.WHAT; say $a; say $a.perl
> my $a = 1/60000 + 1/60001; say $a.WHAT; say $a; say $a.perl
> my $a = cos(1/60000); say $a.WHAT; say $a; say $a.perl

(Note that the 1/60000 + 1/60000 didn’t work in the last official release of Rakudo, but is fixed in the Rakudo github repository.)

There also is a nifty method on Num which creates a Rat within a given tolerance of the Num (default is 1e-6):

> say 3.14.Rat.perl
> say pi.Rat.perl
> say pi.Rat(1e-10).perl

One interesting development which has not made it into the main Rakudo build yet is decimal numbers in the source are now spec’d to be Rats. Luckily this is implemented in the ng branch, so it is possible to demo how it will work once it is in mainstream Rakudo:

> say 1.75.WHAT
> say 1.75.perl
> say 1.752.perl

One last thing: in Rakudo, the Rat class is entirely implemented in Perl 6. The source code is thus a pretty good example of how to implement a numeric class in Perl 6.

Day 6: Going Into Hyperspace

pmichaud introduced Perl 6’s hyper operators yesterday. I’d like to explore these powerful meta operators further.

First, for simplicity I’m going to code a helper function, lsay, to easily get nice-looking output of lists. The sub is created using our so you can use it on the REPL.

our sub lsay(@a) { @a.perl.say }

Then we can start looking at hyperoperator examples. For this post I’m going to use >> and << instead of » and «, mostly because they are easier on my eyes. (I’m afraid I may need to get glasses.) » and « are generally considered the true form of the operator, but the longer ASCII version will work as well.

First, the most basic: adding two lists of the same length:

> lsay (1, 2, 3, 4) <<+>> (3, 1, 3, 1)
[4, 3, 6, 5]
> lsay (1, 2, 3, 4) >>+<< (3, 1, 3, 1)
[4, 3, 6, 5]

If the lengths of the arrays are the same, there’s no difference between the two forms. But if the length is different:

> lsay (1, 2, 3, 4) <<+>> (3, 1)
[4, 3, 4, 5]
> lsay (1, 2, 3, 4) >>+<< (3, 1)
Sorry, right side is too short and not dwimmy.

The rule is that whatever is pointed to by the pointy end of the hyperoperator can be extended if it is shorter than the other end; it is extended by repeating the last element of that list. Whatever is at the blunt end of the hyperoperator cannot be extended. All combinations are allowed, so you can specify that only the left side can be extended (<<+<<), only the right side (>>+>>), both sides can be extended (<<+>>), or neither side can be extended (>>+<<). Single scalars extend as well:

> lsay (1, 2, 3, 4) >>+>> 2
[3, 4, 5, 6]
> lsay 3 <<+<< (1, 2, 3, 4)
[4, 5, 6, 7]

So that’s the basics of using hyperoperator with an infix operator. You can also use them with prefix and postfix operators:

> lsay ~<<(1, 2, 3, 4)
["1", "2", "3", "4"]
> my @a= (1, 2, 3, 4); @a>>++; lsay @a;
[2, 3, 4, 5]

You can also:

> lsay (0, pi/4, pi/2, pi, 2*pi)>>.sin
[0, 0.707106781186547, 1, 1.22464679914735e-16, -2.44929359829471e-16]
> lsay (-1, 0, 3, 42)>>.Str
["-1", "0", "3", "42"]

That is to say >>. works to call a method on every member of the list.

However much you are tempted to write @array>>.say, don’t do it. It may work in the current version of Rakudo, but by using the hyper operator you are promising the operation is parallelizable, and the order of the operations on the list(s) is not fixed. The hope is that future versions of Perl 6 will automatically run these operations in parallel.

Other quick notes: The hyperoperators don’t just work with the built-in set of operators. They will work with any new operator you define as well. (That works now in Rakudo, mostly.) They will work with the in-place operators, e.g. @a >>/=>> 2 to divide an entire array by 2. (This does not work in current Rakudo.) They will work with multi-dimensional lists, with trees, and with hashes; see S03 Hyper operators. (As far as I know, these do not yet work in Rakudo either.)

I don’t know too many examples yet of source code using hyperoperators extensively, though LastOfTheCarelessMen’s Vector class is a good if straightforward start — it implements an N-dimensional vector class without a single explicit loop.

Day 1: Getting Rakudo

There are many different partial implementations of Perl 6; at the moment, the most complete is Rakudo. There are a number of different ways to get it up and running, but if you have any interest in tracking or helping with the development, the best approach is to download directly from the source repositories and and build your own copy.

To do so, you will need Subversion (svn), git, Perl 5.8 or newer, a C compiler, and a make utility. On standard Linux-like systems (including OS X), you can build it like this:

$ git clone git://
$ cd rakudo
$ perl --gen-parrot
$ make
$ make test
$ make install

Subversion is needed to run –gen-parrot — it actually uses svn to get the appropriate version of parrot and then compile it.  You will need to use the appropriate make command for your system.  (It’s generally nmake with MS Visual C++, I believe.)

For current Rakudo, make install doesn’t install into your path, it merely preps the system so that you can run the perl6 executable (created in the rakudo directory) from any other directory on your system. Once you have it going, you can easily play around with Perl 6 by executing perl6 with no arguments, putting you into a REPL environment where you can type commands and see what they do. This can be incredibly handy for trying to figure out how Perl 6 works. For instance,

$ ./perl6
> say "Hello world!";
Hello world!
> say (10/7).WHAT
> say [+] (1..999).grep( { $_ % 3 == 0 || $_ % 5 == 0 } );

The lines starting with $ and > are what you type; the other lines are Rakudo’s response. The first example is a simple say statement. The second creates a rational number and asks WHAT its type is (Rat). The third takes the list of numbers from 1 to 999, filters out the ones not divisible by 3 or 5, sums them, and then says the result. (This is Project Euler Problem #1, thanks to draegtun for reminding me of it.) We’ll try to explain how these things work in a future post.

One last thing. If you have any difficulties getting Rakudo working, the #perl6 channel on is usually very helpful.

Perl 6 Advent Calendar

In the grand tradition of previous Perl Advent Calendars, we are going to post something we like about Perl 6 every day in December until Christmas.  This post will be updated daily to link to each post as a sort of table of contents, or you can follow this blog in the normal way you follow blogs.

Day 1: Getting Rakudo
Day 2: The Beauty Of Formatting
Day 3: Static Types and Multi Subs
Day 4: Testing
Day 5: Metaoperators
Day 6: Going Into Hyperspace
Day 7: Looping For Fun and Profit
Day 8: .comb Your Constraints
Day 9: Having Beautiful Arguments And Parameters
Day 10: A Regex Story
Day 11: Classes, Attributes, Methods, and More
Day 12: Modules and Exporting
Day 13: Junctions
Day 14: Going to the Rats
Day 15: .pick Your Game
Day 16: We Call It “The Old Switcheroo”
Day 17: Making Snowmen
Day 18: Roles
Day 19: Whatever
Day 20: Little Big Things
Day 21: Grammars and Actions
Day 22: Operator Overloading
Day 23: Lazy Fruits From the Gather of Eden
Day 24: The Perl 6 Standard Grammar
Merry Christmas!