Day 11 – Markov Sequence

Day 11 – Markov Sequence

On Day 4, I teased with mention of non-numeric sequences. Today I’d like to explore one such sequence, based on the common idea of using Markov chains on text. We will determine the next letter in the sequence randomly based on the previous two letters. The distribution follows the patterns contained in a model text, so that the result approximates the language of the model.

use v6;
use List::Utils;

my $model-text = $*;
$model-text .=subst(/<[_']>/, "", :global);
$model-text .=subst(/<-alpha>+/, " ", :global);

my %next-step;
for sliding-window($model-text.comb, 3) -> $a, $b, $c {
    %next-step{$a ~ $b}{$c}++;

my $first = $model-text.substr(0, 1);
my $second = $model-text.substr(1, 1);
my @chain := $first, $second, -> $a, $b { %next-step{$a ~ $b}.roll.key } ... *;
say @chain.munch(80);

After the initial use statements, the code divides neatly into three sections. The first section inputs the model text and gets rid of the non-alphabetic characters. Line 4 uses slurp to read standard input ($*IN) into a single string, and lc makes it all lowercase. The first subst (line 5) removes all underscores and apostrophes from the text. The second (line 6) replaces each string of non-alphabetic characters with a single space.

The second section combines the sliding-window sub from List::Utils with a bit of the good old Perl hash magic. (You can get List::Utils using neutro.)

$model-text.comb splits the text into individual characters.

sliding-windows iterates through a list, giving you the next N (3, in this case) elements in the list starting with each element in the list. (That is, you get the 1st, 2nd, and 3rd elements, then the 2nd, 3rd, and 4th elements, then the 3rd, 4th, and 5th elements, etc.) In this case we use it to get every set of three consecutive characters in the text.

In that loop, we construct a hash table of hash tables. The keys to the outer are the first two of those three consecutive characters. The keys to the inner are the third consecutive character, and its value is how many times that character follows the first two. So, for instance, if I feed the lyrics to the Aqualung album into this program, then %next-step{"qu"} looks like this:

{"a" => 5, "e" => 2}

That is to say, if you have “q” and “u”, they are followed by “a” five times (presumably all in the name Aqualung) and “e” twice (“requests” and “question”).

The third section of the code, then uses the knowledge we’ve just accumulated to build the sequence. First we get the first and second characters of the model space, just so we can start with a pair of letters we know for sure will have a letter coming after them. Then we construct a sequence which starts with those two, and uses -> $a, $b { %next-step{$a ~ $b}.roll } as a generator. The generator uses the previous two characters in the sequence to look up the appropriate frequency hash for the next character. The roll method randomly returns one of the keys of that hash, weighted by the value. (In the “qu” example above, you could think of this as rolling a seven-sided die, five of whose faces say “a” and two “e”.) If there is no known character which follows the previous two (for instance, if the last two characters in the model text are a pair unique in the text and you reach them both in order), then an undefined value is returned, which stops the sequence. We get the first 80 characters from the sequence using the munch method, chosen because it is well-behaved if the sequence terminates early.

Running the script on the lyrics to Aqualung produces sequences like
“t carealven thead you he sing i withe and upon a put saves pinsest to laboonfeet” and “t steall gets sill a creat ren ther he crokin whymn the gook sh an arlieves grac”. (Why does it start with “t “? My Aqualung lyrics file starts with some ASCII-art apparently attempting to imitate the original liner notes, and when we removed the non-alphabetic characters it boils down to “t “.)

Note that nowhere here does this script make assumptions about the character set it is working with. Anything that Perl 6 recognizes as alphabetic can be processed this way. If you feed it the standard “Land der Berge” file that p6eval uses as stdin, you’ll get strings like “laß in ber bist brüften las schören zeites öst froher land der äckerzeichöne lan”. (Fingers crossed that WordPress and your browser handle the non-ASCII characters correctly!)

One word of warning: As I was finishing this, the #perl6 channel raised the question of what Hash.roll should actually do. Right now in Rakudo it has the functionality of the (not yet implemented) KeyBag.roll method. Once it is implemented, KeyBag could be substituted if Hash.roll ends up spec’d differently.

There is a simple alternative which works today, however. If you change to %next-step{$a ~ $b}{$c}++ to %next-step.push($a ~ $b, $c), %next-step will be constructed as a Hash of Arrays. Each array will list all the characters $c which appear after $a and $b, with each distinct character repeated the number of times it appears in the file. This naturally acts as a weighting for .roll to use in the sequence generator.

I need to give a big thank you to Moritz Lenz, who was a huge help cleaning up and simplifying this script.

Day 10 – Feed operators

Day 10 – Feed operators

Anyone who has programmed in Perl 5 for a while has probably run across or written code similar to this:

    my @new = sort { ... } map { ... } grep { ... } @original;

In this construction, the data flows from the @original array which feeds into the grep, and that, in turn, feeds into the map, and then into the sort, and then finally, the result is assigned to the @new array. Because they each take a list as their final parameter, simply by juxtposition, the data feeds leftward from one operation to the next.

Perl 6, on the other hand, makes this idea of data flowing from one operation to another explicit by introducing the feed operator. The above Perl 5 code could be written like this in Perl 6:

    my @new <== sort { ... } <== map { ... } <== grep { ... } <== @original;

Note that TMTOWTDI is alive and well in Perl 6. You could have also written much the same as in Perl 5:

    my @new = sort { ... }, map { ... }, grep { ... }, @original;

The only difference would be the addition of commas.

So, what do we gain from these feed operators? Normally, when reading code, you read from left to right. In the original Perl 5 code you would read from left to right until you realize that you’re dealing with constructions where the direction of flow is right to left, then you jump to the end and follow the processing in a right-to-left manner. In Perl 6 there is now a prominent syntactic marker that clues you in to the leftward flowing nature of the data.

Still, the right-to-left nature of this code is somewhat troublesome. It may not seem so onerous if all of the code fits on one line as above. But imagine if the blocks associated with grep, map, and sort were little longer. Finding the end of the statement could be a bit annoying.

Luckily, Perl 6 has another feed operator that allows you to write the same code in a left-to-right fashion:

    @original ==> grep { ... } ==> map { ... } ==> sort { ... }  ==> my @new;

This works exactly the same as before only the direction of flow has been changed.

Here are a couple of examples of real, working Perl 6 code using the feed operators:

    my @random-nums = (1..100).pick(*);
    my @odds-squared <== sort() <== map { $_ ** 2 } <== grep { $_ % 2 } <== @random-nums;
    say ~@odds-squared;
    my @rakudo-people = <scott patrick carl moritz jonathan jerry stephen>;
    @rakudo-people ==> grep { /at/ } ==> map { .ucfirst } ==> my @who-it's-at;
    say ~@who-it's-at;
Day 9 – The module ecosystem

Day 9 – The module ecosystem

The Perl 6 module database on is certainly not CPAN yet, but there are still a number of things worth using, or at least knowing about. There’s no "standard" module installer for Perl 6, like there’s a cpan shell for Perl 5, but the most commonly used and most often working is neutro. It’s a simple script fetching, building, and installing modules from the ecosystem, resolving dependencies and checking if the tests are passing: not much more we need. Let’s see how to install something interesting with it, say JSON::Tiny, a JSON parser.

First, we need to get neutro. We will assume you use git to obtain it; notice that git is also obligatory to download modules (all of them live on github currently).

    git clone git://
    cd neutro
    PERL6LIB=tmplib bin/neutro .

That will download neutro and bootstrap it using the supplied libs. What we end up is the module installer itself, and the File::Tools and Module::Tools distributions. Make sure ~/.perl6/bin is in your PATH enviroment variable, so you will be able to run neutro without specifying its exact location. You are now able to install modules as simply as with cpanminus:

    neutro json
    neutro perl6-Term-ANSIColor
    neutro perl6-lwp-simple

You may notice module names are not similar to what you may be used to from Perl 5. They’re not standarized, they’re just the names of a git repos they live in. To make sure what you are looking for, consult the

    neutro update # fetch the fresh list of modules
    neutro list

Modules will be installed to ~/.perl6/lib, which is in the default search path of Rakudo, so you don’t really need to set PERL6LIB yourself:

    perl6 -e 'use Term::ANSIColor; say colored("Hello blue world!", "blue")'

You probably just can’t wait to write your first module and make it available for the whole world. There’s no CPAN where you can send your packages; the usual workflow is creating a repository on Github and adding it to the projects.list file in the ecosystem. You don’t need to have a direct access to the repo to get your module published. You can either send a pull request for your forked ecosystem repo, send a patch, or just ask some of the commiters or people on the #perl6 channel of Freenode.

How to write a module, what tools to use? If you’re interested, look into the guide.

Happy hacking!

Day 8 – Different Names of Different Things

Day 8 – Different Names of Different Things

(By masak and moritz)

Newcomers to the Perl programming language, version 5, often complain that
they can’t reverse strings. There’s a built-in reverse function, but it doesn’t seem to work at first glance:

    $ perl -E "say reverse 'hello'"

When such programmers ask more experienced Perl programmers, the solution is quickly found: reverse has actually two different operation modes. In list context it reverses lists, in scalar context it reverses strings.

    $ perl -E "say scalar reverse 'hello'"

Sadly this an exception from Perl’s usual context model. For most operators or functions the operator determines the context, and the data is interpreted in that context. For example + and * work on numbers, and . works on strings (concatenation). So a symbol (or function name, in the case of uc) stands for an operation, and provides context. Not so reverse.

In Perl 6, we try to learn from previous mistakes, and get rid of historical inconsistencies. This is why list reversal, string reversal and hash inversion have been split up into separate built-ins:

    # string reversal, sorry, flipping:
    $ perl6 -e 'say flip "hello"'
    # reversing lists
    # perl6 -e 'say join ", ", reverse <ab cd ef>'
    ef, cd, ab
    # hash inversion
    perl6 -e 'my %capitals = France => "Paris", UK => "London";
              say %capitals.invert.perl'
    ("Paris" => "France", "London" => "UK")

Hash inversion differs from the other two operations in that the result is of a different type than the input. Since hash values need not be unique, inverting a hash and returning the result as a hash would either change the structure (by grouping the keys of non-unique values together somehow), or lose information (by having one value override the other ones).

Instead hash inversion returns a list of pairs, and the user can decide which mode of operation they want for turning the result back into a hash, if that’s necessary at all.

Here’s how you do it if you want the hash inversion operation to be non-destructive:

    my %inverse;
    %inverse.push( %original.invert );

As pairs with already-seen keys get pushed onto the %inverse hash, the original value isn’t overriden but instead "promoted" to an array. Here’s a small demonstration:

    my %h;
    %h.push('foo' => 1);    # foo => 1
    %h.push('foo' => 2);    # foo => [1, 2]
    %h.push('foo' => 3);    # foo => [1, 2, 3]

All three (flip/reverse/invert) coerce their arguments to the expected type (if possible). For example if you pass a list to flip, it is coerced into a string and then the string is rever^W flipped.

Day 7 – Lexical variables

Day 7 – Lexical variables

Programming is tough going. Well, stringing together lines of code isn’t that difficult, and prototyping an idea can be pleasant and easy. But as the size of the program scales up, and the maintenance time lengthens, things tend to get tricky. Eventually, if we’re unlucky, we’re overcome by the complexity — not necessarily the complexity of the problem we started out solving, but the complexity of the program itself. We get gray hairs from debugging, or we’re simply at a loss for how to extend the program to do what we want.

So we turn to the history of programming, seeking advice as to how to combat complexity. And the answer sits there, clear as day: limit extent. If you’re architecting programs with hundreds or thousands of types of components, you’ll want those components to interact through only a very small set of surfaces… otherwise, you’ll simply lose control. Otherwise, combinatorics will defeat you.

We see this principle at every single level of programming, simply because it’s such a primary thing: Separation of Concerns, Do One Thing And Do It Well, BCNF, monads, routines, classes, roles, modules, packages. All of them urge us or guide us to limit the extent of things, so we don’t lose to combinatorics. Perhaps the simplest example of this is the lexical variable.

    my $var;
    # $var is visible in here
# $var is not visible out here

Yeah — that is today’s “cool feature”. :-) Here’s what makes it interesting:

Perl got this one wrong from version 1 and onwards. The default variable scope in Perl 5 is the “package variable”, a kind of global variable. Define something inside a block; still see it outside.

$ perl -v
This is perl 5, version 12, subversion 1 (v5.12.1)

$ perl -E '{ $var = 42 }; say $var'

$ perl -wE '{ my $var= 42 }; say $var'
Name "main::var" used only once: possible typo at -e line 1.
Use of uninitialized value $var in say at -e line 1.

In Perl 6, the lexical variable is the default. You won’t get past compilation if you try to pull off the above trick in Rakudo:

$ perl6 -e '{ $var = 42 }; say $var'    # gotta initialize with 'my'
Symbol '$var' not predeclared in <anonymous>

$ perl6 -e '{ my $var = 42 }; say $var' # still won't work! not visible outside
Symbol '$var' not predeclared in <anonymous>

You might say “okay, this is great for catching a typo now and then”. Yes, sure, but the big advantage is that this keeps you honest about variable scoping. And that helps you manage complexity.

Now let me just rush to the defense of Perl 5 by saying a variety of things at the same time. Perl 5 does try to steer you in the right way by having you use strict and use warnings by reflex; Perl 5 is bound by its promise of backwards compatibility, which is very good and noble; Perl 1 certainly was not about writing large applications and managing the resulting complexity; and global variables do make a lot of sense in a one-line script.

Perl 6 has an inherent focus to help you start small, and then help you put in more strictures and architectural underpinnings as your application scales up. In the case of variables, this means that in scripts and modules, lexical variables (à la strict) are the default, but in those -e one-liners the default is package variables. (Rakudo doesn’t implement this distinction yet, and one has to use lexical variables even at the command line. After it’s been implemented in Rakudo, I’d expect the perl6 invocations above to get past compilation, and produce outputs similar to the perl invocations.)

Moving along. At this point you might consider all that’s worth saying about lexical variables to have been said — not so. You see, the result of designing things right is that surprising and awesome bonuses keep falling out. Consider this subroutine:

sub counter($start_value) {
    my $count = $start_value;
    return { $count++ };

What’s returned at return { $count++ } is a block of code. So each time we call counter, what we get back is a little disconnected piece of code that can be called, as many times as we want.

Now look what happens when we create two such pieces of code and play around with them:

my $c1 = counter(5);
say $c1();           # 5
say $c1();           # 6

my $c2 = counter(42);
say $c2();           # 42
say $c1();           # 7
say $c2();           # 43

See that? The vital observation here is that $c1 and $c2 are acting entirely independently of each other. Both keep their own state, in the form of the $count variable, and although this might look like the same variable to us, to the two invocations of counter it looks like two different storage locations — because each time we enter a block of code, we start out afresh. The little block of code returned from some run of counter retains a relation to that particular storage location (it “closes over” the storage location, protecting it from the grasp of the Grim Garbage Collector; thus this kind of block is called a closure.)

If closures look a lot like lightweight objects to you, congratulations; they are. The principle behind closures, regulating the way values are accessed, is the same as the principle behind encapsulation and information hiding in OO. It’s all about limiting the extent of things, so that they can wreak as little havoc as possible when things turn ugly.

You can do nifty things like closures with lexical variables. You can’t with package variables. Lexical variables are cooler. QED.

Day 6 – The X and Z metaoperators

Day 6 – The X and Z metaoperators

One of the new ideas in Perl 6 is the metaoperator, an operator which combines with an ordinary operator in order to change its behaviour. There are several of them, but in this post we will concentrate on just two of them: X and Z.

The X operator is one you might already have seen in its ordinary role as the infix cross operator. It combines lists together, one element from each, in every combination:

> say ((1, 2) X ('a', 'b')).perl
((1, "a"), (1, "b"), (2, "a"), (2, "b"))

However, this infix:<X> operator is actually just a shorthand for the X metaoperator applied to the list concatenation operator infix:<,>. Indeed, you’re perfectly at liberty to write

> say ((1, 2) X, (10, 11)).perl
((1, 10), (1, 11), (2, 10), (2, 11))

If you like. So what happens if you apply X to a different infix operator? How about infix:<+>

> say ((1, 2) X+ (10, 11)).perl
(11, 12, 12, 13)

What’s it done? Instead of creating a list of all the elements picked for each combination, the operator applies the addition infix to them, and the result is not a list but a single number, the sum of all the elements in that combination.

This works for any infix operator you care to use. How about string concatenation, infix:<~>?

> say ((1, 2) X~ (10, 11)).perl
("110", "111", "210", "211")

Or perhaps the numeric equality operator infix:<==>?

> say ((1, 2) X== (1, 1)).perl
(Bool::True, Bool::True, Bool::False, Bool::False)

But this post is also meant to be about the Z metaoperator. We expect you may have already figured out what it does, if you’ve encountered the infix:<Z> operator before, which is of course just a shortcut for Z,. If a Haskell programmer understands infix:<Z> as being like the zip function, then the Z metaoperator is like zipWith.

> say ((1, 2) Z, (3, 4)).perl
((1, 3), (2, 4))
> say ((1, 2) Z+ (3, 4)).perl
(4, 6)
> say ((1, 2) Z== (1, 1)).perl
(Bool::True, Bool::False)

Z, then, operates on each element of each list in turn, working on the first elements together, then the second, then the third for however many there are. It stops when it reaches the end of a list regardless of which side that list is on.

Z is also lazy, so you can apply it to two infinite lists and it will only generate as many results as you need. X can only handle an infinite list on the left, otherwise it would never manage to get anywhere at all.

At the time of writing, Rakudo appears to suffer from a bug where infix:<Z> and infix:<Z,> are not identical: the former produces a flattened result list. S03 shows that the behaviour of the latter is correct.

These metaoperators, then, become powerful tools for performing operations encompassing the individual elements of multiple lists, whether those elements are associated in some way based on their indexes as with Z, or whether you just want to examine all possible combinations with X.

Got a list of keys and values and you want to make a hash? Easy!

my %hash = @keys Z=> @values;

Or perhaps you want to iterate over two lists in parallel?

for @a Z @b -> $a, $b { ... }

Or three?

for @a Z @b Z @c -> $a, $b, $c { ... }

Or maybe you want to find out all the possible totals you could get from rolling three ten-sided dice:

my @d10 = 1 ... 10;
my @scores = (@d10 X+ @d10) X+ @d10;

If you want to see some real-world use of these metaoperators, in Moritz Lenz’s Sudoku solver.

Day 5 – Why Perl syntax does what you want

Day 5 – Why Perl syntax does what you want

Opening the fifth door of our advent calendar, we don’t find a recipe of how to do something cool with Perl 6 – rather an explanation of how some of the intuitiveness of the language works.

As an example, consider these two lines of code:

    say 6 / 3;
    say 'Price: 15 Euro' ~~ /\d+/;

They print out 2 and 15, respectively. For a Perl programmer this is not surprising. But look closer: the forward slash / serves two very different purposes, the numerical division in the first line, and delimits a regex in the second line.

How can Perl know when a / means what? It certainly doesn’t look at the text after the slash to decide, because a regex can look just like normal code.

The answer is that Perl keeps track of what it expects. Most important are two things it expects: terms and operators.

A term can be literal like 23 or "a string". After parser finds such a literal, there can either be the end of a statement (indicated by a semicolon), or an operator like +, * or /. After an operator, the parser expects a term again.

And that’s already the answer: When the parser expects a term, a slash is recognized as the start of a regex. When it expects an operator, it counts as a numerical division operator.

This has far reaching consequences. Subroutines can be called without parenthesis, and after a subroutine name an argument list is expected, which starts with a term. On the other hand type names are followed by operators, so at parse time all type names must be known.

On the upside, many characters can be reused for two different syntaxes in a very convenient way.