Day 2 – Perl 6: Sigils, Variables, and Containers

Having a rudimentary understanding of containers is vital for enjoyable programming in Perl 6. They're ubiquitous and not only do they affect the kind of variables you get, they also dictate how Lists and Maps behave when iterated.

Today, we'll learn what containers are and how to work with them, but first, I'd like you to temporarily forget everything you might know or suspect about Perl 6's sigils and variables, especially if you're coming from Perl 5's background. Everything.

Show Me The Money

In Perl 6, a variable is prefixed with a $ sigil and is given a value with a binding operator (:=). Like so:

my $foo := 42;
say "The value is $foo"; # OUTPUT: «The value is 42␤»

If you've followed my suggestion to forget everything you know, it won't shock you to learn the same applies to List and Hash types:

my $ordered-things := <foo bar ber>;
my $named-things := %(:42foo, :bar<ber>);
say "$named-things<foo> bottles of $ordered-things[2] on the wall";
# OUTPUT: «42 bottles of ber on the wall␤»
.say for $ordered-things; # OUTPUT: «foo␤bar␤ber␤»
.say for $named-things; # OUTPUT: «bar => ber␤foo => 42␤»

Knowing just this, you can write a great variety of programs, so if you ever start to feel like there's just too much to learn, remember you don't have to learn everything at once.

We Wish You a Merry Listmas

Let's try doing more things with our variables. It's not uncommon to want to change a value in a list. How well do we fare with what we have so far?

my $list := (1, 2, 3);
$list[0] := 100;
# OUTPUT: «Cannot use bind operator with this left-hand side […] »

Although we can bind to variables, if we attempt to bind to some value, we get an error, regardless of whether the value comes from a List or just, say, a literal:

1 := 100;
# OUTPUT: «Cannot use bind operator with this left-hand side […] »

This is how Lists manage to be immutable. However, 'Tis The Season and wishes do come true, so let's wish for a mutable List!

What we need to get a hold of is a Scalar object because the binding operator can work with it. As the name suggests, a Scalar holds one thing. You can't instantiate a Scalar via the .new method, but we can get them by just declaring some lexical variables; don't need to bother giving them names:

my $list := (my $, my $, my $);
$list[0] := 100;
say $list; # OUTPUT: «(100 (Any) (Any))␤»

The (Any) in the output are the default values of the containers (on that, a bit later). Above, it seems we managed to bind a value to a list's element after List's creation, did we not? Indeed we did, but…

my $list := (my $, my $, my $);
$list[0] := 100;
$list[0] := 200;
# OUTPUT: «Cannot use bind operator with this left-hand side […] »

The binding operation replaces the Scalar container with a new value (100), so if we try to bind again, we're back to square one, trying to bind to a value instead of a container again.

We need a better tool for the job.

That's Your Assignment

The binding operator has a cousin: the assignment operator (=). Instead of replacing our Scalar containers with a binding operator, we'll use the assignment operator to assign, or "store", our values in the containers:

my $list := (my $ = 1, my $ = 2, my $ = 3);
$list[0] = 100;
$list[0] = 200;
say $list;
# OUTPUT: «(200 2 3)␤»

Now, we can assign our original values right from the start, as well as replace them with other values whenever we want to. We can even get funky and put different type constraints on each of the containers:

my $list := (my Int $ = 1, my Str $ = '2', my Rat $ = 3.0);
$list[0] = 100; # OK!
$list[1] = 42; # Typecheck failure!
# OUTPUT: «Type check failed in assignment;
# expected Str but got Int (42) […] »

That's somewhat indulgent, but there is one thing that could use a type constraint: the $list variable. We'll constrain it to the Positional role to ensure it can only hold Positional types, like List and Array:

my Positional $list := (my $ = 1, my $ = '2', my $ = 3.0);

Don't know about you, but that looks awfully verbose to me. Luckily, Perl 6 has syntax to simplify it!

Position@lly

First, let's get rid of the explicit type constraint on the variable. In Perl 6, you can use @ instead of $ as a sigil to say that you want the variable to be type-constrained with role Positional:

my @list := 42;
# OUTPUT: «Type check failed in binding;
# expected Positional but got Int (42) […] »

Second, instead of parentheses to hold our List, we'll use square brackets. This tells the compiler to create an Array instead of a List. Arrays are mutable and they stick each of their elements into a Scalar container automatically, just like we did manually in the previous section:

my @list := [1, '2', 3.0];
@list[0] = 100;
@list[0] = 200;
say @list;
# OUTPUT: «[200 2 3]␤»

Our code became a lot shorter, but we can toss out a couple more characters. Just like assigning, instead of binding, to a $-sigiled variable gives you a Scalar container for free, you can assign to @-sigiled variable to get a free Array. If we switch to assignment, we can get rid of the square brackets altogether:

my @list = 1, '2', 3.0;

Nice and concise.

Similar ideas are behind %– and &-sigiled variables. The % sigil implies a type-constraint on Associative role and offers the same shortcuts for assignment (giving you a Hash) and creates Scalar containers for the values. The &-sigiled variables type-constrain on role Callable and assignment behaves similar to $ sigils, giving a free Scalar container whose value you can modify:

my %hash = :42foo, :bar<ber>;
say %hash; # OUTPUT: «{bar => ber, foo => 42}␤»
my &reversay = sub { $^text.flip.say }
reversay '6 lreP ♥ I'; # OUTPUT: «I ♥ Perl 6␤»
# store a different Callable in the same variable
&reversay = *.uc.say; # a WhateverCode object
reversay 'I ♥ Perl 6'; # OUTPUT: «I ♥ PERL 6␤»

The One and Only

Earlier we learned that assignment to $-sigiled variables gives you a free Scalar container. Since scalars, as the name suggests, contain just one thing… what exactly happens if you put a List into a Scalar? After all, the Universe remains unimploded when you try to do that:

my $listish = (1, 2, 3);
say $listish; # OUTPUT: «(1 2 3)␤»

Such behaviour may make it seem that Scalar is a misnomer, but it does actually treat the entire list as a single thing. We can observe the difference in a couple of ways. Let's compare a List bound to a $-sigiled variable (so no Scalar is involved) with one that is assigned into a $-sigiled variable (automatic Scalar container):

# Binding:
my $list := (1, 2, 3);
say $list.perl;
say "Item: $_" for $list;
# OUTPUT:
# (1, 2, 3)
# Item: 1
# Item: 2
# Item: 3
# Assignment:
my $listish = (1, 2, 3);
say $listish.perl;
say "Item: $_" for $listish;
# OUTPUT:
# $(1, 2, 3)
# Item: 1 2 3

The .perl method gave us an extra insight and showed the second List with a $ before it, to indicate it's containerized in a Scalar. More importantly, when we iterated over our Lists with the for loop, the second List resulted in just a single iteration: the entire List as one item! The Scalar lives up to its name.

This behaviour isn't merely of academic interest. Recall that Arrays (and Hashes) create Scalar containers for their values. This means that if we nest things, even if we select an individual list or hash stored inside the Array (or Hash) and try to iterate over it, it'd be treated as just a single item:

my @stuff = (1, 2, 3), %(:42foo, :70bar);
say "List Item: $_" for @stuff[0];
say "Hash Item: $_" for @stuff[1];
# OUTPUT:
# List Item: 1 2 3
# Hash Item: bar 70
# foo 42

The same reasoning—that lists and hashes in Scalar containers are a single item—applies when you try to flatten an Array's elements or pass them as an argument to a slurpy parameter:

my @stuff = (1, 2, 3), %(:42foo, :70bar);
say flat @stuff;
# OUTPUT: «((1 2 3) {bar => 70, foo => 42})␤»
-> *@args { @args.say }(@stuff)
# OUTPUT: «[(1 2 3) {bar => 70, foo => 42}]␤»

It's this behaviour that can drive Perl 6 beginners up the wall, especially those who come from auto-flattening languages, such as Perl 5. However, now that we know why this behaviour is observed, we can change it!

Decont

If the Scalar container is the culprit, all we need to do is remove it. We need to de-containerize our list and hash, or "decont" for short. In your Perl 6 travels, you'll find several ways to accomplish that, but one way that's designed precisely for that is the decont methodop (<>):

my @stuff = (1, 2, 3), %(:42foo, :70bar);
say "Item: $_" for @stuff[0]<>;
say "Item: $_" for @stuff[1]<>;
# OUTPUT:
# Item: 1
# Item: 2
# Item: 3
# Item: bar 70
# Item: foo 42

It's easy to remember: it looks like a squished box (a trampled container). After retrieving our containerized items by indexing into the Array, we appended the decont and removed the contents from their Scalar containers, causing our loop to iterate over each item in them individually.

If you wish to decont every element of an Array in one go, simply use the hyper operator (», or >> if you prefer ASCII) along with the decont:

my @stuff = (1, 2, 3), %(:42foo, :70bar);
say flat @stuff»<>;
# OUTPUT: «(1 2 3 bar => 70 foo => 42)␤»
-> *@args { @args.say }(@stuff»<>)
# OUTPUT: «[1 2 3 bar => 70 foo => 42]␤»

With the containers removed, our list and hash flattened just like we wanted. And of course, we could have avoided the Array and bound our original List to the variable instead. Since Lists don't put their elements into containers, there's nothing to decont:

my @stuff := (1, 2, 3), %(:42foo, :70bar);
say flat @stuff;
# OUTPUT: «(1 2 3 bar => 70 foo => 42)␤»
-> *@args { @args.say }(@stuff)
# OUTPUT: «[1 2 3 bar => 70 foo => 42]␤»

Don't Let It Slip Away

While we're here, it's worth noting that many people use the slip operator (|), when they want to do the decont (we're not talking about using it when passing arguments to Callables):

my @stuff = (1, 2, 3), (4, 5);
say "Item: $_" for |@stuff[0];
# OUTPUT:
# Item: 1
# Item: 2
# Item: 3

Although it gets the job done as far as deconting goes, it can introduce subtle bugs that could be very difficult to track down. Try to spot one here, in a program that iterates over an infinite list of non-negative integers and prints those that are prime:

my $primes = ^.grep: *.is-prime;
say "$_ is a prime number" for |$primes;

Give up? This program leaks memory… very slowly. Even though, we're iterating over an infinite list of items, that's not an issue because .grep method returns a Seq object that doesn't keep already-iterated items around and so memory usage never grows there.

The problematic part is our | slip operator. It converts our Seq into a Slip, which is a type of a List and keeps around all of the values we already consumed. Here's a modified version of the program that grows faster, if you wanted to see that growth in htop:

# CAREFUL! Don't consume all of your resources!
my $primes = ^.map: *.self;
Nil for |$primes;

Let's try it again, but this time using the decont method op:

my $primes = ^.map: *.self;
Nil for $primes<>;

The memory usage is stable now and the program can sit there and iterate until the end of times. Of course, since we know it's the Scalar container that causes containerization and we wish to avoid it here, we can simply bind the Seq to the variable instead:

my $primes := ^.map: *.self;
Nil for $primes;

I Want Less

If you detest sigils, Perl 6 got something you can smile about: sigil-less variables. Just prefix the name with a backslash during declaration, to indicate you don't want no stinkin' sigils:

my= 42;
say Δ²; # OUTPUT: «1764␤»

You don't get any free Scalars with such variables and so, during declaration, it makes no difference between binding or assignment to them. They behave similar to how binding a value to a $-sigiled variable behaves, including the ability to bind Scalars and make the variable mutable:

my= my $ = 42;
Δ = 11;
say Δ²; # OUTPUT: «121␤»

A more common place where you might see such variables is as parameters of routines, here, these mean you want is raw trait applied to the parameter. The meaning exists for the + positional slurpy parameter as well (no backslash is needed), where having it is raw means you won't get unwanted Scalar containers due to the slurpy being an Array as it has the @ the sigil:

sub sigiled ($x is raw, +@y) {
$x = 100;
say flat @y
}
sub sigil-less (\x, +y) {
x = 200;
say flat y
}
my $x = 42;
sigiled $x, (1, 2), (3, 4); # OUTPUT: «((1 2) (3 4))␤»
say $x; # OUTPUT: «100␤»
sigil-less $x, (1, 2), (3, 4); # OUTPUT: «(1 2 3 4)␤»
say $x; # OUTPUT: «200␤»

Defaulting on Default Defaults

One awesome feature offered by containers is default values. You may have heard that in Perl 6 Nil signals the absence of a value and not a value in itself. Container defaults is where it comes into play:

my $x is default(42);
say $x; # OUTPUT: «42␤»
$x = 10;
say $x; # OUTPUT: «10␤»
$x = Nil;
say $x; # OUTPUT: «42␤»

A container's default value is given to it using the is default trait. Its argument is evaluated at compile time and the resultant value is used whenever the container lacks a value. Since Nil's job is to signal just that, assigning a Nil into a container will result in the container containing its default value, not a Nil.

Defaults can be given to Array and Hash containers just the same and if you wish your containers to contain a Nil literally, when no value is present, just specify Nil as a default:

my @a is default<meow> = 1, 2, 3;
say @a[0, 2, 42]; # OUTPUT: «(1 3 meow)␤»
@a[0]:delete;
say @a[0]; # OUTPUT: «meow␤»
my %h is default(Nil) = :bar<ber>;
say %h<bar foos>; # OUTPUT: «(ber Nil)␤»
%h<bar>:delete;
say %h<bar> # OUTPUT: «Nil␤»

The container's default has a default default: the explicit type constraint that's present on the container:

say my Int $y; # OUTPUT: «(Int)␤»
say my Mu $z; # OUTPUT: «(Mu)␤»
say my Int $i where *.is-prime; # OUTPUT: «(<anon>)␤»
$i.new; # OUTPUT: (exception) «You cannot create […]»

If no explicit type constraint is present, the default default is an Any type object:

say my $x; # OUTPUT: «(Any)␤»
say $x = Nil; # OUTPUT: «(Any)␤»

Note that the default values you may use in routine signatures for optional parameters are not the container defaults and assigning Nil to subroutine arguments or into parameters will not utilize the defaults from the signature.

Customizing

If the standard behaviour of containers doesn't suit your needs, you can make your own container, using the Proxy type:

my $collector := do {
my @stuff;
Proxy.new: :STORE{ @stuff.push: @_[1] },
:FETCH{ @stuff.join: "|" }
}
$collector = 42;
$collector = 'meows';
say $collector; # OUTPUT: «42|meows␤»
$collector = 'foos';
say $collector; # OUTPUT: «42|meows|foos␤»

The interface is somewhat clunky, but it gets the job done. We create the Proxy object using method .new that takes two required named arguments: STORE and FETCH, each taking a Callable.

The FETCH Callable gets called whenever a value is read from the container, which can happen more times than is immediately apparent: in the code above, the FETCH Callable is called 10 times as the container percolates through dispatch and routines of the two say calls. The Callable is called with a single positional argument: the Proxy object itself.

The STORE Callable gets called whenever a value is stored into our container, for example, with an assignment operator (=). The first positional argument to the Callable is the Proxy object itself, and the second argument is the value that was given to be stored.

We'd like STORE and FETCH Callables to share the @stuff variable, so we use the do statement prefix with a code block to contain it all nicely inside.

We bind our Proxy to a variable and the rest is just normal variable usage. The output shows the altered behaviour our custom container provides.

Proxies are also handy as a return value from methods to provide extra behaviour with mutable attributes. For example, here's an attribute that from the outside appears to be just a normal mutable attribute, but actually coerces its value from an Any type to an Int

class Foo {
has $!foo;
method foo {
Proxy.new: :STORE(-> $, Int() $!foo { $!foo }),
:FETCH{ $!foo }
}
}
my $o = Foo.new;
$o.foo = ' 42.1e0 ';
say $o.foo; # OUTPUT: «42␤»

Quite sweet! And if you want a Proxy with a better interface with a few more features under its belt, check out the Proxee module.

That's All, Folks

That about covers it all. The remaining beasts you'll see in the land of Perl 6 are "twigils": variables with TWO symbols before the name, but as far as containers go, they behave the same as the variables we've covered. The second symbol simply indicates additional information, such as whether the variable is an implied positional or named parameter…

sub test { say "$^implied @:parameters[]" }
test 'meow', :parameters<says the cat>;
# OUTPUT: «meow says the cat␤»

…or whether the variable is a private or public attribute:

with class Foo {
has $!foo = 42;
has @.bar = 100;
method what's-foo { $!foo }
}.new {
say .bar; # OUTPUT: «[100]␤»
say .what's-foo # OUTPUT: «42␤»
}

That's a journey for another day, however.

Conclusion

Perl 6 has a rich system of variables and containers that differs vastly from Perl 5. It's important to understand the way it works, as it affects the way iteration and flattening of lists and hashes behaves.

Assignment to variables offers valuable shortcuts, such as providing Scalar, Array, or Hash containers, depending on the sigil. Binding to variables allows you to bypass such shortcuts, if you so require.

Sigil-less variables exist in Perl 6 and they have similar behaviour to how $-sigiled variables with binding work. When used as parameters, these variables behave like is raw trait was applied to them.

Lastly, containers can have default values and it's possible to create your own custom containers that can either be bound to a variable or returned from a routine.

Happy Holidays!

Day 1 – The Grinch of Perl 6: A Practical Guide to Ruining Christmas

Look at them! All smiling and happy. Coworkers, friends, and close family members. All enjoying programming in Perl 6 version 6.c "Christmas". Great concurrency primitives, core grammars, and a fantastic object model. It sickens me!

But wait a second… wait just a second. I got an idea. An awful idea. I got a wonderful, awful idea! We can ruin their "Christmas". All we need is a few tricks up our sleeves. Muahuahahaha!!


Welcome to the 2017th Perl 6 Advent Calendar! Each day, from today until Christmas, we'll have an awesome blog post about Perl 6 lined up for you.

Today, we'll show our naughty side and purposefully do naughty things. Sure, these have good uses, but being naughty is a lot more fun. Let's begin!

But True does False

Have you heard of the but operator? A fun little thing:

say True but False ?? 'Tis true' !! 'Tis false';
# OUTPUT: «Tis false␤»
my $n = 42 but 'forty two';
say $n; # OUTPUT: «forty two␤»
say $n + 7; # OUTPUT: «49␤»

It's an infix operator that first clones the object on the left hand side and then mixes in a role provided on the right hand side into the clone:

my $n = 42 but role Evener {
method is-even { self %% 2 }
}
say $n.is-even; # OUTPUT: «True␤»
say $n.^name; # OUTPUT: «Int+{Evener}␤»

Those aren't roles in the first two examples above. The but operator has a handy shortcut: if the thing on the right isn't a role, it creates one for you! The role will have a single method, named after the .^name of the object on the right hand side, and the method will simply return the given object. Thus, this…

put True but 'some boolean'; # OUTPUT: «some boolean␤»

…is equivalent to:

put True but role {
method ::(BEGIN 'some boolean'.^name) {
'some boolean'
}
} # OUTPUT: «some boolean␤»

The .^name on our string returns Str, since it's a Str object:

say 'some boolean'.^name; # OUTPUT: «Str␤»

And so the role provides a method named Str, which put calls on non-Str objects to obtain a stringy value to output, causing our boolean to have an altered stringy representation.

As an example, string '0' is True in Rakudo Perl 6 but is False in Pumpkin Perl 5. Using the but operator, we can alter a string to behave like Perl 5's version:

role Perl5Str {
method Bool {
nextsame unless self eq '0';
False
}
}
sub perlify { $^v but Perl5Str };
say so perlify 'meows'; # OUTPUT: «True␤»
say so perlify '0'; # OUTPUT: «False␤»
say so perlify ''; # OUTPUT: «False␤»

The role provides the .Bool method that the so routine calls. Inside the method, we re-dispatch to the original .Bool method using nextsame routine unless the string is a '0', in which case we simply return False.

The but operator has a brother: an infix does operator. It behaves very similarly, except it does not clone. (N.B.: the shortcut for automatically making roles from non-roles is available in does only on bleeding edge Rakudo, version 2017.11-1-g47ebc4a and up)

my $o = class { method stuff { 'original' } }.new;
say $o.stuff; # OUTPUT: «original␤»
$o does role { method stuff { 'modded' } };
say $o.stuff; # OUTPUT: «modded␤»

Some of the things in a program are globally accessible and in some implementations (e.g. Rakudo), certain constants are cached. This means we can get quite naughty in a separate part of a program and those Christmas celebrators won't even know what hit 'em!

How about, we override what the prompt routine reads? They like Christmas? We'll give them some Christmas trees:

$*IN does role { method get { "🎄 {callsame} 🎄" } }
my $name = prompt "Enter your name: ";
say "You entered your name as: $name";
# OUTPUT
# Enter your name: (typed by user:) Zoffix Znet
# You entered your name as: 🎄 Zoffix Znet 🎄

That override will work even if we stick it into a module. We can also kick it up a notch and mess with enums and cached constants, though this naughtiness likely won't be able to cross the module boundary and other implementation-specific cache invalidation:

True does False;
say 42 ?? "tis true" !! "tis false";
# OUTPUT: «tis true␤»

So far, that didn't quite have the wanted impact, but let's try coercing our number to a Bool:

True does False;
say 42.Bool ?? "tis true" !! "tis false";
# OUTPUT: «tis false␤»

There we go! And now, for the final Grinch-worthy touch, we'll mess with numerical results of computations on numbers. Rakudo caches Int constants. Infix + operator also uses the internal-ish-ish .Bridge method when computing with numerics of different types. So, let's override the .Bridge on our constant to return something funky:

BEGIN 42 does role { method Bridge { 12e0 } }
say 42 + 15; # OUTPUT: «57␤»
say 42 + 15e0; # OUTPUT: «27␤»

That's proper evil, sure to ruin any Christmas, but we're only getting started…

Wrapping It Up

What kind of Christmas would it be without wrapped presents?! Oh, for presents we shall have and Perl 6's .wrap method provided by Routine type will let us wrap 'em up, oh so good.

use soft;
sub foo { say 'in foo' }
&foo.wrap: -> | {
say 'in the wrap';
callsame;
say 'back in the wrap';
}
foo;
# OUTPUT:
# in the wrap
# in foo
# back in the wrap

We enable use soft pragma to prevent unwanted inlining of routines that would otherwise interfere with our wrap. Then, we use a routine we want to wrap as a noun by using it with its & sigil and call the .wrap method that takes a Callable.

The given Callable's signature must be compatible with the one on the wrapped routine (or its proto if it's a multi); otherwise we'd not be able to both dispatch to the routine correctly and call the wrapper with the args. In the example above, we simply use an anonymous Capture (|) to accept all possible arguments.

Inside the Callable we have two say calls and make use of callsame routine to call the next available dispatch candidate, which happens to be our original routine. This comes in handy, since were we to attempt to call foo by its name inside the wrapper, we'd start the dispatch over from scratch, resulting in an infinite dispatch loop.

Since methods are Routines, we can wrap them as well. We can get a hold of the Method object using the .^lookup meta method:

IO::Handle.^lookup('print').wrap: my method (|c) {
my &wrapee = nextcallee;
wrapee self, "🎄 Ho-ho-ho! 🎄\n";
wrapee self, |c
};
print "Hello, World!\n";
# OUTPUT:
# 🎄 Ho-ho-ho! 🎄
# Hello, World!

Here, we grab the .print method from IO::Handle type and .wrap it. We wish to make use of self inside the method, so we're wrapping using a standalone method (my method …) instead of a block or a subroutine. The reason we want to have self is to be able to call the very method we're wrapping to print our Christmassy message. Because our method is detached, the callwith and related routines will need self fed to them along with the rest of the args, to ensure we continue dispatch to the right object.

Inside the wrap, we use the nextcallee routine to obtain the original method.If it's a multi, we'll get the proto, not a specific candidate that best matches the original arguments, so the next candidate ordering is slightly different inside the wrap, compared to traditional routines. We grab the nextcallee in to a variable, because we want to call it more than once and calling it shifts the routine off the dispatch stack. In the first call, we print our Christmassy message and in the second call, we merely slip our Capture (|c) of original args, performing the call like it were originally meant to happen.

Thanks to the .wrap, we can alter or even completely redefine behaviour of subroutines and methods, which is sure to be jolly fun when your friends try to use them. Ho-ho-ho!

Invisibility Cloak

The tricks we've played so far are wonderfully terrible, but they're just too obvious and too… visible. Since Perl 6 has superb Unicode support, I think we should search the mass of Unicode characters for some fun mischief. In particular, we're looking for invisible characters that are NOT whitespace. Just one is sufficient for our purpose, but these four are fairly invisible on my computer:

[⁠] U+2060 WORD JOINER [Cf]
[⁡] U+2061 FUNCTION APPLICATION [Cf]
[⁢] U+2062 INVISIBLE TIMES [Cf]
[⁣] U+2063 INVISIBLE SEPARATOR [Cf]

Perl 6 supports custom terms and operators that can consist of any characters, except whitespace. For example, here's my patented Shrug Operator:

sub infix:<¯\(°_o)/¯> {
($^a, $^b).pick
}
say 'Coke' ¯\(°_o)/¯ 'Pepsi';
# OUTPUT: «Pepsi␤»

And here's a term, made out of non-identifier characters (we could've used the actual characters in the definition as well):

sub term:«\c[family: woman woman boy boy]» {
'♫ We— are— ♪ faaaamillyyy ♬'
}
say 👩‍👩‍👦‍👦;
# OUTPUT: «♫ We— are— ♪ faaaamillyyy ♬»

With our invisible, non-whitespace characters in hand, we can make invisible operators and terms!

sub infix:«\c[INVISIBLE TIMES]» { $^a × $^b }
my \r = 42;
say "Area of the circle is " ~ π⁢r²;
# OUTPUT: «Area of the circle is 5541.76944093239␤»

Let's make a Jolly module that will export some invisible terms and operators. We'll then sprinkle them into our Christmassy friends' code:

unit module Jolly;
sub term:«\c[INVISIBLE TIMES]» is export { 42 }
sub infix:«\c[INVISIBLE TIMES]» is export {
$^a × $^b
}
sub prefix:«\c[INVISIBLE SEPARATOR]» (|)
is looser(&[,]) is export
{
say "Ho-ho-ho!";
}

We've used the same character for the term and the infix operator. That's fine, as Perl 6 has fairly strict expectation of terms being followed by operators and vice versa, so it'll know when we meant to use the term or when to use the infix operator. Here's the resultant Grinch code, along with the output it produces:

say 42⁢⁢;
# OUTPUT:
# 1764
# Ho-ho-ho!

That'll sure be fun to debug! Here's a list of characters in that line of code, for you to see where we've used our invisible goodies:

.say for '⁣say 42⁢⁢;'.uninames;
# OUTPUT:
# INVISIBLE SEPARATOR
# LATIN SMALL LETTER S
# LATIN SMALL LETTER A
# LATIN SMALL LETTER Y
# SPACE
# DIGIT FOUR
# DIGIT TWO
# INVISIBLE TIMES
# INVISIBLE TIMES
# SEMICOLON

Ho-Ho-Ho

Productivity at Christmas time drops to a standstill. People have the Holidays and the New Year on their minds. Wouldn't surprise me to see a whole bunch of TODO comments in all the codes. But what if we were able to detect and complain about them? There's nothing more Grinch-like than aborting program compilation whenever someone is feeling lazy!

Perl 6 has Slangs. It's an experimental feature that currently does not have an officially supported interface, however, for our purpose, it'll do just fine.

Using Slangs, it's possible to lexically mutate Perl 6's grammar and introduce language features and behaviour, just like a Perl 6 core developer would:

BEGIN $*LANG.refine_slang: 'MAIN',
role SomeExtraGrammar {
token term:sym<meow> {
'This is not a syntax error'
}
},
role SomeExtraActions {
method EXPR (Mu $/) {
say "Parsed expression: " ~ $/;
nextsame
}
}
This is not a syntax error;
say 'hehe'
# OUTPUT:
# Parsed expression: This is not a syntax error
# Parsed expression: 'hehe'
# Parsed expression: say 'hehe'
# hehe

The "experimental" part of the Slangs feature largely lies in having to rely on the structure of core Grammar and core Actions; currently there's no official guarantee those will remain unchanged, which makes Slangs fragile.

For our naughty, Grinchy trick, we'll be modifying behaviour of comments and if we read the code to trace what calls the comment token, we'll find it's actually part of the redefined ws token, which, as you may know from everyday Perl 6 grammars, is responsible for whitespace matching in, among other things, grammar rules.

This complicates the matter slightly, as ws is such a cornerstone token that, along with comp_unit, statementlist, and statement, it can't be modified in the mainline (code outside routines and blocks). The reason is the Slang is loaded after the mainline is already being parsed using the stock version of these tokens. The tokens inside statement token can be changed even in the mainline, because statement token reblesses the grammar, but ws does not get such luxury.

Since we're starting to tread far into the deep end… enough talk! Let's code:

BEGIN $*LANG.refine_slang: 'MAIN', role {
token comment:sym<todo> {
'#' \s* 'TODO' ':'? \s+ <( \N*
{ die "Ho-ho-ho! I think you were"
~ " meant to finish " ~ $/ }
}
}
sub business-stuff {
# TODO: business stuff
}
# OUTPUT:
# ===SORRY!===
# Ho-ho-ho! I think you were meant to finish business stuff

We use the BEGIN phaser to make the Slang modification happen at compile time, since we're trying to affect how further compilation is performed.

We added a new proto token comment:sym<todo> to core Perl 6 grammar that matches content similar to what a regular comment would match, except it also looks for the TODO our Christmassy friends decided to leave around. The \N* atom captures whatever string the user typed after the TODO and the <( match capture marker tells the compiler to exclude the previously matched stuff in the token from the captured text inside the Match object stored in the $/ variable.

At the end of the token, we simply use a code block to die with a message that tells the user to finish up their TODO. Quite crafty!

Since we'd rather the user not notice our jolly tricks, let's stick the Slang into a module that's to be loaded by the target code. We'll just make a slight tweak to the original code:

# File: ./Jolly.pm6
sub EXPORT {
$*LANG.refine_slang: 'MAIN', role {
token comment:sym<todo> {
'#' \s* 'TODO' ':'? \s+ <( \N*
{ die "Ho-ho-ho! I think you were"
~ " meant to finish " ~ $/ }
}
}
Map.new
}
# File: ./script.p6
use lib <.>;
use Jolly;
sub business-stuff {
# TODO: business stuff
}
# OUTPUT:
# ===SORRY!===
# Ho-ho-ho! I think you were meant to finish business stuff

We want the slang to run at the compilation time of the script, not the module, so we removed the BEGIN phaser and instead stuck the code to be inside sub EXPORT, which will run when the module is used during script's compilation. The Map.new is just how I prefer to write {} in EXPORT sub, to indicate we do not wish to export any symbols. In our script, we now merely have to use the module and the Slang gets activated. Awesome!

Conclusion

Today, we started off the 2017 Perl 6 Advent Calendar by being naughty Grinches and messing with users' programs. We mutated objects using but and does operators. Wrapped methods and subroutines with our custom routines that implemented extra features. Made invisible terms and operators. And even mutated the language itself to do our bidding.

Over the next 23 days, we'll see more Perl 6 Advent articles, so be sure to check back. And maybe, by the end of it all, our Grinchy hearts will grow three sizes…

-Ofun

Day 11 — Perl 6 Core Hacking: It Slipped Through The QASTs

One of the great things about Perl 6 is how accessible to regular users the compiler is to hack on. Easy bugs require nothing but knowledge of Perl 6 itself, since a lot of the Rakudo compiler is written in Perl 6. Slightly tougher bugs require knowledge of NQP, and tougher still are bugs involving Grammar and Actions. Things progress further in difficulty from there, going as far as assembly hacking on VM level, but today, we’ll stick around in Rakudo land. We have a task at hand!

Santa is having some difficulties generating his Naughty-or-Nice list due to a bug in Perl 6. He traced it down to the use of the S/// substitution operator with :g modifier that for some reason returns an empty list instead of the original string if no matches were made:

say S:g/naughty// with 'only nice list';
# OUTPUT: ()

Time to dig in and fix this, if anyone is to get any presents this year!

The Bots To The Rescue

The first thing to do when fixing a bug is to find out when it first appeared. AlexDaniel++ and MasterDuke++ implemented several IRC bots that make this task extremely simple. They are available in #perl6 and #perl6-dev IRC channels, and you can play with them in #zofbot IRC channel, without annoying anyone. We’ll be using bisectable6 bot to find out when the S/// operator got broken:

<Zoffix> bisectable6, help
<bisectable6> Zoffix, Like this: bisectable6: old=2015.12 new=HEAD exit
    1 if (^∞).grep({ last })[5] // 0 == 4 # RT128181

Just give the bot a piece of code, optionally specifying the starting and ending commits, and it’ll bisect by either the exit code, or failing at that, by output.

<Zoffix> bisectable6, S:g/d// given 'abc'
<bisectable6> Zoffix, Bisecting by output (old=2016.10 new=524368c)
    because on both starting points the exit code is 0
<bisectable6> Zoffix, bisect log:
    https://gist.github .com/c2cf9c3a7b6d13a43c34f64b96090e31
<bisectable6> Zoffix, (2016-10-23)
    https://github  .com/rakudo/rakudo/commit/b7201a8f22338a906f2d8027a21387e8f5c77f41

The last link is the interesting bit, it tells us the S:g/// was working fine until that commit. The commit does seem related—it’s the refactor lizmat++ did to make .match 150%–1400% faster—but it’s quite big and it’s not obvious how it’s linked to the workings of the S/// operator. Let’s find out, shall we?

How Do You Spell That?

We can specify the --target command line argument to perl6 executable to ask it for the output of a particular stage of the program (run perl6 --statestats -e '' to see names of all stages). Let’s output the parse stage, to find out which tokens in the Grammar we should look into:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 --target=parse -e 'S:g/d//'
- statementlist: S:g/d//
  - statement: 1 matches
    - EXPR: S:g/d//
      - value: S:g/d//
        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g
          - sibble: /d//
            - right:
            - babble:
              - B:
            - left: d
              - termseq: d
                - termaltseq: d
                  - termconjseq: 1 matches
                    - termalt: 1 matches
                      - termconj: 1 matches
                        - termish: 1 matches
                          - noun: 1 matches
                            - atom: d

There are some general tokens in the output—such as statementlist, statement, EXPR, and value—we can just gloss over. They are about statements and we want stuff for the operator itself, so the interesting bit start with this:

        - quote: S:g/d//
          - sym: S
          - rx_adverbs: :g
            - quotepair: 1 matches
              - identifier: g

Let’s pop open the Grammar in our text editor and locate a token called quote. It can also be a rule, regex or method, but tokens are most common. The first thing we can locate is this:

proto token quote { <...> }
token quote:sym<apos>  {
    :dba('single quotes') "'" ~ "'"    
    <nibble(self.quote_lang(%*LANG<Quote>, "'", "'", ['q']))>
}
token quote:sym<sapos> {
    :dba('curly single quotes') "‘" ~ "’"
    <nibble(self.quote_lang(%*LANG<Quote>, "", "", ['q']))>
}

The Grammar that parses Perl 6 isn’t much different from the grammar you’d use as a user of Perl 6, so most of it probably looks familiar to you. The quote token is a proto regex, so looking further at the output --target=parse gave us, we see we need :sym<S> variant of it.

Scrolling a bit through the quote‘s candidates, we finally come across :sym<s> that sets the <sym> capture to either s or S:

token quote:sym<s> {
    <sym=[Ss]> (s)**0..1
    :my %*RX;
    :my $*INTERPOLATE := 1;
    :my $*SUBST_LHS_BLOCK;
    :my $*SUBST_RHS_BLOCK;
    {
        %*RX<s> := 1 if $/[0]
    }
    <.qok($/)>
    <rx_adverbs>
    <sibble(%*RX<P5> ?? %*LANG<P5Regex> !! %*LANG<Regex>, %*LANG<Quote>, ['qq'])>
    [ <?{ $<sibble><infixish> }> || <.old_rx_mods>? ]

}

So this token handles both s/// and S/// operators and its body is unimpressive: it seems to be no more than than some set up work. With the name of the token in hand, we now know what to look for in the Actions: method quote:sym<s>.

While finding it in actions is easy… it’s quite a biggie, with 177 lines of code to its name. However, someone nice left us a comment that fits into our puzzle:

method quote:sym<s>($/) {
    # We are emulating Str.subst/subst-mutate here, by calling match,
    # assigning the result to a temporary variable etc.
    ...

Recall bisectable6‘s results? The commit it pointed out was work on the .match method and according to the comment for S/// operator, it uses .match to do its stuff. Let’s execute that method on builds before and after the commit bisectable6 found for us. There’s another handy bot to do that for us: commitable6.

Give it a commit SHA or one of the release tags along with code to run and it’ll give you output for that code on that commit:

<Zoffix> committable6, 2016.11 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.11»: ()

<Zoffix> committable6, 2016.10 say 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

We ran the code on 2016.11 and 2016.10 releases and the output indicates there’s no difference… or is there? The problem with using say as a debugging tool is it often omits things we may find useful. A better alternative is the dd routine that is a Rakudo-specific utility dumper sub that’s not part of standard Perl 6 language. Give it some args and it’ll dump them out. Let’s give it a spin:

<Zoffix> committable6, 2016.11 dd 'abc'.match: :g, /d/
<comittable6> Zoffix, ¦«2016.11»: slip()

<Zoffix> committable6, 2016.10 dd 'abc'.match: :g, /d/
<committable6> Zoffix, ¦«2016.10»: ()

Aha! Another puzzle piece! When :g adverb is in use, on failed matches .match used to return an empty list, but after lizmat++’s .match improvements, it started to return Empty, which is an empty Slip. Slips tend to flatten themselves out into the outer container, so perhaps that’s causing an issue in the S///‘s action method? Let’s take a closer look at it.

Slippety Slip

A bird’s-eye view of method quote:sym<s> action shows it does some setup work and then codegens a QAST (“Q” Abstract Syntax Tree). It’d be helpful to take a look at what it generates.

One method of doing so is using the same --target feature we’ve used to get the parse stage, except we’d use the ast stage. So the command would be this:

perl6 --target=ast -e 'S:g/d//'

If you actually run that, you’ll get a text wall of QAST, and it may be tough to spot which are the bits actually generated by the S/// operator. Luckily, there’s a better way! The QAST node objects have .dump method that dumps them the same style as what you see in --target=ast output. So checkout the compiler’s repo if you haven’t already done so, pop open src/Perl6/Actions.nqp file, go to the end of method quote:sym<s> and stick note($past.dump) in there to print the dump of the QAST generated for the S/// operator:

    ...
    );
    $past.annotate('is_S', $<sym> eq 'S');
    note($past.dump); # <----------------------- like that
    make WANTED($past, 's///');  # never carp about s/// in sink context
}

(Why is it called $past and not $qast? Historical reasons: QAST used to be PAST, for Parrot Abstract Syntax Tree).

Now, compile Rakudo:

perl Configure.pl --gen-moar --gen-nqp --backends=moar
make
make test
make install

And execute our buggy S/// match to make the line we added print out S///‘s QAST:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'S:g/d// given "abc"'
- QAST::Op(locallifetime)  :is_S<?> S:g/d//
  - QAST::Stmt
    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)
    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)
    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)
      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>
    - QAST::Stmt
    - QAST::Var(lexical $/)

There are docs for types of QAST you see here, or we can just wing it.

We callmethod match and bind the result to subst_result_1:

    - QAST::Var(local subst_result_1 :decl(var))
    - QAST::Op(bind)
      - QAST::Var(local subst_result_1)
      - QAST::Op(callmethod match)  S:g/d//
        - QAST::Var(lexical $_) <wanted>
        - QAST::WVal(Regex)  :code_object<?> :past_block<?>
        - QAST::IVal+{QAST::SpecialArg}(1 :named<g>)

We call nqp::p6store (p6* ops are documented in Rakudo’s repo), giving it the result of infix:<,>($/) as container and the return of .match call as value:

    - QAST::Op(p6store)
      - QAST::Op(call &infix:<,>)
        - QAST::Var(lexical $/)
      - QAST::Var(local subst_result_1)

We check if anything matched (for :g matches, we check for a Positional that has any .elems in it):

    - QAST::Op(if)
      - QAST::Op(unless)
        - QAST::Op(istype)
          - QAST::Var(local subst_result_1)
          - QAST::WVal(Match)
        - QAST::Op(if)
          - QAST::Op(istype)
            - QAST::Var(local subst_result_1)
            - QAST::WVal(Positional)
          - QAST::Op(callmethod elems)
            - QAST::Var(local subst_result_1)

If we did have matches, call Str!APPLY-MATCHES:

      - QAST::Op(call &infix:<=>)
        - QAST::Var(lexical $/) <wanted>
        - QAST::Op(callmethod dispatch:<!>)
          - QAST::Op(callmethod Str)
            - QAST::Var(lexical $_) <wanted>
          - QAST::SVal(APPLY-MATCHES)
          - QAST::WVal(Str)
          - QAST::Var(local subst_result_1)
          - QAST::Op(p6capturelex)  :code_object<?> :past_block<?>
            - QAST::Op(callmethod clone)
              - QAST::WVal(Code)  :code_object<?> :past_block<?>
          - QAST::Var(lexical $/)
          - QAST::IVal(1)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)
          - QAST::IVal(0)

If we didn’t have matches, call nqp::p6store, storing the $_ (this is our original string S/// works on) in the $/:

      - QAST::Op(p6store)
        - QAST::Op(call &infix:<,>)
          - QAST::Var(lexical $/)
        - QAST::Var(lexical $_) <wanted>

Since we know the commit bisectable6 found makes .match return an empty slip for failed matches, it’s that last bit of QAST that should look suspicious, since slips flatten themselves out. We’ll return to why we’re storing into an &infix:<,>($/) rather than into $/ directly, but first, let’s write the NQP equivalent of such a setup.

We have two variables: $/ with Empty and $_ with our original string. The QAST::Op node maps out to an nqp op with the same name, so our suspicious bit looks something like this:

use nqp;

$_ = 'abc';
$/ = Empty;
nqp::p6store( &infix:<,>($/), $_);

Yet another helpful bot, camelia, lets us run a piece of code straight from IRC. Just use trigger m: with some code. Let’s try it out:

<Zoffix> m: use nqp; $_ = 'abc'; $/ = Empty;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Slip $/ = slip$()␤»

<Zoffix> m: use nqp; $_ = 'abc'; $/ = List.new;
    nqp::p6store( &infix:<,>($/), $_); dd $/;
<camelia> rakudo-moar ea2884: OUTPUT«Str $/ = "abc"␤»

The results show that when $/ is an Empty, it ends up still being it after the p6store, while if $/ is an empty List, it happily takes a string. We finally connected the S/// operator with the commit that introduced the bug and found why it occurs (although, slips behaving like that may be a bug of its own). Let’s trace where that Empty in Str.match comes from and why it’s there.

What Sourcery Is This?

There’s another bot (it’s the future! people have lots of bots!), SourceBaby, that can give you a link to source code for a routine. It uses CoreHackers::Sourcery module under the hood and takes arguments to give to its sourcery routine. Trigger it with the s: trigger:

<Zoffix> s: 'abc', 'match', \(/d/, :g)
<SourceBaby> Zoffix, Sauce is at
    https://github.com/rakudo/rakudo/blob/164eb42/src/core/Str.pm#L946

We gave it an object to call a method on (a Str), a string with the method name (match), and a Capture with which arguments the method is to be called. In return, it gave a URL to the multi that handles those args:

multi method match(Regex:D $pattern, :global(:$g)!, *%_) {
    nqp::if(
      nqp::elems(nqp::getattr(%_,Map,'$!storage')),
      self!match-cursor(nqp::getlexcaller('$/'),
        $pattern($cursor-init(Cursor,self,:0c)), 'g', $g, %_),
      nqp::if(
        $g,
        self!match-list(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)),
          CURSOR-GLOBAL, POST-MATCH),
        self!match-one(nqp::getlexcaller('$/'),
          $pattern($cursor-init(Cursor,self,:0c)))
      )
    )
}

No Empty here, but we can see that when $g is true, we call self!match-list. It’s a private method, so SourceBaby would not be able to help with it. Let’s find it by searching the same source file:

# Create list from the appropriate Sequence given the move
method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      Empty,
    ))
}

And there’s our Empty! The commit message doesn’t mention why we changed from an empty List to an Empty, there are no comments in the source code explaining it, so we’ll have to resort to the most technologically non-advanced debugging tool in our arsenal… asking people.

The Dev IRC Channel

If you have questions about core development, join #perl6-dev IRC channel on Freenode. In this case, we can ask lizmat++ if she remembers whether there was a reason for that Empty.

If the person you’re trying to reach isn’t currently online, you can use the messaging bot, using the .tell trigger, followed by the person’s nick, followed by message. When the bot sees the person talk, it will deliver the message.

<babydrop> .ask stmuk_ so is `zef` now the installer being
    shipped with R*? I notice our REPL message still
    references panda; wondering if that should read zef now
<yoleaux2> babydrop: I'll pass your message to stmuk_.

After the discussion about the Empty, there doesn’t appear to be any specific reason to return it in this case, so we’ll change it to return an empty List instead, just as its old behavior was, and that will also fix our bug. The new !match-list then looks like this:

method !match-list(\slash, \cursor, \move, \post) {
    nqp::decont(slash = nqp::if(
      nqp::isge_i(nqp::getattr_i(cursor,Cursor,'$!pos'),0),
      Seq.new(POST-ITERATOR.new(cursor, move, post)).list,
      List.new,
    ))
}

Compile the compiler; this time we can just run make install, since everything is already configured and pre-built from the last time we compiled:

make install

Check the change did fix the bug:

zoffix@VirtualBox:~/CPANPRC/rakudo$ ./perl6 -e 'say S:g/naughty// with "only nice list"'
only nice list

And run the test suite:

TEST_JOBS=6 make spectest

The TEST_JOBS env var lets you run multiple test files at once and the optimal value to set it at is around 1.3 times the number of cores in your computer. If you have a very meaty box (or endless patience), you can run make stresstest instead, for a more thorough test run.

With the spectest passing all of it’s tests, we are ready to finish off our work.

Test It!

The test suite is located in t/spec and is automatically checked out from its repo when you run make spectest. You can also simply delete that directory and clone your own fork as t/spec instead.

Usually, it’s easy to locate the file where the test can go into by running tree -f | grep 'some search term'. We fixed an issue with substitutions, so let’s go for this:

zoffix@VirtualBox:~/CPANPRC/rakudo/t/spec$ tree -f | grep subst
│   ├── ./integration/substr-after-match-in-gather-in-for.t
├── ./S05-substitution
│   ├── ./S05-substitution/67222.t
│   ├── ./S05-substitution/match.t
│   ├── ./S05-substitution/subst.rakudo.moar
│   └── ./S05-substitution/subst.t
│   ├── ./S32-str/substr-eq.t
│   ├── ./S32-str/substr-rw.rakudo.moar
│   ├── ./S32-str/substr-rw.t
│   ├── ./S32-str/substr.t

The ./S05-substitution/subst.t file looks like a decent candidate, pop it open. Bump the plan at the top of file by the number of tests you’re adding, then add the test at the end of the file (or a more appropriate spot):

plan 185;

...

is-deeply (S:g/FAIL// with 'foo'), 'foo',
    'S:g/// returns original string on failure to match';

Run the test file, to ensure everything passes:

make t/spec/S05-substitution/subst.t

And commit! We’re done! Santa’s Naughty-or-Nice list shall work fine from now on.

The Final Mystery

Recall that &infix:<,>($/) thing that was causing the bug when $/ contained an Empty? So what is that all about?

If you don’t know something about Perl 6, just come to our #perl6 IRC channel and ask. This is what I did when I couldn’t understand the purpose of that infix thing and after a long discussion, finding old bug tickets, and testing old bugs… we came to the conclusion these are no longer needed here!

So along with a bug fix, we also cleaned up codegen. At least that’s in theory, perhaps by doing so we created another bug that will send us on yet another great hunting journey.

Conclusion

It’s easy to give a helping hand to the core developers of Perl 6 by fixing some of the bugs. Starting from easy things that require nothing more than knowledge of Perl 6, you can progressively learn more about the internals and fix tougher problems.

The perl6 compiler comes with a number of useful output methods like --target=ast and --target=parse that can aid in debugging. An army of IRC bots makes it easy to navigate source code both in space and time, by either giving you a link to an implementation or producing output of some particular commit.

Lastly, a very valuable resource we have available is the people of Perl 6, who can help you out. Whether you’re digging deep into the guts of the compiler, or just starting out with computer programming.

Join us. We have… bugs to fix!

Day 8 — How to Make, Use, and Abuse Perl 6 Subsets

Ever reached for a Perl 6 type and found it good enough, but not perfect? Perhaps, you wanted an IntEven, StrPalindrome, or YourCustomClassWhereAttrFooIsBar. Never fear, Perl 6’s subsets are here!

What Are Subsets?

You can think of them as a refinement on some type and you can use them in most places where you’d use a type, such as in type constraints. To create one, use the subset keyword, along with a where keyword specifying your refinement:

    subset Even where * %% 2;
    say 1 ~~ Even; # False
    say 2 ~~ Even; # True

The WhateverStar * is the value being checked and the %% operator checks if that value is evenly divisible by 2. We can now use our subset on the right side of a smartmatch operator to check whether a value is an even number! Pretty awesome. What else can we do with it?

How about type-constraining a variable:

    my Even $x = 42; # all good

    $x = 43; # and this?
    # Type check failed in assignment to $x; expected Even but got Int (43)
    #   in block <unit> at script.p6 line 3

Or type-constraining input and output of a routine:

    sub takes-an-even-only (Even $x) { $x² }
    sub returns-an-even-only returns Even { $^x² }

    say takes-an-even-only   42; # 1764
    say returns-an-even-only 42; # 1764

    say takes-an-even-only   43;
    # Constraint type check failed for parameter '$x'
    #   in sub takes-an-even-only at script.p6 line 2
    #   in block <unit> at script.p6 line 8

    say returns-an-even-only 43;
    # Type check failed for return value; expected Even but got Int (1849)
    #   in sub returns-an-even-only at script.p6 line 3
    #   in block <unit> at script.p6 line 13

That’s all pretty sweet, but our Even accepts strings and other weird stuff:

    say '42.0000' ~~ Even; # True
    say class { method Real { 42 } }.new ~~ Even; # True

There’s a reason for that: we never specified what type we’re making a subset of, so by default, it used Any. Let’s fix that!

Getting Typical

If you want to create a subset based on some type, specify that type with the of keyword:

    subset IntEven of Int where * %% 2;

Now, before the where refinement even runs, the value we’re checking against must first pass the Int type constraint:

    say 42 ~~ IntEven; # True
    say 43 ~~ IntEven; # False
    say '42.0000' ~~ IntEven; # False
    say class { method Real { 42 } }.new ~~ IntEven; # False

We’re not limited to numerics! Let’s try a Str:

    subset StrPalindrome of Str where {
        .flip eq $_ given .comb(/\w+/).join.lc;
    }

    say ’Madam, I'm Adam.~~ StrPalindrome; # True
    say '1 on 1'  ~~ StrPalindrome; # False

We’re now using a more complex refinement in the where clause, using a code block. Just like the WhateverCode version with the *, the block receives the value to check as its argument, which it aliases to $_ topical variable. The code block tells us whether the argument is a palindrome, by returning a truthy or falsy value.

So how far can we go with the type constraints in our subsets?

Custom Made

We can type-constrain a subset using any class we have lying around! How about this one:

    class Awesome { has $.awesomeness-level }
    my $obj1 = Awesome.new: :10000awesomeness-level;
    my $obj2 = Awesome.new: :31337awesomeness-level;

We make a class called Awesome that has a public attribute called awesomeness-level. We also create two instances of that class, setting the awesomeness-level to 10000 in $obj1 and to 31337 in $obj1. So how about a subset that checks whether awesomeness-level is a prime number? It’s just a single line of code:

    subset AwesomePrime of Awesome where .awesomeness-level.is-prime;

    say $obj1 ~~ AwesomePrime; # False
    say $obj2 ~~ AwesomePrime; # True

The where block of the subset is “thunked,” which means the compiler takes an expression and turns it into a code block for us, so we don’t have to explicitly use a codeblock here, nor do we need a WhateverStar. The value being checked is in the $_ topical variable, which is what method calls use when you don’t specify what you’re calling them on. Thus, our subset expects a thing of type Awesome and then checks whether its awesomeness-level attribute is a prime number!

By using such subsets, you can create routines that only accept your custom objecs of a specific configuration. For example, in code of an IRC::Client bot, we can create a subset of IRC::Client::Message for messages that are received from bot admins, and then register events only for such messages:

    subset BotAdmin of IRC::Client::Message where .host eq conf<bot-admins>.any;

    multi method irc-to-me (BotAdmin $e where /:i ^ 'run' $<steps>=.+ $/ ) {
        ...
    }

The subset calls the subroutine that reads configuration and provides a list of bot admin hosts against which the host of the sender of the message is checked. We’re encapsulating the logic that checks we have an acceptable object to work with, and we’re able to call that logic even before we enter our method.

So if we can do that with subsets… is there anything we can’t do?

Time for Some Heavy Abuse!

Let’s do something crazy! A subroutine that fetches a link to a website and checks whether it contains a mention of Perl 6:

    use LWP::Simple;
    sub is-perl-site { LWP::Simple.get( $^website ).contains: 'Perl 6' }

There’s nothing crazy about that, you say? Then, how about we use that subroutine as a refiner in a subset where clause:

    subset PerlWebsite where &is-perl-site;

    say 'http://perl6.party' ~~ PerlWebsite; # True
    say 'http://lolcats.com' ~~ PerlWebsite; # False

In fact, we can make a routine that only accepts URLs to websites mentioning Perl 6:

    sub ain't-taking-non-perl-stuff (PerlWebsite $url) {
        say "Why! I can already tell $url is awesome!";
    }

    ain't-taking-non-perl-stuff 'http://perl6.party';
    # Why! I can already tell http://perl6.party is awesome!

    ain't-taking-non-perl-stuff 'http://lolcats.com';
    # Constraint type check failed for parameter '$url'
    #   in sub ain't-taking-non-perl-stuff at script.p6 line 8
    #   in block <unit> at script.p6 line 15

But do you notice something off? The error message is rather poor… Let’s improve it!

What we know so far is the where clause takes some code to run and if that code’s result is falsy, the typecheck will fail. That means inside the where we can know whether or not the typecheck will fail before we even return from it. Let’s put that to use:

    sub is-perl-site {
        given LWP::Simple.get( $^website ).contains: 'Perl 6' {
            when :so  { True }
            when :not {
                say ’This ain't no website containing "Perl 6"!‘;
                False;
            }
        }
    }

    subset PerlWebsite where &is-perl-site;

In the routine that checks a website mentions Perl 6, in the case when it does :not contain a mention of Perl 6, we say a helpful message, indicating what exactly was wrong. Let’s run this:

    sub ain't-taking-non-perl-stuff (PerlWebsite $url) {
        say "Why! I can already tell $url is awesome!";
    }

    ain't-taking-non-perl-stuff 'http://perl6.party';
    # Why! I can already tell http://perl6.party is awesome!

    ain't-taking-non-perl-stuff 'http://lolcats.com';
    # This ain't no website containing "Perl 6"!
    # This ain't no website containing "Perl 6"!
    # Constraint type check failed for parameter '$url'
    #   in sub ain't-taking-non-perl-stuff at script.p6 line 16
    #   in block <unit> at script.p6 line 23

Whoa! The message printed twice. What gives?

It’s actually expected that the refinement in subsets is an inexpensive and relatively simple operation… With that expectation in mind, the parameter binder—which doesn’t know how to generate errors—simply passes its stuff through the slower code path—which does—and it’s that slower code path that runs the where code the second time, triggering our message one more time.

So yes, doing overly complex stuff in subsets is abusive. However, you can throw an exception inside the where to avoid the repetition of the message:

    sub is-perl-site {
        LWP::Simple.get( $^website ).contains: 'Perl 6'
            or die ’This ain't no website containing "Perl 6"!‘;
    }

    ...

    ain't-taking-non-perl-stuff 'http://lolcats.com';
    # This ain't no website containing "Perl 6"!
    #   in sub is-perl-site at z.p6 line 4
    #   in any accepts_type at gen/moar/m-Metamodel.nqp line 3472
    #   in sub ain't-taking-non-perl-stuff at z.p6 line 11
    #   in block <unit> at z.p6 line 18

And don’t forget to check out Subset::Helper and Subset::Common modules.

What About a Light Spanking?

There is one type of abuse cheating with subsets that can get you out of a bind: fiddling with narrowness when it comes to resolution of multi candidates.

For example, let’s say you hate humanity and you wish to change the meaning of the infix + operator on two Ints to do subtraction instead of addition. You, of course, write this:

    multi sub infix:<+> (Int $a, Int $b) { $a$b }

But as you run a sample code, over the sound of your evil laughter…

    Ambiguous call to 'infix:<+>'; these signatures all match:
    :(Int:D \a, Int:D \b --> Int:D)
    :(Int $a, Int $b)
      in block <unit> at z.p6 line 4

… the complier errors out.

You see, core language already has an infix + operator that takes two Ints! When you add one of your own, you create an ambiguity. To resolve this issue, we need to somehow create an Int that the compiler thinks is narrower than an Int, but in reality isn’t. Sounds tough? Not an issue for subsets:

    subset NarrowInt of Int where {True};
    multi sub infix:<+> (NarrowInt $a, NarrowInt $b) { $a$b }

    say 42 + 2; # 40

It worked! We created a subset of Int, so we match all of Ints, just like we wanted. In the refinement, however, we specify a single code block that always returns True, making that refinement always succeed, and making our subset accept all the values a regular Int accepts, while being narrower than a regular Int as far as multi resolution goes.

If you’re wondering why we had to use an explicit block, it’s because the where smartmatches, and the smartmatch against a True produces a warning, because it’s always true, and while that is what we want here, in most code such a construct is a mistake.

But if you’re upset about writing one-too-many characters, here’s neat trick:

    multi sub infix:<+> (Int $a where {True}, Int $b where {True}) { $a - $b }
    say 42 + 2;

You don’t need to create an explicit subset, and can stick a where clause right onto the thing you’re working with to refine just that thing. The type constraint on it will function as the of ... of a subset.

You can also type constraint a variable with a subset and still add a where clause. Or create a subset of a subset of a subset and still add… well, we’re getting carried away.

When they stop calling…

Consider this piece of wonderful code:

    class Thingie {
        multi method stuff ($ where /meows/) { say "Meow meow!"; }
    }

    class SubThingie is Thingie {
        multi method stuff ($ where /foos/) { say "Just foos..."; }
    }

    SubThingie.new.stuff: 'meows and foos'; # Just foos...

You have a class with a multi method. Along with it, you have a subclass of it with another multi method of the same name. Both have a where clause and when you call the method with input that can match either multi, the subclass’s multi gets called. But what do you do if you want to reverse that… you want the parent class’s multi to be called, if both multies matches the input.

The first solution is very simple. Just add a type constraint (we’ll use Str) in the parent class, while leaving it off in the child:

    class Thingie {
        multi method stuff (Str $ where /meows/) { say "Meow meow!"; }
    }

    class SubThingie is Thingie {
        multi method stuff ($ where /foos/) { say "Just foos..."; }
    }

    SubThingie.new.stuff: 'meows and foos'; # Meow meow!

The presence of a type constraint on the method in the parent class makes it narrower than the one in the subclass, so even though the subclass’s method can also accept the input, it’s the parent class that gets to take care of it.

However, what if we wanted the same Str type constraint on both methods? The parent class we’ll leave as is: just a normal Str type constraint. In the kid, however, we’ll use a wider subset of Any (that’s the default if you don’t specify the of, remember?), but in its where clause we’ll smartmatch against Str, to ensure the subset accepts only Strs:

    class Thingie {
        multi method stuff (Str $ where /meows/) { say "Meow meow!"; }
    }

    class SubThingie is Thingie {
        subset WiderStr where { $_ ~~ Str };
        multi method stuff (WiderStr $ where /foos/) { say "Just foos..."; }
    }

    SubThingie.new.stuff: 'meows and foos'; # Meow meow!

The result is the opposite of a cheat we made in the previous section: instead of a subset that matches a type exactly, but is narrower than it, we now created a subset that matches a type exactly, but is wider than it, as far as multi candidate resolution goes. And yes, you can just merge the two where clauses instead of creating a subset, producing:

    multi method stuff ($ where { $_ ~~ Str and $_ ~~ /foos/ }) {
        say "Just foos...";
    }

It’ll work the same.

Conclusion

Subsets are a powerful feature that lets you specify refinements on existing core and custom types. You can smartmatch against a subset to perform a check on a value, or you can use subsets to type-contraint variables, parameters, and return values.

You can use the subset keyword to create a named subset, or you can attach a refinement onto a specific variable or parameter with a where clause. Subsets can also be used to effect alternative narrowness of a parameter, to affect multi candidate resolution order.

Subsets can also be abused to perform very complex operations, but… that’s probably a bad idea.

-Ofun

Day 7 — Set In Your Ways: Perl 6’s Setty and Baggy Types

There’s a relatively common pattern I see with people writing code that counts… say, DNA bases in a string:

my %counts;
%counts{$_}++ for 'AGTCAGTCAGTCTTTCCCAAAAT'.comb;
say %counts<A T G C>; # (7 7 3 6)

Make a Hash. For each thing you want to count, ++ that key in that Hash. So what’s the problem?

Perl 6 actually has specialized types that are more appropriate for this operation; for example, the Bag:

'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)

Let’s talk about these types and all the fancy operators that come with them!

A Note on Unicode

I’ll be using fancy-pants Unicode versions of operators and symbols in this post, because they look purty. However, all of them have what we call “Texas” equivalents you can use instead.

Ready. Set. Go.

The simplest of these types is a Set. It will keep exactly one of each item, so if you have multiple objects that are the same, the duplicates will be discarded:

say set 1, 2, 2, "foo", "a", "a", "a", "a", "b";
# OUTPUT: set(a, foo, b, 1, 2)

As you can see, the result has only one a and only one 2. We can use the , U+2208 ELEMENT OF, set membership operator to check if an item is in a set:

my $mah-peeps = set <babydrop iBakeCake Zoffix viki>;
say 'Weeee \o/' if 'Zoffix'  $mah-peeps;
# OUTPUT: Weeee \o/

The set operators are coercive, so we don’t need to explicitly create a set; they’ll do it for us:

say 'Weeee \o/' if 'Zoffix'  <babydrop iBakeCake Zoffix viki>;
# OUTPUT: Weeee \o/

But pay attention when using allomorphs:

say 'Weeee \o/' if 42  <1 42 72>;
# No output

say 'Weeee \o/' if 42  +«<1 42 72>; # coerce allomorphs to Numeric
# OUTPUT: Weeee \o/

The angle brackets create allomorphs for numerics, so in the first case above, our set contains a bunch of IntStr objects, while the left hand side of the operator has a regular Int, and so the comparison fails. In the second case, we coerce allomorphs to their numeric component with a hyper operator and the test succeeds.

While testing membership is super exciting, we can do more with our sets! How about some intersections?

my $admins = set <Zoffix mst [Coke] lizmat>;
my $live-in-North-America = set <Zoffix [Coke] TimToady huggable>;

my $North-American-admins = $admins  $live-in-North-America;
say $North-American-admins;
# OUTPUT: set(Zoffix, [Coke])

We intersected two sets with the , U+2229 INTERSECTION, intersection operator and received a set that contains only the elements present in both original sets. You can chain these operations too, so membership will be checked in all of the provided sets in the chain:

say <Zoffix lizmat>  <huggable Zoffix>  <TimToady huggable Zoffix>;
# OUTPUT: set(Zoffix)

Another handy operator is the set difference operator, whose Unicode look I find somewhat annoying: No, it’s not a backslash (\), but a U+2216 SET MINUS character (luckily, you can use the much more obvious (-) Texas version).

The usefulness of the operator compensates its shoddy looks:

my @spammers = <spammety@sam.com  spam@in-a-can.com  yum@spam.com>;
my @senders  = <perl6@perl6.org   spammety@sam.com   good@guy.com>;

for keys @senders  @spammers -> $non-spammer {
    say "Message from $non-spammer";
}

# OUTPUT:
# Message from perl6@perl6.org
# Message from good@guy.com

We have two arrays: one contains a list of spammers’ addresses and another contains a list of senders. How to get a list of senders, without any spammers in it? Just use the (fine, fine, the (-)) operator!

We then use the for loop to iterate over the results, and as you can see from the output, all spammers were omitted… But why is keys there?

The reason is Setty and Mixy types are a lot like hashes, in a sense that they have keys and values for those keys. Set types always have True as values, and since we don’t care about iterating over Pair objects in our loop, we use the keys to get just the keys of the set: the email addresses.

However, hash-like semantics aren’t useless on Sets. For example, we can take a slice, and with :k adverb return just the elements that the set contains:

my $meows = set <
    Abyssinian  Aegean  Manx      Siamese  Siberian  Snowshoe
    Sokoke      Sphynx  Suphalak  Thai
>;

say $meows<Sphynx  Raas  Ragamuffin  Thai>:k;
# OUTPUT: (Sphynx Thai)

But what happens if we try to delete an item from a set?

say $meows<Siamese>:delete;
# Cannot call 'DELETE-KEY' on an immutable 'Set'
# in block <unit> at z.p6 line 6

We can’t! The Set type is immutable. However, just like Map type has a mutable version Hash, so does the Set type has a mutable version: the SetHash. There isn’t a cutesy helper sub to create one, so we’ll use the constructor instead:

my $s = SetHash.new: <a a a b c d>;
say $s;
$s<a d>:delete;
say $s;

# SetHash.new(a, c, b, d)
# SetHash.new(c, b)

Voilà! We successfully deleted a slice. So, what other goodies does Santa have in his… bag?

Gag ’em ‘n’ Bag ’em

Related to Sets is another type: a Bag, and yes, it’s also immutable, with the corresponding mutable type being BagHash. We already saw at the start of this article we can use a Bag to count stuff, and just like a Set, a Bag is hash-like, which is why we could view a slice of the four DNA amino acids:

'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)

While a Set has all values set to True, a Bag‘s values are integer weights. If you put two things that are the same into a Bag there’ll be just one key for them, but the value will be 2:

my $recipe = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
say $recipe;
# OUTPUT: bag(cup of flour, egg(2), cup of milk)

And of course, we can use our handy operators to combine bags! Here, we’ll be using , U+228E MULTISET UNION, operator, which looks a lot clearer in its Texas version: (+)

my $pancakes = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
my $omelette = bag 'egg', 'egg',  'egg', 'cup of milk';

my $shopping-bag = $pancakes  $omelette  <gum  chocolate>;
say $shopping-bag;
# bag(gum, cup of flour, egg(5), cup of milk(2), chocolate)

We used two of our Bags along with a 2-item list, which got correctly coerced for us, so we didn’t have to do anything.

A more impressive operator is , U+227C PRECEDES OR EQUAL TO, and it’s mirror , U+227D SUCCEEDS OR EQUAL TO, which tell whether a Baggy on the narrow side of the operator is a subset of the Baggy on the other side; meaning all the objects in the smaller Baggy are present in the larger one and their weights are at most as big.

Here’s a challenge: we have some materials and some stuff we want to build. Problem is, we don’t have enough materials to build all the stuff, so what we want to know is what combinations of stuff we can build. Let’s use some Bags!

my $materials = bag 'wood' xx 300, 'glass' xx 100, 'brick' xx 3000;
my @wanted =
    bag('wood' xx 200, 'glass' xx 50, 'brick' xx 3000) but 'house',
    bag('wood' xx 100, 'glass' xx 50)                  but 'shed',
    bag('wood' xx 50)                                  but 'dog-house';

say 'We can build...';
.put for @wanted.combinations.grep: { $materials  [] |$^stuff-we-want };

# OUTPUT:
# We can build...
#
# house
# shed
# dog-house
# house shed
# house dog-house
# shed dog-house

The $materials is a Bag with our materials. We used xx repetition operator to indicate quantities of each. Then we have a @wanted Array with three Bags in it: that’s the stuff we want to build. We’ve also used used the but operator on them to mix in names for them to override what those bags will .put out as at the end.

Now for the interesting part! We call .combinations on our list of stuff we want, and just as the name suggests, we get all the possible combinations of stuff we can build. Then, we .grep over the result, looking for any combination that takes at most all of the materials we have (that’s the operator). On it’s fatter end, we have our $materials Bag and on its narrower end, we have the operator that adds the bags of each combination of our stuff we want together, except we use it as a metaoperator [⊎], which is the same as putting that operator between each item of $^stuff-we-want. In case you it’s new to you: the $^ twigil on $^stuff-we-want creates an implicit signature on our .grep block and we can name that variable anything we want.

And there we have it! The output of the program shows we can build any combination of stuff, except the one that contains all three items. I guess we just can’t have it all…

…But wait! There’s more!

Mixing it Up

Let’s look back at our recipe code. There’s something not quite perfect about it:

my $recipe = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
say $recipe;
# OUTPUT: bag(cup of flour, egg(2), cup of milk)

What if a recipe calls for half a cup of milk instead of a whole one? How do we represent a quarter of a teaspoon of salt, if Bags can only ever have integer weights?

The answer to that is the Mix type (with the corresponding mutable version, MixHash). Unlike a Bag, a Mix supports all Real weights, including negative weights. Thus, our recipe is best modeled with a Mix.

my $recipe = Mix.new-from-pairs:  'egg'          => 2, 'cup of milk' => ½,
                                  'cup of flour' => ¾, 'salt'        => ¼;
say $recipe;
# mix(salt(0.25), cup of flour(0.75), egg(2), cup of milk(0.5))

Be sure to quote your keys and don’t use colonpair form (:42a, or :a(42)), since those are treated as named arguments. There’s also a mix routine, but it doesn’t take weights and functions just like bag routine, except returning a Mix. And, of course, you can use a .Mix coercer on a hash or a list of pairs.

Less-Than-Awesome creation aside, let’s make something with mixes! Say, you’re an alchemist. You want to make a bunch of awesome potions and you need to know the total amount of ingredients you’ll need. However, you realize that some of the ingredients needed by some reactions are actually produced as a byproduct by other reactions you’re making. So, what’s the most efficient amount of stuff you’ll need? Mixes to the rescue!

my %potions =
    immortality  => (:oxium(6.98), :morphics(123.3),  :unobtainium(2)   ).Mix,
    invisibility => (:forma(9.85), :rubidium(56.3),   :unobtainium(−0.3)).Mix,
    strength     => (:forma(9.15), :rubidium(−30.3),  :kuva(0.3)        ).Mix,
    speed        => (:forma(1.35), :nano-spores(1.3), :kuva(1.3)        ).Mix;

say [] %potions.values;
# OUTPUT: mix(unobtainium(1.7), nano-spores(1.3), morphics(123.3),
#              forma(20.35), oxium(6.98), rubidium(26), kuva(1.6))

For convenience, we set up a Hash, with keys being names of potions and values being Mixes with quantities of ingredients. For reactions that produce one of the ingredients we seek, we’ve used negative weights, indicating the amount produced.

Then, we used the same set addition operator we saw earlier, in it’s meta form: [⊎]. We supply it the .values of our Hash that are our Mixes, and it happily adds up all of our ingredients, which we see in the output.

Look at unobtainium and rubidium: the set operator correctly accounted for the quantities produced by reactions where those ingredients have negative weights!

With immortality potion successfully mixed, all we need to do now is figure out what to do for the next few millennia… How about coding some Perl 6?

Day 4 – Going Raw with Rogue Robots

DISCLAIMER: accessing or spying on networks without permission to do so is illegal in many jurisdictions. The author does not condone or encourage anyone to break laws. And should this article inspire you to become a cyber-crimefighter and you get caught and killed… well, that’s not a bad way to go.

Agent, we have a mission! The bad guys seem to have set up a server where they are discussing their secrets. We can’t risk being caught and exposed, so you’ll have to design an automated robot to do the job. Here’s the task:

  1. Recon (snoop on the network, to learn the protocol)
  2. Infiltrate (connect to the server)
  3. Put on a disguise (respond to events / use the Perl 6 ecosystem)
  4. Send regular reports to the agency (timed events)

1) Recon (snoop on the network, to learn the protocol)

The bad guys are using an IRC server for communication. Unfortunately, our Lab did not have the time to do the research, so we’ll have to go raw. You’ll need any IRC Client and something that can snoop on the network traffic. We have preliminary results using XChat and WireShark, see if you can replicate them.

Fire up WireShark and enable listening on your network device, on my machine it’s named eth2 (and I had to start Wireshark as root, to get permissions to capture).  Go to Capture -> Interfaces and click Start button for the appropriate interface:

 

Using your IRC client, now connect to the IRC server the bad guys are using—which is irc.freenode.net on port 6667—and join a test channel, say, #perl6-recon. Once that is done, click the Stop Running Live Capture button in Wireshark.

We’re done collecting our data, Agent. Let’s take a look at what we got. Type tcp.port == 6667 in the filter field:

We want to figure out how to make our robot do what we’ve just done: connect to the server and join a channel. Sort the captured data by time and look for what the client is sending to the server. We’ll want to send the same thing:

Ignoring other chatter, it seems we should be successful if we send the following data to the server:

NICK Perl6NotABot
USER Perl6NotABot Perl6NotABot irc.freenode.net :Not a bot
JOIN #perl6-recon

Let’s do just that!

2) Infiltrate (connect to the server)

Fire up your favourite code editor and let’s write some Perl 6. It’s time to infiltrate the system!

 1   my ( $nick, $channel ) = 'P6NotABot', '#perl6-recon';
 2   await IO::Socket::Async.connect('irc.freenode.net', 6667).then({
 3       given .result {
 4           .print(qq:to/END/
 5               NICK $nick
 6               USER $nick $nick irc.freenode.net :Not a bot
 7               JOIN $channel
 8               END
 9           );
10           react { whenever .Supply { .say } }
11       }
12   });

Try this code out on your computer. You should see a whole bunch of output from the server. Let’s break down what the code does:

On line 1 we simply store the name of the spy bot and the channel we’re joining into variables. Line 2 is more interesting: the IO::Socket::Async.connect('irc.freenode.net', 6667) bit creates an asynchronous socket that attempts to connect irc.freenode.net server on port 6667. That returns us a Promise and since we really, really want that socket, we await that promise’s completion right away. When that happens, it means we have a connected socket; we’re moved along to the .then that is given a code block as an argument, which gets executed. Let’s take a closer look at that block (note: if you’re getting errors with line 10, your Rakudo is likely too old; upgrade or use .chars-supply instead of .Supply):

 1   {
 2       given .result {
 3           .print(qq:to/END/
 4               NICK $nick
 5               USER $nick $nick irc.freenode.net :Not a bot
 6               JOIN $channel
 7               END
 8           );
 9           react {
10               whenever .Supply {
11                   .say
12               }
13           }
14       }
15   }

Line 2 is a given block with .result as the given. It’s a bare method call, which means it’s called on the $_ topical variable, which in this case is our socket Promise, thus the given block is operating on the result of that promise, which is our connected socket. Inside the given block, on line 3, we have a .print method executed, again on the $_, which now is our async socket. The qq:to/END/ ... END bit is a HEREDOC—a multi-line chunk of text—that all gets sent to the server. And that bit should look familiar: it’s the same stuff we snooped from the network when connecting using a regular IRC client. We’ve used our nickname on the USER line a couple of times for it to serve us as both user name and anything else the server needs.

On line 9 we have a react block that, unsurprisingly, reacts to events. We’re interested in when some stuff heads our way from the socket, which is why we ask to do stuff whenever we have .Supply. At the moment we simply ask it to print that stuff on screen with the .say method called on the topical variable—this is all the output from the server you saw on screen if you ran this program—but let’s bring out bigger guns and do something more fun, shall we?

3) Put on a disguise (respond to events)

Agent, our spy bot needs to act as if it were a human! We can’t have it sit silently—the bad guys will know right away something is up. Since, for safety, we can’t respond to all queries ourselves, our robot needs to be smart enough to do it on its own. It seems a mammoth task to implement in such a short a time, but luckily, I have a contact who can assist us. They developed a super secret weapon called Text::Markov. Head over to http://modules.perl6.org/ and see if you can locate that weapon. Got it? For the record, if you ever need quick assess to docs and specs, just use the /repo/ part of URL along with the name, for example: http://modules.perl6.org/repo/Text::Markov

Now, install Text::Markov. You should be able to do so by running panda install Text::Markov command. This module will allow our spy bot to respond to any bad guys who attempt to talk to it. Responding means watching for something, so fire up your spy bot again and try talking in the channel its in. Then look at what the server is sending to the bot:

:Baddie!~Bad@localhost PRIVMSG #evil :I have a great plan to do evil stuff!
:Baddie!~bad@localhost PRIVMSG #evil :P6NotABot, hey, who are you?

We’ll guestimate that to send a message, we need to start our line with a colon, send our nick, followed by an exclamation sign, followed by user name, at sign, our hostname, word PRIVMSG, channel name, and the message we want to send prefixed by another colon. And anything said in the channel follows the same format. First, let’s try watching for lines containing PRIVMSG from the server and parse out the actual text said, which we’ll send right back. Here’s our code:

 1   my ( $nick, $channel ) = 'P6NotABot', '#perl6-recon';
 2   await IO::Socket::Async.connect('irc.freenode.net', 6667).then({
 3       my $sock = .result;
 4       $sock.print(qq:to/END/
 5           NICK $nick
 6           USER $nick $nick irc.freenode.net :Really not a bot
 7           JOIN $channel
 8           END
 9       );
10   
11       react {
12           whenever $sock.Supply {
13               .say;
14   
15               /^':' <-[:]>+ 'PRIVMSG ' $channel ' :' (.+)/
16                   and $sock.print(
17                       ":$nick!~$nick@localhost PRIVMSG $channel :You said $0"
18                   );
19           }
20       }
21   });

First, note how we got rid of the given block and are simply storing the connected socket in the $sock variable—this will let us access it more easily later in the code. In the whenever block, along with printing all the data the server is sending us in the terminal (line 13), we’re also doing a regex match that looks for things that look like stuff said in our channel. The (.+) portion captures what was said and we parrot it back into the socket. Since Perl short-circuits conditionals, simply using and on line 16 will cause the $sock.print code to execute only when the regex matches. Try this code out and talk in the channel. The bot should respond to you.

Now, simply parroting back what the bad guys are saying will get our spy-bot spotted and kicked out fast. We need to be smarter, and this is where Text::Markov comes in. Looking at its documentation at http://modules.perl6.org/repo/Text::Markov, we see we need to feed it lines with .feed method and we can get it to produce output via .read method. The plan is this then: we’ll feed the Markov chain all the text messages that occur in the channel and make the bot respond to the channel only when someone addresses it by mentioning its name. The code becomes this:

 1   use Text::Markov;
 2   
 3   my ( $nick, $channel ) = 'P6NotABot', '#perl6-recon';
 4   
 5   my $mc = Text::Markov.new;
 6   /\S/ and $mc.feed($_) for 'story.txt'.IO.lines;
 7   
 8   await IO::Socket::Async.connect('irc.freenode.net', 6667).then({
 9       my $sock = .result;
10       $sock.print(qq:to/END/
11           NICK $nick
12           USER $nick $nick irc.freenode.net :Really not a bot
13           JOIN $channel
14           END
15       );
16   
17       react {
18           whenever $sock.Supply {
19               .say;
20               if /^':' <-[:]>+ 'PRIVMSG ' $channel ' :' $<said>=(.+)/ {
21                   $mc.feed( ~$<said> );
22                   $<said> ~~ /$nick/ and $sock.print(
23                       ":$nick!~$nick@localhost PRIVMSG $channel "
24                       ~ ":{$mc.read.substr(0, 200)}\n"
25                   );
26               }
27           }
28       }
29   });

Let’s break this down. On line 1 we’re useing the Text::Markov module to include its functionality in our code. On line 5, we added a new variable $mc and store the Text::Markov object in it that we obtain by calling .new method on Text::Markov class. Now, normally the bare Text::Markov will take a bit to “learn” new text and until it does so, it’ll do a lot of repeats. To prevent that, I saved a short detective story into a text file called story.txt and on line 6 I’m reading all lines from that file and .feeding the Markov chain all lines that aren’t blank. Much of the following code is the same as before; let’s jump straight to line 20.

Notice the slight change in the regex: I’ve used $<said>=(.+) instead of bare (.+), so that we could have a meaningful name for the captured stuff instead of the cryptic $0. On line 21, I’m feeding the match into the Markov chain (the ~ before the variable forces it into a string). Then on line 22, I have another regex that checks whether the text that was said contains the nick of the bot. If the regex matches, our program proceeds to $sock.print portion of line 22 and outputs the message generated by the Text::Markov module. Line 23 has the prefix the server expects that we’ve been using. On line 24, the ~ is the string concatenation operator. Inside that string, however, notice how we’re actually executing some Perl 6 code! It’s the curly braces { } that allow us to do so. I’m getting a line of text via .read method on our Markov object, and then I’m shortening it to at most 200 characters with .substr method call, since if it’s too long, the IRC server will kick our bot out.

Try this code out (remember to create a file called story.txt and fill it with some text). Try addressing the bot by mentioning its nickname. It should produce some interesting text. You can also try commenting out line 6 and trying to address the bot then. Notice how without having fed the Markov chain some content, the results it produces are uninspiring.

4) Send regular reports to the agency (timed events)

Responding to users on the network is great and all, but we have a job to do, Agent. As a proof of concept, we’ll simply regularly append a time-stamped string into a file, to notify the agency that the bot is still alive and well. Let’s take a look at the code for that:

 1   use Text::Markov;
 2   
 3   my ( $nick, $channel ) = 'P6NotABot', '#perl6-recon';
 4   
 5   my $mc = Text::Markov.new;
 6   /\S/ and $mc.feed($_) for 'story.txt'.IO.lines;
 7   
 8   await IO::Socket::Async.connect('localhost', 6667).then({
 9       my $sock = .result;
10       $sock.print(qq:to/END/
11           NICK $nick
12           USER $nick $nick irc.freenode.net :Really not a bot
13           JOIN $channel
14           END
15       );
16   
17       Supply.interval( 5 ).tap({
18           spurt 'report.txt', "[{DateTime.now}] Still alive!\n", :append;
19       });
20   
21       react {
22           whenever $sock.Supply {
23               .say;
24               if /^':' <-[:]>+ 'PRIVMSG ' $channel ' :' $<said>=(.+)/ {
25                   $mc.feed( ~$<said> );
26                   ~$<said> ~~ /$nick/ and $sock.print(
27                       ":$nick!~$nick@localhost PRIVMSG $channel "
28                       ~ ":{$mc.read.substr(0, 200)}\n"
29                   );
30               }
31           }
32       }
33   });

If you’re not seeing much difference, it’s because there isn’t! Lines 17–19 is all we added. We’re .tapping a Supply that emits an event every five seconds. The code block we give to .tap uses spurt in :append mode to append a string to file named report.txt. The string it spurts uses DateTime type’s method .now to obtain the time stamp. And there you have it—doing stuff every five seconds!

Conclusion

You’ve now seen how easy it is to do event loops in Perl 6, connect to a network resource, read from and write to files, as well as use code libraries developed by third parties. In just 33 lines of liberally-written code, we have something that connects to an IRC server and respond to specific messages, while doing work in intervals as well.

There’s more evil in the world, Agent! Be sure to read all the documentation referenced throughout this blog post. See if you can improve your robot.

Together with the power of Perl 6… We’ll save the world.