Day 7 — Unicode, Perl 6, and You

Quick (rhetorical) question: how many of you either try your best to ignore Unicode, or groan at the thought of having to deal with it again?

It’s fair, after all, considering Unicode is big. Really big. (You may think it’s a long walk down the ASCII table, but that’s peanuts compared to space Unicode.) It certainly doesn’t help that many languages, particularly older ones, don’t help you, the average programmer, work with it all that well. Either they don’t deal with encoding standards at all, meaning some familiarity is mandatory, or certain other languages claim to support it but really just balk once you get past the BMP (the codepoints that can fit in a 16-bit number).

Perl 6, as you might guess, does handle Unicode well. It’s actually necessary to go about this day in a twofold manner: half of the story is how to process Unicode text, and half is how to use Unicode syntax. Let’s start with the one more likely to be of concern when actually programming, that of…

How do I Handle Unicode Text?

No matter your level of experience in handling Unicode (or anything involving different encodings), you’ll be pleased to learn that in Perl 6, it goes just about the way you’d expect.

Perl 6’s strings are interesting in that they by default work on the notion of graphemes — a collection of codepoints that look like a distinct thing; what you’d call a “character” if you didn’t know better. Not every distinct “character” you could come up with has its own codepoint in the standard, so usually handling visual elements naturally can be quite painful.

However, Perl 6 does this work for you, keeping track of these collections of codepoints internally, so that you just have to think in terms of what you would see the characters as. If you’ve ever had to dance around with substring operations to make sure you didn’t split between a letter and a diacritic, this will be your happiest day in programming.

As an example, here’s a devanagari syllable in a string. The .codes method returns the number of codepoints in the string, while .chars returns the number of characters (aka graphemes):

say "नि".codes;    # returns  2
say "नि".chars;    # returns  1

Even though there isn’t a singular assigned codepoint for this syllable, Perl 6 still treats it as one character, suiting any purpose that doesn’t involve messing with the text at a lower level.

That’s cool, but does it matter much to me, a simple English-speaking programmer who’s never had to deal with other languages or scripts?, I can imagine some of you thinking. And the answer is yes, because regardless of your background, there is most definitely one grapheme you’ve encountered before:

say "\r\n".chars;    # returns 1

Yep, the Windows end-of-line sequence is explicitly counted by Unicode’s “extended grapheme cluster” definition as one grapheme.

And of course it’s not just looks, that’s how operations on strings work:

say "नि\r\n".substr(1,1).perl    # returns "\r\n"

Of course, that’s all just for the default Str type. If you don’t want to work at a grapheme level, then you have several other string types to choose from: If you’re interested in working within a particular normalization, there’s the self-explanatory types of NFC, NFD, NFKC, and NFKD. If you just want to work with codepoints and not bother with normalization, there’s the Uni string type (which may be most appropriate in cases where you don’t want the NFC normalization that comes with normal Str, and keep text as-is). And if you want to work at the binary level, well, there’s always the Blob family of types :) .

We also have several methods that let you examine the various bits of Unicode info associated with characters:

say "a".uniname;                # get name of first Unicode character in string.
say "\r\nhello!".ord            # get number of first codepoint
                                # (*not* grapheme) in string
say "\r\nhello!".ords           # get numbers of all codepoints
say "0".uniprop("Numeric_Type") # get associated property

And so on :) . Note that the ord/ords part shows you that you’ll really never get the internal numbers used to keep track of graphemes. When ord sees a grapheme cluster, it just returns the codepoint number for the first codepoint of that cluster.

Not Just Strings

Of course, our Unicode support wouldn’t be complete without regex support! Of particular note is the ability to match based on properties, so for example

/ <:Alpha>+ /

will match multiple alphabetic characters (<alpha> will do almost the same thing, just with the addition of matching underscore), and

/ '0x' <:Nv(0..9) + :Hex_Digit>+ | '0b' <:Nv(0..1)>+ /

is a regex that lets you match against either hexadecimal numbers or binary ones, in a Unicode-friendly way. And if you wanted to write the Unicode standard’s “extended grapheme cluster” pattern in regexes (the same pattern we use to determine grapheme handling mentioned earlier):

grammar EGC {
    token Hangul-Syllable {
        || <:GCB<L>>* <:GCB<V>>+ <:GCB<T>>*
        || <:GCB<L>>* <:GCB<LV>> <:GCB<V>>* <:GCB<T>>*
        || <:GCB<L>>* <:GCB<LVT>> <:GCB<T>>*
        || <:GCB<L>>+
        || <:GCB<T>>+
    }

    token TOP {
        || <:GCB<CR>> <:GCB<LF>>
        || <:GCB<PP>>*
           [
           || <:GCB<RI>>
           || <.Hangul-Syllable>
           || <!:GCB<Control>>
           ]
           [
           || <:Grapheme_Extend>
           || <:GCB<Spacing_Mark>>
           ]*
        || .
    }
}

A bit wordy, but just imagine how much more painful that would be without built-in Unicode support in your regexes!

And aside from all the programming-related stuff, there’s also…

Using Unicode to Write Perl 6

In part of our tireless support of Unicode, we also parse your source code with the same regex engine you just saw demonstrated above (though the Perl 6 parser doesn’t need to bother with Unicode properties nearly that often). This means we’re able to support syntax using Unicode in Perl 6, and have been taking advantage of it for a long time now. Observe:

say 0 ∈ «42 -5 1».map(&log ∘ &abs);
say 0.1e0 + 0.2e0 ≅ 0.3e0;
say 「There is no \escape in here!」

Just a small sampling of the Unicode built-in to Perl 6 by default. Featuring interpolating quote-words lists, setops, function composition, and approximate equality. Oh, and the delimiters for the most basic level of string quoting.

Don’t worry though, standard Perl 6 does not demand that you be able to type Unicode. If you can’t, there are so-called “Texas” variants:

say 0 (elem) <<42 -5 1>>.map(&log o &abs);
say 0.1e0 + 0.2e0 =~= 0.3e0;
say Q[[[There is no \escape in here!]]]

This is fine of course, but if it’s feasible for you to set up Unicode support, I heartily recommend it. Here’s a short list on various ways to do it:

  • Get an awesome text editor — The more featureful text editors (such as emacs or vim, to name a couple) will have functionality in place to insert arbitrary characters. Go look it up in your editor’s documentation, and consider petitioning if it doesn’t support Unicode entry :) .
  • Use your OS’s hex input — Some systems, such as Windows or applications using GTK, support key shortcuts to let you type the hexadecimal codepoint numbers for characters. You’ll have to memorize codepoints, but chances are you’d get used to it eventually.
  • Set up your keyboard’s third/fourth/etc. levels — If your system supports it, you can enable third/fourth level modifiers and so on for your keyboard to access those levels (if you don’t know what those are, your ‘Shift’ key counts as a second-level modifier, and the characters it lets you type are considered on the second level, as an example). Depending on the amount of time and/or patience you have you could even customize those extra levels.
  • (X11) Set up your Compose key — This is the method I myself use, and it involves setting up a key to use as the “Compose key” or “Multi key”, and use of a file in ~/.XCompose (or some other place, as long as you configure it) to set up key combos. The Compose key works by letting you type any configured sequence of keys after pressing the Compose key, which will insert the character(s) of your choice.
    • Which key you sacrifice of course depends on which keys you don’t make use of; it could be the caps lock, or one of those extra Shift/Alt/Ctrl keys. It can even be that useless Menu key, which you probably just remembered was on your keyboard :P .
    • An absolutely wonderful starting .XCompose can be found in this github repository. You’ll still want to add combinations to this for some Perl 6, and perhaps do other tinkering with it¹, but it’s still quite a lot better than having to start from scratch :) .

In Conclusion

This of course isn’t an exhaustive coverage of all that Perl 6 has to offer Unicode, but the underlying takeaway is that Perl 6 makes handling Unicode much nicer than other languages do (at least out of the box).

Bonus! Partly in the spirit of Christmastime, and partly in the spirit of “I love this, and what better time to share it?”, allow me to present for your historical interest Perl 6’s legendary “snowman comet” bug:

say "abc" ~~ m☃.(.).☄  # this used to work. Really.

Basically this old old old old bug that (sadly) doesn’t exist anymore was about the regex part of the parser messing up a bit and interpreting ☃☄ as just as valid a pair of brackets as () or ⦃⦄.

Is there a relevant lesson in this bug? Nope. Is it only vaguely connected to a winter blog post on Unicode? You bet. It’s just that it’s thanks to Unicode support we were able to get that kind of bug way back in 2009, and it’s thanks to Unicode support (among other things) that would let someone re-implement this as a slang or something ☺ .

So go forth confident in your newfound ability to handle international text with much greater ease than you’re perhaps used to, and spend more time building ☃☃☃☃ the rest of this month.

Have the appropriate amount of fun! ❄

¹Psst! Use the texas variants for your compose combos if you’re stuck on coming up with them, e.g. <Multi_key> <equal> <asciitilde> <equal> for

Day 6 – On Opening Files and Contributing to Open Source Projects

Why do people contribute to open source projects? Some do it for fun, some for fame and some for fortune by actually getting paid to do such work. However, probably the most important factor is scratching your own itch: You want something done, and make it happen.

That was the position I found myself in earlier this year: I wanted to use Perl 6 to read a binary file and patch it in-place. According to the documentation, I should have been able to do so, as Rakudo claimed to support opening files in the modes read-only :r, write-only :w, append :a and read-write :rw, and I assumed the last one would do the trick.

Not so: The code silently adjusted the flag :rw to :w, and I was left hanging.

After a bit of bikeshedding, a design emerged that I was happy with, and as it was merged upstream a month and a half later, others seemed to agree. But while I added an extensive commit message to lay out the new system in its glory, slacker that I am, it was left for others to perfom the tedious tasks of writing tests and documentation.

As you might expect, that hope never materialized, and the extended open modes I introduced became something of an Easter egg: Undocumented, untested and only discoverable by reading the commit log or the code itself. I may very well be the only person who has actually made use of them.

Fast-forward to December: Here we are, Christmas draws near, and with it the first proper release of Perl 6. As a sometimes-good on-and-off-again citizen of the Perl 6 community, I decided to put on my big boy pants and do the Right Thing: Some tests have now been written (which makes the new open modes an official part of the 6.c language release), and the documentation will be updated soon-ish to conform to what is actually implemented instead of what had been planned to be implemented eventually.

Having said all that, let us now take a look at what the fuss is all about:

# not the actual signature as &open delegates to IO::Handle.open
sub open(
    IO() $path,
    :$mode, :$create, :$append, :$truncate, :$exclusive,
    :$r, :$w, :$x, :$a, :$rw, :$rx, :$ra, :$update,
    :$bin, :$enc = 'utf8',
    :$nl-in = ["\x0A", "\r\n"], :$nl-out = "\n", :$chomp = True
--> IO::Handle)

Isn’t it a thing of beauty? Perhaps not, but it’s also not as scary as it looks:

The only required argument is the $path, which must be something we can call .IO on. The common example would be a string that holds the name of a file you want to access.

One design goal behind the additional open modes was to support all the features of fopen(3) as specified by the C 11 standard library. To keep things sane, inspiration was drawn from open(2) and the flags specified by POSIX.

The first row of named arguments lists the POSIX-inspired ones. Here, :$mode may take the values 'ro', 'wo' and 'rw', the rest are boolean flags. If I did my job well, you will rarely need to use them – instead, the single- and double-letter variants listed on the next line should suffice.

Read-only mode :r is the default. The three write-only modes are :w, which truncates the file if it aready exists (and is the only shorthand mode that does so implicitly), exclusive mode :x, which fails if the file already exists (and thus can be used to implement a poor man’s locking mechanism), and append mode :a, which adds content at the end of a file.

If you want to read and write to a file at the same time (as I did), you may combine the flag :r with any of :w, :x or :a. For convenience, this may also be spelled :rw, :rx, :ra. Note that the effect of providing :r,:w (or its alias :rw) is not a combination of the effects of :r and :w – existing files will not be truncated as one might expect, which is a deliberate departure from the pattern.

Aside from :r, all shorthand modes will implicitly create the file if it doesn’t already exist. In contrast, the fopen(3) mode "r+" (known to Perl 5 programmers as "+<") does not. In Perl 6, this mode is now known as :update, corresponding to the low-level :mode<rw>. In contrast, :rw maps to :mode<rw>,:create, which does not have a direct equivalent in either C 11 or Perl 5.

The :bin flag and :$enc argument control whether the file should be opened in binary mode or text mode with given encoding. By default, files are assumed to be text files encoded as UTF-8. This is different from what the documentation tells you (there has never been autodetection of Unicode encodings as far as I’m aware), and binary files also do not return buffers instead of strings when processing the file line-by-line. I’ve raised that issue with The Man (translation: talked on IRC about it) and a pull request has been sent. We’ll have to wait and see what develops on that front in the remaining weeks leading up to the party.

Finally, the last line of named arguments lists those that control line-based access in general and the behaviour of the .get and .lines methods in particular. The :$nl-in argument allows you to provide a list of strings that should be considered line separators, the :$nl-out argument controls what gets written to disk when you request a newline, eg when using the methods .say, .put or .print-nl. If the final argument :$chomp is set to True (which is the default), line separators will be discarded instead of being included at the end of the strings returned by .get or .lines. As a side note, if you find chomp => False too much of a burden to type, Perl 6 supports the shorthand notation :!chomp.

For those of you still with me: Congratulations, you now know how to open files in Perl 6, and all that remains for me to say to you is this:

Have a happy St.Nicholas Day and a fun time playing with Rakudo!

PS: As to what you can do with a file handle once it has been opened, I’ll leave that as an exercise to the reader. But note that while Perl wants to make hard things possible, it also tries to keep easy things easy – which means that you can use interfaces like slurp, spurt and lines (or their method forms .IO.slurp, .IO.spurt, .IO.lines) without having to manually open (and close!) anything.

Day 5 – Identifiers have hyphens in them

One day on the #perl6 channel, back in 2009, I stumbled into a conversation where Larry said “it didn’t break any spectests, and that convinced me” or something like that. Maybe it broke a couple of spectests, but they apparently needed breaking anyway.

The change in question was adding hyphens and apostrophes to identifiers. Normally in languages, these are valid identifier names:

my $foo;
my $please_do;

But Perl 6 also allows these:

my $foo-with-hyphens;
my $please-don't;

As usual, I was conservative, and slow to pick up these changes. I didn’t like what it did to my vim highlighting of variables. Whatever.

These days I kind of like the hyphens in identifiers. Mostly I just decide on a project-by-project basis whether I want to use hyphens, but I notice myself deciding to use them more and more often. They just look nicer, somehow. More balanced.

Damian Conway, on the Perl 6 Language mailing list, tried to institute the convention that hyphens and underscores should be used interchangeably — hyphens where you’d use hyphens in a sentence, and underscore where you’d use a space between words. I haven’t seen anyone pick up on that practice. I suspect it is because many of us are not native speakers, but rather speakers of some nebulous Goodenuf English, and we would hesitate between what’s two words and what’s a hyphen-separated compound. Actually, I’m pretty sure native speakers hesitate too sometimes.

Anyway, there’s an obvious parser conflict which you may have spotted: the hyphen is already used for infix minus. Perl 6 disambiguates this by saying that you’re allowed to use a hyphen (or an apostrophe) if it’s then followed by an alphabetic character. Thanks to sigils, this is enough to make it clear what you mean.

my $foo-2;     # variable minus 2 
my $foo-bar;   # a single variable;

Now I want to say two things at once. The first thing is that the apostrophe is an old vestigial thing in Perl 5 identifiers, too. It means the same as the package separator ::. Which is how Acme::Don't can exist, for example. (Look at the actual package name of the module.

The second thing is that Lisp people and Haskell people seem particularly saddened that because of this rule, you’re not allowed to put an apostrophe at the end of an identifier.

my $foo';     # not allowed

Ah, well. There will be slangs. I’m surprised there isn’t already a slang for apostrophes at the end of identifiers. ☺

Day 4 – Going Raw with Rogue Robots

DISCLAIMER: accessing or spying on networks without permission to do so is illegal in many jurisdictions. The author does not condone or encourage anyone to break laws. And should this article inspire you to become a cyber-crimefighter and you get caught and killed… well, that’s not a bad way to go.

Agent, we have a mission! The bad guys seem to have set up a server where they are discussing their secrets. We can’t risk being caught and exposed, so you’ll have to design an automated robot to do the job. Here’s the task:

  1. Recon (snoop on the network, to learn the protocol)
  2. Infiltrate (connect to the server)
  3. Put on a disguise (respond to events / use the Perl 6 ecosystem)
  4. Send regular reports to the agency (timed events)

1) Recon (snoop on the network, to learn the protocol)

The bad guys are using an IRC server for communication. Unfortunately, our Lab did not have the time to do the research, so we’ll have to go raw. You’ll need any IRC Client and something that can snoop on the network traffic. We have preliminary results using XChat and WireShark, see if you can replicate them.

Fire up WireShark and enable listening on your network device, on my machine it’s named eth2 (and I had to start Wireshark as root, to get permissions to capture).  Go to Capture -> Interfaces and click Start button for the appropriate interface:

 

Using your IRC client, now connect to the IRC server the bad guys are using—which is irc.freenode.net on port 6667—and join a test channel, say, #perl6-recon. Once that is done, click the Stop Running Live Capture button in Wireshark.

We’re done collecting our data, Agent. Let’s take a look at what we got. Type tcp.port == 6667 in the filter field:

We want to figure out how to make our robot do what we’ve just done: connect to the server and join a channel. Sort the captured data by time and look for what the client is sending to the server. We’ll want to send the same thing:

Ignoring other chatter, it seems we should be successful if we send the following data to the server:

NICK Perl6NotABot
USER Perl6NotABot Perl6NotABot irc.freenode.net :Not a bot
JOIN #perl6-recon

Let’s do just that!

2) Infiltrate (connect to the server)

Fire up your favourite code editor and let’s write some Perl 6. It’s time to infiltrate the system!

 1   my ( $nick, $channel ) = 'P6NotABot', '#perl6-recon';
 2   await IO::Socket::Async.connect('irc.freenode.net', 6667).then({
 3       given .result {
 4           .print(qq:to/END/
 5               NICK $nick
 6               USER $nick $nick irc.freenode.net :Not a bot
 7               JOIN $channel
 8               END
 9           );
10           react { whenever .Supply { .say } }
11       }
12   });

Try this code out on your computer. You should see a whole bunch of output from the server. Let’s break down what the code does:

On line 1 we simply store the name of the spy bot and the channel we’re joining into variables. Line 2 is more interesting: the IO::Socket::Async.connect('irc.freenode.net', 6667) bit creates an asynchronous socket that attempts to connect irc.freenode.net server on port 6667. That returns us a Promise and since we really, really want that socket, we await that promise’s completion right away. When that happens, it means we have a connected socket; we’re moved along to the .then that is given a code block as an argument, which gets executed. Let’s take a closer look at that block (note: if you’re getting errors with line 10, your Rakudo is likely too old; upgrade or use .chars-supply instead of .Supply):

 1   {
 2       given .result {
 3           .print(qq:to/END/
 4               NICK $nick
 5               USER $nick $nick irc.freenode.net :Not a bot
 6               JOIN $channel
 7               END
 8           );
 9           react {
10               whenever .Supply {
11                   .say
12               }
13           }
14       }
15   }

Line 2 is a given block with .result as the given. It’s a bare method call, which means it’s called on the $_ topical variable, which in this case is our socket Promise, thus the given block is operating on the result of that promise, which is our connected socket. Inside the given block, on line 3, we have a .print method executed, again on the $_, which now is our async socket. The qq:to/END/ ... END bit is a HEREDOC—a multi-line chunk of text—that all gets sent to the server. And that bit should look familiar: it’s the same stuff we snooped from the network when connecting using a regular IRC client. We’ve used our nickname on the USER line a couple of times for it to serve us as both user name and anything else the server needs.

On line 9 we have a react block that, unsurprisingly, reacts to events. We’re interested in when some stuff heads our way from the socket, which is why we ask to do stuff whenever we have .Supply. At the moment we simply ask it to print that stuff on screen with the .say method called on the topical variable—this is all the output from the server you saw on screen if you ran this program—but let’s bring out bigger guns and do something more fun, shall we?

3) Put on a disguise (respond to events)

Agent, our spy bot needs to act as if it were a human! We can’t have it sit silently—the bad guys will know right away something is up. Since, for safety, we can’t respond to all queries ourselves, our robot needs to be smart enough to do it on its own. It seems a mammoth task to implement in such a short a time, but luckily, I have a contact who can assist us. They developed a super secret weapon called Text::Markov. Head over to http://modules.perl6.org/ and see if you can locate that weapon. Got it? For the record, if you ever need quick assess to docs and specs, just use the /repo/ part of URL along with the name, for example: http://modules.perl6.org/repo/Text::Markov

Now, install Text::Markov. You should be able to do so by running panda install Text::Markov command. This module will allow our spy bot to respond to any bad guys who attempt to talk to it. Responding means watching for something, so fire up your spy bot again and try talking in the channel its in. Then look at what the server is sending to the bot:

:Baddie!~Bad@localhost PRIVMSG #evil :I have a great plan to do evil stuff!
:Baddie!~bad@localhost PRIVMSG #evil :P6NotABot, hey, who are you?

We’ll guestimate that to send a message, we need to start our line with a colon, send our nick, followed by an exclamation sign, followed by user name, at sign, our hostname, word PRIVMSG, channel name, and the message we want to send prefixed by another colon. And anything said in the channel follows the same format. First, let’s try watching for lines containing PRIVMSG from the server and parse out the actual text said, which we’ll send right back. Here’s our code:

 1   my ( $nick, $channel ) = 'P6NotABot', '#perl6-recon';
 2   await IO::Socket::Async.connect('irc.freenode.net', 6667).then({
 3       my $sock = .result;
 4       $sock.print(qq:to/END/
 5           NICK $nick
 6           USER $nick $nick irc.freenode.net :Really not a bot
 7           JOIN $channel
 8           END
 9       );
10   
11       react {
12           whenever $sock.Supply {
13               .say;
14   
15               /^':' <-[:]>+ 'PRIVMSG ' $channel ' :' (.+)/
16                   and $sock.print(
17                       ":$nick!~$nick@localhost PRIVMSG $channel :You said $0"
18                   );
19           }
20       }
21   });

First, note how we got rid of the given block and are simply storing the connected socket in the $sock variable—this will let us access it more easily later in the code. In the whenever block, along with printing all the data the server is sending us in the terminal (line 13), we’re also doing a regex match that looks for things that look like stuff said in our channel. The (.+) portion captures what was said and we parrot it back into the socket. Since Perl short-circuits conditionals, simply using and on line 16 will cause the $sock.print code to execute only when the regex matches. Try this code out and talk in the channel. The bot should respond to you.

Now, simply parroting back what the bad guys are saying will get our spy-bot spotted and kicked out fast. We need to be smarter, and this is where Text::Markov comes in. Looking at its documentation at http://modules.perl6.org/repo/Text::Markov, we see we need to feed it lines with .feed method and we can get it to produce output via .read method. The plan is this then: we’ll feed the Markov chain all the text messages that occur in the channel and make the bot respond to the channel only when someone addresses it by mentioning its name. The code becomes this:

 1   use Text::Markov;
 2   
 3   my ( $nick, $channel ) = 'P6NotABot', '#perl6-recon';
 4   
 5   my $mc = Text::Markov.new;
 6   /\S/ and $mc.feed($_) for 'story.txt'.IO.lines;
 7   
 8   await IO::Socket::Async.connect('irc.freenode.net', 6667).then({
 9       my $sock = .result;
10       $sock.print(qq:to/END/
11           NICK $nick
12           USER $nick $nick irc.freenode.net :Really not a bot
13           JOIN $channel
14           END
15       );
16   
17       react {
18           whenever $sock.Supply {
19               .say;
20               if /^':' <-[:]>+ 'PRIVMSG ' $channel ' :' $<said>=(.+)/ {
21                   $mc.feed( ~$<said> );
22                   $<said> ~~ /$nick/ and $sock.print(
23                       ":$nick!~$nick@localhost PRIVMSG $channel "
24                       ~ ":{$mc.read.substr(0, 200)}\n"
25                   );
26               }
27           }
28       }
29   });

Let’s break this down. On line 1 we’re useing the Text::Markov module to include its functionality in our code. On line 5, we added a new variable $mc and store the Text::Markov object in it that we obtain by calling .new method on Text::Markov class. Now, normally the bare Text::Markov will take a bit to “learn” new text and until it does so, it’ll do a lot of repeats. To prevent that, I saved a short detective story into a text file called story.txt and on line 6 I’m reading all lines from that file and .feeding the Markov chain all lines that aren’t blank. Much of the following code is the same as before; let’s jump straight to line 20.

Notice the slight change in the regex: I’ve used $<said>=(.+) instead of bare (.+), so that we could have a meaningful name for the captured stuff instead of the cryptic $0. On line 21, I’m feeding the match into the Markov chain (the ~ before the variable forces it into a string). Then on line 22, I have another regex that checks whether the text that was said contains the nick of the bot. If the regex matches, our program proceeds to $sock.print portion of line 22 and outputs the message generated by the Text::Markov module. Line 23 has the prefix the server expects that we’ve been using. On line 24, the ~ is the string concatenation operator. Inside that string, however, notice how we’re actually executing some Perl 6 code! It’s the curly braces { } that allow us to do so. I’m getting a line of text via .read method on our Markov object, and then I’m shortening it to at most 200 characters with .substr method call, since if it’s too long, the IRC server will kick our bot out.

Try this code out (remember to create a file called story.txt and fill it with some text). Try addressing the bot by mentioning its nickname. It should produce some interesting text. You can also try commenting out line 6 and trying to address the bot then. Notice how without having fed the Markov chain some content, the results it produces are uninspiring.

4) Send regular reports to the agency (timed events)

Responding to users on the network is great and all, but we have a job to do, Agent. As a proof of concept, we’ll simply regularly append a time-stamped string into a file, to notify the agency that the bot is still alive and well. Let’s take a look at the code for that:

 1   use Text::Markov;
 2   
 3   my ( $nick, $channel ) = 'P6NotABot', '#perl6-recon';
 4   
 5   my $mc = Text::Markov.new;
 6   /\S/ and $mc.feed($_) for 'story.txt'.IO.lines;
 7   
 8   await IO::Socket::Async.connect('localhost', 6667).then({
 9       my $sock = .result;
10       $sock.print(qq:to/END/
11           NICK $nick
12           USER $nick $nick irc.freenode.net :Really not a bot
13           JOIN $channel
14           END
15       );
16   
17       Supply.interval( 5 ).tap({
18           spurt 'report.txt', "[{DateTime.now}] Still alive!\n", :append;
19       });
20   
21       react {
22           whenever $sock.Supply {
23               .say;
24               if /^':' <-[:]>+ 'PRIVMSG ' $channel ' :' $<said>=(.+)/ {
25                   $mc.feed( ~$<said> );
26                   ~$<said> ~~ /$nick/ and $sock.print(
27                       ":$nick!~$nick@localhost PRIVMSG $channel "
28                       ~ ":{$mc.read.substr(0, 200)}\n"
29                   );
30               }
31           }
32       }
33   });

If you’re not seeing much difference, it’s because there isn’t! Lines 17–19 is all we added. We’re .tapping a Supply that emits an event every five seconds. The code block we give to .tap uses spurt in :append mode to append a string to file named report.txt. The string it spurts uses DateTime type’s method .now to obtain the time stamp. And there you have it—doing stuff every five seconds!

Conclusion

You’ve now seen how easy it is to do event loops in Perl 6, connect to a network resource, read from and write to files, as well as use code libraries developed by third parties. In just 33 lines of liberally-written code, we have something that connects to an IRC server and respond to specific messages, while doing work in intervals as well.

There’s more evil in the world, Agent! Be sure to read all the documentation referenced throughout this blog post. See if you can improve your robot.

Together with the power of Perl 6… We’ll save the world.

 

Day 3 – Atom Editor support

If there is one thing in abundance in the world of programming, it’s text editors. It seems like at least once a month a new one will be trending on HN or Reddit.  While many are often just pet projects or slow to add features, Github’s Atom Editor has rapidly gained a strong feature-set and has risen far above the pack and now competes with Vim, Emacs, and especially Sublime Text. Though many see Atom as a text editor for web developers, it actually has packages available for pretty much any language (the Atom team has even built a program to convert TextMate packages to Atom packages, which made it easy to seed their ecosystem).

Atom’s Perl 6 support

Out-of-the-box, Atom supports Perl 6 via the language-perl package included with the base install.  However, the highlighter has many shortcomings and doesn’t illustrate (pun intended) the breadth of features that Perl 6 provides. So I endeavored to write a syntax highlighter that would be much more vibrant and let the language truly shine! And here is a glimpse of language-perl6fe: Continue reading “Day 3 – Atom Editor support”

Day 2 – 2 bind or !2 bind

Is a question one may want to answer while programming in Perl 6. Let me explain with the help of a Proxy and a script.

11 sub BOS() {
12 	my Str $c-value; # closure variable
13 	return Proxy.new(
14 		FETCH => method () { $c-value },
15 		STORE => method ($new-value) { 
16 			die $?LINE ~ ' Semper invicta!' if $new-value.tc ~~ any <Supermutant Ghoul Synth>; 
17 			$c-value = $new-value;
18 		}
19 	);
20 }
21 
22 # Let's prefix &say with the line number it's called from
23 my sub say(**@args is raw) { 
24 	print callframe(1).line; 
25 	print ' ' ~ .Str ~ $*OUT.nl-out for @args
26 }
27 
28 my Str $who-container = BOS;
29 my Str \who-raw = BOS;
30 my Str $who-bound := BOS;

Now we have three variables, of which one is a buildin container. All of them refer to a ‘magic’ variable. Via the methods given to Proxy.new, we can control what values are allowed to be stored. As you likely have guessed already, our magic variable doesn’t like Supermutants and other unhumanly creatures. There is a catch. The container that is still a container wont do what we expect.

43 $who-container = 'supermutant';
44 say $who-container;
45 
46 try {
47 
48 	who-raw = 'ghoul';
49 	say who-raw;
50 
51 	CATCH {
52 		when X::AdHoc { warn $_.Str }; # line 52
53 	}
54 }
55 
56 who-raw = 'Cait';
57 say who-raw;
58 
59 try {
60 
61 	$who-bound = 'synth';
62 	say $who-bound;
63 
64 	CATCH {
65 		when X::AdHoc { warn $_.Str }; # line 65
66 	}
67 }
68 
69 $who-bound = 'Dogmeat';
70 say $who-bound;

By handling X::AdHoc we can turn a fatal die into a harmless warn. For $who-container we don’t need to do that. Let’s see how they differ.

79 say $who-container.VAR.^name; # Scalar
80 say who-raw.VAR.^name; # Proxy
81 say $who-bound.VAR.^name; # Proxy

With VAR we get access to the typeobject of the container we introduce with my or our. Binding allows us to set the type of the container, what is not the same thing as the type of the value stored in the container. Let’s have a look at the type checks.

92 say so BOS() ~~ Proxy; # False
93 say so BOS() ~~ Str; # True
94 
95 say so BOS().VAR ~~ Proxy; # True
96 say so BOS().VAR ~~ Str; # False

The type of the returned value is the type of $c-value. The container type is Proxy. We can use that to decide at runtime if we need to bind.

105 my Str $foo;
106 $foo = BOS if BOS().VAR ~~ Scalar;
107 $foo := BOS if BOS().VAR ~~ Proxy;
108 say $foo.VAR.^name; # Proxy

There is no way for a function to force binding on its returned value. So we have to be careful with returned values. Luckily Perl 6 provides us with enough introspection so we can handle it at runtime.

A bound Proxy can help us to probe Rakudo a little.

120 my $c-zeta;
121 my $counter;
122 my $zeta := Proxy.new(FETCH => { $counter++; $c-zeta }, STORE => -> $, $new { $c-zeta = $new } );
123 
124 $zeta = 'how often will it be fetched?';
125 say $zeta;
126 say $counter; # 5

On Rakudo Beta 1 FETCH is called 5 times just to .say the value. Let’s see how far our probe will reach.

135 f($zeta, $zeta, $zeta);
136 
137 sub f($c, $c-rw is rw, \r){
138 	say $c.VAR.^name; # Scalar
139 	say $c-rw.VAR.^name; # Proxy
140 	say r.VAR.^name; # Proxy
141 }

The default for sub and method parameters is ro, what does what is said on the tin. Both is rw and a sigilless parameter are using binding.

150 constant lover = BOS();
151 
152 say lover.VAR.^name; # Proxy
153 
154 lover = 'Piper';
155 say lover; # Piper

If we declare a constant Perl 6 will also force a binding. Allowing sigillessness may give it away. Proxy isn’t all that constant though. So we can turn a constant value into a variable. You just can’t trust those dynamic languages.

166 constant fibonacci = 0, 1, *+* ... *;
167 
168 say fibonacci[^10]; # (0 1 1 2 3 5 8 13 21 34)
169 say fibonacci.VAR.^name; # Seq

But then a sequence isn’t really constant. It’s values are calculated at runtime, for as many values we ask for.

178 try {
179 	fibonacci[11] = 0;
180 	CATCH { when X::Assignment::RO { say .Str } }
181 }

If we want to take advantage of type checks, we have to make sure a call to FETCH does return a default value of the right type. That’s easy to do with a type capture. If we omit the type in STORE we can still cheat to our heart’s content. If we don’t want to cheat, we could use ::T in STORE‘s signature.

193 sub typed(::T $type = Any){ # you may not want to default to Any, here we do
194 	my T $c-typed-value;
195 	my $c-typed-value-type-string = $c-typed-value.WHAT.perl;
196 	return Proxy.new(
197 		FETCH => method () { $c-typed-value },
198 		STORE => method ($new-value) { 
199 			$c-typed-value = $new-value ~~ Str 
200 				?? ( $new-value.comb>>.ord.sum / $new-value.chars )."$c-typed-value-type-string"()
201 				!! $new-value;
202 		}
203 	);
204 }
205 
206 my Int $typed-container := typed(Int);
207 say $typed-container = 11;
208 $typed-container = 'FOO';
209 say $typed-container;

If we do fancy container magic, we have to implement readonlyness by hand.

217 my $constant-variable := Proxy.new( 
218 	FETCH => { q{Merry 1.0!} }, 
219 	STORE => -> $, $ { die X::Assignment::RO.new(typename => 'none really') } # line 219 
220 );
221 
222 say $constant-variable;
223 
224 $constant-variable = 'The Grudge stole Christmas!'; # this will die in line 219 
225 say $constant-variable;

Binding is a powerful tool to expose the dynamic nature of Perl 6 and allows us to take advantage of that nature. No matter if you bind or not I wish you a Merry 1.0 and a happy new year!

Day 1 – The State of Perl 6 in 2015

Please fasten your seat belt for your annual dive into Perl 6.

As has been customary the last six years, we start with a brief overview of the state of Perl 6.

Last year's State of Perl 6 anticipated an upcoming production release of Perl 6. That is still scheduled for this Christmas. Last year's summary also identified major areas of work to-be-done for this release.

The 2015.05 release of Rakudo introduced NFG or "Normal Form Grapheme", which means that strings in Perl 6 are not based on Unicode codepoints anymore, and instead on grapheme clusters. Grapheme clusters can consist of base characters, like the latin lower-case c, and combining characters, like the "combining cedilla" character. Together, they form a single, visual character "c with cedilla", ç. This character also exists as a single codepoint, but other combinations of base character and combining character don't. Yet with this new feature, Perl 6 still treats the cluster of base character plus one (or more) combining characters as a single character, so regexes matches and substr won't tear them apart.

In September, Rakudo shipped with the GLR or Great List Refactoring in place. This mostly means that the rules for using and accessing nested data structures are now much simpler and more consistent. Under the hood we also have a sane and powerful iterator model, and a new type Seq for lazy value streams that don't necessarily memorize old values on iteration.

Finally, the September release introduced native, shaped arrays (or NSA). This allows you to write

    my int32 @matrx[4;5]

which allocates a continuous block of 20 32-bit values, but is still usable as a two-dimensional matrix in Perl 6 land. This paves the road towards memory efficient linear algebra (and other applications, of course).

But, development didn't stop there. The upcoming December release of Rakudo brings us automatic precompilation of modules and installation repositories.

Not only the compiler progresses. Where the Perl 6 Modules page showed around 270 modules a year ago (end of November 2014), we are now at about 460 modules. I'm also happy to report that there are two module installers now, panda and zef.

We also now have decent documentation coverage, at least on built-in types; other areas such as tutorials and material on language features are still a bit sparse. Other documentation efforts such as Learn Perl 6 in Y minutes and perl6intro.com have sprung up, and enjoy popularity.

I'm sure there has been more Perl 6 activity that would be worth reporting, but the excitment for the upcoming release makes a calm retrospective a bit harder than usual.

Stay tuned for more, and have fun!