Day 9 – Longest Token Matching

December 9, 2012 by

Perl 6 regular expressions prefer to match the longest alternative when possible.

say "food and drink" ~~ / foo | food /;   # food

This is in contrast to Perl 5, which would prefer the first alternative above, and produce the match “foo”.

You can still get the first-alternative behavior if you want; it’s tucked away in the slightly longer alternation operator ||:

say "food and drink" ~~ / foo || food /;  # foo

…And that’s it! That’s Longest Token Matching. ☺ Short post.

“Huh, wait!” I hear you exclaim, in a desperate attempt to make the daily Perl 6 Advent goodness last a bit longer. “Why is Longest Token Matching such a big deal? Who would ever be so obsessed with long tokens?”

I’m glad you asked. As it turns out, Longest Token Matching (or LTM for short) plays very well with our intuition about how things should be parsed. If you’re creating a language, you want people to be able to declare a variable forest_density without the mention of this variable clashing with the syntax of for loops. LTM will see to that.

I like “strange consistencies” — when distal parts of a language design turn out to have commonalities that make the language feel more uniform. There is that kind of consistency here, between classes and grammars. Perl 6 basically exploits that consistency to the max. Let me briefly map out what I mean.

We’re all used to writing classes at this point. From a birds-eye view, they look like this:

class {
    method
    method
    method
}

Grammars have a suspiciously similar structure:

grammar {
    rule
    rule
    rule
}

(The keywords are actually regex, token and rule, but when we talk about them as a group, we just call them “rules”.)

We’re also used to being able to derive classes into subclasses (class B is A), and add or override methods in a way which produces a nice mix of old and new behavior. Perl 6 provides multi methods which even allow you to add new methods of the same name, and the old ones won’t be overridden, they’ll just all try to match alongside the new methods. The dispatch is handled by a (usually autogenerated) proto method that dispatches to all eligible candidates.

What does all this have to do with grammars and rules? Well, it turns out that first off, you can derive new grammars from old ones. It works the same as deriving classes. (In fact, under the hood it’s exactly the same mechanism. Grammars are classes with a different metaclass object.) New rules will override old rules just like you’d expect with methods.

S05 has a cute example with parsing of letters, and deriving the grammar to parse formal letters:

    grammar Letter {
         rule text     { <greet> $<body>=<line>+? <close> }
         rule greet    { [Hi|Hey|Yo] $<to>=\S+? ',' }
         rule close    { Later dude ',' $<from>=.+ }
         token line    { \N* \n}
     }

     grammar FormalLetter is Letter {
         rule greet { Dear $<to>=\S+? ',' }
         rule close { Yours sincerely ',' $<from>=.+ }
     }

The derived FormalLetter overrides greet and close, but not line.

But what about all the goodness with multi methods? Could we define some kind of “proto rule” that would allow us to have several rules in a grammar with the same name but different bodies? For example, we might want to parse a language with a rule term, but there are many different terms: strings, numbers… and maybe the numbers can be decimal or binary or octal or hexadecimal…

Perl 6 grammars can contain a proto rule, and then you can define and redefine a rule with the same name as many times as you want. And now we’re back full circle with the / foo | food / alternation from the start of the article. All those rules you write with the same name compile down to one big alternation. Not only that — rules which call other rules, some of them possibly proto rules, all of that will be “flattened” out into one big LTM alternation. In practice that means that all the possible things a term can be are tried out all at once, on equal footing. Neither alternative wins because you happened to define it before the others. An alternative wins because it is the longest.

The strange consistency resides in the fact that in the call-a-method side of things, the most specific method wins, and “most specific” has to with signature narrowness. The better the types in the signature describe the arguments coming in, the more specific the method.

In the parse-with-a-rule side of things, the most specific rule wins, but here “most specific” has to do with parse success. The better the rule can describe what comes next in the text, the more specific the rule.

And that’s strangely consistent, because on the surface methods and rules look like quite different beasts.

We really believe we have something going with this whole principle of deriving a grammar and getting a new language. LTM is right at the center of that because it allows new rules and old to intermix in a fair and predictable way. It’s a kind of meritocracy: rules win not based on whether they’re young or old, but based on whether they are able to parse the text well.

In fact, the Perl 6 compiler itself works this way. It parses your program using a Perl 6 grammar, and that grammar is derivable… whenever you declare a new operator in your program, a new grammar is derived for you. The parsing of your operator is added as a new rule in the new grammar, and the new grammar is given the task of parsing the rest of your program. Your new operator will win against similar but shorter ones, and lose against similar but longer ones.

Day 8 – Panda package manager

December 8, 2012 by

Perl 6 is not just the language. While without modules it can do more than Perl 5, modules can make life easier. About two years ago neutro was discussed on this blog. I’m not going to talk about it, as it’s deprecated today.

Today, the standard way of installing modules is the panda utility. If you’re using Rakudo Star, you should have it already installed (try the panda command in console to check it). After running it and waiting a few seconds, you should see help for the panda utility.

$ panda
Usage:
  panda [--notests] [--nodeps] install [ ...] -- Install the specified modules
  panda [--installed] [--verbose] list -- List all available modules
  panda update -- Update the module database
  panda info [ ...] -- Display information about specified modules
  panda search  -- Search the name/description

As you can see, it doesn’t have many options (it’s actually similar to RubyGems or cpanminus in its simplicity). You can see the current list of modules at Perl 6 Modules page. Let’s say you would want to parse an INI file. First, you can find module for it using the search command.

$ panda search INI
JSON::Tiny               *          A minimal JSON (de)serializer
Config::INI              *          .ini file parser and writer module for
                                    Perl 6
MiniDBI                  *          a subset of Perl 5 DBI ported to Perl 6
                                    to use while experts build the Real Deal
Class::Utils             0.1.0      Small utilities to help with defining
                                    classes

Config::INI is module you want. Other modules were found because my query wasn’t specific enough and found “ini” in other words (minimal, MiniDBI and defining). Config::INI isn’t part of Rakudo Star, so you have to install it.

Panda installs modules globally when you writing access to the installation directory, locally otherwise. Because of that you can use panda even when Perl 6 is installed globally without installing modules like local::lib, like you have to in Perl 5.

$ panda install Config::INI
==> Fetching Config::INI
==> Building Config::INI
Compiling lib/Config/INI.pm
Compiling lib/Config/INI/Writer.pm
==> Testing Config::INI
t/00-load.t .... ok
t/01-parser.t .. ok
t/02-writer.t .. ok
All tests successful.
Files=3, Tests=55, 3 wallclock secs ( 0.04 usr 0.00 sys + 2.38 cusr 0.14 csys = 2.56 CPU)
Result: PASS
==> Installing Config::INI
==> Succesfully installed Config::INI

After the module has been installed, you can update it as easily – by installing it. Currently panda cannot automatically upgrade modules, but after a module has been updated (you can watch repositories on GitHub to know when it happens – every module is available on GitHub), you can easily upgrade it by reinstalling the module.

When a module was installed, you can check if it works by trying to use it. This is a sample script that can be used to convert INI file into a Perl 6 data structure.

#!/usr/bin/env perl6
use Config::INI;
multi sub MAIN($file) {
    say '# your INI file as seen by Perl 6';
    say Config::INI::parse_file($file).perl;
}

Day 7 – MIME::Base64 – On encoded strings

December 7, 2012 by

parrot MIME::Base64 FixedIntegerArray: index out of bounds!

Ronaldxs created the following parrot ticket #813 4 months ago:

“Was playing with p6 MIME::Base64 and utf8 sampler page when I came across this. It seems that the parrot MIME Base64 library can’t handle some UTF-8 characters as demonstrated below.”

.sub go :main
    load_bytecode 'MIME/Base64.pbc'

    .local pmc enc_sub
    enc_sub = get_global [ "MIME"; "Base64" ], 'encode_base64'

    .local string result_encode
    result_encode = enc_sub(utf8:"\x{203e}")

    say result_encode
.end

FixedIntegerArray: index out of bounds!
current instr.: 'parrot;MIME;Base64;encode_base64'
pc 163 (runtime/parrot/library/MIME/Base64.pir:147)

called from Sub 'go' pc 11 (die_utf8_base64.pir:8)

This was interesting, because parrot strings store the encoding information in the string. The user does not need to store the string encoding information somewhere else as in perl5, nor have to do educated guesses about the encoding. parrot supports ascii, latin1, binary, utf-8, ucs-2, utf-16 and ucs-4 string encodings natively.
So we thought we the hell cannot parrot handle simple utf-8 encoded strings?

As it turned out, the parrot implementation of MIME::Base64, which can be shared to all languages which use parrot as VM, stored the character codepoints for each character as array of integers. On multibyte encodings such as UTF-8 this leads to different data held in memory than a normal multibyte string which is encoded as the byte buffer and the additional encoding information.

Internal string representations

For example an overview of different internal string representations for the utf-8 string “\x{203e}”:

perl5 strings:

len=3, utf-8 flag, "\342\200\276" buf=[e2 80 be]

parrot strings:

len=1, bufused=3, encoding=utf-8, buf=[e2 80 be]

The Unicode tables:

U+203E	‾	e2 80 be	OVERLINE

gdb perl5

Let’s check it out:

$ gdb --args perl -e'print "\x{203e}"'
(gdb) start
(gdb) b Perl_pp_print
(gdb) c
(gdb) n 

.. until if (!do_print(*MARK, fp))

(gdb) p **MARK
$1 = {sv_any = 0x404280, sv_refcnt = 1, sv_flags = 671106052, sv_u = {
      svu_pv = 0x426dd0 "‾", svu_iv = 4353488, svu_uv = 4353488, 
      svu_rv = 0x426dd0, svu_array = 0x426dd0, svu_hash = 0x426dd0, 
      svu_gp = 0x426dd0, svu_fp = 0x426dd0}, ...}

(gdb) p Perl_sv_dump(*MARK)
ALLOCATED at -e:1 for stringify (parent 0x0); serial 301
SV = PV(0x404280) at 0x4239a8
  REFCNT = 1
  FLAGS = (POK,READONLY,pPOK,UTF8)
  PV = 0x426dd0 "\342\200\276" [UTF8 "\x{203e}"]
  CUR = 3
  LEN = 16
$2 = void

(gdb) x/3x 0x426dd0
0x426dd0:	0xe2	0x80	0xbe

We see that perl5 does store the utf-8 flag, but not the length of the string, the utf8 length (=1), only the length of the buffer (=3).
Any other multi-byte encoded string, such as UCS-2 is stored differently. We suppose as utf-8.

We are already in the debugger, so let’s try the different cmdline argument.

(gdb) run -e'use Encode; print encode("UCS-2", "\x{203e}")'
  The program being debugged has been started already.
  Start it from the beginning? (y or n) y
Breakpoint 2, Perl_pp_print () at pp_hot.c:712
712	    dVAR; dSP; dMARK; dORIGMARK;

(gdb) p **MARK

$3 = {sv_any = 0x404b30, sv_refcnt = 1, sv_flags = 541700, sv_u = {
    svu_pv = 0x563a50 " >", svu_iv = 5651024, svu_uv = 5651024, 
    svu_rv = 0x563a50, svu_array = 0x563a50, svu_hash = 0x563a50, svu_gp = 0x563a50, 
    svu_fp = 0x563a50}, ...}

(gdb) p Perl_sv_dump(*MARK)
ALLOCATED at -e:1 by return (parent 0x0); serial 9579
SV = PV(0x404b30) at 0x556fb8
  REFCNT = 1
  FLAGS = (TEMP,POK,pPOK)
  PV = 0x563a50 " >"
  CUR = 2
  LEN = 16
$4 = void

(gdb) x/2x 0x563a50
0x563a50:	0x20	0x3e

But we don’t see the UTF8 flag in encode(“UCS-2″, “\x{203e}”), just the simple ascii string ” >”, which is the UCS-2 representation of [20 3e].
Because ” >” is perfectly representable as non-utf8 ASCII string.
UCS-2 is much much nicer than UTF-8, is has a fixed size, it is readable, Windows uses it, but it cannot represent all Unicode characters.

Encode::Unicode contains this nice cheatsheet:

Quick Reference
                Decodes from ord(N)           Encodes chr(N) to...
       octet/char BOM S.P d800-dfff  ord > 0xffff     \x{1abcd} ==
  ---------------+-----------------+------------------------------
  UCS-2BE       2   N   N  is bogus                  Not Available
  UCS-2LE       2   N   N     bogus                  Not Available
  UTF-16      2/4   Y   Y  is   S.P           S.P            BE/LE
  UTF-16BE    2/4   N   Y       S.P           S.P    0xd82a,0xdfcd
  UTF-16LE    2/4   N   Y       S.P           S.P    0x2ad8,0xcddf
  UTF-32        4   Y   -  is bogus         As is            BE/LE
  UTF-32BE      4   N   -     bogus         As is       0x0001abcd
  UTF-32LE      4   N   -     bogus         As is       0xcdab0100
  UTF-8       1-4   -   -     bogus   >= 4 octets   \xf0\x9a\af\8d
  ---------------+-----------------+------------------------------

gdb parrot

Back to parrot:

If you debug parrot with gdb you get a gdb pretty-printer thanks to Nolan Lum, which displays the string and encoding information automatically.
In perl5 you have to call Perl_sv_dump with or without the my_perl as first argument, if threaded or not. With a threaded perl, e.g. on Windows you’d need to call p Perl_sv_dump(my_perl, *MARK).
In parrot you just ask for the value and the formatting is done with a gdb pretty-printer plugin.
The string length is called strlen (of the encoded string), the buffer size is called bufused.

Even in a backtrace the string arguments are displayed abbrevated like this:

#3  0x00007ffff7c29fc4 in utf8_iter_get_and_advance (interp=0x412050, str="utf8:� [1/2]", 
    i=0x7fffffffdd00) at src/string/encoding/utf8.c:551
#4  0x00007ffff7a440f6 in Parrot_str_escape_truncate (interp=0x412050, src="utf8:� [1/2]",
    limit=20) at src/string/api.c:2492
#5  0x00007ffff7b02fb3 in trace_op_dump (interp=0x412050, code_start=0x63a1c0, pc=0x63b688)
    at src/runcore/trace.c:450

[1/2] means strlen=1 bufused=2
Each non-ascii or non latin-1 encoded string is printed with the encoding prefix.
Internally the encoding is of course a index or pointer in the table of supported encodings.

You can set a breakpoint to utf8_iter_get_and_advance and watch the strings.

(gdb) r t/library/mime_base64u.t
Breakpoint 1, utf8_iter_get_and_advance (interp=0x412050, str="utf8:\\x{00c7} [8/8]", 
                i=0x7fffffffcd40) at src/string/encoding/utf8.c:544
(gdb) p str
$1 = "utf8:\\x{00c7} [8/8]"
(gdb) p str->bufused 
$3 = 8
(gdb) p str->strlen
$4 = 8
(gdb) p str->strstart
$5 = 0x5102d7 "\\x{00c7}"

This is escaped. Let’s advance to a more interesting utf8 string in this test, i.e. until str=”utf8:Ā [1/2]“
You get the members of a struct with tab-completion, i.e. press <TAB> after p str->

(gdb) p str->
_buflen    _bufstart  bufused    encoding   flags      hashval    strlen     strstart
(gdb) p str->strlen
$9 = 8

(gdb) dis 1
(gdb) b utf8_iter_get_and_advance if str->strlen == 1
(gdb) c
Breakpoint 2, utf8_iter_get_and_advance (interp=0x412050, str="utf8:Ā [1/2]", 
                i=0x7fffffffcd10) at src/string/encoding/utf8.c:544
544	    ASSERT_ARGS(utf8_iter_get_and_advance)

(gdb) p str->strlen
$10 = 1
(gdb) p str->strstart
$11 = 0x7ffff7faeb58 "Ā"
(gdb) x/2x str->strstart
0x7ffff7faeb58:	0xc4	0x80
(gdb) p str->encoding
$12 = (const struct _str_vtable *) 0x7ffff7d882e0
(gdb) p *str->encoding

$13 = {num = 3, name = 0x7ffff7ce333f "utf8", name_str = "utf8", bytes_per_unit = 1,
  max_bytes_per_codepoint = 4, to_encoding = 0x7ffff7c292b0 <utf8_to_encoding>, chr =
  0x7ffff7c275c0 <unicode_chr>, equal = 0x7ffff7c252e0 <encoding_equal>, compare =
  0x7ffff7c254e0 <encoding_compare>, index = 0x7ffff7c25690 <encoding_index>, rindex
  = 0x7ffff7c257a0 <encoding_rindex>, hash = 0x7ffff7c25a20 <encoding_hash>, scan =
  0x7ffff7c29380 <utf8_scan>, partial_scan = 0x7ffff7c29460 <utf8_partial_scan>, ord
  = 0x7ffff7c297e0 <utf8_ord>, substr = 0x7ffff7c25de0 <encoding_substr>, is_cclass =
  0x7ffff7c26000 <encoding_is_cclass>, find_cclass =
  0x7ffff7c260e0 <encoding_find_cclass>, find_not_cclass =
  0x7ffff7c26220 <encoding_find_not_cclass>, get_graphemes =
  0x7ffff7c263d0 <encoding_get_graphemes>, compose =
  0x7ffff7c27680 <unicode_compose>, decompose = 0x7ffff7c26450 <encoding_decompose>,
  upcase = 0x7ffff7c27b20 <unicode_upcase>, downcase =
  0x7ffff7c27be0 <unicode_downcase>, titlecase = 0x7ffff7c27ca0 <unicode_titlecase>,
  upcase_first = 0x7ffff7c27d60 <unicode_upcase_first>, downcase_first =
  0x7ffff7c27dc0 <unicode_downcase_first>, titlecase_first =
  0x7ffff7c27e20 <unicode_titlecase_first>, iter_get =
  0x7ffff7c29c40 <utf8_iter_get>, iter_skip = 0x7ffff7c29d60 <utf8_iter_skip>,
  iter_get_and_advance = 0x7ffff7c29eb0 <utf8_iter_get_and_advance>,
  iter_set_and_advance = 0x7ffff7c29fd0 <utf8_iter_set_and_advance>}

encode_base64(str)

$ perl -MMIME::Base64 -lE'$x="20e3";$s="\x{20e3}";
  printf "0x%s\t%s=> %s",$x,$s,encode_base64($s)'
Wide character in subroutine entry at -e line 1.

Oops, I’m clearly a unicode perl5 newbie. Does my term not understand utf-8?

$ echo $TERM
xterm

No, it should. encode_base64 does not understand unicode.
perldoc MIME::Base64
“The base64 encoding is only defined for single-byte characters. Use the Encode module to select the byte encoding you want.”

Oh my! But it is just perl5. It just works on byte buffers, not on strings.
perl5 strings can be utf8 and non-utf8. Why on earth an utf8 encoded string is disallowed and only byte buffers of unknown encodings are allowed goes beyond my understanding, but what can you do. Nothing. base64 is a binary only protocol, based on byte buffers. So we decode it manually to byte buffers. The Encode API for decoding is called encode.

$ perl -MMIME::Base64 -MEncode -lE'$x="20e3";$s="\x{20e3}";
  printf "0x%s\t%s=> %s",$x,$s,encode_base64(encode('utf8',$s))'
Wide character in printf at -e line 1.
0x20e3	=> 4oOj

This is now the term warning I know. We need -C

$ perldoc perluniintro

$ perl -C -MMIME::Base64 -MEncode -lE'$x="20e3";$s="\x{20e3}";
  printf "0x%s\t%s=> %s",$x,$s,encode_base64(encode('utf8',$s))'
0x20e3	=> 4oOj

Over to rakudo/perl6 and parrot:

$ cat >m.pir << EOP
.sub main :main
    load_bytecode 'MIME/Base64.pbc'
    $P1 = get_global [ "MIME"; "Base64" ], 'encode_base64'
    $S1 = utf8:"\x{203e}"
    $S2 = $P1(s1)
    say $S1
    say $S2
.end
EOP

$ parrot m.pir
FixedIntegerArray: index out of bounds!
current instr.: 'parrot;MIME;Base64;encode_base64'
                pc 163 (runtime/parrot/library/MIME/Base64.pir:147)

The perl6 test, using the parrot library, from https://github.com/ronaldxs/perl6-Enc-MIME-Base64/

$ git clone git://github.com/ronaldxs/perl6-Enc-MIME-Base64.git
Cloning into 'perl6-Enc-MIME-Base64'...

$ PERL6LIB=perl6-Enc-MIME-Base64/lib perl6 <<EOP
use Enc::MIME::Base64;
say encode_base64_str("\x203e");
EOP

> use Enc::MIME::Base64;
Nil
> say encode_base64_str("\x203e");
FixedIntegerArray: index out of bounds!
...

The pure perl6 workaround:

$ PERL6LIB=perl6-Enc-MIME-Base64/lib perl6 <<EOP
use PP::Enc::MIME::Base64;
say encode_base64_str("\x203e");
EOP

> use PP::Enc::MIME::Base64;
Nil
> say encode_base64_str("\x203e");
4oC+

Wait. perl6 creates a different enoding than perl5?
What about coreutils base64 command.

$ echo -n "‾" > m.raw
$ od -x m.raw
0000000 80e2 00be
0000003
$ ls -al m.raw
-rw-r--r-- 1 rurban rurban 3 Dec  6 10:23 m.raw
$ base64 m.raw
4oC+

[80e2 00be] is the little-endian version of [e2 80 be], 3 bytes, flipped.
Ok, at least base64 agrees with perl6, and I must have made some encoding mistake with perl5.

Back to debugging our parrot problem:

parrot unlike perl6 has no debugger yet. So we have to use gdb, and we need to know in which function the error occured. We use the parrot -t trace flag, which is like the perl5 debugging -Dt flag, but it is always enabled, even in optimized builds.

$ parrot --help
...
    -t --trace [flags] 
    --help-debug
...
$ parrot --help-debug
...
--trace -t [Flags] ...
    0001    opcodes
    0002    find_method
    0004    function calls

$ parrot -t7 m.pir
...
009f band I9, I2, 63         I9=0 I2=0 
00a3 set I10, P0[I5]         I10=0 P0=FixedIntegerArray=PMC(0xff7638) I5=[2063]
016c get_results PC2 (1), P2 PC2=FixedIntegerArray=PMC(0xedd178) P2=PMCNULL
016f finalize P2             P2=Exception=PMC(0x16ed498)
0171 pop_eh
lots of error handling
...
0248 callmethodcc P0, "print" P0=FileHandle=PMC(0xedcca0) 
FixedIntegerArray: index out of bounds!

We finally see the problem, which matches the run-time error.

00a3 set I10, P0[I5]         I10=0 P0=FixedIntegerArray=PMC(0xff7638) I5=[2063]

We want to set I10 to the I5=2063’th element in the FixedIntegerArray P0, and the array is not big enough.

After several hours of analyzing I came to the conclusion that the parrot library MIME::Base64 was wrong by using ord of every character in the string. It should use a bytebuffer instead.
Which was fixed with commit 3a48e6. ord can return integers > 255, but base64 can only handle chars < 255.

The fixed parrot library was now correct:

$ parrot m.pir
‾
4oC+

But then the tests started failing. I spent several weeks trying to understand why the parrot testsuite was wrong with the mime_base64 tests, the testdata came from perl5. I came up with different implementation hacks which would match the testsuite, but finally had to bite the bullet, changing the tests to match the implementation.

And I had to special case the tests for big-endian, as base64 is endian agnostic. You cannot decode a base64 encoded powerpc file on an intel machine, when you use multi-byte characters. And utf-8 is even more multi-byte than ucs-2. I had to accept the fact the big-endian will return a different encoding. Before the results were the same. The tests were written to return the same encoding on little and big-endian.

Summary

The first reason why I wrote this blog post was to show how to debug into crazy problems like this, when you are not sure if the core implementation, the library, the spec or the tests are wrong. It turned out, that the library and the tests were wrong.
You saw how easily you could use gdb to debug into such problems, as soon as you find out a proper breakpoint.

The internal string representations looked like this:

MIME::Base64 internally:

len=1, encoding=utf-8, buf=[3e20]

and inside the parrot imcc compiler the SREG

len=8, buf="utf-8:\"\x{203e}\""

parrot is a register based runtime, and a SREG is the string representation of the register value. Unfortunately a SREG cannot hold the encoding info yet, so we prefix the encoding in the string, and unquote it back. This is not the reason why parrot is still slower than the perl5 VM. I benchmarked it. parrot still uses too much sprintf’s internally and the encoding quote/unquoting counts only for a 4th of the time of the sprintf gyrations.
And parrot function calls are awfully slow and de-optimized.

The second reason is to explain the new decode_base64() API, which only parrot – and therefore all parrot based languages like rakudo – now have got.

decode_base64(str, ?:encoding)

“Decode a base64 string by calling the decode_base64() function.
This function takes as first argument the string to decode, as optional second argument the encoding string for the decoded data.
It returns the decoded data.

Any character not part of the 65-character base64 subset is silently ignored.
Characters occurring after a ‘=’ padding character are never decoded.”

So decode_base64 got now a second optional encoding argument. The src string for encode_base64 can be any encoding and is automatically decoded to a bytebuffer. You can easily encode an image or unicode string without any trouble, and for the decoder you can define the wanted encoding beforehand. The result can be the encoding binary or utf-8 or any encoding you prefer, no need for additional decoding of the result. The default encoding of the decoded string is either ascii, latin-1 or utf-8. parrot will upgrade the encoding automatically.

You can compare the new examples of pir against the perl5 version:

parrot:

.sub main :main
    load_bytecode 'MIME/Base64.pbc'

    .local pmc enc_sub
    enc_sub = get_global [ "MIME"; "Base64" ], 'encode_base64'

    .local string result_encode
    # GH 814
    result_encode = enc_sub(utf8:"\x{a2}")
    say   "encode:   utf8:\"\\x{a2}\""
    say   "expected: wqI="
    print "result:   "
    say result_encode

    # GH 813
    result_encode = enc_sub(utf8:"\x{203e}")
    say   "encode:   utf8:\"\\x{203e}\""
    say   "expected: 4oC+"
    print "result:   "
    say result_encode

.end

perl5:

use MIME::Base64 qw(encode_base64 decode_base64);
use Encode qw(encode);

my $encoded = encode_base64(encode("UTF-8", "\x{a2}"));
print  "encode:   utf-8:\"\\x{a2}\"  - ", encode("UTF-8", "\x{a2}"), "\n";
print  "expected: wqI=\n";
print  "result:   $encoded\n";
print  "decode:   ",decode_base64("wqI="),"\n\n"; # 302 242

my $encoded = encode_base64(encode("UTF-8", "\x{203e}"));
print  "encode:   utf-8:\"\\x{203e}\"  -> ",encode("UTF-8", "\x{203e}"),"\n";
print  "expected: 4oC+\n";
print  "result:   $encoded\n"; # 342 200 276
print  "decode:   ",decode_base64("4oC+"),"\n";

for ([qq(a2)],[qq(c2a2)],[qw(203e)],[qw(3e 20)],[qw(1000)],[qw(00c7)],[qw(00ff 0000)]){
    $s = pack "H*",@{$_};
    printf "0x%s\t=> %s", join("",@{$_}), encode_base64($s);
}

perl6:

use Enc::MIME::Base64;
say encode_base64_str("\xa2");
say encode_base64_str("\x203e");

Day 6 – Lexical Imports

December 6, 2012 by

Perl 6 is built on lexical scopes. Variables, subroutines, constants and even types are looked up lexically first, and subroutines are only looked up in lexical scopes.

So it is only fitting that importing symbols from modules is also done into lexical scopes. I often write code such as

    use v6;

    # the main functionality of the script
    sub deduplicate(Str $s) {
        my %seen;
        $s.comb.grep({!%seen{ .lc }++}).join;
    }

    # normal call
    multi MAIN($phrase) {
        say deduplicate($phrase)
    }

    # if you call the script with --test, it runs its unit tests
    multi MAIN(Bool :$test!) {
        # imports &plan, &is etc. only into the lexical scope
        use Test;
        plan 2;
        is deduplicate('just some words'),
            'just omewrd', 'basic deduplication';
        is deduplicate('Abcabd'),
            'Abcd', 'case insensitivity';
    }

This script removes all but the first occurrence of each character given on the command line:

    $ perl6 deduplicate 'Duplicate character removal'
    Duplicate hrmov

But if you call it with the --test option, it runs its own unit tests:

    $ perl6 deduplicate --test
    1..2
    ok 1 - basic deduplication
    ok 2 - case insensitivity

Since the testing functions are only necessary in a part of the program — in a lexical scope, to be more precise –, the use statement is inside that scope, and limits the visibility of the imported symbols to this scope. So if you try to use the is function outside the routine in which Test is used, you get a compile-time error.

Why, you might ask? From the programmer's perspective, it reduces risk of (possibly unintended and unnoticed) name clashes the same way that lexical variables are safer than global variables.

From the point of view of language design, the combination of lexical importing, runtime-immutable lexical scopes and lexical-only lookup of subroutines allows resolving subroutine names at compile time, which again allows neat stuff like detecting calls to undeclared functions, compile-time type checking of arguments, and other nice optimizations.

But subroutines are only the tip of the iceberg. Perl 6 has a very flexible syntax, which you can modify with custom operators and macros. Those too can be exported, and imported into lexical scopes. Which means that language modifications are also lexically by default. So you can safely load any language-modifying extension, without running into danger that a library you use can't cope with it — the library doesn't even see the language modification.

So ultimately, lexical importing is another facet of encapsulation.

Day 5 – A Perl 6 Debugger

December 5, 2012 by

There’s much more to the developer experience of a language than its design, features and implementations. While the language and its implementations are perhaps the thing developers will spend most time with, the overall experience will also involve interaction with the community, reading documentation, using modules and employing various development tools. Thus, it’s important that Perl 6 make progress on these fronts too. Over the past year, we’ve taken some good steps forward in these areas; there’s now doc.perl6.org, the module ecosystem has grown, and the module installation tooling has improved. Another big step forward with regards to tooling – and the topic of this post – is that an interactive Perl 6 debugger is now available.

Running With The Debugger

The debugger has been included with the last few Rakudo * releases. If you have one of those, you’re all set. Just run perl6-debug instead of perl6. It takes the same set of options, so if your normal invocation involves, for example, using the -I flag to set the include path for modules, it’ll Just Work Like Usual. Of course, what happens next is entirely different. The debugger will show you each module it is loading, followed by placing you at the first interesting statement of the program, highlighted in yellow.

dbg0

Note how it takes care to put you on the first line that actually does something, skipping the my statement above it (it’s getting increasingly smart about this).

The Basics

Hitting enter allows you to single-step through the program. At any point, you can look at variables, call methods on variables, or even evaluate expressions.

dbg1

If you want to move statement by statement, but never descend into a function call or method call, type an s, followed by enter. To step out of the current sub (that is, run until it returns then break in its caller), use so to step out. To run the program until it hits an exception, just use r. Even at the point you get an exception, you can still access variables to try and dissect what went wrong.

dbg2

One final variant, rt, will run until an exception is thrown, but handled. You’ll break at the point of the throw. This means you’re not disadvantaged in the debugger if you took care to handle exceptions well in your program; you can still break when they are thrown and use the debugger to help understand why. :-)

Breakpoints

Sometimes, you know exactly where the juicy stuff happens in your program that you wish to debug. If only you could just run until you got there. Turns out you can – that’s what breakpoints are for. We can add one, use r to run, and it will stop where we placed the breakpoint.

dbg3

Note that you don’t have to type out the full name of the file you want to put the breakpoint in; any unambiguous substring of the name of a file that is loaded will be sufficient.

I won’t cover them here, but there are also tracepoints, which instead of breaking will log the value of an expression each time a certain place in the program is hit. Later, you can display the log. It’s like adding print statements, but without the print statement going in your code, removing the risk of them accidentally making it into a commit (‘cus we’ve all done that one, right? :-))

Regex and Grammar Debugging

When the debugger detects you are in a regex or grammar, it offers a little extra help. As well as allowing you to single-step your way through the regex, atom by atom, it also displays the match text, indicating what has been matched so far.

dbg4

Here, you can see that the pattern already successfully matched SELECT, and is now looking for a literal * or will try to call the field list rule. If in a regex, which may backtrack, the match position  jumps backwards when backtracking happens, so you can understand the backtracking behavior of the pattern.

Yes, Perl 5 Regexes Too!

Rakudo has some support for the :P5 adverb on regexes, which allows use of the Perl 5 regex syntax. Here the debugger is used in REPL mode (where you enter an expression, then can immediately debug it) to explore the difference between alternations in Perl 5 and Perl 6 (in Perl 5 they go left to right, in Perl 6 they have longest token matching semantics, such that it tries the thing that will match most characters first).

dbg5

The debugger in REPL mode is great for exploring and understanding how things will execute (and as such can serve as a learning or teaching aid). Another use is for debugging modules without having to write a test script; just write a use statement in the debugger or you can even supply the module using the -M command line flag and the debugger will load it!

What About Funky Stuff, Like Macros?

That is, macros, BEGIN time, eval, and those other things where your Perl 6 program does the time warp again, doing a bit of runtime at compile time or compiling some more stuff at runtime. The debugger is built for it. If a macro is applied, the debugger will place you in it. Notice below how we’re still in the process of loading the second file, and did not get to the third yet – we really are debugging at BEGIN time!

dbg6

Any lines of code that are stripped out by the macro are simply never hit at runtime. And what about statements in quasi blocks? The debugger will take you there, so you not only know what macro was applied, but can dig into exactly what it does too.

dbg7

Just as bits of runtime happening at compile time work out fine, any code that gets eval‘d at runtime is also compiled with debug hooks, meaning that you can step straight into it and debug the evaluated code.

Written in Perl 6 and NQP!

You might think that writing a debugger must involve all kinds of low-level hackery. In fact, that’s not the case. The debug hooks mechanism is written in NQP, and the command line user interface is written in Perl 6. This is significant from a couple of angles. The first is the fact that we can write something like this without breaking the encapsulation of the compiler, but instead just by subclassing the Grammar, Actions and Compiler objects and twiddling with the AST. In fact, the debugger was built without any changes being required to Rakudo as it already existed. This provides important feedback on our compiler architecture – this time, very positive feedback. Things are extensible in the ways they were designed to be. The second is that writing so much of it in Perl 6 is a healthy bit of dogfooding – using the product in order to build further products. My hope is that, since most of what people would want to change is actually written in the Perl 6 part, it will feel quite hackable by the community at large.

And What Of Future Plans?

Various features are still to come: conditional breakpoints, dumping tracepoint output to a file, showing the path taken through a grammar to get to the current point, and various bits of configurability. The command line interface is nice, but of course having some extra options would be even nicer. I’m interested in a web-based interface, but also in integration with tools like Padre. There’s some work afoot on a common protocol for these things, which could make such integration possible without having to re-invent too many wheels. In the meantime, having an interactive debugger which is aware of and works well with a wide range of Perl 6 language features is a solid step forward. Happy debugging, and feature ideas (or patches ;-)) are welcome; here’s the GitHub repo!

Day 4 – Having Fun with Rakudo and Project Euler

December 4, 2012 by

Rakudo, the leading Perl6 implementation, is not perfect, and performance is a particularly sore subject. However, the pioneer does not ask ‘Is it fast?’, but rather ‘Is it fast enough?’, or perhaps even ‘How can I help to make it faster?’.

To convince you that Rakudo can indeed be fast enough, we’ll take a shot at a bunch of Project Euler problems. Many of those involve brute-force numerics, and that’s something Rakudo isn’t particularly good at right now. However, that’s not necessarily a show stopper: The less performant the language, the more ingenious the programmer needs to be, and that’s where the fun comes in.

All code has been tested with Rakudo 2012.11.

We’ll start with something simple:

Problem 2

By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.

The solution is beautifully straight-forward:

    say [+] grep * %% 2, (1, 2, *+* ...^ * > 4_000_000);

Runtime: 0.4s

Note how using operators can lead to code that’s both compact and readable (opinions may vary, of course). We used

  • whatever stars * to create lambda functions
  • the sequence operator (in its variant that excludes the right endpoint) ...^ to build up the Fibonacci sequence
  • the divisible-by operator %% to grep the even terms
  • reduction by plus [+] to sum them.

However, no one forces you to go crazy with operators – there’s nothing wrong with vanilla imperative code:

Problem 3

What is the largest prime factor of the number 600,851,475,143?

An imperative solution looks like this:

    sub largest-prime-factor($n is copy) {
        for 2, 3, *+2 ... * {
            while $n %% $_ {
                $n div= $_;
                return $_ if $_ > $n;
            }
        }
    }

    say largest-prime-factor(600_851_475_143);

Runtime: 2.6s

Note the is copy trait, which is necessary as Perl6 binds arguments read-only by default, and that integer division div is used instead of numeric division /.

Nothing fancy going on here, so we’ll move along to

Problem 53

How many, not necessarily distinct, values of nCr, for 1 ≤ n ≤ 100, are greater than one-million?

We’ll use the feed operator ==> to factor the algorithm into separate steps:

    [1], -> @p { [0, @p Z+ @p, 0] } ... * # generate Pascal's triangle
    ==> (*[0..100])()                     # get rows up to n = 100
    ==> map *.list                        # flatten rows into a single list
    ==> grep * > 1_000_000                # filter elements exceeding 1e6
    ==> elems()                           # count elements
    ==> say;                              # output result

Runtime: 5.2s

Note the use of the Z meta-operator to zip the lists 0, @p and @p, 0 with +.

The one-liner generating Pascal’s triangle has been stolen from Rosetta Code, another great resource for anyone interested in Perl6 snippets and exercises.

Let’s do something clever now:

Problem 9

There exists exactly one Pythagorean triplet for which a + b + c = 1000. Find the product abc.

Using brute force will work (solution courtesy of Polettix), but it won’t be fast (~11s on my machine). Therefore, we’ll use a bit of algebra to make the problem more managable:

Let (a, b, c) be a Pythagorean triplet

    a < b < c
    a² + b² = c²

For N = a + b + c it follows

    b = N·(N - 2a) / 2·(N - a)
    c = N·(N - 2a) / 2·(N - a) + a²/(N - a)

which automatically meets b < c.

The condition a < b gives the constraint

    a < (1 - 1/√2)·N

We arrive at

    sub triplets(\N) {
        for 1..Int((1 - sqrt(0.5)) * N) -> \a {
            my \u = N * (N - 2 * a);
            my \v = 2 * (N - a);

            # check if b = u/v is an integer
            # if so, we've found a triplet
            if u %% v {
                my \b = u div v;
                my \c = N - a - b;
                take $(a, b, c);
            }
        }
    }

    say [*] .list for gather triplets(1000);

Runtime: 0.5s

Note the declaration of sigilless variables \N, \a, …, how $(…) is used to return the triplet as a single item and .list – a shorthand for $_.list – to restore listy-ness.

The sub &triplets acts as a generator and uses &take to yield the results. The corresponding &gather is used to delimit the (dynamic) scope of the generator, and it could as well be put into &triplets, which would end up returning a lazy list.

We can also rewrite the algorithm into dataflow-driven style using feed operators:

    constant N = 1000;

    1..Int((1 - sqrt(0.5)) * N)
    ==> map -> \a { [ a, N * (N - 2 * a), 2 * (N - a) ] } \
    ==> grep -> [ \a, \u, \v ] { u %% v } \
    ==> map -> [ \a, \u, \v ] {
        my \b = u div v;
        my \c = N - a - b;
        a * b * c
    }
    ==> say;

Runtime: 0.5s

Note how we use destructuring signature binding -> […] to unpack the arrays that get passed around.

There’s no practical benefit to use this particular style right now: In fact, it can easily hurt performance, and we’ll see an example for that later.

It is a great way to write down purely functional algorithms, though, which in principle would allow a sufficiently advanced optimizer to go wild (think of auto-vectorization and -threading). However, Rakudo has not yet reached that level of sophistication.

But what to do if we’re not smart enough to find a clever solution?

Problem 47

Find the first four consecutive integers to have four distinct prime factors. What is the first of these numbers?

This is a problem where I failed to come up with anything better than brute force:

    constant $N = 4;

    my $i = 0;
    for 2..* {
        $i = factors($_) == $N ?? $i + 1 !! 0;
        if $i == $N {
            say $_ - $N + 1;
            last;
        }
    }

Here, &factors returns the number of prime factors. A naive implementations looks like this:

    sub factors($n is copy) {
        my $i = 0;
        for 2, 3, *+2 ...^ * > $n {
            if $n %% $_ {
                ++$i;
                repeat while $n %% $_ {
                    $n div= $_
                }
            }
        }
        return $i;
    }

Runtime: unknown (33s for N=3)

Note the use of repeat while … {…}, the new way to spell do {…} while(…);.

We can improve this by adding a bit of caching:

    BEGIN my %cache = 1 => 0;

    multi factors($n where %cache) { %cache{$n} }
    multi factors($n) {
        for 2, 3, *+2 ...^ * > sqrt($n) {
            if $n %% $_ {
                my $r = $n;
                $r div= $_ while $r %% $_;
                return %cache{$n} = 1 + factors($r);
            }
        }
        return %cache{$n} = 1;
    }

Runtime: unknown (3.5s for N=3)

Note the use of BEGIN to initialize the cache first, regardless of the placement of the statement within the source file, and multi to enable multiple dispatch for &factors. The where clause allows dynamic dispatch based on argument value.

Even with caching, we’re still unable to answer the original question in a reasonable amount of time. So what do we do now? We cheat and use Zavolaj – Rakudo’s version of NativeCall – to implement the factorization in C.

It turns out that’s still not good enough, so we refactor the remaining Perl code and add some native type annotations:

    use NativeCall;

    sub factors(int $n) returns int is native('./prob047-gerdr') { * }

    my int $N = 4;

    my int $n = 2;
    my int $i = 0;

    while $i != $N {
        $i = factors($n) == $N ?? $i + 1 !! 0;
        $n = $n + 1;
    }

    say $n - $N;

Runtime: 1m2s (0.8s for N=3)

For comparison, when implementing the algorithm completely in C, the runtime drops to under 0.1s, so Rakudo won’t win any speed contests just yet.

As an encore, three ways to do one thing:

Problem 29

How many distinct terms are in the sequence generated by ab for 2 ≤ a ≤ 100 and 2 ≤ b ≤ 100?

A beautiful but slow solution to the problem can be used to verify that the other solutions work correctly:

    say +(2..100 X=> 2..100).classify({ .key ** .value });

Runtime: 11s

Note the use of X=> to construct the cartesian product with the pair constructor => to prevent flattening.

Because Rakudo supports big integer semantics, there’s no loss of precision when computing large numbers like 100100.

However, we do not actually care about the power’s value, but can use base and exponent to uniquely identify the power. We need to take care as bases can themselves be powers of already seen values:

    constant A = 100;
    constant B = 100;

    my (%powers, %count);

    # find bases which are powers of a preceeding root base
    # store decomposition into base and exponent relative to root
    for 2..Int(sqrt A) -> \a {
        next if a ~~ %powers;
        %powers{a, a**2, a**3 ...^ * > A} = a X=> 1..*;
    }

    # count duplicates
    for %powers.values -> \p {
        for 2..B -> \e {
            # raise to power \e
            # classify by root and relative exponent
            ++%count{p.key => p.value * e}
        }
    }

    # add +%count as one of the duplicates needs to be kept
    say (A - 1) * (B - 1) + %count - [+] %count.values;

Runtime: 0.9s

Note that the sequence operator ...^ infers geometric sequences if at least three elements are provided and that list assignment %powers{…} = … works with an infinite right-hand side.

Again, we can do the same thing in a dataflow-driven, purely-functional fashion:

    sub cross(@a, @b) { @a X @b }
    sub dups(@a) { @a - @a.uniq }

    constant A = 100;
    constant B = 100;

    2..Int(sqrt A)
    ==> map -> \a { (a, a**2, a**3 ...^ * > A) Z=> (a X 1..*).tree } \
    ==> reverse()
    ==> hash()
    ==> values()
    ==> cross(2..B)
    ==> map -> \n, [\r, \e] { (r) => e * n } \
    ==> dups()
    ==> ((A - 1) * (B - 1) - *)()
    ==> say();

Runtime: 1.5s

Note how we use &tree to prevent flattening. We could have gone with X=> instead of X as before, but it would make destructuring via -> \n, [\r, \e] more complicated.

As expected, this solution doesn’t perform as well as the imperative one. I’ll leave it as an exercise to the reader to figure out how it works exactly ;)

That’s it

Feel free to add your own solutions to the Perl6 examples repository under euler/.

If you’re interested in bioinformatics, you should take a look at Rosalind as well, which also has its own (currently only sparsely populated) examples directory rosalind/.

Last but not least, some solutions for the Computer Language Benchmarks Game – also known as the Debian language shootout – can be found under shootout/.

You can contribute by sending pull requests, or better yet, join #perl6 on the Freenode IRC network and ask for a commit bit.

Have the appropriate amount of fun!

Day 3 – Whatever the layout manager is

December 3, 2012 by

Introduction

This article aims to demonstrate how Whatever — one of the many interesting Perl 6 curiosities — could be useful to easily implement and use complex things like a layout manager. In a couple of words, a layout manager is the part of a graphical interface in charge of the spatial arrangement of objects like windows or widgets. For the sake of simplicity, the layout manager implemented in this article will comply with the following three rules:

  • there are only two kinds of widgets: terminal or container, the latter can contain other widgets of either kind;
  • a widget cannot be overlapped, except for containers which fully contain their sub-widgets; and
  • only the height can be adjusted, this size can be either static, dynamic, or intentionally left unspecified.

Usage

From the user's point-of-view, this layout manager aims to be as easy to use as possible. For example, it shouldn't be hard to specify such typical interface below, inspired from a text-based program. In this example, the interface and body widgets are containers, all the others are terminals:

interface (X lines)
+----> +------------------------------------------+
|      | menu bar (1 line)                        |  body (remaining space)
|      +------------------------------------------+ <----+
|      | subpart 1 (1/3 of the remaining space)   |      |
|      |                                          |      |
|      |                                          |      |
|      +------------------------------------------+      |
|      | subpart 2 (remaining space)              |      |
|      |                                          |      |
|      |                                          |      |
|      |                                          |      |
|      |                                          |      |
|      |                                          |      |
|      +------------------------------------------+ <----+
|      | status bar (1 line)                      |
+----> +------------------------------------------+

The user don't know what the remaining space is in advance because such an interface is arbitrary resizable. As a consequence it should be specified as a non-predefined value; this is where * — the Whatever object — comes in handy. This object is interesting for two reasons:

  • from the user's point-of-view, the definition of non-static sizes is as simple as: * / 3 for the subpart 1 (dynamic) and just * for the subpart 2 (unspecified); and
  • from the developer's point-of-view, Perl 6 transforms automatically things like $size = * / 3 into a closure: x → x / 3. Then, it could be called like a regular function: $size($x).

That way, the previous GUI can be transliterated into the following lines of code:

my $interface =
    Widget.new(name => 'interface', size => $x, sub-widgets => (
        Widget.new(name => 'menu bar', size => 1),
        Widget.new(name => 'main part', size => *, sub-widgets => (
            Widget.new(name => 'subpart 1', size => * / 3),
            Widget.new(name => 'subpart 2', size => *))),
        Widget.new(name => 'status bar', size => 1)));

Implementation

The drawing of terminal widgets is straightforward since most of the work is done by containers. Those are in charge to compute the remaining space as well as to uniformly distribute widgets that have an unspecified size:

class Widget {
    has $.name;
    has $.size is rw;
    has Widget @.sub-widgets;

    method compute-layout($remaining-space? is copy, $unspecified-size? is copy) {
        $remaining-space //= $!size;

        if @!sub-widgets == 0 {  # Terminal
            my $computed-size = do given $!size {
                when Real     { $_                  };
                when Callable { .($remaining-space) };
                when Whatever { $unspecified-size   };
            }

            self.draw($computed-size);
        }
        else {  # Container
            my @static-sizes   =  grep Real,     @!sub-widgets».size;
            my @dynamic-sizes  =  grep Callable, @!sub-widgets».size;
            my $nb-unspecified = +grep Whatever, @!sub-widgets».size;

            $remaining-space -= [+] @static-sizes;

            $unspecified-size = ([-] $remaining-space, @dynamic-sizes».($remaining-space))
                                 / $nb-unspecified;

            .compute-layout($remaining-space, $unspecified-size) for @!sub-widgets;
        }
    }

    method draw(Real $size is copy) {
        "+{'-' x 25}+".say;
        "$!name ($size lines)".fmt("| %-23s |").say;
        "|{' ' x 25}|".say while --$size > 0;
    }
}

Here, any Callable object can be used to specify a dynamic size, as far as it takes the computed remaining space as argument. That means it is possible to specify more sophisticated dynamic size by passing a code Block. For example, { max(5, $^x / 3) } ensures the widget has a proportional size that can't decrease below 5.

Conclusion

It's time to check if this trivial layout manager works correctly both in Rakudo and Niecza, the two most advanced implementations of Perl 6. The following test is rather simple, it creates and draws an interface, then resize it and draws it again:

my $interface =
    Widget.new(name => 'interface', size => 11, sub-widgets => (
        Widget.new(name => 'menu bar', size => 1),
        Widget.new(name => 'main part', size => *, sub-widgets => (
            Widget.new(name => 'subpart 1', size => * / 3),
            Widget.new(name => 'subpart 2', size => *))),
        Widget.new(name => 'status bar', size => 1)));

$interface.compute-layout;  # Draw
$interface.size += 3;       # Resize
$interface.compute-layout;  # Redraw

The results before and after resizing are respectively displayed below. They are close enough from the initial mockup, n'est-ce pas?

+-------------------------+            +-------------------------+
| menu bar (1 lines)      |            | menu bar (1 lines)      |
+-------------------------+            +-------------------------+
| subpart 1 (3 lines)     |            | subpart 1 (4 lines)     |
|                         |            |                         |
|                         |            |                         |
+-------------------------+            |                         |
| subpart 2 (6 lines)     |            +-------------------------+
|                         |            | subpart 2 (8 lines)     |
|                         |            |                         |
|                         |            |                         |
|                         |            |                         |
|                         |            |                         |
+-------------------------+            |                         |
| status bar (1 lines)    |            |                         |
                                       |                         |
                                       +-------------------------+
                                       | status bar (1 lines)    |

Finally, the implementation of such a flexible program is really simple in Perl 6: everything is already there, in the core language. Obviously, this trivial layout manager isn't ready for prime-time since a lot of things are missing: sanity checks, multiple dimensions, … but those are left as exercises to you, the reader ;) For any questions or comments, feel free to meet Perl 6 fellows on IRC (#perl6 on freenode).

Bonus

As seen previously, $!size can be Whatever, but it can't be whatever you want. For example, a negative Real or a string are not correct values. Once again Perl 6 provides a simple yet powerful feature: constrained types. In a couple of words this permits to define new types from a set of constraints:

subset PosReal of Real where * >= 0;
subset Size where {   .does(PosReal)
                   or .does(Callable) and .signature ~~ :(PosReal --> PosReal)
                   or .does(Whatever) };

has Size $.size is rw;

Day 2 – Anonymous functions for great good

December 2, 2012 by

Perl 6 has great support for functions. It packs function signatures full with awesome, and lets you have your cake and eat it a couple of times over with all the ways you can specify a function. You can specify parameter types, optional parameters, named parameters, and even those cool where clauses. If I didn’t know better, I’d suspect Perl 6 was compensating for some predecessor’s rather rudimentary handling of parameters. (*cough* @_ *cough*)

Among all these other things, Perl 6 also allows you to define functions without naming them.

sub { say "lol, I'm so anonymous!" }

How is this useful? If you can’t name the function, you can’t call it, right? Wrong.

You can store the function in a variable. Or return it from another function. Or pass it to another function. In fact, when you don’t name your function, the focus becomes much more what code you’re going to run later. Like an executable “to do”-list.

Of course, Perl 5 has anonymous functions, too. With exactly the same syntax, even. In fact, all the big languages do anonymous functions, according to this list of languages on Wikipedia. Except, it seems, the historically significant languages C and Pascal. And the more modern but lumbering Java. “Planned for Java 8″. Haha, Java, catch up! Even C++ has them now.

How important are anonymous functions? Very. In the 1930s, Alan Turing showed how all computer processes could be simulated using just a pre-programmed machine that looks like a tape recorder, reading and writing values on a really long tape. (The Turing Machine.) Meanwhile, across the Atlantic, Alonzo Church showed how all computer processes could be simulated using just anonymous functions, no tape recorder required. (Lambda calculus.) It’s all quite elegant.

Later languages like Lisp and Scheme lean heavily on anonymous functions as a key component in the language. And lately a scrappy language called JavaScript, which also leans heavily on anonymous functions, has taken over the world while we were all busy surfing the web.

But let’s talk possibilities here. What can anonymous functions do for us? And how would it look in Perl 6?

Well, take sorting as a famous example. You could imagine Perl 6 having a sort_lexicographically function and a sort_numerically function. But it doesn’t. It has a sort function. When you want it to sort in a certain way, you just pass an anonymous function to it.

my @sorted_words = @words.sort({ ~$_ });
my @sorted_numbers = @numbers.sort({ +$_ });

(Technically, those are blocks, not functions. But the difference isn’t significant if you’re not planning to return anywhere inside.)

And of course it goes further than just those two sort orders. You could sort by shoe size, or maximum ground speed, or decreasing likelihood of spontaneous combustion. All because you can pass in any logic as an argument. Object-oriented people are very proud of this pattern, and call it “dependency injection”.

Come to think of it, map and grep and reduce all depend on this kind of function-passing. We sometimes refer to passing functions to functions as “higher order programming”, as if it was only something people with special privileges should be doing. But in fact it’s a very useful and broadly applicable technique.

The above examples all run the anonymous functions as part of their own execution. But there’s no need to restrict ourselves to this. We can create functions, return them, and then run them later:

sub make_surprise_for($name) {
    return sub { say "Sur-priiise, $name!" };
}

my $reveal_surprise = make_surprise_for("Finn");    # nothing happens, yet
# ...wait for it...
# ...wait...
# ...waaaaaaait...
$reveal_surprise();        # "Sur-priiise, Finn!"

The function in $reveal_surprise remembers the value of $name even though the original function passing it in has exited long ago. That’s pretty nice. This effect is referred to as the anonymous function closing over the variable $name. But there’s no need to get technical — the long and short of it is “it’s awesome”.

And in fact, it feels quite natural if we just look at anonymous functions alongside other staple storage mechanisms such as arrays and hashes. All of these can be stored in variables, passed as arguments or returned from functions. An anonymous array allows you to store a sequence of things for later. An anonymous hash allows you to store mappings/translations of things for later. An anonymous function allows you to store calculations or behavior for later.

Later this month, I’ll go through how to exploit dynamic scoping in Perl 6 to create nice DSL-y interfaces. We’ll see how anonymous functions come into play there as well.

Perl 6 Advent Calendar 2012: Table of Contents

December 1, 2012 by

This post serves as a table of contents for the 2012 Perl 6 advent calendar. Links to new posts will appear here during the course of this month.

See also: table of contents for 2011, 2010, 2009.

Day 1 – State of Perl 6 in 2012

December 1, 2012 by

Welcome to another edition of your annual Perl 6 advent calendar.

As is tradition on the first of December, you can read a short overview over what has changed in the past year, and where we are standing now.

The list of major changes to the specification is pretty short. The IO subsystem has undergone a rewrite, and now much better reflects the realities in implementations, and actually has a measure of common sense applied. S32::Exceptions has gone through lots of changes (mostly extensions), and now there is a decent core of exception classes in Perl 6.

Both Rakudo and Niecza, the two major Perl 6 compilers, have matured a great deal. Contrary to last year, chances are pretty good that if your program works on one of the compilers, it also works on the other. Niecza also temporarily overtook Rakudo on the count of passing tests.

Niecza had a revamp of the roles implementation, has gained constant folding, awesome Unicode support in regexes, list comprehensions and a no strict; mode. To name just a few of the major changes.

Rakudo now supports heredocs, all phasers (special blocks like BEGIN, END, FIRST, …), longest-token matching in regexes, typed exceptions, much nicer backtraces and operator adverbs. And it now has a debugger, which is shipped with the Rakudo Star distribution.

The module ecosystem has grown a lot, and there is much more documentation for Perl 6 than a year ago.

So, after all these changes, where are we now?

Reports from production uses of Perl 6 are slowly starting to trickle in, and these days if your Perl 6 code has bugs, the chances are much higher that your code is to blame than the compilers. Perl 6 has never been this much fun to use. It surely has been a good and productive year for Perl 6, and we’re sure that this last month will continue the tradition. Have fun!


Follow

Get every new post delivered to your Inbox.

Join 37 other followers