Buffers and Binary IO

Buffers and Binary IO

Perl 5 is known to have very good Unicode support (starting from version 5.8, the later the better), but people still complain that it is hard to use. The most important reason for that is that the programmer needs to keep track of which strings have been decoded, and which are meant to be treated as binary strings. And there is no way to reliably introspect variables to find out if they are binary or text strings.

In Perl 6, this problem has been addressed by introducing separate types. Str holds text strings. String literals in Perl 6 are of type Str. Binary data is stored in Buf objects. There is no way to confuse the two. Converting back and forth is done with the encode and decode methods.

    my $buf = Buf.new(0x6d, 0xc3, 0xb8, 0xc3, 0xbe, 0x0a);

    my $str = $buf.decode('UTF-8');
    print $str;

Both of those output operations have the same effect, and print møþ to the standard output stream, followed by a newline. Buf.new(...) takes a list of integers between 0 and 255, which are the byte values from which the new byte buffer is constructed. $*OUT.write($buf) writes the $buf buffer to standard output.

$buf.decode('UTF-8') decodes the buffer, and returns a Str object (or dies if the buffer doesn’t consistute valid UTF-8). The reverse operation is $Buf.encode($encoding). A Str can simply be printed with print.

Of course print also needs to convert the string to a binary representation somewhere in the process. There is a default encoding for this and other operations, and it is UTF-8. The Perl 6 specification allows the user to change the default, but no compiler implements that yet.

For reading, you can use the .read($no-of-bytes) methods to read a Buf, and .get for reading a line as a Str.

The read and write methods are also present on sockets, not just on the ordinary file and stream handles.

One of the particularly nasty things you can accidentally do in Perl 5 is
concatenating text and binary strings, or combine them in another way (like with join or string interpolation). The result of such an operation is a string that happens to be broken, but only if the binary string contains any bytes above 127 — which can be a nightmare to debug.

In Perl 6, you get Cannot use a Buf as a string when you try that, avoiding that trap.

The existing Perl 6 compilers do not yet provide the same level of Unicode support as Perl 5 does, but the bits that are there are much harder to misuse.

2 thoughts on “Buffers and Binary IO

  1. With Rakudo (VERSION contains ‘2011.04’) I get the following error in line 1 of your sample code:

    Default constructor only takes named arguments
    in method new at src/gen/CORE.setting:506

    Looks like the ‘Buf’ constructor has a different calling convention. Is there documentation available for Buf?

    1. Sorry, I should have mentioned that, you need the latest development version of nom to run these examples, many properties of Buf have changed since April.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s