Perl 5 is known to have very good Unicode support (starting from version 5.8, the later the better), but people still complain that it is hard to use. The most important reason for that is that the programmer needs to keep track of which strings have been decoded, and which are meant to be treated as binary strings. And there is no way to reliably introspect variables to find out if they are binary or text strings.
In Perl 6, this problem has been addressed by introducing separate types.
Str holds text strings. String literals in Perl 6 are of type
Str. Binary data is stored in
Buf objects. There is no way to confuse the two. Converting back and forth is done with the
my $buf = Buf.new(0x6d, 0xc3, 0xb8, 0xc3, 0xbe, 0x0a); $*OUT.write($buf); my $str = $buf.decode('UTF-8'); print $str;
Both of those output operations have the same effect, and print
møþ to the standard output stream, followed by a newline.
Buf.new(...) takes a list of integers between 0 and 255, which are the byte values from which the new byte buffer is constructed.
$*OUT.write($buf) writes the
$buf buffer to standard output.
$buf.decode('UTF-8') decodes the buffer, and returns a
Str object (or dies if the buffer doesn’t consistute valid UTF-8). The reverse operation is
Str can simply be printed with
UTF-8. The Perl 6 specification allows the user to change the default, but no compiler implements that yet.
For reading, you can use the
.read($no-of-bytes) methods to read a
.get for reading a line as a
write methods are also present on sockets, not just on the ordinary file and stream handles.
One of the particularly nasty things you can accidentally do in Perl 5 is
concatenating text and binary strings, or combine them in another way (like with
join or string interpolation). The result of such an operation is a string that happens to be broken, but only if the binary string contains any bytes above 127 — which can be a nightmare to debug.
In Perl 6, you get
Cannot use a Buf as a string when you try that, avoiding that trap.
The existing Perl 6 compilers do not yet provide the same level of Unicode support as Perl 5 does, but the bits that are there are much harder to misuse.