Day 11 – Parrot threads

Editors note: I, rurban, does know almost nothing about threads. Any errors are probably mine. I just tested them, fixed some deadlocks, added the numcpu code and merged the threads branch to master.

Parrot now supports fast and lightweight OS threads, based on Nat Tucks’s initial GSoC work together with Andrew “whiteknight” Whitworth on green threads and finally Stefan Seifert’s extension to true parallel OS threads as hybrid threads.
See and


Parrot always supported “threads”, i.e. concurrency models over the last years, but we identified various problems with the particular designs and were continuously improving them. In our case without changing the API too much, as the pdd25 concurrency spec is pretty high level, describing the various models parrot should support, and also pretty low-level in describing the two PMC’s which export the threads API, the Task and the Scheduler classes.

Being born at a time when Perl 6 still looked much more similar to Perl 5 than it does nowadays, Parrot’s threading support initially was very close to Perl’s ithreads model. Previous attempts to change this into the more conventional model of data shared by default or implementing new technologies like STM “Software Transactional Memory” failed. For example Parrot has never supported running multiple threads and having garbage collection at the same time.

In the year 2005 development of faster Central Processing Units (CPUs) shifted from increased speed of a single core to adding more cores. Modern processors contain up to 12 cores with even mobile phones having up to four. To utilize a modern CPU’s power, code needs to be run in parallel. In UNIX (and thus Perl) tradition, this is accomplished using multiple processes being a good solution for many use cases. For many others like auto threading of hyper operators in Perl 6, the cost of process setup and communication would be prohibitively high except for very large data sets.

During the years of back and forth and failed attempts of adding threading support to Parrot, the Perl 6 specification evolved to a point where the largest parts of the language were covered and its features were implemented in the compilers. The lack of concurrency primitives in Parrot however prevents any progress in the area of concurrency support.


Green threads were used to simplify the implementation of a nearly lock free multithreading implementation. 

Parrot supports now native Win32 threads and POSIX threads. Win32 alarm, sleep and premption is unified with POSIX, it is handled on a common timer thread.

Parrot creates at startup a thread pool of --numthreads threads, which defaults to the number of available CPU cores. Activating a new thread at runtime causes no run-time penalties, until the number of cores is utilized. When a user starts a new task, the scheduler first looks for an idle thread. If one can be found, the task is scheduled on the thread’s interpreter. If more tasks are started than the maximum number of threads, the tasks are distributed evenly among the running interpreters. This is effectively an implementation of the N:M threading model.

Green threads

Our GSOC student Nan “Chandor” Tuck worked in summer 2010 on green threads.

What I have working now is a pre-emptively scheduled green threads system for Parrot that allows programs to be written in a concurrent style. Individual green threads can do basic blocking file input without stopping other threads from running. These logical threads are accessed using the Task API that I described a couple weeks ago. This functionality makes Parrot similarly powerful at threading as the standard version of Ruby or Python: Threads can do pretty much everything except run at the same time.

What was missing from this green threads branch was true parallel execution in OS threads, one global_interpreter structure that is shared and protected by locks or other concurrent access rules and many local_interpreters that run simultaneously in separate OS threads.

OS threads

From Fall 2011 to Summer 2012 Stefan “nine” Seifert implemented true OS threads on top of green threads to finally allow true parallel execution of Tasks, to implement blocking IO, and to give perl6 some more advantages over perl5.

The lightweight “green” threads are used as messages in a system where reading shared variables is allowed but only the one owner thread may write to it. That’s why we call it hybrid threads.

Why is multithreading support so difficult to implement?

Low level programming languages like C provide only the bare necessities, leaving the responsibility for preventing data corruption and synchronization entirely to the user. A high-level language like Perl 6 on the other hand provides complex and compound data types, handles garbage collection and a very dynamic object system. Even seemingly simple things like a method call can become very complex. In a statically typed programming language the definition of a class is immutable. Thus, calling a method on an object contains just the steps of determining the object’s class, fetching the required method from this class and calling it. Calling the same method again can then even omit the first two steps since their results cannot change.

In a dynamic language, the object may change its class at runtime. The inheritance hierarchy of the class may be changed by adding or removing parent classes. Methods may be added to or removed from classes (or objects) at runtime and even the way how to find a method of a class may change. So a simple method call results in the following steps:

    ·  determining the class of the object,
    ·  determining the method resolution method of the class,
    ·  finding the actual method to call,
    ·  calling the method.

These steps have to be repeated for every method call, because the results may change any time. In a threaded environment, a thread running in parallel may change the underlying data and meta data in between those sequences and even between those steps. As a consequence, this meta data has to be protected from corruption introducing the need for locks in a performance critical area.

Many interpreters for dynamic languages like Python or Ruby handle this problem by using a global interpreter lock (GIL) to effectively serialize all operations. This is a proven and reliable way but leaves much of the hardware’s potential unused.


In Java, the user is responsible for preventing concurrency issues. The language provides synchronization primitives like mutexes, but the interpreter (the Java Virtual Machine, JVM) does not protect the consistency of the provided data structures. The class library provides the user with high-level data structures explicitly designed for multithreaded scenarios.

Java version 1.1 used green threads to support multithreaded execution of Java programs. Green threads are threads simulated by the virtual machine (VM) but unable to use more than one CPU core for processing. Version 1.2 introduced native Operating System (OS) threading support which since has become the standard way to do multithreading in Java.


The CPython implementation of the Python runtime uses a Global Interpreter Lock (GIL) to protect its internal consistency. This is a single lock taken whenever the interpreter executes Python bytecode. Because of this lock, only one thread can execute bytecode at any time so all built-in types and the object model are implicitly type safe. The drawback is that Python code cannot benefit from having multiple CPU cores available. However I/O operations and calls to external libraries are executed without holding the GIL, so in applications with multiple I/O bounded threads, there may still be a performance benefit from using multithreading.

To run Python code in parallel, multiple processes have to be used. The multiprocessing module provides support for spawning processes exposed through an API similar to the threading module. Since processes may not directly access other processes’ memory, the multiprocessing module provides several means of communication between processes: Queues, Pipes and shared memory support.


Much of Parrot’s previous threading related code has been removed to clean up the code and improve performance. Since the existing threading support was known to be unreliable and seriously flawed, this was no trade off. The final parts were removed by the merging of the kill_threads branch on September, 21st 2011.

In 2010, Nat Tuck began working on a green_threads branch during his Google Summer of Code internship. The feature got prototyped using pure PIR and then implemented in Parrot’s core. He got it to work in simple cases and started to work on OS thread support but the internship ended before the code was ready to be merged into the master branch. The code lay dormant until the work on hybrid threads in the threads branch started in 2011.

In Parrot, green threads are called Tasks. Each task is assigned a fixed amount of execution time. After this time is up a timer callback sets a flag which is checked at execution of every branch operation. Since the interpreter’s state is well defined at this point, its internal consistency is guaranteed. The same holds for the GC. Since task preemption is only done while executing user-level code, the GC can do its work undisturbed and without the need for measures like locking. Since user-level code is allowed to disable the scheduler, it can be guaranteed to run undisturbed through critical sections.

The scheduler is implemented as a PMC type. This allows the user to subclass this PMC thus allowing fine-grained control over the scheduling policy. Features, a user could add this way would be for example giving different priorities to tasks or implementing the possibility to suspend and resume a task.

Shared data

Cross-thread writes to shared variables may endanger the internal consistency of the interpreter. Traditionally, the solution to this problem is the use of locks of varying granularity. Fine-grained locking allows code to run in parallel but taking and releasing locks costs performance. It not only increases the instruction count and memory accesses but it also forces the CPU cores to coordinate and thus communicate. Even a seemingly simple operation like an atomic increment can take two orders of magnitude longer than a normal increment. While the gain through being able to utilize multiple CPU cores may offset this cost, it is still impacting the common case of having only a single thread running.

Too coarse locking on the other hand would reduce scalability and the performance gains through parallel execution by having threads wait for extended periods for locks to become available. In the extreme case of having a global interpreter lock it would effectively serialize all computations costing much of the benefits of using threads in the first place.

The other problem with locking is the possibility of introducing deadlocks. For example, two functions F1 and F2 both use two resources A and B protected by locks. If F1 first locks A and then tries to lock B while F2 has already locked B and is now trying to lock A, the program would come to a halt. Both functions would be left waiting for the other to unlock the resource which will never happen. With fine-grained locking, the possibilities for such bugs grow quickly. At the same time, it is easy to miss a case where a lock would be appropriate leading to difficult to diagnose corruption bugs.

The solution for these problems implemented in hybrid threads is to sidestep them altogether by disallowing write access to shared variables. The programmer (or in most cases the compiler) is obliged to declare a list of all shared variables before a newly created task is started. The interpreter would then create proxy objects for these variables which the task can use to access the data. These proxies contain references to the original objects. They use these references to forward all reading vtable functions to the originals. Write access on the other hand would lead to a runtime error.

In other words, all data is owned by the thread creating it and only the owner may write to it. Other threads have only read access.

For threads to be able to communicate with their creators and other threads, they still need to write to shared variables. This is where green threads come into play. Since green threads are light weight, it is feasible for a thread to create a task just for updating a variable. This task is scheduled on the interpreter owning this variable. To reduce latency, the task is flagged to run immediately. The data-owning interpreter will preempt the currently running task and process the new write task. Put another way, the data-owning interpreter is told what to write to its variables, so other threads don’t have to.


Proxies are the arbiters between threads. They are the only means for a thread to access another thread’s data and are implemented by the Proxy PMC type.

Proxy has default implementations for all functions, writing functions raise a cant_do_write_method runtime exception.  If a method returns a PMC from the target’s interp, another proxy object has to be created and wrapped around it so it can be safely returned to the caller.


The Sub PMC represents executable subroutines. A Sub does not only contain the code to execute but also the context in which to execute the code such as visible globals and namespaces.  If a proxy to such a Sub were created and invoke called on it, the code would access this context directly since it belongs to the same interp as the proxied Sub itself.  Thus, an operation like get_global fetches a global from an unproxied namespace and an unproxied global is be put into the target register.  Since this is happening while running invoke on the original Sub, Proxy cannot intercept the call and create a Proxy for the result.

This is the reason why Parrot_thread_create_proxy does not create a Proxy for a Sub but uses Parrot_thread_create_local_sub to create a copy on the thread’s interp with proxies for all PMC attributes.

Writing to shared variables

To write to shared variables, a thread creates a task and schedules it on the data owning interpreter. An example task looks like this:

    .sub write_to_variable
         .param pmc variable
         variable = 1

This is a subroutine with just one parameter. The variable passed as this parameter is the one the task should write to. In this case the constant value 1 would be written to the variable. In PIR, an assignment to a PMC gets translated to a method call. In this case, the set_integer_native is called changing the variable’s value. Since PMCs are passed by reference, it is the original variable which gets written to.

Code to create the task looks like:

    1    write_task = new ['Task']
    2    setattribute write_task, 'code', write_to_variable
    3    setattribute write_task, 'data', shared_variable
    4    interp.'schedule_proxied'(write_task, shared_variable)

Line 1 creates a new task object. The example subroutine is used for the task’s code attribute. shared_variable is used for data. At this point, shared_variable is actually the proxy object created for the shared integer PMC. The interpreter object contains a schedule_proxied method which is used to schedule the write_task on the thread owning the original variable.

schedule_proxied uses Parrot_thread_create_local_task which in this case detects that the data given as parameter for the task’s code is actually a proxy already and unwraps the proxied object. Parrot_cx_schedule_immediate is then used to make the data owning interpreter execute the task as soon as possible.

To protect a critical section, preemption can be disabled so the critical section runs uninterrupted:

    1 .sub swap_variables
    2     .param pmc a, b
    3     .local temp
    4     disable_preemption
    5     temp = a
    6     a = b
    7     b = temp
    8     enable_preemption
    9 .end


Using tasks to write to shared variables makes such actions inherently asynchronous. This is not always what is needed by the implemented algorithm. For example, when the shared variable is a lock, processing should continue as soon as it’s acquired. The wait operation is used to wait for a task’s completion. The waiting task is added to the waited for task’s waiters list and preempted immediately. When a task finishes, all the tasks in the waiters list are scheduled again for execution. Since for each task a local copy is created on the target thread, the running task not only checks its own waiters list but also its partner’s.

If a task on the main thread was waiting for a task on another thread to finish and no other tasks are in the scheduler’s queue on the main thread, the main thread exits if no alarms are pending. To prevent this unintended exit, all tasks are added to the scheduler’s foreign_tasks list when they are scheduled on other threads. To end the program with other threads still running, an explicit exit operation has to be used.


Preliminary benchmarks have shown Parrot’s performance to be within an order of magnitude of that of an optimized implementation in Perl 5.

Since Parrot does not yet offer the user any synchronization primitives, locks had to be implemented using a shared variable which is written to only by the main thread. Replacing this primitive method with a native semaphore implementation would probably reduce runtime to a small fraction.

Runtime comparison for matrix multiplication

                singlethreaded  computation      multithreaded   computation
    1. run          28.522 s       19.530 s        17.543 s          8.478 s
    2. run          28.427 s       19.463 s        17.320 s          8.283 s
    3. run          28.200 s       19.235 s        17.489 s          8.473 s
    average         28.383 s       19.409 s        17.451 s          8.411 s

This test implements matrix multiplication using four threads. For simplicity the second matrix only has one column. The program is written in the Winxed programming language. Winxed is a low-level language with Javascript like syntax and the possibility to include sections of PIR code verbatim making it possible to try experimental opcodes while writing more readable and concise code than with PIR alone. The complete source code is available in examples/threads/matrix_part.winxed

The program consists of the parts initialization, computation and verification. Computation is parallelized using four tasks each calculating one fourth of the result vector. Runtime is compared to a simple singlethreaded implementation. Run times were measured using the time command and are recorded in the above table.

As can be seen, the multithreaded implementation gives an average speedup of 2.31 for the computation and 1.61 in total.

Runtime comparison for Mandelbrot set calculation

                 singlethreaded  1 thread    2 threads   4 threads    8 threads
    1. run           89.931 s    89.978 s    45.813 s     24.028 s     17.445 s
    2. run           89.707 s    89.871 s    45.906 s     24.048 s     17.695 s
    3. run           90.318 s    89.839 s    45.951 s     24.049 s     17.573 s
    average          89.985 s    89.896 s    45.890 s     24.042 s     17.571 s
    speedup           1.000        1.001       1.959       3.739        5.116

The complete source code is available in examples/pir/mandel.pir

Calculating an image of the Mandelbrot set is a common benchmark for multithreading implementations since calculations of points are independent of each other and are thus easily parallelizable. A simple implementation of the escape time algorithm written in Winxed has been used to determine scalability properties of the threading implementation. The image is split into lines which are calculated alternatedly by a configured number of tasks. Run times were measured using the time command on an Intel Core i7 3770K processor with 16 GiB RAM running openSUSE 12.1 and are recorded in the Mandelbrot table. As can be seen, the implementation scales nearly linearly up to four threads reflecting the CPU’s four physical cores. Using eight threads, the speedup is only 1.368 compared to four threads but this seems to be more a limitation of the hardware than the implementation.


On IRC and on the mailing list some detailed questions have been asked.

See here:

Day 7 – MIME::Base64 – On encoded strings

parrot MIME::Base64 FixedIntegerArray: index out of bounds!

Ronaldxs created the following parrot ticket #813 4 months ago:

“Was playing with p6 MIME::Base64 and utf8 sampler page when I came across this. It seems that the parrot MIME Base64 library can’t handle some UTF-8 characters as demonstrated below.”

.sub go :main
    load_bytecode 'MIME/Base64.pbc'

    .local pmc enc_sub
    enc_sub = get_global [ "MIME"; "Base64" ], 'encode_base64'

    .local string result_encode
    result_encode = enc_sub(utf8:"\x{203e}")

    say result_encode

FixedIntegerArray: index out of bounds!
current instr.: 'parrot;MIME;Base64;encode_base64'
pc 163 (runtime/parrot/library/MIME/Base64.pir:147)

called from Sub 'go' pc 11 (die_utf8_base64.pir:8)

This was interesting, because parrot strings store the encoding information in the string. The user does not need to store the string encoding information somewhere else as in perl5, nor have to do educated guesses about the encoding. parrot supports ascii, latin1, binary, utf-8, ucs-2, utf-16 and ucs-4 string encodings natively.
So we thought we the hell cannot parrot handle simple utf-8 encoded strings?

As it turned out, the parrot implementation of MIME::Base64, which can be shared to all languages which use parrot as VM, stored the character codepoints for each character as array of integers. On multibyte encodings such as UTF-8 this leads to different data held in memory than a normal multibyte string which is encoded as the byte buffer and the additional encoding information.

Internal string representations

For example an overview of different internal string representations for the utf-8 string “\x{203e}”:

perl5 strings:

len=3, utf-8 flag, "\342\200\276" buf=[e2 80 be]

parrot strings:

len=1, bufused=3, encoding=utf-8, buf=[e2 80 be]

The Unicode tables:

U+203E	‾	e2 80 be	OVERLINE

gdb perl5

Let’s check it out:

$ gdb --args perl -e'print "\x{203e}"'
(gdb) start
(gdb) b Perl_pp_print
(gdb) c
(gdb) n 

.. until if (!do_print(*MARK, fp))

(gdb) p **MARK
$1 = {sv_any = 0x404280, sv_refcnt = 1, sv_flags = 671106052, sv_u = {
      svu_pv = 0x426dd0 "‾", svu_iv = 4353488, svu_uv = 4353488, 
      svu_rv = 0x426dd0, svu_array = 0x426dd0, svu_hash = 0x426dd0, 
      svu_gp = 0x426dd0, svu_fp = 0x426dd0}, ...}

(gdb) p Perl_sv_dump(*MARK)
ALLOCATED at -e:1 for stringify (parent 0x0); serial 301
SV = PV(0x404280) at 0x4239a8
  REFCNT = 1
  PV = 0x426dd0 "\342\200\276" [UTF8 "\x{203e}"]
  CUR = 3
  LEN = 16
$2 = void

(gdb) x/3x 0x426dd0
0x426dd0:	0xe2	0x80	0xbe

We see that perl5 does store the utf-8 flag, but not the length of the string, the utf8 length (=1), only the length of the buffer (=3).
Any other multi-byte encoded string, such as UCS-2 is stored differently. We suppose as utf-8.

We are already in the debugger, so let’s try the different cmdline argument.

(gdb) run -e'use Encode; print encode("UCS-2", "\x{203e}")'
  The program being debugged has been started already.
  Start it from the beginning? (y or n) y
Breakpoint 2, Perl_pp_print () at pp_hot.c:712
712	    dVAR; dSP; dMARK; dORIGMARK;

(gdb) p **MARK

$3 = {sv_any = 0x404b30, sv_refcnt = 1, sv_flags = 541700, sv_u = {
    svu_pv = 0x563a50 " >", svu_iv = 5651024, svu_uv = 5651024, 
    svu_rv = 0x563a50, svu_array = 0x563a50, svu_hash = 0x563a50, svu_gp = 0x563a50, 
    svu_fp = 0x563a50}, ...}

(gdb) p Perl_sv_dump(*MARK)
ALLOCATED at -e:1 by return (parent 0x0); serial 9579
SV = PV(0x404b30) at 0x556fb8
  REFCNT = 1
  PV = 0x563a50 " >"
  CUR = 2
  LEN = 16
$4 = void

(gdb) x/2x 0x563a50
0x563a50:	0x20	0x3e

But we don’t see the UTF8 flag in encode(“UCS-2”, “\x{203e}”), just the simple ascii string ” >”, which is the UCS-2 representation of [20 3e].
Because ” >” is perfectly representable as non-utf8 ASCII string.
UCS-2 is much much nicer than UTF-8, is has a fixed size, it is readable, Windows uses it, but it cannot represent all Unicode characters.

Encode::Unicode contains this nice cheatsheet:

Quick Reference
                Decodes from ord(N)           Encodes chr(N) to...
       octet/char BOM S.P d800-dfff  ord > 0xffff     \x{1abcd} ==
  UCS-2BE       2   N   N  is bogus                  Not Available
  UCS-2LE       2   N   N     bogus                  Not Available
  UTF-16      2/4   Y   Y  is   S.P           S.P            BE/LE
  UTF-16BE    2/4   N   Y       S.P           S.P    0xd82a,0xdfcd
  UTF-16LE    2/4   N   Y       S.P           S.P    0x2ad8,0xcddf
  UTF-32        4   Y   -  is bogus         As is            BE/LE
  UTF-32BE      4   N   -     bogus         As is       0x0001abcd
  UTF-32LE      4   N   -     bogus         As is       0xcdab0100
  UTF-8       1-4   -   -     bogus   >= 4 octets   \xf0\x9a\af\8d

gdb parrot

Back to parrot:

If you debug parrot with gdb you get a gdb pretty-printer thanks to Nolan Lum, which displays the string and encoding information automatically.
In perl5 you have to call Perl_sv_dump with or without the my_perl as first argument, if threaded or not. With a threaded perl, e.g. on Windows you’d need to call p Perl_sv_dump(my_perl, *MARK).
In parrot you just ask for the value and the formatting is done with a gdb pretty-printer plugin.
The string length is called strlen (of the encoded string), the buffer size is called bufused.

Even in a backtrace the string arguments are displayed abbrevated like this:

#3  0x00007ffff7c29fc4 in utf8_iter_get_and_advance (interp=0x412050, str="utf8:� [1/2]", 
    i=0x7fffffffdd00) at src/string/encoding/utf8.c:551
#4  0x00007ffff7a440f6 in Parrot_str_escape_truncate (interp=0x412050, src="utf8:� [1/2]",
    limit=20) at src/string/api.c:2492
#5  0x00007ffff7b02fb3 in trace_op_dump (interp=0x412050, code_start=0x63a1c0, pc=0x63b688)
    at src/runcore/trace.c:450

[1/2] means strlen=1 bufused=2
Each non-ascii or non latin-1 encoded string is printed with the encoding prefix.
Internally the encoding is of course a index or pointer in the table of supported encodings.

You can set a breakpoint to utf8_iter_get_and_advance and watch the strings.

(gdb) r t/library/mime_base64u.t
Breakpoint 1, utf8_iter_get_and_advance (interp=0x412050, str="utf8:\\x{00c7} [8/8]", 
                i=0x7fffffffcd40) at src/string/encoding/utf8.c:544
(gdb) p str
$1 = "utf8:\\x{00c7} [8/8]"
(gdb) p str->bufused 
$3 = 8
(gdb) p str->strlen
$4 = 8
(gdb) p str->strstart
$5 = 0x5102d7 "\\x{00c7}"

This is escaped. Let’s advance to a more interesting utf8 string in this test, i.e. until str=”utf8:Ā [1/2]”
You get the members of a struct with tab-completion, i.e. press <TAB> after p str->

(gdb) p str->
_buflen    _bufstart  bufused    encoding   flags      hashval    strlen     strstart
(gdb) p str->strlen
$9 = 8

(gdb) dis 1
(gdb) b utf8_iter_get_and_advance if str->strlen == 1
(gdb) c
Breakpoint 2, utf8_iter_get_and_advance (interp=0x412050, str="utf8:Ā [1/2]", 
                i=0x7fffffffcd10) at src/string/encoding/utf8.c:544
544	    ASSERT_ARGS(utf8_iter_get_and_advance)

(gdb) p str->strlen
$10 = 1
(gdb) p str->strstart
$11 = 0x7ffff7faeb58 "Ā"
(gdb) x/2x str->strstart
0x7ffff7faeb58:	0xc4	0x80
(gdb) p str->encoding
$12 = (const struct _str_vtable *) 0x7ffff7d882e0
(gdb) p *str->encoding

$13 = {num = 3, name = 0x7ffff7ce333f "utf8", name_str = "utf8", bytes_per_unit = 1,
  max_bytes_per_codepoint = 4, to_encoding = 0x7ffff7c292b0 <utf8_to_encoding>, chr =
  0x7ffff7c275c0 <unicode_chr>, equal = 0x7ffff7c252e0 <encoding_equal>, compare =
  0x7ffff7c254e0 <encoding_compare>, index = 0x7ffff7c25690 <encoding_index>, rindex
  = 0x7ffff7c257a0 <encoding_rindex>, hash = 0x7ffff7c25a20 <encoding_hash>, scan =
  0x7ffff7c29380 <utf8_scan>, partial_scan = 0x7ffff7c29460 <utf8_partial_scan>, ord
  = 0x7ffff7c297e0 <utf8_ord>, substr = 0x7ffff7c25de0 <encoding_substr>, is_cclass =
  0x7ffff7c26000 <encoding_is_cclass>, find_cclass =
  0x7ffff7c260e0 <encoding_find_cclass>, find_not_cclass =
  0x7ffff7c26220 <encoding_find_not_cclass>, get_graphemes =
  0x7ffff7c263d0 <encoding_get_graphemes>, compose =
  0x7ffff7c27680 <unicode_compose>, decompose = 0x7ffff7c26450 <encoding_decompose>,
  upcase = 0x7ffff7c27b20 <unicode_upcase>, downcase =
  0x7ffff7c27be0 <unicode_downcase>, titlecase = 0x7ffff7c27ca0 <unicode_titlecase>,
  upcase_first = 0x7ffff7c27d60 <unicode_upcase_first>, downcase_first =
  0x7ffff7c27dc0 <unicode_downcase_first>, titlecase_first =
  0x7ffff7c27e20 <unicode_titlecase_first>, iter_get =
  0x7ffff7c29c40 <utf8_iter_get>, iter_skip = 0x7ffff7c29d60 <utf8_iter_skip>,
  iter_get_and_advance = 0x7ffff7c29eb0 <utf8_iter_get_and_advance>,
  iter_set_and_advance = 0x7ffff7c29fd0 <utf8_iter_set_and_advance>}


$ perl -MMIME::Base64 -lE'$x="20e3";$s="\x{20e3}";
  printf "0x%s\t%s=> %s",$x,$s,encode_base64($s)'
Wide character in subroutine entry at -e line 1.

Oops, I’m clearly a unicode perl5 newbie. Does my term not understand utf-8?

$ echo $TERM

No, it should. encode_base64 does not understand unicode.
perldoc MIME::Base64
“The base64 encoding is only defined for single-byte characters. Use the Encode module to select the byte encoding you want.”

Oh my! But it is just perl5. It just works on byte buffers, not on strings.
perl5 strings can be utf8 and non-utf8. Why on earth an utf8 encoded string is disallowed and only byte buffers of unknown encodings are allowed goes beyond my understanding, but what can you do. Nothing. base64 is a binary only protocol, based on byte buffers. So we decode it manually to byte buffers. The Encode API for decoding is called encode.

$ perl -MMIME::Base64 -MEncode -lE'$x="20e3";$s="\x{20e3}";
  printf "0x%s\t%s=> %s",$x,$s,encode_base64(encode('utf8',$s))'
Wide character in printf at -e line 1.
0x20e3	=> 4oOj

This is now the term warning I know. We need -C

$ perldoc perluniintro

$ perl -C -MMIME::Base64 -MEncode -lE'$x="20e3";$s="\x{20e3}";
  printf "0x%s\t%s=> %s",$x,$s,encode_base64(encode('utf8',$s))'
0x20e3	=> 4oOj

Over to rakudo/perl6 and parrot:

$ cat >m.pir << EOP
.sub main :main
    load_bytecode 'MIME/Base64.pbc'
    $P1 = get_global [ "MIME"; "Base64" ], 'encode_base64'
    $S1 = utf8:"\x{203e}"
    $S2 = $P1(s1)
    say $S1
    say $S2

$ parrot m.pir
FixedIntegerArray: index out of bounds!
current instr.: 'parrot;MIME;Base64;encode_base64'
                pc 163 (runtime/parrot/library/MIME/Base64.pir:147)

The perl6 test, using the parrot library, from

$ git clone git://
Cloning into 'perl6-Enc-MIME-Base64'...

$ PERL6LIB=perl6-Enc-MIME-Base64/lib perl6 <<EOP
use Enc::MIME::Base64;
say encode_base64_str("\x203e");

> use Enc::MIME::Base64;
> say encode_base64_str("\x203e");
FixedIntegerArray: index out of bounds!

The pure perl6 workaround:

$ PERL6LIB=perl6-Enc-MIME-Base64/lib perl6 <<EOP
use PP::Enc::MIME::Base64;
say encode_base64_str("\x203e");

> use PP::Enc::MIME::Base64;
> say encode_base64_str("\x203e");

Wait. perl6 creates a different enoding than perl5?
What about coreutils base64 command.

$ echo -n "‾" > m.raw
$ od -x m.raw
0000000 80e2 00be
$ ls -al m.raw
-rw-r--r-- 1 rurban rurban 3 Dec  6 10:23 m.raw
$ base64 m.raw

[80e2 00be] is the little-endian version of [e2 80 be], 3 bytes, flipped.
Ok, at least base64 agrees with perl6, and I must have made some encoding mistake with perl5.

Back to debugging our parrot problem:

parrot unlike perl6 has no debugger yet. So we have to use gdb, and we need to know in which function the error occured. We use the parrot -t trace flag, which is like the perl5 debugging -Dt flag, but it is always enabled, even in optimized builds.

$ parrot --help
    -t --trace [flags] 
$ parrot --help-debug
--trace -t [Flags] ...
    0001    opcodes
    0002    find_method
    0004    function calls

$ parrot -t7 m.pir
009f band I9, I2, 63         I9=0 I2=0 
00a3 set I10, P0[I5]         I10=0 P0=FixedIntegerArray=PMC(0xff7638) I5=[2063]
016c get_results PC2 (1), P2 PC2=FixedIntegerArray=PMC(0xedd178) P2=PMCNULL
016f finalize P2             P2=Exception=PMC(0x16ed498)
0171 pop_eh
lots of error handling
0248 callmethodcc P0, "print" P0=FileHandle=PMC(0xedcca0) 
FixedIntegerArray: index out of bounds!

We finally see the problem, which matches the run-time error.

00a3 set I10, P0[I5]         I10=0 P0=FixedIntegerArray=PMC(0xff7638) I5=[2063]

We want to set I10 to the I5=2063’th element in the FixedIntegerArray P0, and the array is not big enough.

After several hours of analyzing I came to the conclusion that the parrot library MIME::Base64 was wrong by using ord of every character in the string. It should use a bytebuffer instead.
Which was fixed with commit 3a48e6. ord can return integers > 255, but base64 can only handle chars < 255.

The fixed parrot library was now correct:

$ parrot m.pir

But then the tests started failing. I spent several weeks trying to understand why the parrot testsuite was wrong with the mime_base64 tests, the testdata came from perl5. I came up with different implementation hacks which would match the testsuite, but finally had to bite the bullet, changing the tests to match the implementation.

And I had to special case the tests for big-endian, as base64 is endian agnostic. You cannot decode a base64 encoded powerpc file on an intel machine, when you use multi-byte characters. And utf-8 is even more multi-byte than ucs-2. I had to accept the fact the big-endian will return a different encoding. Before the results were the same. The tests were written to return the same encoding on little and big-endian.


The first reason why I wrote this blog post was to show how to debug into crazy problems like this, when you are not sure if the core implementation, the library, the spec or the tests are wrong. It turned out, that the library and the tests were wrong.
You saw how easily you could use gdb to debug into such problems, as soon as you find out a proper breakpoint.

The internal string representations looked like this:

MIME::Base64 internally:

len=1, encoding=utf-8, buf=[3e20]

and inside the parrot imcc compiler the SREG

len=8, buf="utf-8:\"\x{203e}\""

parrot is a register based runtime, and a SREG is the string representation of the register value. Unfortunately a SREG cannot hold the encoding info yet, so we prefix the encoding in the string, and unquote it back. This is not the reason why parrot is still slower than the perl5 VM. I benchmarked it. parrot still uses too much sprintf’s internally and the encoding quote/unquoting counts only for a 4th of the time of the sprintf gyrations.
And parrot function calls are awfully slow and de-optimized.

The second reason is to explain the new decode_base64() API, which only parrot – and therefore all parrot based languages like rakudo – now have got.

decode_base64(str, ?:encoding)

“Decode a base64 string by calling the decode_base64() function.
This function takes as first argument the string to decode, as optional second argument the encoding string for the decoded data.
It returns the decoded data.

Any character not part of the 65-character base64 subset is silently ignored.
Characters occurring after a ‘=’ padding character are never decoded.”

So decode_base64 got now a second optional encoding argument. The src string for encode_base64 can be any encoding and is automatically decoded to a bytebuffer. You can easily encode an image or unicode string without any trouble, and for the decoder you can define the wanted encoding beforehand. The result can be the encoding binary or utf-8 or any encoding you prefer, no need for additional decoding of the result. The default encoding of the decoded string is either ascii, latin-1 or utf-8. parrot will upgrade the encoding automatically.

You can compare the new examples of pir against the perl5 version:


.sub main :main
    load_bytecode 'MIME/Base64.pbc'

    .local pmc enc_sub
    enc_sub = get_global [ "MIME"; "Base64" ], 'encode_base64'

    .local string result_encode
    # GH 814
    result_encode = enc_sub(utf8:"\x{a2}")
    say   "encode:   utf8:\"\\x{a2}\""
    say   "expected: wqI="
    print "result:   "
    say result_encode

    # GH 813
    result_encode = enc_sub(utf8:"\x{203e}")
    say   "encode:   utf8:\"\\x{203e}\""
    say   "expected: 4oC+"
    print "result:   "
    say result_encode



use MIME::Base64 qw(encode_base64 decode_base64);
use Encode qw(encode);

my $encoded = encode_base64(encode("UTF-8", "\x{a2}"));
print  "encode:   utf-8:\"\\x{a2}\"  - ", encode("UTF-8", "\x{a2}"), "\n";
print  "expected: wqI=\n";
print  "result:   $encoded\n";
print  "decode:   ",decode_base64("wqI="),"\n\n"; # 302 242

my $encoded = encode_base64(encode("UTF-8", "\x{203e}"));
print  "encode:   utf-8:\"\\x{203e}\"  -> ",encode("UTF-8", "\x{203e}"),"\n";
print  "expected: 4oC+\n";
print  "result:   $encoded\n"; # 342 200 276
print  "decode:   ",decode_base64("4oC+"),"\n";

for ([qq(a2)],[qq(c2a2)],[qw(203e)],[qw(3e 20)],[qw(1000)],[qw(00c7)],[qw(00ff 0000)]){
    $s = pack "H*",@{$_};
    printf "0x%s\t=> %s", join("",@{$_}), encode_base64($s);


use Enc::MIME::Base64;
say encode_base64_str("\xa2");
say encode_base64_str("\x203e");