Day 6 – On Opening Files and Contributing to Open Source Projects

Why do people contribute to open source projects? Some do it for fun, some for fame and some for fortune by actually getting paid to do such work. However, probably the most important factor is scratching your own itch: You want something done, and make it happen.

That was the position I found myself in earlier this year: I wanted to use Perl 6 to read a binary file and patch it in-place. According to the documentation, I should have been able to do so, as Rakudo claimed to support opening files in the modes read-only :r, write-only :w, append :a and read-write :rw, and I assumed the last one would do the trick.

Not so: The code silently adjusted the flag :rw to :w, and I was left hanging.

After a bit of bikeshedding, a design emerged that I was happy with, and as it was merged upstream a month and a half later, others seemed to agree. But while I added an extensive commit message to lay out the new system in its glory, slacker that I am, it was left for others to perfom the tedious tasks of writing tests and documentation.

As you might expect, that hope never materialized, and the extended open modes I introduced became something of an Easter egg: Undocumented, untested and only discoverable by reading the commit log or the code itself. I may very well be the only person who has actually made use of them.

Fast-forward to December: Here we are, Christmas draws near, and with it the first proper release of Perl 6. As a sometimes-good on-and-off-again citizen of the Perl 6 community, I decided to put on my big boy pants and do the Right Thing: Some tests have now been written (which makes the new open modes an official part of the 6.c language release), and the documentation will be updated soon-ish to conform to what is actually implemented instead of what had been planned to be implemented eventually.

Having said all that, let us now take a look at what the fuss is all about:

# not the actual signature as &open delegates to
sub open(
    IO() $path,
    :$mode, :$create, :$append, :$truncate, :$exclusive,
    :$r, :$w, :$x, :$a, :$rw, :$rx, :$ra, :$update,
    :$bin, :$enc = 'utf8',
    :$nl-in = ["\x0A", "\r\n"], :$nl-out = "\n", :$chomp = True
--> IO::Handle)

Isn’t it a thing of beauty? Perhaps not, but it’s also not as scary as it looks:

The only required argument is the $path, which must be something we can call .IO on. The common example would be a string that holds the name of a file you want to access.

One design goal behind the additional open modes was to support all the features of fopen(3) as specified by the C 11 standard library. To keep things sane, inspiration was drawn from open(2) and the flags specified by POSIX.

The first row of named arguments lists the POSIX-inspired ones. Here, :$mode may take the values 'ro', 'wo' and 'rw', the rest are boolean flags. If I did my job well, you will rarely need to use them – instead, the single- and double-letter variants listed on the next line should suffice.

Read-only mode :r is the default. The three write-only modes are :w, which truncates the file if it aready exists (and is the only shorthand mode that does so implicitly), exclusive mode :x, which fails if the file already exists (and thus can be used to implement a poor man’s locking mechanism), and append mode :a, which adds content at the end of a file.

If you want to read and write to a file at the same time (as I did), you may combine the flag :r with any of :w, :x or :a. For convenience, this may also be spelled :rw, :rx, :ra. Note that the effect of providing :r,:w (or its alias :rw) is not a combination of the effects of :r and :w – existing files will not be truncated as one might expect, which is a deliberate departure from the pattern.

Aside from :r, all shorthand modes will implicitly create the file if it doesn’t already exist. In contrast, the fopen(3) mode "r+" (known to Perl 5 programmers as "+<") does not. In Perl 6, this mode is now known as :update, corresponding to the low-level :mode<rw>. In contrast, :rw maps to :mode<rw>,:create, which does not have a direct equivalent in either C 11 or Perl 5.

The :bin flag and :$enc argument control whether the file should be opened in binary mode or text mode with given encoding. By default, files are assumed to be text files encoded as UTF-8. This is different from what the documentation tells you (there has never been autodetection of Unicode encodings as far as I’m aware), and binary files also do not return buffers instead of strings when processing the file line-by-line. I’ve raised that issue with The Man (translation: talked on IRC about it) and a pull request has been sent. We’ll have to wait and see what develops on that front in the remaining weeks leading up to the party.

Finally, the last line of named arguments lists those that control line-based access in general and the behaviour of the .get and .lines methods in particular. The :$nl-in argument allows you to provide a list of strings that should be considered line separators, the :$nl-out argument controls what gets written to disk when you request a newline, eg when using the methods .say, .put or .print-nl. If the final argument :$chomp is set to True (which is the default), line separators will be discarded instead of being included at the end of the strings returned by .get or .lines. As a side note, if you find chomp => False too much of a burden to type, Perl 6 supports the shorthand notation :!chomp.

For those of you still with me: Congratulations, you now know how to open files in Perl 6, and all that remains for me to say to you is this:

Have a happy St.Nicholas Day and a fun time playing with Rakudo!

PS: As to what you can do with a file handle once it has been opened, I’ll leave that as an exercise to the reader. But note that while Perl wants to make hard things possible, it also tries to keep easy things easy – which means that you can use interfaces like slurp, spurt and lines (or their method forms .IO.slurp, .IO.spurt, .IO.lines) without having to manually open (and close!) anything.