Day 1 – Porting Vigilance, integrating Perl 6 with standard tools

Greetings everyone, today we’ll be taking an infrastructural script and port it from Perl 5 to Perl 6. This article is based on a pair of posts by James Clark, which you can find here:

This script is used to create and verify MD5 sums. These are 128-bit values that can be used to verify data integrity. While MD5 has been proven to be insecure in protecting against malicious actors, it is still useful for detecting on-disk corruption.

The Perl 6 ecosystem is growing and contains a variety of tools that are either ported from the Perl 5 CPAN, or are replacements. I’ll walk through a few aspects of the original script and my port and show why I make some specific changes. Hopefully this will encourage you to go out and port your own little scripts.

Shebang and imports

The Perl 5 version uses some basic necessities and a few utilities for working with Unicode and making the command line output nicer:

#!/usr/bin/perl -CSDA

use strict;
use warnings;
use utf8;
use Encode qw/encode_utf8 decode_utf8/;
use Getopt::Long;
use Digest::MD5;
use Term::ANSIColor;
use Term::ProgressBar;
use File::Find;
use File::Basename;
use Data::Dumper;

Perl 6 already has warnings and strictures enabled by default and has built-in Unicode support, so we can leave those off. Data::Dumper is already implemented as well, and it has very useful IO functionality. Adding all that together we can get away with a very lean head:

#!/usr/bin/env perl6
use v6;

use Digest::MD5;
use Terminal::ANSIColor;
use Terminal::Spinners;

Command line options

Perl 5 has a number of great modules for handling command line arguments, in our original scripts we used Getopt::Long:

# Define our command-line arguments.
my %opts = ( 'blocksize' => 16384 );
GetOptions(\%opts, "verify=s", "create=s", "update=s", "files", "blocksize=s", "help!");

In Perl 6 we can define command line options straight in our MAIN methods. We use multiple dispatch to steer the execution of the script based on the arguments passed:

multi MAIN (Str :$create, *@files where { so @files }) { ... }

multi MAIN (Str :$update, *@files) { ... }

multi MAIN (Str :$verify, *@files) { ... }

multi MAIN (*@files where { so @files }) { ... }

This also means we don’t have to define a help option/sub because we can document our MAIN subs, thus:

#| Verify the MD5 sums in a file that conforms to md5sum output:
#|   
multi MAIN (Str :$verify, *@files) { ... }

You might have noticed that the Perl 6 version doesn’t define a blocksize option, I’ll come back to that.

IO: reading and writing files

We store the checksums in a file where each line is formatted like the output of the md5sum program from the GNU coreutils: 32 hexadecimal digits, two spaces, and the filename.

Some basic IO and we use regexes to parse each line. Using significant whitespace helps keep each regex fairly terse:

sub load_md5sum_file
{
	my ($filename) = @_;
	my @plan;
	
	open(my $fh, '<:utf8', $filename) or die "Couldn't open '$filename' : $!\n";
	my $linenum = 0;
	while (my $line = <$fh>) {
		chomp $line;
		$linenum++;
		if ($line =~ /^(?\p{ASCII_Hex_Digit}{32})  (?.*)$/) {
			# Checksum and filename compatible with md5sum output.
			push @plan, create_plan_for_filename($+{filename}, $+{md5});
			
		} elsif ($line =~ /^(?\p{ASCII_Hex_Digit}{32})  (?.*)$/) {
			# Checksum and filename compatible with md5sum's manpage but not valid for the actual program.
			# We'll use it, but complain.
			print STDERR colored("Warning: ", 'bold red'), colored("md5sum entry '", 'red'), $line, colored("' on line $linenum of file $filename is using only one space, not two - this doesn't match the output of the actual md5sum program!.", 'red'), "\n";
			push @plan, create_plan_for_filename($+{filename}, $+{md5});
			
		} elsif ($line =~ /^\s*$/) {
			# Blank line, ignore.
			
		} else {
			# No idea. Best not to keep quiet, it could be a malformed checksum line and we don't want to just quietly skip the file if so.
			print STDERR colored("Warning: ", 'bold red'), colored("Unrecognised md5sum entry '", 'red'), $line, colored("' on line $linenum of file $filename.", 'red'), "\n";
			push @plan, { error => "Unrecognised md5sum entry" };
		}
	}
	close($fh) or die "Couldn't close '$filename' : $!\n";
	
	return @plan;
}

Perl 6 allows us to verify that we pass an actually existing file via the signature. Furthermore we replace the regex with a grammar that we can use at different places in the script if needed:

grammar MD5SUM {
	token TOP        { <md5> <spacer> <filehandle> }
	token md5        { <xdigit> ** 32 }
	token spacer     { \s+ }
	token filehandle { .* }
}

sub load-md5sum-file (Str $filehandle where { $filehandle.IO.f }) {
	my MD5Plan @plans;

	PARSE: for $filehandle.IO.lines(:close) -> $line {
		next PARSE if !$line; # We don't get worked up over blank lines.

		my $match = MD5SUM.parse($line);

		if (!$match) {
			say $*ERR: colored("Couldn't parse $line", $ERROR_COLOUR);
			next PARSE;
		}

		if (!$match<filehandle>.IO.f) {
			say $*ERR: colored("{ $match<filehandle> } isn't an existing file.", $ERROR_COLOUR);
			next PARSE;
		}

		if ($match<spacer>.chars == 2) {
			@plans.push(MD5Plan.new($match<filehandle>.Str, $match<md5>.Str));
		}
		else {
			say $*ERR: colored("'$line' does not match the output of md5sum: wrong number of spaces.", $WARNING_COLOUR);
			@plans.push(MD5Plan.new($match<filehandle>.Str, $match<md5>.Str));
		}
	}

	 return @plans;
}

Writing out data is pretty similar:

sub save_md5sum_file
{
	my ($filename, @plan) = @_;
	
	my $fh;
	unless (open($fh, '>:utf8', $filename)) {
		...
	}
	foreach my $plan_entry (@plan) {
		next unless $plan_entry->{correct_md5} && $plan_entry->{filename};
		print $fh "$plan_entry->{correct_md5}  $plan_entry->{filename}\n";
	}
	close($fh) or die "Couldn't close '$filename' : $!\n";
}

Worthy of note is that Perl 6 by default writes files in Unicode:

sub save-md5sum-file (Str $filehandle, @plans) {
	my $io = $filehandle.IO.open: :w;

	WRITE: for @plans -> $plan {
		next WRITE unless $plan.computed-md5 && $plan.filehandle;

		$io.say("{ $plan.computed-md5 }  { $plan.filehandle }");
	}

	$io.close;
}

Getting the MD5 sums

The Perl 5 version of Digest::MD5 uses a fair bit of XS to be very performant. Included in the XS are methods to add data in chunks to be parsed en masse. This allows us to use ProgressBar to show us the progress while the user is waiting:

sub run_md5_file
{
	my ($plan_entry, $progress_fn) = @_;
	
	# We use the OO interface to Digest::MD5 so we can feed it data a chunk at a time.
	my $md5 = Digest::MD5->new();
	my $current_bytes_read = 0;
	my $buffer;
	$plan_entry->{start_time} = time();
	$plan_entry->{elapsed_time} = 0;
	$plan_entry->{elapsed_bytes} = 0;
	
	# 3 argument form of open() allows us to specify 'raw' directly instead of using binmode and is a bit more modern.
	open(my $fh, '<:raw', $plan_entry->{filename}) or die "Couldn't open file $plan_entry->{filename}, $!\n";
	
	# Read the file in chunks and feed into md5.
	while ($current_bytes_read = read($fh, $buffer, $opts{blocksize})) {
		$md5->add($buffer);
		$plan_entry->{elapsed_bytes} += $current_bytes_read;
		$plan_entry->{elapsed_time} = time() - $plan_entry->{start_time};
		&$progress_fn($plan_entry->{elapsed_bytes});
	}
	# The loop will exit as soon as read() returns 0 or undef. 0 is normal EOF, undef indicates an error.
	die "Error while reading $plan_entry->{filename}, $!\n" if ( ! defined $current_bytes_read);
	
	close($fh) or die "Couldn't close file $plan_entry->{filename}, $!\n";
	
	# We made it out of the file alive. Store the md5 we computed. Note that this resets the Digest::MD5 object.
	$plan_entry->{computed_md5} = $md5->hexdigest();
}

The Perl 6 version uses pure Perl and lacks the add functionality, so I use a spinner instead of a progress bar. We also need to set our encoding specifically to avoid the errors we get when reading binary data as Unicode:

sub calc-md5-sum (MD5Plan $plan) {
    my $md5 = Digest::MD5.new;

    print "Calculating MD5 sum for { $plan.filehandle }       "; # We need some space for the spinner to take up.
	                                                             # I like 'bounce', so I need 6 spaces for the spinner
	                                                             # + an extra one to separate it from the filehandle.

	my Buf $buffer = $plan.filehandle.IO.slurp(:close, :bin);

	my $decoded = $buffer.decode('iso-8859-1');

	my $spinner = Spinner.new(type => 'bounce');

	my $promise = Promise.start({
		$md5.md5_hex($decoded)
	});

	until $promise.status {
		$spinner.next;
	}

	say ''; # Add a new line after the spinner.

	$plan.computed-md5 = $promise.result;
}

Closing thoughts

I am not using the Perl 6 version as-is on my systems because of the low performance of Digest::MD5, on my system I replace it with calls to md5sum. Other possibilities would be to use Inline::Perl5 and the Perl 5 version of Digest::MD5, or using the amazing Perl 6 native calling interface to run a C implementation. I hope this article has inspired you to port some of your own Perl 5 scripts to Perl 6, or at least gives you some tips for command line interactions.

Thanks

Thanks to everyone in the Perl 6 Facebook group for being so eager to help, and specifically to Daniel Green and Jason Doege for the fast response. Also thanks to Tom Browder and sena_kun for helping to fix up the formatting.

9 thoughts on “Day 1 – Porting Vigilance, integrating Perl 6 with standard tools

  1. Nice. Thinking about a NativeCall implementation, which is the most common library that provides this functionality? I’m guessing OpenSSL.

    1. Depending on how fancy you want to be, you could do more with it (I could do file validation, for example), but I wanted to keep the article somewhat succinct.

  2. I want to improve our Perl 6 Advent instructions for authors. Could you provide me a link to the final markdown source file you used for the article? I would also appreciate getting any feedback from you on any problems you may have had with the entire process and any suggestions you have for improving the process. Thanks!

  3. Dude, thanks! I was meaning to try a fresh attempt at a Perl 6 version since _forever_ but have been busy with work that keeps me in a Perl 5 headspace. Really surprised to see my old blogpost referenced here in the advent calendar. You’ve done great work explaining the porting process too!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s