Day 21 – Collatz Variations

by

The Collatz sequence is one of those interesting “simple” math problems that I’ve run into a number of times. Most recently a blog post on programming it in Racket showed up on Hacker News. As happens so often, I instantly wanted to implement it in Perl 6.

sub collatz-sequence(Int $start) { 
    $start, { when * %% 2 { $_ / 2 }; when * !%% 2 { 3 * $_ + 1 }; } ... 1;
}

sub MAIN(Int $min, Int $max) {
    say [max] ($min..$max).map({ +collatz-sequence($_) });        
}

This is a very straightforward implementation of the Racket post’s max-cycle-length-range as a stand-alone p6 script. collatz-sequence generates the sequence using the p6 sequence operator. Start with the given number. If it is divisible by two, do so: when * %% 2 { $_ / 2 }. If it is not, multiply by three and add 1: when * !%% 2 { 3 * $_ + 1 }. Repeat this until the sequence reaches 1.

MAIN(Int $min, Int $max) sets up our main function to take two integers. Many times I don’t bother with argument types in p6, but this provides a nice feedback for users:

> perl6 collatz.pl blue red
Usage:
  collatz.pl <min> <max> 

The core of it just maps the numbers from $min to $max (inclusive) to the length of the sequence (+collatz-sequence) and then says the max of the resulting list ([max]).

Personally I’m a big fan of using the sequence operator for tasks like this; it directly represents the algorithm constructing the Collatz sequence in a simple and elegant fashion. On the other hand, you should be able to memoize the recursive version for a speed increase. Maybe that would give it an edge over the sequence operator version?

Well, I was wildly wrong about that.

sub collatz-length($start) {
    given $start {
        when 1       { 1 }
        when * !%% 2 { 1 + collatz-length(3 * $_ + 1) } 
        when * %% 2  { 1 + collatz-length($_ / 2) } 
    }
}

sub MAIN($min, $max) {
    say [max] ($min..$max).map({ collatz-length($_) });        
}

This recursive version, which makes no attempt whatsoever to be efficient, is actually better than twice as fast as the sequence operator version. In retrospect, this makes perfect sense: I was worried about the recursive version making a function call for every iteration, but the sequence version has to make two, one to calculate the next iteration and the other to check and see if the ending condition has been reached.

Well, once I’d gotten this far, I thought I’d better do things correctly. I wrote two framing scripts, one for timing all the available scripts, the other for testing them to make sure they work!

my @numbers = 1..200, 10000..10200;

sub MAIN(Str $perl6, *@scripts) {
    my %results;
    for @scripts -> $script {
        my $start = now;
        qqx/$perl6 $script { @numbers }/;
        my $end = now;

        %results{$script} = $end - $start;
    }

    for %results.pairs.sort(*.value) -> (:key($script), :value($time)) {
        say "$script: $time seconds";
    }
}

This script takes as an argument a string that can be used to call a Perl 6 executable and a list of scripts to run. It runs the scripts using the specified executable, and times them using p6’s now function. It then sorts the results into order and prints them. (A similar script I won’t post here tests each of them to make sure they are returning correct results.)

In the new framework, the Collatz script has changed a bit. Instead of taking a min and a max value and finding the longest Collatz sequence generated by a number in that range, it takes a series of numbers and generates and reports the length of the sequence for each of them. Here’s the sequence operator script in its full new version:

sub collatz-length(Int $start) { 
    +($start, { when * %% 2 { $_ / 2 }; when * !%% 2 { 3 * $_ + 1 }; } ... 1);
}

sub MAIN(*@numbers) {
    for @numbers -> $n {
        say "$n: " ~ collatz-length($n.Int);
    }
}

For the rest of the scripts I will skip the MAIN sub, which is exactly the same in each of them.

Framework established, I redid the recursive version starting from the new sequence operator code.

sub collatz-length(Int $n) {
    given $n {
        when 1       { 1 }
        when * %% 2  { 1 + collatz-length($_ div 2) }
        when * !%% 2 { 1 + collatz-length(3 * $_ + 1) }
    } 
}

The sharp-eyed will notice this version is different from the first recursive version above in two significant ways. This time I made the argument Int $n, which instantly turned up a bit of a bug in all implementations thus far: because I used $_ / 2, most of the numbers in the sequence were actually rationals, not integers! This shouldn’t change the results, but is probably less efficient than using Ints. Thus the second difference about, it now uses $_ div 2 to divide by 2. This version remains a great improvement over the sequence operator version, running in 4.7 seconds instead of 13.3. Changing when * !%% 2 to a simple default shaves another .3 seconds off the running time.

Once I started wondering how much time was getting eaten up by the when statements, rewriting that bit using the ternary operator was an obvious choice.

sub collatz-length(Int $start) { 
    +($start, { $_ %% 2 ?? $_ div 2 !! 3 * $_ + 1 } ... 1);
}

Timing results: Basic sequence 13.4 seconds. Sequence with div 11.5 seconds. Sequence with div and ternary 9.7 seconds.

That made me wonder what kind of performance I could get from a handcoded loop.

sub collatz-length(Int $n is copy) {
    my $length = 1;
    while $n != 1 {
        $n = $n %% 2 ?? $n div 2 !! 3 * $n + 1;
        $length++;
    }
    $length;
}

That’s by far the least elegant of these, I think, but it gets great performance: 3 seconds.

Switching back to the recursive approach, how about using the ternary operator there?

sub collatz-length(Int $n) {
    return 1 if $n == 1;
    1 + ($n %% 2 ?? collatz-length($n div 2) !! collatz-length(3 * $n + 1));
}

This one just edges out the handcoded loop, 2.9 seconds.

Can we do better than that? How about memoization? is cached is supposed to be part of Perl 6; neither implementation has it yet, but last year’s Advent calendar has a Rakudo implementation that still works. Using the last version changed to sub collatz-length(Int $n) is cached { works nicely, but takes 3.4 seconds to execute. Apparently the overhead of caching slows it down a bit. Interestingly, the non-ternary recursive version does speed up with is cached, from 4.4 seconds to 3.6 seconds.

Okay, instead of using a generic memoization, how about hand-coding one?

sub collatz-length(Int $n) {
    return 1 if $n == 1;
    state %lengths;
    return %lengths{$n} if %lengths{$n}:exists;
    %lengths{$n} = 1 + ($n %% 2 ?? collatz-length($n div 2) !! collatz-length(3 * $n + 1));
}

Bingo! 2.7 seconds.

I’m sure there are lots of other interesting approaches for solving this problem, and encourage people to send them in. In the meantime, here’s my summary of results so far:

Script Rakudo Niecza
bin/collatz-recursive-ternary-hand-cached.pl 2.5 1.7
bin/collatz-recursive-ternary.pl 3 1.7
bin/collatz-loop.pl 3.1 1.7
bin/collatz-recursive-ternary-cached.pl 3.2 N/A
bin/collatz-recursive-default-cached.pl 3.5 N/A
bin/collatz-recursive-default.pl 4.4 1.8
bin/collatz-recursive.pl 4.9 1.9
bin/collatz-sequence-ternary.pl 9.9 3.3
bin/collatz-sequence-div.pl 11.6 3.5
bin/collatz-sequence.pl 13.5 3.8

The table was generated from timing-table-generator.pl.

About these ads

12 Responses to “Day 21 – Collatz Variations”

  1. gerdr Says:

    Here’s a cleaned-up version of the manual caching one with an additional performance gain:

    sub collatz-length(Int $n) {
        state %cache = 1 => 1;
        %cache{$n} //= 1 + collatz-length($n %% 2 ?? $n div 2 !! 3 * $n + 1);
    }
    
    • colomon Says:

      Nice! I don’t see a timing improvement in my testing rig, but it’s right there at the top, and it’s certainly more elegant than my hand-cached version. Do you mind if I add collatz-gerdr.pl to the github repo?

  2. colomon Says:

    New timing table, with workload increased and gerdr’s version added:

    Script Rakudo Niecza
    bin/collatz-gerdr.pl 3.5 1.7
    bin/collatz-recursive-ternary-hand-cached.pl 3.5 1.7
    bin/collatz-recursive-ternary.pl 4.1 1.7
    bin/collatz-loop.pl 4.3 1.7
    bin/collatz-recursive-ternary-cached.pl 4.4 N/A
    bin/collatz-recursive-default-cached.pl 4.9 N/A
    bin/collatz-recursive-default.pl 6.5 1.9
    bin/collatz-recursive.pl 7.2 2
    bin/collatz-sequence-ternary.pl 15.1 4.3
    bin/collatz-sequence-div.pl 18.3 4.4
    bin/collatz-sequence.pl 21.3 4.8
  3. Kaz Says:

    Timing just the collatz-length function, with @numbers = 1..800, 10000..10200, I’m getting roughly 31% better performance by using bitwise operators (rather than a divisibility test and a division).

    %seq{$n} //= 1 + collatz-length($n +& 1 ?? 3 * $n + 1 !! $n +> 1);

  4. colomon Says:

    Kaz is correct — the various hand-cached versions are so fast that the time to calculate the answers is completely swamped by the overhead of setting up the tests. (Proof: I just added collatz-dummy which never calculates a Collatz sequence length at all, it just returns 1 for everything. It’s only a tiny marginal improvement over the best versions which actually do work.)

    Conclusion: I need a better benchmarking framework for this. Must ponder.

  5. pat Says:

    Why didn’t you use multi-subs? the recursive solution looks like a cry for multiple dispatch to me. So here’s my approach (didn’t time it, because i don’t expect any speed gain).

    subset Odd of Int where {$_ !%% 2};
    subset Even of Int where {$_ %% 2};

    multi collatz-length (1 ) { 1 }
    multi collatz-length (Even $n) { 1 + collatz-length($n div 2) }
    multi collatz-length (Odd $n) { 1 + collatz-length(3 * $n + 1) }

    sub MAIN(*@numbers) {
    for @numbers -> $n {
    say “$n: ” ~ collatz-length($n.Int);
    }
    }

  6. Kaz Says:

    Combining some recursion steps by using the property that (Odd * 3 + 1) ~~ Even:

    sub collatz-length(Int $n) {
    state %seq = { 1 => 1 };
    %seq{$n} //= $n +& 1 ??
    2 + collatz-length((3 * $n + 1) +> 1) !!
    1 + collatz-length($n +> 1);
    }

    • Kaz Says:

      Same thing, but with sparser memoization for better performance:

      sub collatz-length(Int $n) {
      state %seq = { 1 => 1 };
      if $n +& 1 {
      %seq{$n} //= 2 + collatz-length((3 * $n + 1) +> 1);
      } else {
      collatz-length($n +> 1) + 1;
      }
      }

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

Join 44 other followers

%d bloggers like this: