# Day 21 – Collatz Variations

The Collatz sequence is one of those interesting “simple” math problems that I’ve run into a number of times. Most recently a blog post on programming it in Racket showed up on Hacker News. As happens so often, I instantly wanted to implement it in Perl 6.

```sub collatz-sequence(Int \$start) {
\$start, { when * %% 2 { \$_ / 2 }; when * !%% 2 { 3 * \$_ + 1 }; } ... 1;
}

sub MAIN(Int \$min, Int \$max) {
say [max] (\$min..\$max).map({ +collatz-sequence(\$_) });
}
```

This is a very straightforward implementation of the Racket post’s `max-cycle-length-range` as a stand-alone p6 script. `collatz-sequence` generates the sequence using the p6 sequence operator. Start with the given number. If it is divisible by two, do so: `when * %% 2 { \$_ / 2 }`. If it is not, multiply by three and add 1: `when * !%% 2 { 3 * \$_ + 1 }`. Repeat this until the sequence reaches 1.

`MAIN(Int \$min, Int \$max)` sets up our main function to take two integers. Many times I don’t bother with argument types in p6, but this provides a nice feedback for users:

```> perl6 collatz.pl blue red
Usage:
collatz.pl <min> <max>
```

The core of it just maps the numbers from `\$min` to `\$max` (inclusive) to the length of the sequence (`+collatz-sequence`) and then says the max of the resulting list (`[max]`).

Personally I’m a big fan of using the sequence operator for tasks like this; it directly represents the algorithm constructing the Collatz sequence in a simple and elegant fashion. On the other hand, you should be able to memoize the recursive version for a speed increase. Maybe that would give it an edge over the sequence operator version?

Well, I was wildly wrong about that.

```sub collatz-length(\$start) {
given \$start {
when 1       { 1 }
when * !%% 2 { 1 + collatz-length(3 * \$_ + 1) }
when * %% 2  { 1 + collatz-length(\$_ / 2) }
}
}

sub MAIN(\$min, \$max) {
say [max] (\$min..\$max).map({ collatz-length(\$_) });
}
```

This recursive version, which makes no attempt whatsoever to be efficient, is actually better than twice as fast as the sequence operator version. In retrospect, this makes perfect sense: I was worried about the recursive version making a function call for every iteration, but the sequence version has to make two, one to calculate the next iteration and the other to check and see if the ending condition has been reached.

Well, once I’d gotten this far, I thought I’d better do things correctly. I wrote two framing scripts, one for timing all the available scripts, the other for testing them to make sure they work!

```my @numbers = 1..200, 10000..10200;

sub MAIN(Str \$perl6, *@scripts) {
my %results;
for @scripts -> \$script {
my \$start = now;
qqx/\$perl6 \$script { @numbers }/;
my \$end = now;

%results{\$script} = \$end - \$start;
}

for %results.pairs.sort(*.value) -> (:key(\$script), :value(\$time)) {
say "\$script: \$time seconds";
}
}
```

This script takes as an argument a string that can be used to call a Perl 6 executable and a list of scripts to run. It runs the scripts using the specified executable, and times them using p6’s `now` function. It then sorts the results into order and prints them. (A similar script I won’t post here tests each of them to make sure they are returning correct results.)

In the new framework, the Collatz script has changed a bit. Instead of taking a min and a max value and finding the longest Collatz sequence generated by a number in that range, it takes a series of numbers and generates and reports the length of the sequence for each of them. Here’s the sequence operator script in its full new version:

```sub collatz-length(Int \$start) {
+(\$start, { when * %% 2 { \$_ / 2 }; when * !%% 2 { 3 * \$_ + 1 }; } ... 1);
}

sub MAIN(*@numbers) {
for @numbers -> \$n {
say "\$n: " ~ collatz-length(\$n.Int);
}
}
```

For the rest of the scripts I will skip the `MAIN` sub, which is exactly the same in each of them.

Framework established, I redid the recursive version starting from the new sequence operator code.

```sub collatz-length(Int \$n) {
given \$n {
when 1       { 1 }
when * %% 2  { 1 + collatz-length(\$_ div 2) }
when * !%% 2 { 1 + collatz-length(3 * \$_ + 1) }
}
}
```

The sharp-eyed will notice this version is different from the first recursive version above in two significant ways. This time I made the argument `Int \$n`, which instantly turned up a bit of a bug in all implementations thus far: because I used `\$_ / 2`, most of the numbers in the sequence were actually rationals, not integers! This shouldn’t change the results, but is probably less efficient than using `Int`s. Thus the second difference about, it now uses `\$_ div 2` to divide by 2. This version remains a great improvement over the sequence operator version, running in 4.7 seconds instead of 13.3. Changing ` when * !%% 2` to a simple `default` shaves another .3 seconds off the running time.

Once I started wondering how much time was getting eaten up by the `when` statements, rewriting that bit using the ternary operator was an obvious choice.

```sub collatz-length(Int \$start) {
+(\$start, { \$_ %% 2 ?? \$_ div 2 !! 3 * \$_ + 1 } ... 1);
}
```

Timing results: Basic sequence 13.4 seconds. Sequence with `div` 11.5 seconds. Sequence with `div` and ternary 9.7 seconds.

That made me wonder what kind of performance I could get from a handcoded loop.

```sub collatz-length(Int \$n is copy) {
my \$length = 1;
while \$n != 1 {
\$n = \$n %% 2 ?? \$n div 2 !! 3 * \$n + 1;
\$length++;
}
\$length;
}
```

That’s by far the least elegant of these, I think, but it gets great performance: 3 seconds.

Switching back to the recursive approach, how about using the ternary operator there?

```sub collatz-length(Int \$n) {
return 1 if \$n == 1;
1 + (\$n %% 2 ?? collatz-length(\$n div 2) !! collatz-length(3 * \$n + 1));
}
```

This one just edges out the handcoded loop, 2.9 seconds.

Can we do better than that? How about memoization? `is cached` is supposed to be part of Perl 6; neither implementation has it yet, but last year’s Advent calendar has a Rakudo implementation that still works. Using the last version changed to `sub collatz-length(Int \$n) is cached {` works nicely, but takes 3.4 seconds to execute. Apparently the overhead of caching slows it down a bit. Interestingly, the non-ternary recursive version does speed up with `is cached`, from 4.4 seconds to 3.6 seconds.

Okay, instead of using a generic memoization, how about hand-coding one?

```sub collatz-length(Int \$n) {
return 1 if \$n == 1;
state %lengths;
return %lengths{\$n} if %lengths{\$n}:exists;
%lengths{\$n} = 1 + (\$n %% 2 ?? collatz-length(\$n div 2) !! collatz-length(3 * \$n + 1));
}
```

Bingo! 2.7 seconds.

I’m sure there are lots of other interesting approaches for solving this problem, and encourage people to send them in. In the meantime, here’s my summary of results so far:

 Script Rakudo Niecza bin/collatz-recursive-ternary-hand-cached.pl 2.5 1.7 bin/collatz-recursive-ternary.pl 3 1.7 bin/collatz-loop.pl 3.1 1.7 bin/collatz-recursive-ternary-cached.pl 3.2 N/A bin/collatz-recursive-default-cached.pl 3.5 N/A bin/collatz-recursive-default.pl 4.4 1.8 bin/collatz-recursive.pl 4.9 1.9 bin/collatz-sequence-ternary.pl 9.9 3.3 bin/collatz-sequence-div.pl 11.6 3.5 bin/collatz-sequence.pl 13.5 3.8

The table was generated from timing-table-generator.pl.

## 12 thoughts on “Day 21 – Collatz Variations”

1. gerdr says:

Here’s a cleaned-up version of the manual caching one with an additional performance gain:

``````sub collatz-length(Int \$n) {
state %cache = 1 => 1;
%cache{\$n} //= 1 + collatz-length(\$n %% 2 ?? \$n div 2 !! 3 * \$n + 1);
}
``````
1. Nice! I don’t see a timing improvement in my testing rig, but it’s right there at the top, and it’s certainly more elegant than my hand-cached version. Do you mind if I add collatz-gerdr.pl to the github repo?

1. gerdr says:

Sure, feel free to add it to the repository.

2. New timing table, with workload increased and gerdr’s version added:

 Script Rakudo Niecza bin/collatz-gerdr.pl 3.5 1.7 bin/collatz-recursive-ternary-hand-cached.pl 3.5 1.7 bin/collatz-recursive-ternary.pl 4.1 1.7 bin/collatz-loop.pl 4.3 1.7 bin/collatz-recursive-ternary-cached.pl 4.4 N/A bin/collatz-recursive-default-cached.pl 4.9 N/A bin/collatz-recursive-default.pl 6.5 1.9 bin/collatz-recursive.pl 7.2 2 bin/collatz-sequence-ternary.pl 15.1 4.3 bin/collatz-sequence-div.pl 18.3 4.4 bin/collatz-sequence.pl 21.3 4.8
3. Kaz says:

Timing just the collatz-length function, with @numbers = 1..800, 10000..10200, I’m getting roughly 31% better performance by using bitwise operators (rather than a divisibility test and a division).

%seq{\$n} //= 1 + collatz-length(\$n +& 1 ?? 3 * \$n + 1 !! \$n +> 1);

1. Kaz says:

(that’s starting from gerdr’s 2-line function)

2. Hmm… this is definitely the best result I’ve seen so far, but I’m not seeing anything like 31% improvement.

 Script Rakudo Niecza bin/collatz-kaz.pl 3.2 1.6 bin/collatz-gerdr.pl 3.3 1.6 bin/collatz-recursive-ternary-hand-cached.pl 3.3 1.6 bin/collatz-recursive-ternary.pl 4 1.7 bin/collatz-loop.pl 4.2 1.7 bin/collatz-recursive-ternary-cached.pl 4.3 N/A bin/collatz-recursive-default-cached.pl 4.7 N/A bin/collatz-recursive-default.pl 6.3 1.9 bin/collatz-recursive.pl 6.9 1.9 bin/collatz-sequence-ternary.pl 14.9 4.3 bin/collatz-sequence-div.pl 17.8 4.3 bin/collatz-sequence.pl 21 4.8
1. Kaz says:

The 31% measurement was with a MAIN like this (this is actually slightly tighter than the MAIN that had the 31% result, so the proportional differences will be a bit higher with this):

sub MAIN() {
my \$start = now;
for @numbers -> \$number {
collatz-length(\$number);
}
my \$time = now – \$start;

say “\$time”;
}

I only wrote it to support testing a single function at a time, but it measures the runtime of the actual function execution, so I’ve found it useful for comparing implementations with better granularity and less dependence on system load.

4. Kaz is correct — the various hand-cached versions are so fast that the time to calculate the answers is completely swamped by the overhead of setting up the tests. (Proof: I just added collatz-dummy which never calculates a Collatz sequence length at all, it just returns 1 for everything. It’s only a tiny marginal improvement over the best versions which actually do work.)

Conclusion: I need a better benchmarking framework for this. Must ponder.

5. pat says:

Why didn’t you use multi-subs? the recursive solution looks like a cry for multiple dispatch to me. So here’s my approach (didn’t time it, because i don’t expect any speed gain).

subset Odd of Int where {\$_ !%% 2};
subset Even of Int where {\$_ %% 2};

multi collatz-length (1 ) { 1 }
multi collatz-length (Even \$n) { 1 + collatz-length(\$n div 2) }
multi collatz-length (Odd \$n) { 1 + collatz-length(3 * \$n + 1) }

sub MAIN(*@numbers) {
for @numbers -> \$n {
say “\$n: ” ~ collatz-length(\$n.Int);
}
}

6. Kaz says:

Combining some recursion steps by using the property that (Odd * 3 + 1) ~~ Even:

sub collatz-length(Int \$n) {
state %seq = { 1 => 1 };
%seq{\$n} //= \$n +& 1 ??
2 + collatz-length((3 * \$n + 1) +> 1) !!
1 + collatz-length(\$n +> 1);
}

1. Kaz says:

Same thing, but with sparser memoization for better performance:

sub collatz-length(Int \$n) {
state %seq = { 1 => 1 };
if \$n +& 1 {
%seq{\$n} //= 2 + collatz-length((3 * \$n + 1) +> 1);
} else {
collatz-length(\$n +> 1) + 1;
}
}