Day 23: Lazy fruits from the gather of Eden

Today’s gift is a construct not often seen in other languages. It’s an iterator builder! And it’s called gather.

But first, let’s try a bit of historical perspective. Many Perl people know their map, grep and sort, convenient functions to carry out simple list transformations, filterings and rankings without having to resort to for loops.

my @squares = map { $_ * $_ }, @numbers;
my @primes  = grep { is-prime($_) }, @numbers;

The map and grep constructs are especially powerful once we get comfortable with chaining them:

my @children-of-single-moms =
    map  {  .children },
    grep { !.is-married },
    grep {  .gender == FEMALE },
         @citizens;

(Note that .children may return one child, a list of several children or an empty list. map has a flattening influence on the lists thus produced, so the final result is a flat list of children.)

The chaining of map and sort gave rise to the famous Schwartzian transform, a Perl 5 caching idiom for when the thing being sorted on is computationally expensive:

my @files-by-modification-date =
    map  { .[0] },                # deconstruct
    sort { $^a[1] <=> $^b[1] },
    map  { [$_, $_ ~~ :M] },      # compute and construct
         @files;

It’s unfortunate that the functional paradigm puts the steps in reverse order of processing. The proposed pipe syntax, notably the ==>, would solve that. But it’s not implemented in Rakudo yet, only described in S03.

Anyway, if you’ve read the post from day 20, you know that the Schwartzian transform is built into sort nowadays:

my @files-by-modification-date =
    sort { $_ ~~ :M },
    @files;

So that’s, you know, coming along.

Now, what about this gather construct? Well, it’s a kind of generalization of map and grep.

sub mymap(&transform, @list) {
    gather for @list {
        take transform($_);
    }
};

sub mygrep(&condition, @list) {
    gather for @list {
        take $_ if condition($_);
    }
};

(The real map can swallow several argument at a time, making it more powerful than the &mymap above.)

Just to be clear about what happens: gather signals that within the subsequent block, we’ll be building a list. Each take adds an element to the list. You could think of it as pushing to an anonymous array if you want:

my @result = gather { take $_ for 5..7 }; # this...

my @result;
push @result, $_ for 5..7; # ...is the same as this

Which brings us to the first property of gather: it’s the construct you can use for building lists when map, grep and sort aren’t sufficient. Of course, there’s no need to reinvent those constructions… but the fact that you can do that, or roll your own special variants, is kinda nice.

sub incremental-concat(@list) {
  my $string-accumulator = "";
  gather for @list {
    # RAKUDO: The ~() is a workaround for [perl #62178]
    take ~($string-accumulator ~= $_);
  }
};

say incremental-concat(<a b c>).perl; # ["a", "ab", "abc"]

The above is nicer than using map, since we need to manage the $string-accumulator between iterations.

(Implementing &incremental-concat by hand is silly in an implementation which implements the [\~] operator. Just a short announcement to people who want to keep track of the extent to which Perl 6 is channeling APL. Rakudo doesn’t yet, though.)

The second property of gather is that while the take calls (of course) have to occur within the scope of a gather block, they do not necessarily have to occur in the lexical scope, only the dynamic scope. For those unfamiliar with the distinction, I think an example explains it best:

sub traverse-tree-inorder(Tree $t) {
  traverse-tree-inorder($t.left) if $t.left;
  take transform($t);
  traverse-tree-inorder($t.right) if $t.right;
}

my $tree = ...;
my @all-nodes = gather traverse-tree-inorder($tree);

See what’s happening here? We wrap the call to &traverse-tree-inorder in a gather statement. The statement itself doesn’t lexically contain any take calls, but the called subroutine does, and the take in there remembers that it’s in a gather context. That’s what makes the gather context dynamic rather than lexical.

Just to hammer in the point, traverse-tree-inorder does lexical recursion on a tree structure, and no matter how far down the call stack we find ourselves, the values passed to take find their way back into the same anonymous array, rooted in the gather around the original call. It’s as if the anonymous array were implicitly passed around for us automatically as invisible arguments. Another way to view it is that the gather mode works orthogonally to the call stack, essentially not caring about how many calls down it is.

Should we be unfortunate enough to do a take outside of any gather block, we’ll get a warning at runtime.

I’ve saved the best till last: the third property of gather: it’s lazy.

What does “lazy” mean here? Well, to take the above tree-traversing code as an example: when the assignment to @all-nodes has executed, the tree hasn’t yet been traversed. Only when you access the first element of the array, @all-nodes[0], does the traversal start. And it stops right after it finds the leftmost leaf node. Access @all-nodes[1], and the traversal will resume from where it left off, run just enough to find the second node in the traversal, and then stop.

In short, the code within the gather block starts and stops in such a way that it never does more work than you’ve asked it to do. That’s lazy.

It’s essentially a model of delayed execution. Perl 6 promises to run the code inside your gather block, but only if it turns out that you actually need the information. Operationally, you can think of it as a separate thread that starts and stops, always doing the smallest possible amount of work to keep the main thread satisfied. But under the hood in a given implementation, it’s likely implemented by continuations — or, failing that, by painful, complicated cheating.

Now here’s the thing: most any array in Perl 6 has the lazy behavior by default, and things like reading all lines from a file are lazy by default, and not only can map and grep be implemented using gather; it turns out that they actually are, too. So map and grep are also lazy.

Now, it’s nice to know that values aren’t unnecessarily generated when you’re doing calculations with arrays… but the really nice thing is that lazy arrays open up the door for stream-based programming and, by extension, infinite arrays.

Unfortunately, laziness hasn’t landed in Rakudo yet. We’re nearly there though, so I don’t feel too bad dangling these examples in front of you, even though they will currently cause Rakudo to spin the fans of your computer and nothing more:

my @natural-numbers = 0 .. Inf;

my @even-numbers  = 0, 2 ... *;    # arithmetic seq
my @odd-numbers   = 1, 3 ... *;
my @powers-of-two = 1, 2, 4 ... *; # geometric seq

my @squares-of-odd-numbers = map { $_ * $_ }, @odd-numbers;

sub enumerate-positive-rationals() { # with duplicates, but still
  take 1;
  for 1..Inf -> $total {
    for 1..^$total Z reverse(1..^$total) -> $numerator, $denominator {
      take $numerator / $denominator;
    }
  }
}

sub enumerate-all-rationals() {
  map { $_, -$_ }, enumerate-positive-rationals();
}

sub fibonacci() {
  gather {
    take 0;
    my ($last, $this) = 0, 1;
    loop { # infinitely!
      take $this;
      ($last, $this) = $this, $last + $this;
    }
  }
}
say fibonacci[10]; # 55

# Merge two sorted (potentially infinite) arrays
sub merge(@a, @b) {
  !@a && !@b ?? () !!
  !@a        ?? @b !!
         !@b ?? @a !!
  (@a[0] < @b[0] ?? @a.shift !! @b.shift, merge(@a, @b))
}

sub hamming-sequence() # 2**a * 3**b * 5**c, where { all(a,b,c) >= 0 }
  gather {
    take 1;
    take $_ for
        merge( (map { 2 * $_ } hamming-sequence()),
               merge( (map { 3 * $_ }, hamming-sequence()),
                      (map { 5 * $_ }, hamming-sequence()) ));
  }
}

(That last subroutine is a Perl 6 solution to the Hamming problem, described in section 6.4 of mjd++’s Higher Order Perl. A seriously cool book, by the way. It builds iterators from scratch; we just use gather for the same result.)

Today’s obfu award goes to David Brunton, who has written a Perl 6 tweet which draws a rule #30 cellular automaton, which also happens to look like a Christmas tree.

$ perl6 -e 'my %r=[^8]>>.fmt("%03b") Z (0,1,1,1,1,0,0,0);\
say <. X>[my@i=0 xx 9,1,0 xx 9];\
for ^9 {say <. X>[@i=map {%r{@i[($_-1)%19,$_,($_+1)%19].join}},^19]};'
.........X.........
........XXX........
.......XX..X.......
......XX.XXXX......
.....XX..X...X.....
....XX.XXXX.XXX....
...XX..X....X..X...
..XX.XXXX..XXXXXX..
.XX..X...XXX.....X.
XX.XXXX.XX..X...XXX

10 thoughts on “Day 23: Lazy fruits from the gather of Eden

  1. In Haskell one can write something like (written in Perl code):

    @fibs = (0, 1, (@fibs >>+<< @fibs[1 .. Inf]))

    ie. you construct the list by referring to itself. Would something like that be possible in Perl6?

    1. Yes; that’s the technique I’m using to construct the Hamming sequence in the post, i.e. creating an infinite sequence which is defined in terms of itself.

      Pugs once did something very much like the code you propose above (but used := binding, if I recall correctly). Last I heard, that semantics has fallen out of favour, and I’d recommend using a subroutine instead. Also, fib is less complicated than hamming-sequence, so no need to use the merge function.

  2. When I read that “take” may occur anywhere not only in the lexical scope, I ask myself if it’s implemented using some kind of global variable.

    I.e. if I build an array like:

    my @very_private_array = map { some_function_from_some_other_module(); 2 * $_ } 1..10;

    and the foreign function happens to be nasty and likes to call “take” out of the blue, will that modify my array?

    1. You are correct.

      $ perl6 -e 'sub foo { take 42 }; my @a = map { foo; $_ }, 1..3; say @a.perl'
      [42, 1, 42, 2, 42, 3]
      

      What you call “some kind of global” variable is probably a dynamic variable, i.e. one following dynamic scoping rules rather than lexical ones.

      I agree that what you describe is slightly problematic, in that it feels a bit “too powerful” — should foreign modules be able to affect the purity of my map call? Also, just because map is defined using gather in Rakudo doesn’t mean that it will be in all implementations of Perl 6. Thus your example, and mine, should probably be considered implementation-dependent code, and thus avoidable.

      Maybe that’s it. At least until I or someone else comes up with a good technical patch for the problem you envision, the solution is social, and the way to live with the fact that calling an impure foreign function inside a map block can lead to unexpected return values, is “well, don’t do that”.

      1. I think this could produce bugs that are hard to trace. This is a bit like in Perl 5 when subroutines modify $_ by accident (maybe because of a typo or something). This confuses the outer subroutines and produces strange behaviour. This is particularly hard to fix if the inner subroutine is in a module and I did not read its code. I am happy that for $_ this problem disappears in Perl 6.

        As for my example code: As the map block runs lazily, code like this could be used to e.g. display a message whenever a new number for the array is computed. Suppose “2 * $_” is replaced by some computation that takes some time, and some_function_from_some_other_module is used to print a banner that displays something like “please wait while we compute the next number”. In this case, the function is invoked only for its side effects, and the programmer would not expect that there is any chance that it might pollute the resulting array by inadvertently calling “take” without “gather”.

        This is not only problematic for “map”, but for ordinary uses of gather/take, too, I suppose.

        Just my two cents.

      2. I agree with you that this could be a problem. I’m not sure how big of one it would be in practice. Nor do I have even the beginning of a fix to that problem that would retain the flexibility of a dynamic gather/take in the scenarios when you do want the effect you’re describing. (One case where the dynamical-ness turns out to be very useful is when traversing a tree structure, for example.)

      3. Maybe it would be a fix if “gather” and “take” were matched only if they are in the same file (or module or something like that).

      4. Maybe it would, but it also feels a bit arbitrary. Does that mean that providing tree traversal callbacks from another file or module is out of the question? Why?

        I don’t think it’s up to the compiler to detect dubious cases of ‘take without gather’, but maybe a sufficiently brave static analyzer could catch some common errors.

    2. More generally, there will always be some level at which you’ll be able to subvert even the built-ins of Perl 6, it being self-hosting. The pipes are showing, and they can sometimes be bent in ways not originally intended.

Leave a reply to reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.