## Day 23: Lazy fruits from the gather of Eden

by

Today’s gift is a construct not often seen in other languages. It’s an iterator builder! And it’s called `gather`.

But first, let’s try a bit of historical perspective. Many Perl people know their `map`, `grep` and `sort`, convenient functions to carry out simple list transformations, filterings and rankings without having to resort to `for` loops.

```my @squares = map { \$_ * \$_ }, @numbers;
my @primes  = grep { is-prime(\$_) }, @numbers;```

The `map` and `grep` constructs are especially powerful once we get comfortable with chaining them:

```my @children-of-single-moms =
map  {  .children },
grep { !.is-married },
grep {  .gender == FEMALE },
@citizens;```

(Note that `.children` may return one child, a list of several children or an empty list. `map` has a flattening influence on the lists thus produced, so the final result is a flat list of children.)

The chaining of `map` and `sort` gave rise to the famous Schwartzian transform, a Perl 5 caching idiom for when the thing being sorted on is computationally expensive:

```my @files-by-modification-date =
map  { .[0] },                # deconstruct
sort { \$^a[1] <=> \$^b[1] },
map  { [\$_, \$_ ~~ :M] },      # compute and construct
@files;```

It’s unfortunate that the functional paradigm puts the steps in reverse order of processing. The proposed pipe syntax, notably the `==>`, would solve that. But it’s not implemented in Rakudo yet, only described in S03.

Anyway, if you’ve read the post from day 20, you know that the Schwartzian transform is built into `sort` nowadays:

```my @files-by-modification-date =
sort { \$_ ~~ :M },
@files;```

So that’s, you know, coming along.

Now, what about this `gather` construct? Well, it’s a kind of generalization of `map` and `grep`.

```sub mymap(&transform, @list) {
gather for @list {
take transform(\$_);
}
};

sub mygrep(&condition, @list) {
gather for @list {
take \$_ if condition(\$_);
}
};```

(The real `map` can swallow several argument at a time, making it more powerful than the `&mymap` above.)

Just to be clear about what happens: `gather` signals that within the subsequent block, we’ll be building a list. Each `take` adds an element to the list. You could think of it as `push`ing to an anonymous array if you want:

```my @result = gather { take \$_ for 5..7 }; # this...

my @result;
push @result, \$_ for 5..7; # ...is the same as this```

Which brings us to the first property of `gather`: it’s the construct you can use for building lists when `map`, `grep` and `sort` aren’t sufficient. Of course, there’s no need to reinvent those constructions… but the fact that you can do that, or roll your own special variants, is kinda nice.

```sub incremental-concat(@list) {
my \$string-accumulator = "";
gather for @list {
# RAKUDO: The ~() is a workaround for [perl #62178]
take ~(\$string-accumulator ~= \$_);
}
};

say incremental-concat(<a b c>).perl; # ["a", "ab", "abc"]```

The above is nicer than using `map`, since we need to manage the `\$string-accumulator` between iterations.

(Implementing `&incremental-concat` by hand is silly in an implementation which implements the `[\~]` operator. Just a short announcement to people who want to keep track of the extent to which Perl 6 is channeling APL. Rakudo doesn’t yet, though.)

The second property of `gather` is that while the `take` calls (of course) have to occur within the scope of a `gather` block, they do not necessarily have to occur in the lexical scope, only the dynamic scope. For those unfamiliar with the distinction, I think an example explains it best:

```sub traverse-tree-inorder(Tree \$t) {
traverse-tree-inorder(\$t.left) if \$t.left;
take transform(\$t);
traverse-tree-inorder(\$t.right) if \$t.right;
}

my \$tree = ...;
my @all-nodes = gather traverse-tree-inorder(\$tree);```

See what’s happening here? We wrap the call to `&traverse-tree-inorder` in a `gather` statement. The statement itself doesn’t lexically contain any `take` calls, but the called subroutine does, and the `take` in there remembers that it’s in a gather context. That’s what makes the `gather` context dynamic rather than lexical.

Just to hammer in the point, `traverse-tree-inorder` does lexical recursion on a tree structure, and no matter how far down the call stack we find ourselves, the values passed to `take` find their way back into the same anonymous array, rooted in the `gather` around the original call. It’s as if the anonymous array were implicitly passed around for us automatically as invisible arguments. Another way to view it is that the `gather` mode works orthogonally to the call stack, essentially not caring about how many calls down it is.

Should we be unfortunate enough to do a `take` outside of any `gather` block, we’ll get a warning at runtime.

I’ve saved the best till last: the third property of `gather`: it’s lazy.

What does “lazy” mean here? Well, to take the above tree-traversing code as an example: when the assignment to `@all-nodes` has executed, the tree hasn’t yet been traversed. Only when you access the first element of the array, `@all-nodes[0]`, does the traversal start. And it stops right after it finds the leftmost leaf node. Access `@all-nodes[1]`, and the traversal will resume from where it left off, run just enough to find the second node in the traversal, and then stop.

In short, the code within the gather block starts and stops in such a way that it never does more work than you’ve asked it to do. That’s lazy.

It’s essentially a model of delayed execution. Perl 6 promises to run the code inside your gather block, but only if it turns out that you actually need the information. Operationally, you can think of it as a separate thread that starts and stops, always doing the smallest possible amount of work to keep the main thread satisfied. But under the hood in a given implementation, it’s likely implemented by continuations — or, failing that, by painful, complicated cheating.

Now here’s the thing: most any array in Perl 6 has the lazy behavior by default, and things like reading all lines from a file are lazy by default, and not only can `map` and `grep` be implemented using `gather`; it turns out that they actually are, too. So `map` and `grep` are also lazy.

Now, it’s nice to know that values aren’t unnecessarily generated when you’re doing calculations with arrays… but the really nice thing is that lazy arrays open up the door for stream-based programming and, by extension, infinite arrays.

Unfortunately, laziness hasn’t landed in Rakudo yet. We’re nearly there though, so I don’t feel too bad dangling these examples in front of you, even though they will currently cause Rakudo to spin the fans of your computer and nothing more:

```my @natural-numbers = 0 .. Inf;

my @even-numbers  = 0, 2 ... *;    # arithmetic seq
my @odd-numbers   = 1, 3 ... *;
my @powers-of-two = 1, 2, 4 ... *; # geometric seq

my @squares-of-odd-numbers = map { \$_ * \$_ }, @odd-numbers;

sub enumerate-positive-rationals() { # with duplicates, but still
take 1;
for 1..Inf -> \$total {
for 1..^\$total Z reverse(1..^\$total) -> \$numerator, \$denominator {
take \$numerator / \$denominator;
}
}
}

sub enumerate-all-rationals() {
map { \$_, -\$_ }, enumerate-positive-rationals();
}

sub fibonacci() {
gather {
take 0;
my (\$last, \$this) = 0, 1;
loop { # infinitely!
take \$this;
(\$last, \$this) = \$this, \$last + \$this;
}
}
}
say fibonacci[10]; # 55

# Merge two sorted (potentially infinite) arrays
sub merge(@a, @b) {
!@a && !@b ?? () !!
!@a        ?? @b !!
!@b ?? @a !!
(@a[0] < @b[0] ?? @a.shift !! @b.shift, merge(@a, @b))
}

sub hamming-sequence() # 2**a * 3**b * 5**c, where { all(a,b,c) >= 0 }
gather {
take 1;
take \$_ for
merge( (map { 2 * \$_ } hamming-sequence()),
merge( (map { 3 * \$_ }, hamming-sequence()),
(map { 5 * \$_ }, hamming-sequence()) ));
}
}```

(That last subroutine is a Perl 6 solution to the Hamming problem, described in section 6.4 of mjd++’s Higher Order Perl. A seriously cool book, by the way. It builds iterators from scratch; we just use `gather` for the same result.)

Today’s obfu award goes to David Brunton, who has written a Perl 6 tweet which draws a rule #30 cellular automaton, which also happens to look like a Christmas tree.

```\$ perl6 -e 'my %r=[^8]>>.fmt("%03b") Z (0,1,1,1,1,0,0,0);\
say <. X>[my@i=0 xx 9,1,0 xx 9];\
for ^9 {say <. X>[@i=map {%r{@i[(\$_-1)%19,\$_,(\$_+1)%19].join}},^19]};'
.........X.........
........XXX........
.......XX..X.......
......XX.XXXX......
.....XX..X...X.....
....XX.XXXX.XXX....
...XX..X....X..X...
..XX.XXXX..XXXXXX..
.XX..X...XXX.....X.
XX.XXXX.XX..X...XXX```

### 9 Responses to “Day 23: Lazy fruits from the gather of Eden”

1. mina86 Says:

In Haskell one can write something like (written in Perl code):

@fibs = (0, 1, (@fibs >>+<< @fibs[1 .. Inf]))

ie. you construct the list by referring to itself. Would something like that be possible in Perl6?

• carl Says:

Yes; that’s the technique I’m using to construct the Hamming sequence in the post, i.e. creating an infinite sequence which is defined in terms of itself.

Pugs once did something very much like the code you propose above (but used `:=` binding, if I recall correctly). Last I heard, that semantics has fallen out of favour, and I’d recommend using a subroutine instead. Also, fib is less complicated than hamming-sequence, so no need to use the merge function.

When I read that “take” may occur anywhere not only in the lexical scope, I ask myself if it’s implemented using some kind of global variable.

I.e. if I build an array like:

my @very_private_array = map { some_function_from_some_other_module(); 2 * \$_ } 1..10;

and the foreign function happens to be nasty and likes to call “take” out of the blue, will that modify my array?

• carl Says:

You are correct.

```\$ perl6 -e 'sub foo { take 42 }; my @a = map { foo; \$_ }, 1..3; say @a.perl'
[42, 1, 42, 2, 42, 3]
```

What you call “some kind of global” variable is probably a dynamic variable, i.e. one following dynamic scoping rules rather than lexical ones.

I agree that what you describe is slightly problematic, in that it feels a bit “too powerful” — should foreign modules be able to affect the purity of my `map` call? Also, just because `map` is defined using `gather` in Rakudo doesn’t mean that it will be in all implementations of Perl 6. Thus your example, and mine, should probably be considered implementation-dependent code, and thus avoidable.

Maybe that’s it. At least until I or someone else comes up with a good technical patch for the problem you envision, the solution is social, and the way to live with the fact that calling an impure foreign function inside a `map` block can lead to unexpected return values, is “well, don’t do that”.

I think this could produce bugs that are hard to trace. This is a bit like in Perl 5 when subroutines modify \$_ by accident (maybe because of a typo or something). This confuses the outer subroutines and produces strange behaviour. This is particularly hard to fix if the inner subroutine is in a module and I did not read its code. I am happy that for \$_ this problem disappears in Perl 6.

As for my example code: As the map block runs lazily, code like this could be used to e.g. display a message whenever a new number for the array is computed. Suppose “2 * \$_” is replaced by some computation that takes some time, and some_function_from_some_other_module is used to print a banner that displays something like “please wait while we compute the next number”. In this case, the function is invoked only for its side effects, and the programmer would not expect that there is any chance that it might pollute the resulting array by inadvertently calling “take” without “gather”.

This is not only problematic for “map”, but for ordinary uses of gather/take, too, I suppose.

Just my two cents.

• carl Says:

I agree with you that this could be a problem. I’m not sure how big of one it would be in practice. Nor do I have even the beginning of a fix to that problem that would retain the flexibility of a dynamic gather/take in the scenarios when you do want the effect you’re describing. (One case where the dynamical-ness turns out to be very useful is when traversing a tree structure, for example.)

Maybe it would be a fix if “gather” and “take” were matched only if they are in the same file (or module or something like that).

• carl Says:

Maybe it would, but it also feels a bit arbitrary. Does that mean that providing tree traversal callbacks from another file or module is out of the question? Why?

I don’t think it’s up to the compiler to detect dubious cases of ‘take without gather’, but maybe a sufficiently brave static analyzer could catch some common errors.

• carl Says:

More generally, there will always be some level at which you’ll be able to subvert even the built-ins of Perl 6, it being self-hosting. The pipes are showing, and they can sometimes be bent in ways not originally intended.