Day 7 — Set In Your Ways: Perl 6’s Setty and Baggy Types

There’s a relatively common pattern I see with people writing code that counts… say, DNA bases in a string:

my %counts;
%counts{$_}++ for 'AGTCAGTCAGTCTTTCCCAAAAT'.comb;
say %counts<A T G C>; # (7 7 3 6)

Make a Hash. For each thing you want to count, ++ that key in that Hash. So what’s the problem?

Perl 6 actually has specialized types that are more appropriate for this operation; for example, the Bag:

'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)

Let’s talk about these types and all the fancy operators that come with them!

A Note on Unicode

I’ll be using fancy-pants Unicode versions of operators and symbols in this post, because they look purty. However, all of them have what we call “Texas” equivalents you can use instead.

Ready. Set. Go.

The simplest of these types is a Set. It will keep exactly one of each item, so if you have multiple objects that are the same, the duplicates will be discarded:

say set 1, 2, 2, "foo", "a", "a", "a", "a", "b";
# OUTPUT: set(a, foo, b, 1, 2)

As you can see, the result has only one a and only one 2. We can use the , U+2208 ELEMENT OF, set membership operator to check if an item is in a set:

my $mah-peeps = set <babydrop iBakeCake Zoffix viki>;
say 'Weeee \o/' if 'Zoffix'  $mah-peeps;
# OUTPUT: Weeee \o/

The set operators are coercive, so we don’t need to explicitly create a set; they’ll do it for us:

say 'Weeee \o/' if 'Zoffix'  <babydrop iBakeCake Zoffix viki>;
# OUTPUT: Weeee \o/

But pay attention when using allomorphs:

say 'Weeee \o/' if 42  <1 42 72>;
# No output

say 'Weeee \o/' if 42  +«<1 42 72>; # coerce allomorphs to Numeric
# OUTPUT: Weeee \o/

The angle brackets create allomorphs for numerics, so in the first case above, our set contains a bunch of IntStr objects, while the left hand side of the operator has a regular Int, and so the comparison fails. In the second case, we coerce allomorphs to their numeric component with a hyper operator and the test succeeds.

While testing membership is super exciting, we can do more with our sets! How about some intersections?

my $admins = set <Zoffix mst [Coke] lizmat>;
my $live-in-North-America = set <Zoffix [Coke] TimToady huggable>;

my $North-American-admins = $admins  $live-in-North-America;
say $North-American-admins;
# OUTPUT: set(Zoffix, [Coke])

We intersected two sets with the , U+2229 INTERSECTION, intersection operator and received a set that contains only the elements present in both original sets. You can chain these operations too, so membership will be checked in all of the provided sets in the chain:

say <Zoffix lizmat>  <huggable Zoffix>  <TimToady huggable Zoffix>;
# OUTPUT: set(Zoffix)

Another handy operator is the set difference operator, whose Unicode look I find somewhat annoying: No, it’s not a backslash (\), but a U+2216 SET MINUS character (luckily, you can use the much more obvious (-) Texas version).

The usefulness of the operator compensates its shoddy looks:

my @spammers = <spammety@sam.com  spam@in-a-can.com  yum@spam.com>;
my @senders  = <perl6@perl6.org   spammety@sam.com   good@guy.com>;

for keys @senders  @spammers -> $non-spammer {
    say "Message from $non-spammer";
}

# OUTPUT:
# Message from perl6@perl6.org
# Message from good@guy.com

We have two arrays: one contains a list of spammers’ addresses and another contains a list of senders. How to get a list of senders, without any spammers in it? Just use the (fine, fine, the (-)) operator!

We then use the for loop to iterate over the results, and as you can see from the output, all spammers were omitted… But why is keys there?

The reason is Setty and Mixy types are a lot like hashes, in a sense that they have keys and values for those keys. Set types always have True as values, and since we don’t care about iterating over Pair objects in our loop, we use the keys to get just the keys of the set: the email addresses.

However, hash-like semantics aren’t useless on Sets. For example, we can take a slice, and with :k adverb return just the elements that the set contains:

my $meows = set <
    Abyssinian  Aegean  Manx      Siamese  Siberian  Snowshoe
    Sokoke      Sphynx  Suphalak  Thai
>;

say $meows<Sphynx  Raas  Ragamuffin  Thai>:k;
# OUTPUT: (Sphynx Thai)

But what happens if we try to delete an item from a set?

say $meows<Siamese>:delete;
# Cannot call 'DELETE-KEY' on an immutable 'Set'
# in block <unit> at z.p6 line 6

We can’t! The Set type is immutable. However, just like Map type has a mutable version Hash, so does the Set type has a mutable version: the SetHash. There isn’t a cutesy helper sub to create one, so we’ll use the constructor instead:

my $s = SetHash.new: <a a a b c d>;
say $s;
$s<a d>:delete;
say $s;

# SetHash.new(a, c, b, d)
# SetHash.new(c, b)

Voilà! We successfully deleted a slice. So, what other goodies does Santa have in his… bag?

Gag ’em ‘n’ Bag ’em

Related to Sets is another type: a Bag, and yes, it’s also immutable, with the corresponding mutable type being BagHash. We already saw at the start of this article we can use a Bag to count stuff, and just like a Set, a Bag is hash-like, which is why we could view a slice of the four DNA amino acids:

'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)

While a Set has all values set to True, a Bag‘s values are integer weights. If you put two things that are the same into a Bag there’ll be just one key for them, but the value will be 2:

my $recipe = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
say $recipe;
# OUTPUT: bag(cup of flour, egg(2), cup of milk)

And of course, we can use our handy operators to combine bags! Here, we’ll be using , U+228E MULTISET UNION, operator, which looks a lot clearer in its Texas version: (+)

my $pancakes = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
my $omelette = bag 'egg', 'egg',  'egg', 'cup of milk';

my $shopping-bag = $pancakes  $omelette  <gum  chocolate>;
say $shopping-bag;
# bag(gum, cup of flour, egg(5), cup of milk(2), chocolate)

We used two of our Bags along with a 2-item list, which got correctly coerced for us, so we didn’t have to do anything.

A more impressive operator is , U+227C PRECEDES OR EQUAL TO, and it’s mirror , U+227D SUCCEEDS OR EQUAL TO, which tell whether a Baggy on the narrow side of the operator is a subset of the Baggy on the other side; meaning all the objects in the smaller Baggy are present in the larger one and their weights are at most as big.

Here’s a challenge: we have some materials and some stuff we want to build. Problem is, we don’t have enough materials to build all the stuff, so what we want to know is what combinations of stuff we can build. Let’s use some Bags!

my $materials = bag 'wood' xx 300, 'glass' xx 100, 'brick' xx 3000;
my @wanted =
    bag('wood' xx 200, 'glass' xx 50, 'brick' xx 3000) but 'house',
    bag('wood' xx 100, 'glass' xx 50)                  but 'shed',
    bag('wood' xx 50)                                  but 'dog-house';

say 'We can build...';
.put for @wanted.combinations.grep: { $materials  [] |$^stuff-we-want };

# OUTPUT:
# We can build...
#
# house
# shed
# dog-house
# house shed
# house dog-house
# shed dog-house

The $materials is a Bag with our materials. We used xx repetition operator to indicate quantities of each. Then we have a @wanted Array with three Bags in it: that’s the stuff we want to build. We’ve also used used the but operator on them to mix in names for them to override what those bags will .put out as at the end.

Now for the interesting part! We call .combinations on our list of stuff we want, and just as the name suggests, we get all the possible combinations of stuff we can build. Then, we .grep over the result, looking for any combination that takes at most all of the materials we have (that’s the operator). On it’s fatter end, we have our $materials Bag and on its narrower end, we have the operator that adds the bags of each combination of our stuff we want together, except we use it as a metaoperator [⊎], which is the same as putting that operator between each item of $^stuff-we-want. In case you it’s new to you: the $^ twigil on $^stuff-we-want creates an implicit signature on our .grep block and we can name that variable anything we want.

And there we have it! The output of the program shows we can build any combination of stuff, except the one that contains all three items. I guess we just can’t have it all…

…But wait! There’s more!

Mixing it Up

Let’s look back at our recipe code. There’s something not quite perfect about it:

my $recipe = bag 'egg', 'egg', 'cup of milk', 'cup of flour';
say $recipe;
# OUTPUT: bag(cup of flour, egg(2), cup of milk)

What if a recipe calls for half a cup of milk instead of a whole one? How do we represent a quarter of a teaspoon of salt, if Bags can only ever have integer weights?

The answer to that is the Mix type (with the corresponding mutable version, MixHash). Unlike a Bag, a Mix supports all Real weights, including negative weights. Thus, our recipe is best modeled with a Mix.

my $recipe = Mix.new-from-pairs:  'egg'          => 2, 'cup of milk' => ½,
                                  'cup of flour' => ¾, 'salt'        => ¼;
say $recipe;
# mix(salt(0.25), cup of flour(0.75), egg(2), cup of milk(0.5))

Be sure to quote your keys and don’t use colonpair form (:42a, or :a(42)), since those are treated as named arguments. There’s also a mix routine, but it doesn’t take weights and functions just like bag routine, except returning a Mix. And, of course, you can use a .Mix coercer on a hash or a list of pairs.

Less-Than-Awesome creation aside, let’s make something with mixes! Say, you’re an alchemist. You want to make a bunch of awesome potions and you need to know the total amount of ingredients you’ll need. However, you realize that some of the ingredients needed by some reactions are actually produced as a byproduct by other reactions you’re making. So, what’s the most efficient amount of stuff you’ll need? Mixes to the rescue!

my %potions =
    immortality  => (:oxium(6.98), :morphics(123.3),  :unobtainium(2)   ).Mix,
    invisibility => (:forma(9.85), :rubidium(56.3),   :unobtainium(−0.3)).Mix,
    strength     => (:forma(9.15), :rubidium(−30.3),  :kuva(0.3)        ).Mix,
    speed        => (:forma(1.35), :nano-spores(1.3), :kuva(1.3)        ).Mix;

say [] %potions.values;
# OUTPUT: mix(unobtainium(1.7), nano-spores(1.3), morphics(123.3),
#              forma(20.35), oxium(6.98), rubidium(26), kuva(1.6))

For convenience, we set up a Hash, with keys being names of potions and values being Mixes with quantities of ingredients. For reactions that produce one of the ingredients we seek, we’ve used negative weights, indicating the amount produced.

Then, we used the same set addition operator we saw earlier, in it’s meta form: [⊎]. We supply it the .values of our Hash that are our Mixes, and it happily adds up all of our ingredients, which we see in the output.

Look at unobtainium and rubidium: the set operator correctly accounted for the quantities produced by reactions where those ingredients have negative weights!

With immortality potion successfully mixed, all we need to do now is figure out what to do for the next few millennia… How about coding some Perl 6?