Posts Tagged ‘grammar’

Day 18 – A Grammar with duplicate checking

December 18, 2013

Today’s example constructs a grammar for tracking playing cards in a single deal. We’ll say it’s poker with one or more players and that each player is being dealt a hand that contains exactly five cards.

There is, however, the need to detect duplicate cards. We’ll need some way of tracking cards both within each card-hand and between hands.

A simple Card Game Grammar

To start with, here’s the basic grammar (no duplicate checks yet):

grammar CardGame {

    rule TOP { ^ <deal> $ }

    rule deal {
        <hand>+ % ';'
    }

    rule hand { [ <card> ]**5 }
    token card {<face><suit>}

    proto token suit {*}
    token suit:sym<♥>  {<sym>}
    token suit:sym<♦>  {<sym>}
    token suit:sym<♣>  {<sym>}
    token suit:sym<♠>  {<sym>}

    token face {:i <[2..9]> | 10 | j | q | k | a }
}

say CardGame.parse("2♥ 5♥ 7♦ 8♣ 9♠");
say CardGame.parse("2♥ a♥ 7♦ 8♣ j♥");

The  top-level rule consists of a deal. The deal consists of one or more hands separated by ';'. Each hand consists of 5 playing cards.

Each card is represented by a face, one of: a (ace), j (jack), q (queen) or k (king), or 2 – 10. This is followed by a suite: ♥ (hearts) ♦ (diamonds) ♣ (clubs) or ♠ (spades).

[We could have used the playing cards characters, newly introduced in Unicode 6.0, but these aren't widely supported yet].

As expected, the first cut of the grammar cheerily parses any hand:

say CardGame.parse("a♥ a♥ 7♦ 8♣ j♥");
# one hand, duplicate a♥
say CardGame.parse("a♥ 7♥ 7♦ 8♣ j♥; 10♥ j♥ q♥ k♥ a♥");
# two hands, duplicate j♥

Detecting Duplicates

We start by adding a Perl 6 variable declaration to the grammar. This will be used to track cards:

rule deal {
    :my %*PLAYED = ();
    <hand>+ % ';'
}

This declares %*PLAYED [1]. The '%*' twigil  indicates that it’s a hash '%' and that’s dynamically scoped '*'.

Dynamic scoping is not only for subroutine and method calls [1]. It also works seamlessly with grammar rules, tokens and actions.

Being dynamically scoped, %*PLAYED is available to callees of the deal rule; the hand token, and its callee, the card token.

It’s also available to any actions, that then get called. So we can track and report on duplicates by creating an action class with a method for the card token:

class CardGame::Actions {
    method card($/) {
       my $card = $/.lc;
       say "Hey, there's an extra $card"
           if %*PLAYED{$card}++;
   }
}

my $a = CardGame::Actions.new;
say CardGame.parse("a♥ a♥ 7♦ 8♣ j♥", :actions($a));
# "Hey there's an extra a♥"
say CardGame.parse("a♥ 7♥ 7♦ 8♣ j♥; 10♥ j♥ q♥ k♥ a♦",
                   :actions($a));
# "Hey there's an extra j♥"

And that might be all that’s needed  for tracking and reporting on duplicates. There’s a pretty good separation between the declarative grammar and procedural actions, with just one dynamically scoped hash variable.

Disallowing Duplicates

But I had a situation where I wanted duplicate checking to be a parsing constraint. Parsing needed to fail when duplicates were encountered.

I achieved this by moving the duplicate check grammar side:

token card {<face><suit>
    <?{
        # only allow each card to appear once
        my $card = $/.lc;
        say "Hey, there's an extra $card"
            if %*PLAYED{$card};

        ! %*PLAYED{$card}++;
     }>
}

This has introduced a code assertion between the <?{ and }>  [2]. The rule succeeds when the code evaluates to a True value. The card token thus fails when the same card is detected more than once in a single deal.

say CardGame.parse("2♥ 7♥ 2♦ 3♣ 3♦");
# legitimate, parses

say CardGame.parse("a♥ a♥ 7♦ 8♣ j♥");
# fails with message: Hey, there's an extra a♥

say CardGame.parse("a♥ 7♥ 7♦ 8♣ j♥; 10♥ j♥ q♥ k♥ a♦");
# fails with message: Hey, there's an extra j♥

Discussion/Conclusion

One thing to be careful of with this type of technique is back-tracking (trying of alternatives). If, for instance, the grammar was modified in such a way that the card token could be called more than once for single input card, then we might erroneously report a duplicate. It’s still possible to track, but becomes a bit more involved. The simplest answer is to keep the grammars as simple as possible and minimize back-tracking.

If in any doubt,  please consider using one or more of the Grammar::Debugger, Grammar::Tracer [3] or the debugger [4] modules [5] to track what’s going on. You can also insert debugging code into tokens or rules as closures: { say "here" } [6].

That the exercise for today; a simple Perl 6 Grammar to parse playing-cards in a card-game, but with duplicate checks using either actions or code assertions.

Day 06 – Parsing and generating recurring dates

December 6, 2013

There are a lot of events that are scheduled on particular days of the week each month, for example the regular Windows Patch Day on the second Tuesday of each month, or in Perl 6 land that Rakudo Perl 6 compiler release, which is scheduled for two days after the Parrot release day, which again is scheduled for the third Tuesday of the month.

So let's write something that calculates those dates.

The specification format I have chosen looks like 3rd tue + 2 for the Rakudo release date, that is, two days after the 3rd Tuesday of each month (note that this isn't always the same as the 3rd Thursday).

Parsing it isn't hard with a simple grammar:

grammar DateSpec::Grammar {
    rule TOP {
        [<count><.quant>?]?
        <day-of-week>
        [<sign>? <offset=count>]?
    }
    token count { \d+ }
    token quant { st | nd | rd | th }
    token day-of-week { :i
        [ mon | tue | wed | thu | fri | sat | sun ]
    }
    token sign { '+' | '-' }
}

As you can see, everything except the day of the week is optional, so sun would simply be the first Sunday of the month, and 2 sun - 1 the Saturday before the second Sunday of the month.

Now it's time to actually turn this specification into a data structure that does something useful. And for that, a class wouldn't be a bad choice:

my %dow = (mon => 1, tue => 2, wed => 3, thu => 4,
        fri => 5, sat => 6, sun => 7);

class DateSpec {
    has $.day-of-week;
    has $.count;
    has $.offset;

    multi method new(Str $s) {
        my $m = DateSpec::Grammar.parse($s);
        die "Invalid date specification '$s'\n" unless $m;
        self.bless(
            :day-of-week(%dow{lc $m<day-of-week>}),
            :count($m<count> ?? +$m<count>[0] !! 1),
            :offset( ($m<sign> eq '-' ?? -1 !! 1)
                    * ($m<offset> ?? +$m<offset> !! 0)),
        );
    }

We only need three pieces of data from those date specification strings: the day of the week, whether the 1st, 2nd, 3rd. etc is wanted (here named $.count), and the offset. Extracting them is a wee bit fiddly, mostly because so many pieces of the grammar are optional, and because the grammar allows a space between the sign and the offset, which means we can't use the Perl 6 string-to-number conversion directly.

There is a cleaner but longer method of extracting the relevant data using an actions class.

The closing } is missing, because the class doesn't do anything useful yet, and that should be added. The most basic operation is to find the specified date in a given month. Since Perl 6 has no built-in type for months, we use a Date object where the .day is one, that is, a Date object for the first day of the month.

    method based-on(Date $d is copy where { .day == 1}) {
        ++$d until $d.day-of-week == $.day-of-week;
        $d += 7 * ($.count - 1) + $.offset;
        return $d;
    }

The algorithm is quite simple: Proceed to the next date (++$d) until the day of week matches, then advance as many weeks as needed, plus as many days as needed for the offset. Date objects support addition and subtraction of integers, and the integers are interpreted as number of days to add or subtract. Handy, and exactly what we need here. (The API is blatantly copied from the Date::Simple Perl 5 module).

Another handy convenience method to implement is next, which returns the next date matching the specification, on or after a reference date.

    method next(Date $d = Date.today) {
        my $month-start = $d.truncated-to(month);
        my $candidate   = $.based-on($month-start);
        if $candidate ge $d {
            return $candidate;
        }
        else {
            return $.based-on($month-start + $month-start.days-in-month);
        }
    }
}

Again there's no rocket science involved: try the date based on the month of $d, and if that's before $d, try again, but with the next month as base.

Time to close the class :-).

So, when is the next Rakudo release? And the next Rakudo release after Christmas?

my $spec = DateSpec.new('3rd Tue + 2');
say $spec.next;
say $spec.next(Date.new(2013, 12, 25));

Output:

2013-12-19
2014-01-23

The code works fine on Rakudo with both the Parrot and the JVM backend.

Happy recurring hollidates!

Day 10: A Regex Story

December 10, 2009

On this tenth day of advent, we have the gift of a story …

Once upon a time in a land closer than you might think, an apprentice Perl 6 programmer named Tim was working on a simple parsing problem. His boss (let’s just call him Mr. C) had asked him to parse log files containing inventory information to make sure that there were only valid lines within the file. Each valid line within the file looked like this:

    <part number> <quantity> <color> <description>

So the Perl 6 apprentice, having some familiarity with regular expressions wrote a nice little regex that could be used to identify valid lines. The code that validated each line looked like this:

    next unless $line ~~ / ^^ \d+ \s+ \d+ \s+ \S+ \s+ \N* $$ /

The ~~ operator causes the regex on the right hand side to be matched against the scalar on the left hand side. In the regex itself, ^^ matches the beginning of a line, \d+ matches one or more digits (as the part number and quantity were so composed), \S+ matches one or more non- whitespace characters, \N* matches zero or more non-newline characters, \s+ matches whitespace in between each of these and $$ matches the end of a line. This being Perl 6, these individual components of the regex could be spread out a bit with spaces in between so that it could be more readable. All was good.

But then Mr. C decided that it would be nicer if the individual pieces of information could also be extracted from each in addition to validating it. Tim thought, “No problem, I’ll just use capturing parentheses”. And that’s just what he did:

    next unless $line ~~ / ^^ (\d+) \s+ (\d+) \s+ (\S+) \s+ (\N*) $$ /

After a successful pattern match, each parenthesized portion is available either as part of the match object itself ($/) via $/[0], $/[1], $/[2], or $/[3]. Or it could be accessed via the special variables $0, $1, $2, or $3. Tim was happy. Mr. C was happy.

But then it was discovered that some of the lines didn’t separate the color from the description and that these lines should be considered valid too. Lines where the color was integrated into the description were written a special way. They were always of the form:

    <part number> <quantity> <description> (<color>)

Where description, as before, could be any number of characters including spaces. “Blah!” thought Tim, “now this simple parser suddenly seems more complicated.” Luckily, Tim knew of a place to ask for help. He quickly logged on to irc.freenode.org, joined the #perl6 channel, and asked for help. Someone suggested that he should name the individual parts of his regex to make things easier and then use an alternation to match one or the other alternatives for the last part of the regex.

First, Tim tried naming the parts of his regex. Looking at the synopsis for Perl 6 regex, Tim found he could assign into the match object, so that’s what he did:

    next unless $line ~~ / ^^ $<product>=(\d+) \s+ $<quantity>=(\d+) \s+ $<color>=(\S+) \s+ $<description>=(\N*) $$ /

Now, after a successful match, the individual pieces are available via the match object or via special variables $<product>, $<quantity>, $<color>, and $<description>. This was turning out easier than expected and Tim was feeling quite confident. Next he added the alternation to distinguish between the two different valid lines:

    next unless $line ~~ / ^^
        $<product>=(\d+) \s+ $<quantity>=(\d+) \s+
        [
        | $<description>=(\N*) \s+ '(' $<color>=(\S+) ')'
        | $<color>=(\S+) \s+ $<description>=(\N*)
        ]
      $$
    /

In order to isolate the alternation from the rest of the regex, Tim surrounded it with grouping brackets ([ and ]). These group a portion of a regex much like parentheses only without capturing into $0 and friends. Since he needed to match literal parentheses, Tim took advantage of another useful Perl 6 regex feature: quoted strings are matched literally. And because of the assignments within the regex, $<color> and $<description> always contain the appropriate part of the string.

Tim was elated! He showed his code to Mr. C and Mr. C was elated too! “Well done Tim!” said Mr. C.

Everybody was happy. Tim beamed with pride.

However, after the initial glow of success faded, Tim started looking at his work with a more critical eye. For some of the lines where the color was at the end of the description, it was described as “( color)” or “( color )” or “( color )”. His current regex worked, but it would include the color as part of the description and wouldn’t set $<color> at all. That hardly seemed appropriate. Tim initially fixed this by adding more \s*:

    next unless $line ~~ / ^^
        $<product>=(\d+) \s+ $<quantity>=(\d+) \s+
        [
        | $<description>=(\N*) \s+ '(' \s* $<color>=(\S+) \s* ')'
        | $<color>=(\S+) \s+ $<description>=(\N*)
        ]
      $$
    /

This worked well, but the regex was starting to look a little cluttered. Again, Tim turned to #perl6 for help.

This time someone named PerlJam piped up, “Why don’t you put your regex in a grammar? That’s what you’re effectively doing by assigning each piece to a variable within the match object.” Wha??? Tim had no idea what PerlJam was talking about. After a brief exchange, Tim thought he understood and knew where to look if he needed more information. After thanking PerlJam, he went back to coding. This time the regex virtually disappeared as it turned into a grammar. Here’s what that grammar and matching code looked like:

grammar Inventory {
    regex product { \d+ }
    regex quantity { \d+ }
    regex color { \S+ }
    regex description { \N* }
    regex TOP { ^^ <product> \s+ <quantity>  \s+
                [
                | <description> \s+ '(' \s* <color> \s*  ')'
                | <color> \s+ <description>
                ]
                $$
    }
}

# ... and then later where the match happens
next unless Inventory.parse($line);

“Well,” thought Tim, “it is certainly more organized.”

Each of his variables in the previous incarnation of the regex simply became named regex within the grammar. Within Perl 6 regex, named regex are matched by simply enclosing the name within angle brackets (< and >). The specially named regex TOP is used when Grammar.parse is called with the scalar to match against. And the behavior is exactly the same as before because when a named regex matches as part of another regex, the text that was matched is saved in the match object and referenced by that name.

And though there was still room for improvement, both Tim and Mr. C were very happy with the result.

The End

* At the time of posting, Rakudo cannot correctly use this grammar to parse lines in the format

    <part number> <quantity> <description> (<color>)

Follow

Get every new post delivered to your Inbox.

Join 37 other followers