René Nyffenegger's collection of things on the web | |
René Nyffenegger on Oracle - Most wanted - Feedback
- Follow @renenyffenegger
|
Parse::RecDescent (Perl) | ||
Parse::RecDescent can be used to generate recursive descent parsers.
A parser is used to make meaning of an inputs (mostly given in textual form) that follow certain patterns. These patterns are described in a
grammar. The grammer consists of a set of rules (called productions). When the parser reads the input,
it tries to find a rule that matches and whenever it finds such a rule, it triggers an action. It then goes on to read from the input and to find matching patterns, until
the input has finished.
An example for a rule looks like so:
name_of_rule : name_of_other_rule 'SOME_LITERAL_TEXT' /regular-expression/ {action_to_trigger_if_rule_matches()} | alternative_rule {alternative_action_to_trigger()}
Note the bar (|) that seperates alternatives.
The entire grammar is passed through the new method of Parse::RecDescent.
Here's a simple example to demonstrate this:
use strict; use warnings; use Parse::RecDescent; my $grammar = q { start : character character character(s) {print "Found: ", $item[1], $item[2], join "", @{$item[3]}, "\n"; } character: /\w/ }; my $parser = Parse::RecDescent->new($grammar); defined $parser->start("ABCD") or die "didn't match";
The special expression (s) means: one more more occurances of the preceeding item.
First production, not longestuse strict; use warnings; use Parse::RecDescent; my $grammar = q { start : seq_1 seq_2 seq_1 : 'A' 'B' 'C' 'D' { print "seq_1: " . join (" ", @item[1..$#item]) . "\n" } | 'A' 'B' 'C' 'D' 'E' 'F' { print "seq_1: " . join (" ", @item[1..$#item]) . "\n" } seq_2 : character(s) character: /\w/ { print "character: $item[1]\n" } }; my $parser=Parse::RecDescent->new($grammar); $parser->start("A B C D E F G"); seq_1: A B C D character: E character: F character: G use strict; use warnings; use Parse::RecDescent; my $grammar = q { start: seq_1 seq_2 seq_1 : 'A' 'B' 'C' 'D' 'E' 'F' { print "seq_1: " . join (" ", @item[1..$#item]) . "\n" } | 'A' 'B' 'C' 'D' { print "seq_1: " . join (" ", @item[1..$#item]) . "\n" } seq_2 : character(s) character: /\w/ { print "character: $item[1]\n" } }; my $parser=Parse::RecDescent->new($grammar); $parser->start("A B C D E F"); seq_1: A B C D E F @item and %item@item
$item[0] is the name of the rule, as is $item{__RULE__}.
$item[n] (n > 0) is the value of the nth subitem in the rule.
use strict; use warnings; use Parse::RecDescent; my $grammar = q { start: seq seq : char_1 char_2 char_3 char_4 { print join (" -- ", @item); } char_1 : character char_2 : character char_3 : character char_4 : character character: /\w/ }; my $parser=Parse::RecDescent->new($grammar); $parser->start("A B C D"); seq -- A -- B -- C -- D
If an item in a production is an (s) item, $item[$prod_no] is a reference to an array:
use strict; use warnings; use Parse::RecDescent; my $grammar = q { start: seq seq : character(s) { print join (" -- ", @{$item[1]}) } character: /\w/ }; my $parser=Parse::RecDescent->new($grammar); $parser->start("A B C D"); A -- B -- C -- D %item
$item{__RULE__} is the name of the rule, as is $item[0]
use strict; use warnings; use Parse::RecDescent; my $grammar = q { start: seq seq : char_1 char_2 char_3 char_4 { print map {$_ . "=" . $item{$_} . "; "} keys %item} char_1 : character char_2 : character char_3 : character char_4 : character character: /\w/ }; my $parser=Parse::RecDescent->new($grammar); $parser->start("A B C D"); char_4=D; char_2=B; char_3=C; char_1=A; __RULE__=seq; dynamically matched rules
It is even possible to use @item to dynamically generate rules:
command: keyword body "end $item[1]" Cardinality of items
Usually, an item must occur exactly once in order to match the rule. However, it is possible to alter this behaviour.
item(s)
item occurs 1 or more times.
item(?)
.tem occurs 0 or 1 times.
item(s?)
.tem occurs 0, 1 or more times
item(n..m)
.tem occurs at least n times but no more than m times.
item(..m)
item occurs at most m times (0 times possible)
item(n..)
item occurs at least n times.
$text
The remaining (unparsed) text. Changes to $text do not propagate out of
unsuccessful productions, but do survive successful productions. Hence it is
possible to dynamically alter the text being parsed - for example, to provide a
#include-like facility:
hash_include: '#include' filename { $text = ::loadfile($item[2]) . $text } filename: '<' /[a-z0-9._-]+/i '>' { $return = $item[2] } | '"' /[a-z0-9._-]+/i '"' { $return = $item[2] } $thisline and $prevline
Note, $thisline starts with 1!
I have no idea what $prevline is or indicates.
use strict; use warnings; use Parse::RecDescent; my $grammar = q { start: line(s) line : "print_line" {print "print_line found on: $thisline\n"} |/.*/ }; my $parser=Parse::RecDescent->new($grammar); $parser->start(' hello world i want a print_line right now and print_line here again print_line '); print_line found on: 6 print_line found on: 8 Autoactions
An autoaction triggers at the end of each production.
$::RD_AUTOACTION must be defined when Parse::RecDescent::new() is called.
use strict; use warnings; use Parse::RecDescent; use Data::Dumper; $::RD_AUTOACTION = q { [@item] }; my $grammar = q { start : seq seq : char_1 char_2 number(s) thing(s) char_1 : character char_2 : character character : /[A-Z]/ number : /\d+/ thing : '(' key_value_pair(s) ')' key_value_pair: identifier "=" identifier identifier : m([A-Za-z_]\w*) }; my $parser=Parse::RecDescent->new($grammar); my $result = $parser->start("A B 42 29 5 ( foo = bar perl = cool hello = world )"); print Dumper($result);
Prints a parse tree
$VAR1 = [ 'start', [ 'seq', [ 'char_1', [ 'character', 'A' ] ], [ 'char_2', [ 'character', 'B' ] ], [ [ 'number', '42' ], [ 'number', '29' ], [ 'number', '5' ] ], [ [ 'thing', '(', [ [ 'key_value_pair', [ 'identifier', 'foo' ], '=', [ 'identifier', 'bar' ] ], [ 'key_value_pair', [ 'identifier', 'perl' ], '=', [ 'identifier', 'cool' ] ], [ 'key_value_pair', [ 'identifier', 'hello' ], '=', [ 'identifier', 'world' ] ] ], ')' ] ] ] ]; Zero width assertionsZero-width positive look ahead assertioninner_word: word ...word
Consumes word once if followed by another word.
use strict; use warnings; use Parse::RecDescent; my $grammar = q { start: seq(s) seq : 'A' 'B' ...'C' { print "A B eaten, C follows\n"} | 'A' 'B' character { print "A B $item[3] eaten\n"} | character character { print "two character eaten: $item[1] and $item[2]\n" } character: /\w/ }; my $parser=Parse::RecDescent->new($grammar); $parser->start("A B Q P A B E A B C A B X"); A B Q eaten two character eaten: P and A two character eaten: B and E A B eaten, C follows two character eaten: C and A two character eaten: B and X Zero-width negative look ahead assertion
Too lazy at the moment....
Directives<commit> and <uncommit><reject>
Causes the current production to fail.
<score><autoscore><skip><resync><error><rulevar><matchrule><leftop>
The <leftop:...> directive specifies a left-associative binary operator>.
<rightop>
The <lrightop:...> directive specifies a right-associative binary operator>.
<defer><nocheck><perl_quotelike><perl_codeblock><perl_variable><token>A calculatorWithout leftop and rightopuse strict; use warnings; use Parse::RecDescent; my $grammar = q { {my %vars} start: statements {print $item[1]} statements: statement ';' statements | statement statement: variable '=' statement {$vars {$item [1]} = $item [3]} | expression {$item [1]} expression: term '+' expression {$item [1] + $item [3]} | term '-' expression {$item [1] - $item [3]} | term term: factor '*' term {$item [1] * $item [3]} | factor '/' term {$item [1] / $item [3]} | factor factor: number | variable {$vars {$item [1]} ||= 0 } | '+' factor {$item [2]} | '-' factor {$item [2] * -1} | '(' statement ')' {$item [2]} number: /\d+/ {$item [1]} variable: /[a-z]+/i }; my $parser=Parse::RecDescent->new($grammar); my $result = $parser->start("three=3;six=2*three;eight=three+5;2+eight*six+50"); With leftop and rightopuse strict; use warnings; use Parse::RecDescent; my $grammar = q { {my %vars} start: statements {print $item[1]} statements: statement ';' statements | statement statement: <rightop: variable '=' expression> {my $value = pop @{$item [1]}; while (@{$item [1]}) { $vars {shift @{$item [1]}} = $value; } $value } expression: <leftop: term ('+' | '-') term> {my $s = shift @{$item [1]}; while (@{$item [1]}) { my ($op, $t) = splice @{$item [1]}, 0, 2; if ($op eq '+') {$s += $t} else {$s -= $t} } $s } term: <leftop: factor m{([*/])} factor> {my $t = shift @{$item [1]}; while (@{$item [1]}) { my ($op, $f) = splice @{$item [1]}, 0, 2; if ($op eq '/') {$t /= $f} else {$t *= $f} } $t } factor: number | variable {$vars {$item [1]} ||= 0 } | '+' factor {$item [2]} | '-' factor {$item [2] * -1} | '(' statement ')' {$item [2]} number: /\d+/ {$item [1]} variable: /[a-z]+/i }; my $parser=Parse::RecDescent->new($grammar); my $result = $parser->start("three=3;six=2*three;eight=three+5;2+eight*six+50"); LinksExamples using Parse::RecDescent
Parsing SQL create table statements and producing an ERD out of it.
|