| René Nyffenegger's collection of things on the web | |
|
René Nyffenegger on Oracle - Most wanted - Feedback
|
Parse::RecDescent (Perl) | ||
|
Parse::RecDescent can be used to generate recursive descent parsers.
A parser is used to make meaning of an inputs (mostly given in textual form) that follow certain patterns. These patterns are described in a
grammar. The grammer consists of a set of rules (called productions). When the parser reads the input,
it tries to find a rule that matches and whenever it finds such a rule, it triggers an action. It then goes on to read from the input and to find matching patterns, until
the input has finished.
An example for a rule looks like so:
name_of_rule : name_of_other_rule 'SOME_LITERAL_TEXT' /regular-expression/
{action_to_trigger_if_rule_matches()} |
alternative_rule
{alternative_action_to_trigger()}
Note the bar (|) that seperates alternatives.
The entire grammar is passed through the new method of Parse::RecDescent.
Here's a simple example to demonstrate this:
use strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
start : character character character(s)
{print "Found: ", $item[1], $item[2], join "", @{$item[3]}, "\n"; }
character: /\w/
};
my $parser = Parse::RecDescent->new($grammar);
defined $parser->start("ABCD") or die "didn't match";
The special expression (s) means: one more more occurances of the preceeding item.
First production, not longestuse strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
start : seq_1 seq_2
seq_1 : 'A' 'B' 'C' 'D' { print "seq_1: " . join (" ", @item[1..$#item]) . "\n" }
| 'A' 'B' 'C' 'D' 'E' 'F' { print "seq_1: " . join (" ", @item[1..$#item]) . "\n" }
seq_2 : character(s)
character: /\w/ { print "character: $item[1]\n" }
};
my $parser=Parse::RecDescent->new($grammar);
$parser->start("A B C D E F G");
seq_1: A B C D character: E character: F character: G use strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
start: seq_1 seq_2
seq_1 : 'A' 'B' 'C' 'D' 'E' 'F'
{ print "seq_1: " . join (" ", @item[1..$#item]) . "\n" }
| 'A' 'B' 'C' 'D'
{ print "seq_1: " . join (" ", @item[1..$#item]) . "\n" }
seq_2 : character(s)
character: /\w/
{ print "character: $item[1]\n" }
};
my $parser=Parse::RecDescent->new($grammar);
$parser->start("A B C D E F");
seq_1: A B C D E F @item and %item@item
$item[0] is the name of the rule, as is $item{__RULE__}.
$item[n] (n > 0) is the value of the nth subitem in the rule.
use strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
start: seq
seq : char_1 char_2 char_3 char_4
{ print join (" -- ", @item); }
char_1 : character
char_2 : character
char_3 : character
char_4 : character
character: /\w/
};
my $parser=Parse::RecDescent->new($grammar);
$parser->start("A B C D");
seq -- A -- B -- C -- D
If an item in a production is an (s) item, $item[$prod_no] is a reference to an array:
use strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
start: seq
seq : character(s)
{ print join (" -- ", @{$item[1]}) }
character: /\w/
};
my $parser=Parse::RecDescent->new($grammar);
$parser->start("A B C D");
A -- B -- C -- D %item
$item{__RULE__} is the name of the rule, as is $item[0]
use strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
start: seq
seq : char_1 char_2 char_3 char_4
{ print map {$_ . "=" . $item{$_} . "; "} keys %item}
char_1 : character
char_2 : character
char_3 : character
char_4 : character
character: /\w/
};
my $parser=Parse::RecDescent->new($grammar);
$parser->start("A B C D");
char_4=D; char_2=B; char_3=C; char_1=A; __RULE__=seq; dynamically matched rules
It is even possible to use @item to dynamically generate rules:
command: keyword body "end $item[1]" Cardinality of items
Usually, an item must occur exactly once in order to match the rule. However, it is possible to alter this behaviour.
item(s)
item occurs 1 or more times.
item(?)
.tem occurs 0 or 1 times.
item(s?)
.tem occurs 0, 1 or more times
item(n..m)
.tem occurs at least n times but no more than m times.
item(..m)
item occurs at most m times (0 times possible)
item(n..)
item occurs at least n times.
$text
The remaining (unparsed) text. Changes to $text do not propagate out of
unsuccessful productions, but do survive successful productions. Hence it is
possible to dynamically alter the text being parsed - for example, to provide a
#include-like facility:
hash_include: '#include' filename
{ $text = ::loadfile($item[2]) . $text }
filename: '<' /[a-z0-9._-]+/i '>' { $return = $item[2] }
| '"' /[a-z0-9._-]+/i '"' { $return = $item[2] }
$thisline and $prevline
Note, $thisline starts with 1!
I have no idea what $prevline is or indicates.
use strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
start: line(s)
line : "print_line" {print "print_line found on: $thisline\n"}
|/.*/
};
my $parser=Parse::RecDescent->new($grammar);
$parser->start('
hello world
i want a print_line right
now
and
print_line
here again
print_line
');
print_line found on: 6 print_line found on: 8 Autoactions
An autoaction triggers at the end of each production.
$::RD_AUTOACTION must be defined when Parse::RecDescent::new() is called.
use strict;
use warnings;
use Parse::RecDescent;
use Data::Dumper;
$::RD_AUTOACTION = q { [@item] };
my $grammar = q {
start : seq
seq : char_1 char_2 number(s) thing(s)
char_1 : character
char_2 : character
character : /[A-Z]/
number : /\d+/
thing : '(' key_value_pair(s) ')'
key_value_pair: identifier "=" identifier
identifier : m([A-Za-z_]\w*)
};
my $parser=Parse::RecDescent->new($grammar);
my $result = $parser->start("A B 42 29 5 ( foo = bar perl = cool hello = world )");
print Dumper($result);
Prints a parse tree
$VAR1 = [
'start',
[
'seq',
[
'char_1',
[
'character',
'A'
]
],
[
'char_2',
[
'character',
'B'
]
],
[
[
'number',
'42'
],
[
'number',
'29'
],
[
'number',
'5'
]
],
[
[
'thing',
'(',
[
[
'key_value_pair',
[
'identifier',
'foo'
],
'=',
[
'identifier',
'bar'
]
],
[
'key_value_pair',
[
'identifier',
'perl'
],
'=',
[
'identifier',
'cool'
]
],
[
'key_value_pair',
[
'identifier',
'hello'
],
'=',
[
'identifier',
'world'
]
]
],
')'
]
]
]
];
Zero width assertionsZero-width positive look ahead assertioninner_word: word ...word
Consumes word once if followed by another word.
use strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
start: seq(s)
seq : 'A' 'B' ...'C'
{ print "A B eaten, C follows\n"}
| 'A' 'B' character
{ print "A B $item[3] eaten\n"}
| character character
{ print "two character eaten: $item[1] and $item[2]\n" }
character: /\w/
};
my $parser=Parse::RecDescent->new($grammar);
$parser->start("A B Q P A B E A B C A B X");
A B Q eaten two character eaten: P and A two character eaten: B and E A B eaten, C follows two character eaten: C and A two character eaten: B and X Zero-width negative look ahead assertion
Too lazy at the moment....
Directives<commit> and <uncommit><reject>
Causes the current production to fail.
<score><autoscore><skip><resync><error><rulevar><matchrule><leftop>
The <leftop:...> directive specifies a left-associative binary operator>.
<rightop>
The <lrightop:...> directive specifies a right-associative binary operator>.
<defer><nocheck><perl_quotelike><perl_codeblock><perl_variable><token>A calculatorWithout leftop and rightop
use strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
{my %vars}
start: statements {print $item[1]}
statements: statement ';' statements
| statement
statement: variable '=' statement {$vars {$item [1]} = $item [3]}
| expression {$item [1]}
expression: term '+' expression {$item [1] + $item [3]}
| term '-' expression {$item [1] - $item [3]}
| term
term: factor '*' term {$item [1] * $item [3]}
| factor '/' term {$item [1] / $item [3]}
| factor
factor: number
| variable {$vars {$item [1]} ||= 0 }
| '+' factor {$item [2]}
| '-' factor {$item [2] * -1}
| '(' statement ')' {$item [2]}
number: /\d+/ {$item [1]}
variable: /[a-z]+/i
};
my $parser=Parse::RecDescent->new($grammar);
my $result = $parser->start("three=3;six=2*three;eight=three+5;2+eight*six+50");
With leftop and rightop
use strict;
use warnings;
use Parse::RecDescent;
my $grammar = q {
{my %vars}
start: statements {print $item[1]}
statements: statement ';' statements
| statement
statement: <rightop: variable '=' expression>
{my $value = pop @{$item [1]};
while (@{$item [1]}) {
$vars {shift @{$item [1]}} = $value;
}
$value
}
expression: <leftop: term ('+' | '-') term>
{my $s = shift @{$item [1]};
while (@{$item [1]}) {
my ($op, $t) = splice @{$item [1]}, 0, 2;
if ($op eq '+') {$s += $t}
else {$s -= $t}
}
$s
}
term: <leftop: factor m{([*/])} factor>
{my $t = shift @{$item [1]};
while (@{$item [1]}) {
my ($op, $f) = splice @{$item [1]}, 0, 2;
if ($op eq '/') {$t /= $f}
else {$t *= $f}
}
$t
}
factor: number
| variable {$vars {$item [1]} ||= 0 }
| '+' factor {$item [2]}
| '-' factor {$item [2] * -1}
| '(' statement ')' {$item [2]}
number: /\d+/ {$item [1]}
variable: /[a-z]+/i
};
my $parser=Parse::RecDescent->new($grammar);
my $result = $parser->start("three=3;six=2*three;eight=three+5;2+eight*six+50");
LinksExamples using Parse::RecDescent
Parsing SQL create table statements and producing an ERD out of it.
|