René Nyffenegger's collection of things on the web | |
René Nyffenegger on Oracle - Most wanted - Feedback
- Follow @renenyffenegger
|
HTML::TokeParser [Perl] | ||
use strict; use warnings; use LWP::Simple; use HTML::TokeParser; my $doc = get("http://www.adp-gmbh.ch/sitemap.html"); my $parser = HTML::TokeParser->new(\$doc); my $indent = 0; my $print_it = 0; r(); sub r { while (my $x = $parser->get_token) { if ($x->[0] eq 'S') { if ($x->[1] eq 'ul') { $indent++; } elsif ($x->[1] eq 'li') { print "\n" if $print_it; print " " x $indent; $print_it = 1; } } elsif ($x->[0] eq 'T') { print $x->[1] if $print_it; } elsif ($x->[0] eq 'E') { if ($x->[1] eq 'ul') { $indent--; $print_it = 0; } } } } get_token
get_token returns the next token of the parsed document. After the last token, it returns undef. The returned token is acutally a reference
to an array whose first element describes the type of the token.
The following types can be returned:
|