package Parse::YYLex; =head1 NAME Parse::YYLex - version of Parse::Lex to be used by a byacc parser. =head1 SYNOPSIS I requires this perl version: require 5.004; use Parse::YYLex; If using a procedural parser: Parse::YYLex->create ...; # exports &yylex and $yylval # see Parse::Lex for the token table args <...> Parse::YYLex::lex->from(\*FH); require 'MyParser.pl'; # generated by byacc yyparse(); If using an object-oriented parser: $lexer = new Parse::YYLex ...; # see Parse::Lex for the token table args <...> use MyParser; # generated by byacc5 $parser = new MyParser($lexer->getyylex, \&yyerror, $debug); # you must write &yyerror $lexer->from(\*STREAM); $parser->yyparse(*STREAM); To get the token definitions from F instead of F or to change the skip regexp (default whitespace), do this before calling C or C: Parse::YYLex->ytab('MyParser.ph'); Parse::YYLex->skip(''); =head1 DESCRIPTION Often times you'd use a lexer in conjunction with a parser. And most of the time you'd want to generate that parser with a yacc parser generator. B is a derivative of I compatible with yacc parsers, by adapting it to the byacc calling conventions: =over 4 =item * The parser wants to receive integer token types as defined in F instead of the symbolic types that I returns. =item * The parser wants its tokens as two components (type and value), whereas I returns one object with these two components. Furthermore, a procedural parser wants the value stored in a variable C<$yylval>. =item * The parser wants to receive the tokens by calling a yylex B, not an object B. Thus we have to give the parser a curried form of the lexer function, where the self argument is fixed. =back =head2 Procedural Parsers Yacc (and Bison) traditionally generate C or C++ parsers. Fortunately, Berkeley yacc has been modified to generate Perl, see ftp://ftp.sterling.com/local/perl-byacc.tar.Z Byacc with the I<-P> option generates procedural perl code that is compatible with both perl4 and perl5. (However you cannot use B with perl4.) Use this variant for quick hacks, as it is more convenient than the one below. In this case Ccreate>> instantiates a lexer and exports a C<&yylex> function (the lexer) and a C<$yylval> variable (the token value) to its caller's namespace (which should be the namespace of the parser). If you need to call any object methods of the created lexer (see I for documentation), use the C<$Parse::YYLex::lex> variable. =head2 Object-Oriented Parsers Another byacc modification (I call it byacc5) generates object-oriented Perl5 code: CPAN/authors/id/JAKE/perl5-byacc-patches-0.5.tar.gz Use this variant if you need more than one parser, you need flexibility, or you simply like OO. In this case you need to use I, and pass the return value of I (a reference to the curried lexing function) to the parser. The lexing function returns a two-element array, the token type and value. =head2 Numeric Token Table Yacc parsers insist on using numeric token types, and define these in a file customarily named F. That is where B will look by default, and the file has to be in the @INC path (which includes the current directory). You can specify a different token table before calling C or C: Parse::YYLex->ytab('MyParser.ph'); =head1 LIMITATIONS C is based on I which requires perl 5.004 and will not work with earlier versions. A slightly different version, I, works with earlier perl versions. It would be easy to allow a choice between I and I, but the latter has some limitations, and presently seems to have some bugs. =head1 AUTHOR Vladimir Alexiev =head1 SEE ALSO byacc(1), L. =cut use strict; use Parse::Lex; use vars qw(@ISA $VERSION); @ISA = qw(Parse::Lex); $VERSION = 0.91; my ($TABLE,$yylval); my $ytab = 'y.tab.ph'; use vars '$lex'; # global lexer sub ytab { # class method my ($self,$tab) = @_; $ytab = $tab; } sub new { my $class = ref $_[0] || $_[0]; shift; my ($path) = grep -f "$_/$ytab", @INC; $path or die "can't locate the file $ytab in \@INC\n"; my %table; open(FILE,"$path/$ytab") or die "can't open $ytab: !$\n"; for () {$table{$1}=$2 if /^\$(\w+)=(\d+);$/} close(FILE); # Normally we would create the object, but Parse::Lex uses weird array-based # attributes instead of the conventional hash-based attributes. my $self = bless new Parse::Lex(@_), $class; # index of my slot in the Lex attr array if (!defined $TABLE) {$TABLE = scalar @$self;} elsif ($TABLE != scalar @$self) {die "Can't get a consistent slot for my TABLE attribute";} push @$self, \%table; $self; } sub create { # class method $lex = shift->new(@_); # get an instance my $caller = caller; no strict 'refs'; *{"${caller}::yylex"} = $lex->getyylex; $yylval = \${"${caller}::yylval"}; } sub yylex { # convert &next to yyparse calling conv my $self = shift; # my $stream = shift; $self->from($stream) if FIXME # it's different from last time... my $token = $self->next; my $text = $token->text; my $symbolic = $token->type; # symbolic token type my $numeric = # numeric token code $symbolic eq 'EOI' ? 0 : # end of input $self->[$TABLE]{$symbolic} || # token defined in y.tab.ph (length $text==1 ? ord $text : # single-character token, eg '+' die "Please declare %token $symbolic (`$text') in the parser.\n"); if ($yylval) { # global lexer $$yylval = $text; return $numeric; } else {return ($numeric,$text);} # it doesn't hurt to return both in both cases, but let's play safe } sub getyylex { my $self = shift; sub{$self->yylex(@_)} } 1;