[prog] lex/yacc problem
Almut Behrens
almut-behrens at gmx.net
Wed May 28 22:50:12 EST 2003
On Wed, May 28, 2003 at 09:14:01AM -1000, Jimen Ching wrote:
> On Wed, 28 May 2003, Almut Behrens wrote:
> >do you hava a rule somewhere that handles the occurrence of whitespace, e.g.
> >
> ><INITIAL>{WS} { }
> >
> >in its most simple case (i.e. to ignore whitespace)? If not, the lexer
> >will pass the string " = " to the parser, instead of "=", as your yacc
> >grammar expects.
>
> I have the following two patterns in the lex specification:
>
> WSs {WS}+
>
> <INITIAL>{NL} { cur_lineno++; }
> <INITIAL>{WSs} { }
>
> Note, these two patterns are at the top of the lexer specification (comes
> before all other patterns).
okay, that should be fine then.
Taking a somewhat closer look at it, I think it's your lexer which is
already tokenizing the "#1 'b1001" into the sequence YYPOUND, Binary,
instead of YYPOUND, Number, Binary. (I'm assuming that you, in fact,
do want to split it up into (#1)('b1001), with the (#1) being the
optional_delay -- at least that's how I read your grammar...)
The thing is, that - at the lexer level - this is potentially
ambiguous, as your {Binary} can also take an optional number at the
beginning, so the sequence (#)(1 'b1001) would equally make sense...
One way to get around this would be to use a start condition, being
enabled when the '#' is encountered. In that context, you'd then return
a {Number} prematurely (i.e. before the Binary pattern gets a chance to
match).
Just played around a little... I think the following simplified lexer
would return a token sequence that your grammar can handle:
%{
int yywrap(void) { return 1; }
%}
WS [ \t\r\b]
Digit [0-9]
DigitU [0-9_]
Letter [a-zA-Z]
LetterU [a-zA-Z_]
WordNum [0-9a-zA-Z]
WordNumU [0-9a-zA-Z_]
Number {Digit}{DigitU}*
Word {LetterU}{WordNumU}*
Binary ([-+]?{Number}{WS}*)?\'[bB]{WS}*[01xXzZ?][01xXzZ?_]*
%x delay
%%
# { BEGIN(delay); return 1; }
<delay>{Number} { BEGIN(INITIAL); return 2; }
{Binary} return 3;
{Word} return 4;
{WS} /* eat up whitespace */
. return (int) yytext[0];
/* default rule for literal character tokens such as '=', ';' */
%%
main() {
int r;
while (r = yylex()) { printf("[r=%d]", r); }
}
When you run this standalone, you should get (from the printf in the
while loop) for your string "result = #1 'b1001;":
[r=4][r=61][r=1][r=2][r=3][r=59]
which corresponds to the token sequence
Word, '=', '#', Number, Binary, ';'
| | |
result 1 'b1001
I think this is a sequence your parser should be able to reduce
correctly...
(Instead of the 1, 2, 3, 4 constants here, you'd of course have other
values corresponding to YYPOUND, YYNUMBER, etc.)
Hope that makes a bit more sense,
Almut
More information about the Programming
mailing list