The language I'm writing a parser for has three constructs that are relevant here: the ord operator, represented by TOK_ORD, which casts character expressions into integer expressions, and [ ] and ., which are used for index and field access, respectively, just like in C.
Here's an excerpt from my precedence rules:
%right TOK_ORD
%left PREC_INDEX PREC_MEMBER
My grammar has a nonterminal expr, which represents expressions. Here's some relevant snippets from the grammar (TOK_IDENT is a regular expression for identifiers defined in my .l file):
expr : TOK_ORD expr { /* semantic actions */ }
| variable { /* semantic actions */ }
;
variable : expr '[' expr ']' %prec PREC_INDEX { /* semantic actions */ }
| expr '.' TOK_IDENT %prec PREC_MEMBER { /* semantic actions */ }
;
Here's information about the shift/reduce conflict from the bison output file:
state 61
42 expr: expr . '=' expr
43 | expr . TOK_EQ expr
44 | expr . TOK_NE expr
45 | expr . TOK_LT expr
46 | expr . TOK_LE expr
47 | expr . TOK_GT expr
48 | expr . TOK_GE expr
49 | expr . '+' expr
50 | expr . '-' expr
51 | expr . '*' expr
52 | expr . '/' expr
53 | expr . '%' expr
57 | TOK_ORD expr .
72 variable: expr . '[' expr ']'
73 | expr . '.' TOK_IDENT
'[' shift, and go to state 92
'.' shift, and go to state 93
'[' [reduce using rule 57 (expr)]
'.' [reduce using rule 57 (expr)]
$default reduce using rule 57 (expr)
States 92 and 93, for reference:
state 92
72 variable: expr '[' . expr ']'
TOK_FALSE shift, and go to state 14
TOK_TRUE shift, and go to state 15
TOK_NULL shift, and go to state 16
TOK_NEW shift, and go to state 17
TOK_IDENT shift, and go to state 53
TOK_INTCON shift, and go to state 19
TOK_CHARCON shift, and go to state 20
TOK_STRINGCON shift, and go to state 21
TOK_ORD shift, and go to state 22
TOK_CHR shift, and go to state 23
'+' shift, and go to state 24
'-' shift, and go to state 25
'!' shift, and go to state 26
'(' shift, and go to state 29
expr go to state 125
allocator go to state 44
call go to state 45
callargs go to state 46
variable go to state 47
constant go to state 48
state 93
73 variable: expr '.' . TOK_IDENT
I don't understand why there is a shift/reduce conflict. My grammar clearly defines index and field access to have higher precedence than ord, but that doesn't seem to be enough.
In case you're wondering, yes, this is homework, but the assignment has already been turned in and graded. I'm going back through and trying to fix shift/reduce conflicts so I can better understand what's going on.
Precedence resolution of shift/reduce conflicts works by comparing the precedence of a production (or reduction, if you prefer) with the precedence of a token (the lookahead token).
This simple fact is slightly obscured because bison sets the precedence of a production based on the precedence of the last token in the production (by default), so it appears that precedence values are only being assigned to tokens and the precedence comparison is between token precedences. But that's not accurate: the precedence comparison is always between a production (which might be reduced) and a token (which might be shifted).
As the bison manual says:
…the resolution of conflicts works by comparing the
precedence of the rule being considered with that of the lookahead
token. If the token's precedence is higher, the choice is to shift.
If the rule's precedence is higher, the choice is to reduce.
Now, you define the precedence of the two variable productions using explicit declarations:
variable : expr '[' expr ']' %prec PREC_INDEX { /* semantic actions */ }
| expr '.' TOK_IDENT %prec PREC_MEMBER { /* semantic actions */ }
But those productions never participate in shift/reduce conflicts. When the end of either of the variable rules is reached, there is no possibility of shifting. The itemsets in which those productions can be reduced have no shift actions.
In the state which contains the shift/reduce conflict, the conflict is between the potential reduction:
expr: TOK_ORD expr
and the possible shifts involving lookahead tokens . and [. These conflicts will be resolved by comparing the precedence of the TOK_ORD reduction with the precedences of the lookahead tokens . and [, respectively. If those tokens do not have declared precedences, then the shift/reduce conflict cannot be resolved using the precedence mechanism.
In that case, though, I would expect there to be a large number of shift/reduce conflicts in other states, where the options are to reduce a binary operator (such as expr: expr '+' expr) or to shift the ./[ to extend the rightmost expr. So if there are no such shift/reduce conflicts, the explanation is more complicated and I'd need to see a lot more of the grammar to understand it.
Related
Problem part of grammar:
expr_var: var_or_ID
| expr_var '[' expr ']'
| NEW expr_var '(' expr_listE ')'
| expr_var '(' expr_listE ')'
| expr_var ARROW expr_var
| expr_var ARROW '{' expr_var '}'
| expr_var DCOLON expr_var
| expr_var DCOLON '{' expr_var '}'
| '(' expr_var ')'
;
Description of that problem:
expr_var -> NEW expr_var '(' expr_listE ')' . (rule 87)
expr_var -> expr_var '(' expr_listE ')' . (rule 88)
DCOLON reduce using rule 87 (expr_var)
DCOLON [reduce using rule 88 (expr_var)]
ARROW reduce using rule 87 (expr_var)
ARROW [reduce using rule 88 (expr_var)]
'[' reduce using rule 87 (expr_var)
'[' [reduce using rule 88 (expr_var)]
'(' reduce using rule 87 (expr_var)
'(' [reduce using rule 88 (expr_var)]
$default reduce using rule 87 (expr_var)
Each operator(ARROW,DCOLON...) is left-associative. Operator NEW is non-associative.
How can i resolve that conflict?
The issue here is the ambiguity in:
new foo(1)(2)
This could be parsed in two ways:
(new foo(1))(2) // call the new object
new (foo(1))(2) // function returns class to construct
Which (if either) of these is semantically feasible depends on your language. It is certainly conceivable that both are meaningful. The grammar does not provide any hints about the semantics.
So it is necessary for you to decide which (if either) of these is the intended parse, and craft the grammar accordingly.
Precedence declarations will not help because precedence is only used to resolve shift/reduce conflicts: a precedence relationship is always between a production and a lookahead token, never between two productions.
Bison will resolve reduced/reduce conflicts in favour of the first production, as it has done in this case. So if the syntax is legal and unambiguous then you could select which parse is desired by reordering the productions if necessary. That will leave you with the warning, but you can suppress it with an %expect-rr declaration.
You could also suppress one or the other parse (or both) by explicitly modifying the grammar to exclude new expressions from being the first argument in a call and/or exclude calls from being the first argument of a new, by means of an explicit subset of expr_var.
I feel like the Lemon parser generator is doing it wrong with nonassoc precedence. I have a simplified grammar that exhibits the problems I'm seeing.
%nonassoc EQ.
%left PLUS.
stmt ::= expr.
expr ::= expr EQ expr.
expr ::= expr PLUS expr.
expr ::= IDENTIFIER.
Yields a report with a conflict like so:
State 4:
expr ::= expr * EQ expr
(1) expr ::= expr EQ expr *
expr ::= expr * PLUS expr
EQ shift 2
EQ reduce 1 ** Parsing conflict **
PLUS shift 1
{default} reduce 1
If I tell it that equals is left associative, the problem goes away. It's as if nonassoc doesn't put the rule into the precedence set. Comparing to a Bison version of that grammar, there is no conflict. And assignment really should be nonassociative. I'd rather not lie to it about that to work around this.
After spending some time poring over the reports generated by both Lemon and Bison for the associated grammars, I can only conclude that Lemon is, indeed, mishandling nonassoc precedence. The smoking gun is contained in that state 4 quoted above, but I should probably lay out some more detail for clarity.
The states the build up to expr EQ are straightforward. You arrive at state 2 then:
State 2:
expr ::= * expr EQ expr
expr ::= expr EQ * expr
expr ::= * expr PLUS expr
expr ::= * IDENTIFIER
IDENTIFIER shift 5
expr shift 4
This state contains the current expr EQ item, which expects to be followed by another expr. Because of that, it contains the First set for expr, which are the 3 entries starting with * in the state. If we read an expr in this state, we'll land in state 4 with an item either partway through the reduction or at the end.
expr ::= expr * EQ expr
expr ::= expr EQ expr *
What happens if we read an EQ in this state? I told Lemon the answer. It's an error because EQ is nonassociative. Instead it reports a shift/reduce conflict. In practice, it will shift, which will let it accept an illegal parse, such as x=y=z.
Bison contains these same states, numbered differently, but with a telling distinction.
state 8
2 expr: expr . EQ expr [$end, PLUS]
2 | expr EQ expr . [$end, PLUS]
3 | expr . PLUS expr
EQ error (nonassociative)
$default reduce using rule 2 (expr)
Conflict between rule 2 and token PLUS resolved as reduce (PLUS < EQ).
Conflict between rule 2 and token EQ resolved as an error (%nonassoc EQ).
Bison knows what nonassociative means, and uses that to eliminate the supposed ambiguity if it sees a second EQ in an expression.
I have a YACC grammar for parsing expressions in C++. Here is lite version:
// yacc.y
%token IDENT
%%
expr:
call_expr
| expr '<' call_expr
| expr '>' call_expr
;
call_expr:
IDENT
| '(' expr ')'
| IDENT '<' args '>' '(' args ')'
;
args:
IDENT
| args ',' IDENT
;
%%
When I want to support function call with template arguments, I got a shift/reduce conflict.
When we got input IDENT '<' IDENT, yacc doesn't know whether we should shift or reduce.
I want IDENT '<' args '>' '(' args ')' got higher precedence level than expr '<' call_expr, so I can parse the following exprs.
x < y
f<x>(a,b)
f<x,y>(a,b) < g<x,y>(c,d)
I see C++/C# both support this syntax. Is there any way to solve this problem with yacc?
How do I modify the .y file?
Thank you!
You want the -v option to yacc/bison. It will give you a .output file with all the info about the generated shift/reduce parser. With your grammar, bison gives you:
State 1 conflicts: 1 shift/reduce
:
state 1
4 call_expr: IDENT .
6 | IDENT . '<' args '>' '(' args ')'
'<' shift, and go to state 5
'<' [reduce using rule 4 (call_expr)]
$default reduce using rule 4 (call_expr)
which shows you where the problem is. After seeing an IDENT, when the next token is <, it doesn't know if it should reduce that call_expr (to ultimately match the rule expr: expr '<' call_expr) or if it should shift to match rule 6.
Parsing this with only 1 token lookahead is hard, as you have two distinct meanings of the token < (less-than or open angle bracket), and which is meant depends on the later tokens.
This case is actually even worse, as it is ambiguous, as an input like
a < b > ( c )
is probably a template call with both args lists being singletons, but it could also be
( a < b ) > ( c )
so just unfactoring the grammar won't help. Your best bet is to use a more powerful parsing method, like bison's %glr-parser option or btyacc
Assume this code works:
left '*'
left '+'
expr: expr '+' expr
| expr '*' expr
;
I want to define an other precedence marker like:
left MULTIPLY
left PLUS
expr: expr '+' expr %prec PLUS
| expr '*' expr %prec MULTIPLY
;
Yet this is not actually effective.
I suppose these two forms should be equivalent, however, they're not.
It's not on practical problem. I just want to know the reason and the principle for this phenomenon.
Thanks.
Yacc precedence rules aren't really about the precedence of expressions, though they can be used for that. Instead, they are a way of resolving shift/reduce conflicts (and ONLY shift/reduce conflicts) explicitly.
Understanding how it works requires understanding how shift/reduce (bottom up) parsing works. The basic idea is that you read token symbols from the input and push ("shift") those tokens onto a stack. When the symbols on the top of the stack match the right hand side of some rule in the grammar, you may "reduce" the rule, popping the symbols from the stack and replacing them with a single symbol from the left side of the rule. You repeat this process, shifting tokens and reducing rules until you've read the entire input and reduced it to a single instance of the start symbol, at which point you've successfully parsed the entire input.
The essential problem with the above (and what the whole machinery of the parser generator is solving) is knowing when to reduce a rule vs when to shift a token if both are possible. The parser generator (yacc or bison) builds a state machine that tracks which symbols have been shifted and so knows what 'partially matched' rules are currently possible and limits shifts just to those tokens that can match more of such a rule. This does not work if the grammar in question is not LALR(1), and so in such cases yacc/bsion reports shift/reduce or reduce/reduce conflicts.
The way that precedence rules work to resolve shift reduce conflicts is by assigning a precedence to certain tokens and rules in the grammar. Whenever there is a shift/reduce conflict between a token to be shifted and a rule to be reduced, and BOTH have a precedence it will do whichever one has higher precedence. If they have the SAME precedence, then it looks at the %left/%right/%nonassoc flag associated with the precedence level -- %left means reduce, %right means shift, and %nonassoc means do neither and treat it as a syntax error.
The only tricky remaining bit is how tokens and rules get their precedence. Tokens get theirs from the %left/%right/%nonassoc directive they are in, which also sets the ordering. Rules get precedence from a %prec directive OR from the right-most terminal on their right-hand-side. So when you have:
%left '*'
%left '+'
expr: expr '+' expr
| expr '*' expr
;
You are setting the precedence of '*' and '+' with the %left directives, and the two rules get their precedence from those tokens.
When you have:
%left MULTIPLY
%left PLUS
expr: expr '+' expr %prec PLUS
| expr '*' expr %prec MULTIPLY
;
You are setting the precedence of the tokens MULTIPLY and PLUS and then explicitly setting the rules to have those precedences. However you are NOT SETTING ANY PRECEDENCE for the tokens '*' and '+'. So when there is a shift/reduce conflict between one of the two rules and either '*' or '+', precedence does not resolve it because the token has no precedence.
You say you are not trying to solve a specific, practical problem. And from your question, I'm a little confused about how you are trying to use the precedence marker.
I think you will find that you don't need to use the precedence marker often. It is usually simpler, and clearer to the reader, to rewrite your grammar so that precedence is explicitly accounted for. To give multiply and divide higher precedence than add and subtract, you can do something like this (example adapted from John Levine, lex & yacc 2/e, 1992):
%token NAME NUMBER
%%
stmt : NAME '=' expr
| expr
;
expr : expr '+' term
| expr '-' term
| term
;
term : term '*' factor
| term '/' factor
| factor
;
factor : '(' expr ')'
| '-' factor
| NUMBER
;
In your example, PLUS and MULTIPLY are not real tokens; you can't use them interchangeably with '+' and '*'. Levine calls them pseudo-tokens. They are there to link your productions back to your list of precedences that you have defined with %left and %nonassoc declarations. He gives this example of how you might use %prec to give unary minus high precedence even though the '-' token has low precedence:
%token NAME NUMBER
%left '-' '+'
%left '*' '/'
%nonassoc UMINUS
%%
stmt : NAME '=' expr
| expr
;
expr : expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '-' expr %prec UMINUS
| '(' expr ')'
| NUMBER
;
To sum up, I would recommend following the pattern of my first code example rather than the second; make the grammar explicit.
Shift-reduce conflicts are a conflict between trying to reduce a production versus shifting a token and moving to the nest state. When Bison is resolving a conflict its not comparing two rules and choosing one of them - its comparing one rule that it wants to reduce and the token that you want to shift in the other rule(s). This might be clearer if you have two rules to shift:
expr: expr '+' expr
| expr '*' expr
| expr '*' '*' expr
The reason this is all confusing is that the way Bison gives a precedence to the "reduce" rule is to associate it with a token (the last terminal in the rule by default or the token from the prec declaration) and then it uses the precedence table to compares that token to the token you are trying to shift. Basically, prec declarations only make sense for the "reduce" part of a conflict and they are not counted for the shift part.
One way to see this is with the following grammar
command: IF '(' expr ')' command %prec NOELSE
: IF '(' expr ')' command ELSE command
In this grammar you need to choose between reducing the first rule or shifting the ELSE token. You do this by either giving precedences to the ')' token and to the ELSE token or by using a prec declaration and giving a precedence for NOELSE instead of ')'. If you try to give a prec declaration to the second it will get ignored and Bison will continue trying to look for the precedence of the ELSE token in the precedence table.
I'm writing a grammar in YACC (actually Bison), and I'm having a shift/reduce problem. It results from including the postfix increment and decrement operators. Here is a trimmed down version of the grammar:
%token NUMBER ID INC DEC
%left '+' '-'
%left '*' '/'
%right PREINC
%left POSTINC
%%
expr: NUMBER
| ID
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| INC expr %prec PREINC
| DEC expr %prec PREINC
| expr INC %prec POSTINC
| expr DEC %prec POSTINC
| '(' expr ')'
;
%%
Bison tells me there are 12 shift/reduce conflicts, but if I comment out the lines for the postfix increment and decrement, it works fine. Does anyone know how to fix this conflict? At this point, I'm considering moving to an LL(k) parser generator, which makes it much easier, but LALR grammars have always seemed much more natural to write. I'm also considering GLR, but I don't know of any good C/C++ GLR parser generators.
Bison/Yacc can generate a GLR parser if you specify %glr-parser in the option section.
Try this:
%token NUMBER ID INC DEC
%left '+' '-'
%left '*' '/'
%nonassoc '++' '--'
%left '('
%%
expr: NUMBER
| ID
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '++' expr
| '--' expr
| expr '++'
| expr '--'
| '(' expr ')'
;
%%
The key is to declare postfix operators as non associative. Otherwise you would be able to
++var++--
The parenthesis also need to be given a precedence to minimize shift/reduce warnings
I like to define more items. You shouldn't need the %left, %right, %prec stuff.
simple_expr: NUMBER
| INC simple_expr
| DEC simple_expr
| '(' expr ')'
;
term: simple_expr
| term '*' simple_expr
| term '/' simple_expr
;
expr: term
| expr '+' term
| expr '-' term
;
Play around with this approach.
This basic problem is that you don't have a precedence for the INC and DEC tokens, so it doesn't know how to resolve ambiguities involving a lookahead of INC or DEC. If you add
%right INC DEC
at the end of the precedence list (you want unaries to be higher precedence and postfix higher than prefix), it will fix it, and you can even get rid of all the PREINC/POSTINC stuff, as it's irrelevant.
preincrement and postincrement operators have nonassoc so define that in the precedence section and in the rules make the precedence of these operators high by using %prec