I've got a shift/reduce conflict that I can't figure out why it's occurring, and how to resolve it.
Given this grammar:
%token IDENTIFIER
%start Expression
%%
CallExpression
: Expression "(" ")"
;
Lambda
: "(" ")" "=>" Expression
;
Expression
: IDENTIFIER
| CallExpression
| Lambda
;
I want to be able to parse expressions like this (not exhaustive):
foo
foo()
() => foo
() => () => foo
() => foo()
But I receive a shift/reduce conflict here:
State 11
1 CallExpression: Expression . "(" ")"
2 Lambda: "(" ")" "=>" Expression . [$end, "("]
"(" shift, and go to state 8
"(" [reduce using rule 2 (Lambda)]
$default reduce using rule 2 (Lambda)
I thought I understood when shift/reduces occur, but this one is escaping me so I need to get schooled.
I tried learning more about the precedence directives available, left, right, precedence, nonassoc but any use of them I try does not resolve the ambiguity and they also give me warning: useless precedence and associativity so I'm doing something wrong. Hoping the answer is obvious to others.
I initially thought this was related to the fact that Lambdas start with () and CallExpressions end with it, but changing these tokens to not conflict makes no difference.
/facepalm
Related
I have a simple grammar (for demonstration)
grammar Test;
program
: expression* EOF
;
expression
: Identifier
| expression '(' expression? ')'
| '(' expression ')'
;
Identifier
: [a-zA-Z_] [a-zA-Z_0-9?]*
;
WS
: [ \r\t\n]+ -> channel(HIDDEN)
;
Obviously the second and third alternatives in the expression rule are ambiguous. I want to resolve this ambiguity by permitting the second alternative only if an expression is immediately followed by a '('.
So the following
bar(foo)
should match the second alternative while
bar
(foo)
should match the 1st and 3rd alternatives (even if the token between them is in the HIDDEN channel).
How can I do that? I have seen these ambiguities, between call expressions and parenthesized expressions, present in languages that have no (or have optional) expression terminator tokens (or rules) - example
The solution to this is to temporary "unhide" whitespace in your second alternative. Have a look at this question for how this can be done.
With that solution your code could look somthing like this
expression
: Identifier
| {enableWS();} expression '(' {disableWS();} expression? ')'
| '(' expression ')'
;
That way the second alternative matches the input WS-sensitive and will therefore only be matched if the identifier is directly followed by the bracket.
See here for the implementation of the MultiChannelTokenStream that is mentioned in the linked question.
I am new to bison and I am trying to make a grammar parsing expressions.
I am facing a shift/reduce conflight right now I am not able to solve.
The grammar is the following:
%left "[" "("
%left "+"
%%
expression_list : expression_list "," expression
| expression
| /*empty*/
;
expression : "(" expression ")"
| STRING_LITERAL
| INTEGER_LITERAL
| DOUBLE_LITERAL
| expression "(" expression_list ")" /*function call*/
| expression "[" expression "]" /*index access*/
| expression "+" expression
;
This is my grammar, but I am facing a shift/reduce conflict with those two rules "(" expression ")" and expression "(" expression_list ")".
How can I resolve this conflict?
EDIT: I know I could solve this using precedence climbing, but I would like to not do so, because this is only a small part of the expression grammar, and the size of the expression grammar would explode using precedence climbing.
There is no shift-reduce conflict in the grammar as presented, so I suppose that it is just an excerpt of the full grammar. In particular, there will be precisely the shift/reduce conflict mentioned if the real grammar includes:
%start program
%%
program: %empty
| program expression
In that case, you will run into an ambiguity because given, for example, a(b), the parser cannot tell whether it is a single call-expression or two consecutive expressions, first a single variable, and second a parenthesized expression. To avoid this problem you need to have some token which separates expression (statements).
There are some other issues:
expression_list : expression_list "," expression
| expression
| /*empty*/
;
That allows an expression list to be ,foo (as in f(,foo)), which is likely not desirable. Better would be
arguments: %empty
| expr_list
expr_list: expr
| expr_list ',' expr
And the precedences are probably backwards. Usually one wants postfix operators like call and index to bind more tightly than arithmetic operators, so they should come at the end. Otherwise a+b(7) is (a+b)(7), which is unconventional.
Let's imagine I want to be able to parse values like this (each line is a separate example):
x
(x)
((((x))))
x = x
(((x))) = x
(x) = ((x))
I've written this YACC grammar:
%%
Line: Binding | Expr
Binding: Pattern '=' Expr
Expr: Id | '(' Expr ')'
Pattern: Id | '(' Pattern ')'
Id: 'x'
But I get a reduce/reduce conflict:
$ bison example.y
example.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
Any hint as to how to solve it? I am using GNU bison 3.0.2
Reduce/reduce conflicts often mean there is a fundamental problem in the grammar.
The first step in resolving is to get the output file (bison -v example.y produces example.output). Bison 2.3 says (in part):
state 7
4 Expr: Id .
6 Pattern: Id .
'=' reduce using rule 6 (Pattern)
')' reduce using rule 4 (Expr)
')' [reduce using rule 6 (Pattern)]
$default reduce using rule 4 (Expr)
The conflict is clear; after the grammar reads an x (and reduces that to an Id) and a ), it doesn't know whether to reduce the expression as an Expr or as a Pattern. That presents a problem.
I think you should rewrite the grammar without one of Expr and Pattern:
%%
Line: Binding | Expr
Binding: Expr '=' Expr
Expr: Id | '(' Expr ')'
Id: 'x'
Your grammar is not LR(k) for any k. So you either need to fix the grammar or use a GLR parser.
Suppose the input starts with:
(((((((((((((x
Up to here, there is no problem, because every character has been shifted onto the parser stack.
But now what? At the next step, x must be reduced and the lookahead is ). If there is an = somewhere in the future, x is a Pattern. Otherwise, it is an Expr.
You can fix the grammar by:
getting rid of Pattern and changing Binding to Expr | Expr '=' Expr;
getting rid of all the definitions of Expr and replacing them with Expr: Pattern
The second alternative is probably better in the long run, because it is likely that in the full grammar which you are imagining (or developing), Pattern is a subset of Expr, rather than being identical to Expr. Factoring Expr into a unit production for Pattern and the non-Pattern alternatives will allow you to parse the grammar with an LALR(1) parser (if the rest of the grammar conforms).
Or you can use a GLR grammar, as noted above.
I have a YACC grammar for parsing expressions in C++. Here is lite version:
// yacc.y
%token IDENT
%%
expr:
call_expr
| expr '<' call_expr
| expr '>' call_expr
;
call_expr:
IDENT
| '(' expr ')'
| IDENT '<' args '>' '(' args ')'
;
args:
IDENT
| args ',' IDENT
;
%%
When I want to support function call with template arguments, I got a shift/reduce conflict.
When we got input IDENT '<' IDENT, yacc doesn't know whether we should shift or reduce.
I want IDENT '<' args '>' '(' args ')' got higher precedence level than expr '<' call_expr, so I can parse the following exprs.
x < y
f<x>(a,b)
f<x,y>(a,b) < g<x,y>(c,d)
I see C++/C# both support this syntax. Is there any way to solve this problem with yacc?
How do I modify the .y file?
Thank you!
You want the -v option to yacc/bison. It will give you a .output file with all the info about the generated shift/reduce parser. With your grammar, bison gives you:
State 1 conflicts: 1 shift/reduce
:
state 1
4 call_expr: IDENT .
6 | IDENT . '<' args '>' '(' args ')'
'<' shift, and go to state 5
'<' [reduce using rule 4 (call_expr)]
$default reduce using rule 4 (call_expr)
which shows you where the problem is. After seeing an IDENT, when the next token is <, it doesn't know if it should reduce that call_expr (to ultimately match the rule expr: expr '<' call_expr) or if it should shift to match rule 6.
Parsing this with only 1 token lookahead is hard, as you have two distinct meanings of the token < (less-than or open angle bracket), and which is meant depends on the later tokens.
This case is actually even worse, as it is ambiguous, as an input like
a < b > ( c )
is probably a template call with both args lists being singletons, but it could also be
( a < b ) > ( c )
so just unfactoring the grammar won't help. Your best bet is to use a more powerful parsing method, like bison's %glr-parser option or btyacc
I've been trying to tackle a seemingly simple shift/reduce conflict with no avail. Naturally, the parser works fine if I just ignore the conflict, but I'd feel much safer if I reorganized my rules. Here, I've simplified a relatively complex grammar to the single conflict:
statement_list
: statement_list statement
|
;
statement
: lvalue '=' expression
| function
;
lvalue
: IDENTIFIER
| '(' expression ')'
;
expression
: lvalue
| function
;
function
: IDENTIFIER '(' ')'
;
With the verbose option in yacc, I get this output file describing the state with the mentioned conflict:
state 2
lvalue -> IDENTIFIER . (rule 5)
function -> IDENTIFIER . '(' ')' (rule 9)
'(' shift, and go to state 7
'(' [reduce using rule 5 (lvalue)]
$default reduce using rule 5 (lvalue)
Thank you for any assistance.
The problem is that this requires 2-token lookahead to know when it has reached the end of a statement. If you have input of the form:
ID = ID ( ID ) = ID
after parser shifts the second ID (lookahead is (), it doesn't know whether that's the end of the first statement (the ( is the beginning of a second statement), or this is a function. So it shifts (continuing to parse a function), which is the wrong thing to do with the example input above.
If you extend function to allow an argument inside the parenthesis and expression to allow actual expressions, things become worse, as the lookahead required is unbounded -- the parser needs to get all the way to the second = to determine that this is not a function call.
The basic problem here is that there's no helper punctuation to aid the parser in finding the end of a statement. Since text that is the beginning of a valid statement can also appear in the middle of a valid statement, finding statement boundaries is hard.