Warning: "rule useless in parser due to conflicts" in Bison - parsing

I am trying to make a C lexical analyzer and I have some warnings:
rule useless in parser due to conflicts: sentenceList: sentenceList sentence
rule useless in parser due to conflicts: sentSelection: IF '(' expression ')' sentence
rule useless in parser due to conflicts: sentSelection: IF '(' expression ')' sentence ELSE sentence
rule useless in parser due to conflicts: sentSelection: SWITCH '(' expression ')' sentence
rule useless in parser due to conflicts: sentIteration: WHILE '(' expression ')' sentence
rule useless in parser due to conflicts: sentIteration: FOR '(' expression ';' expression ';' expression ')' sentence
This is the part of the code where the warnings come from:
input: /* nothing */
| input line
;
line: '\n'
| sentence '\n'
;
sentence : sentComposed
|sentSelection
|sentExpression
|sentIteration
;
sentComposed: statementsList
|sentenceList
;
statementsList: statement
| statementsList statement
;
sentenceList: sentence
|sentenceList sentence
;
sentExpression: expression ';'
|';'
;
sentSelection: IF '(' expression ')' sentence
|IF '(' expression ')' sentence ELSE sentence
|SWITCH '(' expression ')' sentence
;
sentIteration: WHILE '(' expression ')' sentence
|DO sentence WHILE '(' expression ')' ';'
|FOR '(' expression ';' expression ';' expression ')' sentence
;
statement: DATATYPE varList
;
varList: aVar
|varList ',' aVar
;
aVar: variable inicial
;
variable: IDENTIFIER
;
initial: '=' NUM
;
I have just added some more information
Every word in uppercase letters are tokens.
If you need any aditional information please tell me

Here's a considerably simplified (but complete) excerpt of your grammar. I've declared expression to be a terminal so as to avoid having to define it:
%token expression IF
%%
sentence : sentComposed
|sentSelection
|sentExpression
sentComposed: sentenceList
sentenceList: sentence
|sentenceList sentence
sentExpression: expression ';'
|';'
sentSelection: IF '(' expression ')' sentence
When I run that through bison, it reports:
ez.y: warning: 4 shift/reduce conflicts [-Wconflicts-sr]
ez.y: warning: 8 reduce/reduce conflicts [-Wconflicts-rr]
Those conflicts are the actual problem, as indicated by the following warnings ("due to conflicts"):
ez.y:8.18-38: warning: rule useless in parser due to conflicts [-Wother]
|sentenceList sentence
^^^^^^^^^^^^^^^^^^^^^
ez.y:11.18-47: warning: rule useless in parser due to conflicts [-Wother]
sentSelection: IF '(' expression ')' sentence
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When bison finds a conflict in a grammar, it resolves it according to a simple procedure:
shift-reduce conflicts are resolved in favour of the shift
reduce-reduce conflicts are resolved in favour of the production which occurs earlier in the grammar.
Once it does that, it might turn out that some production can no longer ever be used, because it was eliminated from every context in which it might have been reduced. That's a clear sign that the grammar is problematic. [Note 1]
The basic problem here is that sentComposed means that statements can just be strung together to make a longer statement. So what happens if you write:
IF (e) statement1 statement2
It could be that statement1 statement2 is intended to be reduced into a single sentComposed which is the target of the IF, so the two statements execute only if e is true. Or it could be that the sentComposed consists of the IF statement with target statement1, followed by statement2. In C terms, the difference is between:
if (e) { statement1; statement2; }
and
{ if (e) { statement1; } statement2; }
So that's a real ambiguity, and you probably need to rethink the absence of braces in order to fix it.
But that's not the only problem; you also have a bunch of reduce-reduce conflicts. Those come about in a much simpler way, because part of the above grammar is the following loop:
sentence: sentComposed
sentComposed: sentenceList
sentenceList: sentence
That loop means that your grammar allows a single sentence to be wrapped in an arbitrary number of unit reductions. You certainly did not intend that; I'm certain that your intent was that sentComposed only be used if actually necessary. But bison doesn't know your intent; it only knows what you say.
Again, you will probably solve this problem when you figure out how you actually want to identify the boundaries of a sentComposed.
Notes:
In some cases, conflicts are not actually a problem. For example, there is a shift-reduce conflict between these two productions; the so-called "dangling-else" ambiguity:
sentSelection: IF '(' expression ')' sentence
|IF '(' expression ')' sentence ELSE sentence
In a nested IF statement:
IF (e) IF (f) s1 ELSE s2
it's not clear whether the ELSE should apply to the inner or outer IF. If it applies to the inner IF, it must be shifted to allow the second production for sentSelection. If it applies to the outer IF, a reduction must first be performed to complete the inner (else-less) IF before shifting ELSE into the outer IF. Bison's default action ("prefer shift") does exactly the right thing in this case, which is to shift the ELSE immediately. (Indeed, that's why the default was chosen to be "prefer shift").

Related

Ambiguous call expression in ANTLR4 grammar

I have a simple grammar (for demonstration)
grammar Test;
program
: expression* EOF
;
expression
: Identifier
| expression '(' expression? ')'
| '(' expression ')'
;
Identifier
: [a-zA-Z_] [a-zA-Z_0-9?]*
;
WS
: [ \r\t\n]+ -> channel(HIDDEN)
;
Obviously the second and third alternatives in the expression rule are ambiguous. I want to resolve this ambiguity by permitting the second alternative only if an expression is immediately followed by a '('.
So the following
bar(foo)
should match the second alternative while
bar
(foo)
should match the 1st and 3rd alternatives (even if the token between them is in the HIDDEN channel).
How can I do that? I have seen these ambiguities, between call expressions and parenthesized expressions, present in languages that have no (or have optional) expression terminator tokens (or rules) - example
The solution to this is to temporary "unhide" whitespace in your second alternative. Have a look at this question for how this can be done.
With that solution your code could look somthing like this
expression
: Identifier
| {enableWS();} expression '(' {disableWS();} expression? ')'
| '(' expression ')'
;
That way the second alternative matches the input WS-sensitive and will therefore only be matched if the identifier is directly followed by the bracket.
See here for the implementation of the MultiChannelTokenStream that is mentioned in the linked question.

yacc shift-reduce for ambiguous lambda syntax

I'm writing a grammar for a toy language in Yacc (the one packaged with Go) and I have an expected shift-reduce conflict due to the following pseudo-issue. I have to distilled the problem grammar down to the following.
start:
stmt_list
expr:
INT | IDENT | lambda | '(' expr ')' { $$ = $2 }
lambda:
'(' params ')' '{' stmt_list '}'
params:
expr | params ',' expr
stmt:
/* empty */ | expr
stmt_list:
stmt | stmt_list ';' stmt
A lambda function looks something like this:
map((v) { v * 2 }, collection)
My parser emits:
conflicts: 1 shift/reduce
Given the input:
(a)
It correctly parses an expr by the '(' expr ')' rule. However given an input of:
(a) { a }
(Which would be a lambda for the identity function, returning its input). I get:
syntax error: unexpected '{'
This is because when (a) is read, the parser is choosing to reduce it as '(' expr ')', rather than consider it to be '(' params ')'. Given this conflict is a shift-reduce and not a reduce-reduce, I'm assuming this is solvable. I just don't know how to structure the grammar to support this syntax.
EDIT | It's ugly, but I'm considering defining a token so that the lexer can recognize the ')' '{' sequence and send it through as a single token to resolve this.
EDIT 2 | Actually, better still, I'll make lambdas require syntax like ->(a, b) { a * b} in the grammar, but have the lexer emit the -> rather than it being in the actual source code.
Your analysis is indeed correct; although the grammar is not ambiguous, it is impossible for the parser to decide with the input reduced to ( <expr> and with lookahead ) whether or not the expr should be reduced to params before shifting the ) or whether the ) should be shifted as part of a lambda. If the next token were visible, the decision could be made, so the grammar LR(2), which is outside of the competence of go/yacc.
If you were using bison, you could easily solve this problem by requesting a GLR parser, but I don't believe that go/yacc provides that feature.
There is an LR(1) grammar for the language (there is always an LR(1) grammar corresponding to any LR(k) grammar for any value of k) but it is rather annoying to write by hand. The essential idea of the LR(k) to LR(1) transformation is to shift the reduction decisions k-1 tokens forward by accumulating k-1 tokens of context into each production. So in the case that k is 2, each production P: N → α will be replaced with productions TNU → Tα U for each T in FIRST(α) and each U in FOLLOW(N). [See Note 1] That leads to a considerable blow-up of non-terminals in any non-trivial grammar.
Rather than pursuing that idea, let me propose two much simpler solutions, both of which you seem to be quite close to.
First, in the grammar you present, the issue really is simply the need for a two-token lookahead when the two tokens are ){. That could easily be detected in the lexer, and leads to a solution which is still hacky but a simpler hack: Return ){ as a single token. You need to deal with intervening whitespace, etc., but it doesn't require retaining any context in the lexer. This has the added bonus that you don't need to define params as a list of exprs; they can just be a list of IDENT (if that's relevant; a comment suggests that it isn't).
The alternative, which I think is a bit cleaner, is to extend the solution you already seem to be proposing: accept a little too much and reject the errors in a semantic action. In this case, you might do something like:
start:
stmt_list
expr:
INT
| IDENT
| lambda
| '(' expr_list ')'
{ // If $2 has more than one expr, report error
$$ = $2
}
lambda:
'(' expr_list ')' '{' stmt_list '}'
{ // If anything in expr_list is not a valid param, report error
$$ = make_lambda($2, $4)
}
expr_list:
expr | expr_list ',' expr
stmt:
/* empty */ | expr
stmt_list:
stmt | stmt_list ';' stmt
Notes
That's only an outline; the complete algorithm includes the mechanism to recover the original parse tree. If k is greater than 2 then T and U are strings the the FIRSTk-1 and FOLLOWk-1 sets.
If it really is a shift-reduce conflict, and you want only the shift behavior, your parser generator may give you a way to prefer a shift vs. a reduce. This is classically how the conflict for grammar rules for "if-then-stmt" and "if-then-stmt-else-stmt" is resolved, when the if statement can also be a statement.
See http://www.gnu.org/software/bison/manual/html_node/Shift_002fReduce.html
You can get this effect two ways:
a) Count on the accidental behavior of the parsing engine.
If an LALR parser handles shifts first, and then reductions if there are no shifts, then you'll get this "prefer shift" for free. All the parser generator has to do is built the parse tables anyway, even if there is a detected conflict.
b) Enforce the accidental behavior. Design (or a get a) parser generator to accept "prefer shift on token T". Then one can supress the ambiguity. One still have to implement the parsing engine as in a) but that's pretty easy.
I think this is easier/cleaner than abusing the lexer to make strange tokens (and that doesn't always work anyway).
Obviously, you could make a preference for reductions to turn it the other way. With some extra hacking, you could make shift-vs-reduce specific the state in which the conflict occured; you can even make it specific to the pair of conflicting rules but now the parsing engine needs to keep preference data around for nonterminals. That still isn't hard. Finally, you could add a predicate for each nonterminal which is called when a shift-reduce conflict is about to occur, and it have it provide a decision.
The point is you don't have to accept "pure" LALR parsing; you can bend it easily in a variety of ways, if you are willing to modify the parser generator/engine a little bit. This gives a really good reason to understand how these tools work; then you can abuse them to your benefit.

Reduce/reduce conflict in grammar

Let's imagine I want to be able to parse values like this (each line is a separate example):
x
(x)
((((x))))
x = x
(((x))) = x
(x) = ((x))
I've written this YACC grammar:
%%
Line: Binding | Expr
Binding: Pattern '=' Expr
Expr: Id | '(' Expr ')'
Pattern: Id | '(' Pattern ')'
Id: 'x'
But I get a reduce/reduce conflict:
$ bison example.y
example.y: warning: 1 reduce/reduce conflict [-Wconflicts-rr]
Any hint as to how to solve it? I am using GNU bison 3.0.2
Reduce/reduce conflicts often mean there is a fundamental problem in the grammar.
The first step in resolving is to get the output file (bison -v example.y produces example.output). Bison 2.3 says (in part):
state 7
4 Expr: Id .
6 Pattern: Id .
'=' reduce using rule 6 (Pattern)
')' reduce using rule 4 (Expr)
')' [reduce using rule 6 (Pattern)]
$default reduce using rule 4 (Expr)
The conflict is clear; after the grammar reads an x (and reduces that to an Id) and a ), it doesn't know whether to reduce the expression as an Expr or as a Pattern. That presents a problem.
I think you should rewrite the grammar without one of Expr and Pattern:
%%
Line: Binding | Expr
Binding: Expr '=' Expr
Expr: Id | '(' Expr ')'
Id: 'x'
Your grammar is not LR(k) for any k. So you either need to fix the grammar or use a GLR parser.
Suppose the input starts with:
(((((((((((((x
Up to here, there is no problem, because every character has been shifted onto the parser stack.
But now what? At the next step, x must be reduced and the lookahead is ). If there is an = somewhere in the future, x is a Pattern. Otherwise, it is an Expr.
You can fix the grammar by:
getting rid of Pattern and changing Binding to Expr | Expr '=' Expr;
getting rid of all the definitions of Expr and replacing them with Expr: Pattern
The second alternative is probably better in the long run, because it is likely that in the full grammar which you are imagining (or developing), Pattern is a subset of Expr, rather than being identical to Expr. Factoring Expr into a unit production for Pattern and the non-Pattern alternatives will allow you to parse the grammar with an LALR(1) parser (if the rest of the grammar conforms).
Or you can use a GLR grammar, as noted above.

Why if I add lambda left recursion occurs?

I am trying to write if syntax by using flex bison and in parser I have a problem
here is a grammar for if syntax in cpp
program : //start rule
|statements;
block:
TOKEN_BEGIN statements ';'TOKEN_END;
reexpression:
| TOKEN_OPERATOR expression;
expression: //x+2*a*3
TOKEN_ID reexpression
| TOKEN_NUMBER reexpression;
assignment:
TOKEN_ID'='expression
statement:
assignment;
statements:
statement';'
| block
| if_statement;
else_statement:
TOKEN_ELSE statements ;
else_if_statement:
TOKEN_ELSE_IF '(' expression ')' statements;
if_statement:
TOKEN_IF '(' expression ')' statements else_if_statement else_statement;
I can't understand why if I replace these three rules , left recursion happen I just add lambda to These rules
else_statement:
|TOKEN_ELSE statements ;
else_if_statement:
|TOKEN_ELSE_IF '(' expression ')' statements;
if_statement:
TOKEN_IF '(' expression ')' statements else_if_statement else_statement;
please help me understand.
There's no lambda or left-recursion involved.
When you add epsilon to the if rules (making the else optional), you get conflicts, because the resulting grammar is ambiguous. This is the classic dangling else ambiguity where when you have TWO ifs with a single else, the else can bind to either if.
IF ( expr1 ) IF ( expr2 ) block1 ELSE block2

Shift/reduce conflict in yacc due to look-ahead token limitation?

I've been trying to tackle a seemingly simple shift/reduce conflict with no avail. Naturally, the parser works fine if I just ignore the conflict, but I'd feel much safer if I reorganized my rules. Here, I've simplified a relatively complex grammar to the single conflict:
statement_list
: statement_list statement
|
;
statement
: lvalue '=' expression
| function
;
lvalue
: IDENTIFIER
| '(' expression ')'
;
expression
: lvalue
| function
;
function
: IDENTIFIER '(' ')'
;
With the verbose option in yacc, I get this output file describing the state with the mentioned conflict:
state 2
lvalue -> IDENTIFIER . (rule 5)
function -> IDENTIFIER . '(' ')' (rule 9)
'(' shift, and go to state 7
'(' [reduce using rule 5 (lvalue)]
$default reduce using rule 5 (lvalue)
Thank you for any assistance.
The problem is that this requires 2-token lookahead to know when it has reached the end of a statement. If you have input of the form:
ID = ID ( ID ) = ID
after parser shifts the second ID (lookahead is (), it doesn't know whether that's the end of the first statement (the ( is the beginning of a second statement), or this is a function. So it shifts (continuing to parse a function), which is the wrong thing to do with the example input above.
If you extend function to allow an argument inside the parenthesis and expression to allow actual expressions, things become worse, as the lookahead required is unbounded -- the parser needs to get all the way to the second = to determine that this is not a function call.
The basic problem here is that there's no helper punctuation to aid the parser in finding the end of a statement. Since text that is the beginning of a valid statement can also appear in the middle of a valid statement, finding statement boundaries is hard.

Resources