Solving shift/reduce conflicts - parsing

I'm using PLY to parse this grammar. I implemented a metagrammar for EBNF used in the linked spec, but PLY reports multiple shift/reduce conflicts.
Grammar:
Rule 0 S' -> grammar
Rule 1 grammar -> prod_list
Rule 2 grammar -> empty
Rule 3 prod_list -> prod
Rule 4 prod_list -> prod prod_list
Rule 5 prod -> id : : = rule_list
Rule 6 rule_list -> rule
Rule 7 rule_list -> rule rule_list
Rule 8 rule -> rule_simple
Rule 9 rule -> rule_group
Rule 10 rule -> rule_opt
Rule 11 rule -> rule_rep0
Rule 12 rule -> rule_rep1
Rule 13 rule -> rule_alt
Rule 14 rule -> rule_except
Rule 15 rule_simple -> terminal
Rule 16 rule_simple -> id
Rule 17 rule_simple -> char_range
Rule 18 rule_group -> ( rule_list )
Rule 19 rule_opt -> rule_simple ?
Rule 20 rule_opt -> rule_group ?
Rule 21 rule_rep0 -> rule_simple *
Rule 22 rule_rep0 -> rule_group *
Rule 23 rule_rep1 -> rule_simple +
Rule 24 rule_rep1 -> rule_group +
Rule 25 rule_alt -> rule | rule
Rule 26 rule_except -> rule - rule_simple
Rule 27 rule_except -> rule - rule_group
Rule 28 terminal -> SQ string_no_sq SQ
Rule 29 terminal -> DQ string_no_dq DQ
Rule 30 string_no_sq -> LETTER string_no_sq
Rule 31 string_no_sq -> DIGIT string_no_sq
Rule 32 string_no_sq -> SYMBOL string_no_sq
Rule 33 string_no_sq -> DQ string_no_sq
Rule 34 string_no_sq -> + string_no_sq
Rule 35 string_no_sq -> * string_no_sq
Rule 36 string_no_sq -> ( string_no_sq
Rule 37 string_no_sq -> ) string_no_sq
Rule 38 string_no_sq -> ? string_no_sq
Rule 39 string_no_sq -> | string_no_sq
Rule 40 string_no_sq -> [ string_no_sq
Rule 41 string_no_sq -> ] string_no_sq
Rule 42 string_no_sq -> - string_no_sq
Rule 43 string_no_sq -> : string_no_sq
Rule 44 string_no_sq -> = string_no_sq
Rule 45 string_no_sq -> empty
Rule 46 string_no_dq -> LETTER string_no_dq
Rule 47 string_no_dq -> DIGIT string_no_dq
Rule 48 string_no_dq -> SYMBOL string_no_dq
Rule 49 string_no_dq -> SQ string_no_dq
Rule 50 string_no_dq -> + string_no_dq
Rule 51 string_no_dq -> * string_no_dq
Rule 52 string_no_dq -> ( string_no_dq
Rule 53 string_no_dq -> ) string_no_dq
Rule 54 string_no_dq -> ? string_no_dq
Rule 55 string_no_dq -> | string_no_dq
Rule 56 string_no_dq -> [ string_no_dq
Rule 57 string_no_dq -> ] string_no_dq
Rule 58 string_no_dq -> - string_no_dq
Rule 59 string_no_dq -> : string_no_dq
Rule 60 string_no_dq -> = string_no_dq
Rule 61 string_no_dq -> empty
Rule 62 id -> LETTER LETTER id
Rule 63 id -> LETTER DIGIT id
Rule 64 id -> LETTER
Rule 65 id -> DIGIT
Rule 66 rest_of_id -> LETTER rest_of_id
Rule 67 rest_of_id -> DIGIT rest_of_id
Rule 68 rest_of_id -> empty
Rule 69 char_range -> [ UNI_CH - UNI_CH ]
Rule 70 empty -> <empty>
Conflicts:
id : LETTER LETTER id
| LETTER DIGIT id
| LETTER
| DIGIT
.
state 4
(62) id -> LETTER . LETTER id
(63) id -> LETTER . DIGIT id
(64) id -> LETTER .
! shift/reduce conflict for LETTER resolved as shift
! shift/reduce conflict for DIGIT resolved as shift
LETTER shift and go to state 10
DIGIT shift and go to state 9
| reduce using rule 64 (id -> LETTER .)
- reduce using rule 64 (id -> LETTER .)
( reduce using rule 64 (id -> LETTER .)
SQ reduce using rule 64 (id -> LETTER .)
DQ reduce using rule 64 (id -> LETTER .)
[ reduce using rule 64 (id -> LETTER .)
$end reduce using rule 64 (id -> LETTER .)
) reduce using rule 64 (id -> LETTER .)
: reduce using rule 64 (id -> LETTER .)
? reduce using rule 64 (id -> LETTER .)
* reduce using rule 64 (id -> LETTER .)
+ reduce using rule 64 (id -> LETTER .)
! LETTER [ reduce using rule 64 (id -> LETTER .) ]
! DIGIT [ reduce using rule 64 (id -> LETTER .) ]
The id rule is supposed to guarantee that productions' ids start with a letter.
Next conflict:
rule_alt : rule '|' rule
.
state 113
(25) rule_alt -> rule | rule .
(25) rule_alt -> rule . | rule
(26) rule_except -> rule . - rule_simple
(27) rule_except -> rule . - rule_group
! shift/reduce conflict for | resolved as shift
! shift/reduce conflict for - resolved as shift
( reduce using rule 25 (rule_alt -> rule | rule .)
SQ reduce using rule 25 (rule_alt -> rule | rule .)
DQ reduce using rule 25 (rule_alt -> rule | rule .)
LETTER reduce using rule 25 (rule_alt -> rule | rule .)
DIGIT reduce using rule 25 (rule_alt -> rule | rule .)
[ reduce using rule 25 (rule_alt -> rule | rule .)
) reduce using rule 25 (rule_alt -> rule | rule .)
$end reduce using rule 25 (rule_alt -> rule | rule .)
| shift and go to state 76
- shift and go to state 74
! | [ reduce using rule 25 (rule_alt -> rule | rule .) ]
! - [ reduce using rule 25 (rule_alt -> rule | rule .) ]
Connected to a smiliar one:
rule_except : rule '-' rule_simple
| rule '-' rule_group
How do I fix these?

You really should think seriously about using the usual scanner/parser architecture. Otherwise, you will have to find a way to deal with whitespace.
As it is, you seem to be ignoring whitespace altogether. That means that the parser cannot see the whitespace between three consecutive identifiers. It will see them run together as asoupofundifferentiatedletters, and it has no way to know what the original intent was. This makes your grammar deeply ambiguous, because in the grammar two identifiers can follow each other on the assumption that something will cause them to be differentiated from each other. And ambiguous grammars always result in LR conflicts.
Having the identifiers (and other multi-character tokens) recognized by the lexer is much easier. Otherwise, you will have to rewrite your grammar to identify all the places where whitespace is allowed (such as around the punctuation in (identifer1|identifier2)) or required (such as two identifiers).
Identifying identifiers in the scanner using regular expressions will also remove the other problems with your grammar and identifiers:
id -> LETTER LETTER id
id -> LETTER DIGIT id
id -> LETTER
These rules require id to be an odd number of characters, where the digits only appear in even positions. So a1b would be an id, but not ab1 or ab or a1. I'm sure that's not what you meant.
You seem to be trying to avoid left-recursion. Instead, you should embrace left-recursion. Bottom-up parsers, like PLY, love left-recursion. (They handle right-recursion, but at the cost of excessive parser stack usage.) So what you really want is:
id: LETTER | id LETTER | id DIGIT
There are other places in the grammar where similar changes are necessary.
The other conflict is caused by your unorthodox handling of operator precedence, which might also be a result of your attempt to avoid left-recursion. The EBNF operators can be parsed with a simple precedence scheme, as with algebraic operators. However, the use of precedence declarations (%left and friends) will be complicated because of the "invisible" concatenation operator. Generally, you'll find it easier to use explicit precedence as in the standard expr/factor/term algebraic grammar. In your case, the equivalent would be something like:
item: id
| terminal
| '(' rule ')'
term: item
| item '*'
| item '+'
| item '?'
seq : term
| seq term
alt : seq
| alt '|' seq
except: term '-' term
rule: alt
| except
The handling of except in the above corresponds to the lack of information about the precedence of the - operator. That's expressed by effectively disallowing any mix of - and | operators without explicit parentheses.
You will also find that you have a shift/reduce conflict here:
# The following will create a problem
prod: id "::=" rule
prod_list
: prod
| prod_list prod
(NOTE: the fact that I wrote that with left-recursion does not create the problem.)
That is not ambiguous, but it is not left-to-right parseable with a single lookahead token. It requires two tokens, because you cannot know whether or not the id is part of the currently-being-parsed sequence, or the beginning of a new production until you see the token after the id: if it is ::=, then the id was the start of a new production and should not be shifted into the current rule. The usual solution to that problem is a hack in the lexer: the lexer is wrapped by a function which keeps one extra token of lookahead, so that it can emit id ::= as a single token of type definition. There are a number of examples of this hack for various LR parsers in other SO questions.
Having said all of that, I really don't understand why you want to build a parser for EBNF in order to parse XML. Building a working parser from EBNF is basically what PLY does, except that it doesn't implement the "E" part, so you have to rewrite rules which use the ?, *, + and - operators. This can be handled automatically, although the - operator is non-trivial in general, but it is not going to be simple. It would be easier, IMHO, to rewrite the few EBNF rules into BNF and then just use PLY. But if you are looking for a challenge, go for it.

First of all, you have apparently slavishly translated the grammar. You need to tokenize the input stream.
Normally, something like id would be a terminal to be discerned by the lexical analyzer, rather than parsed as part of the grammar
id : LETTER LETTER id
| LETTER DIGIT id
| LETTER
| DIGIT
It looks like everything you have under terminal should not be part of the grammar.
Second, you use right recursion in your grammar. While LALR works with both left and right recursion, you get smaller tables with left recursion.
Suppose you have the input string AA
If you were to insist on parsing identifiers, you'd want something more like
id : id LETTER
| id DIGIT
| LETTER
Finally, Shift-Reduce conflicts are not necessarily based. They frequently occur in numeric expressions to be resolved by operator precedent.
Reduce-Reduce conflicts are always bad.

Related

Shift/reduce conflict with ambiguous grammar

I've been stuck with some ambiguous grammar for a while now as yacc reports 6 shift/reduce conflicts. I've looked in the y.output file and have tried to understand how to look at the states and figure out what to do to fix the ambiguous grammar but to no avail. I'm legitimately stuck at how I'm supposed to fix the issues. I've looked at a lot of questions on stack overflow to see if other people's explanation would help me with my problem, but that hasn't helped me much either. For the record, I cannot use any precedence defining directives such as %left to solve the parsing conflicts.
Would someone be able to help me out by guiding me as to how I should change the grammar to fix the shift/reduce conflicts? Maybe by trying to resolve one of the issues and showing me the thinking process behind it? I know the grammar is quite long and hefty and I apologize in advance for that. If anyone is willing to spare their free time on this it would be greatly appreciated, but I realize that I may not be able to have that.
Anyways, here is my grammar in question (it is a slight expansion of the MiniJava grammar):
Grammar
0 $accept: program $end
1 program: main_class class_decl_list
2 main_class: CLASS ID '{' PUBLIC STATIC VOID MAIN '(' STRING '[' ']' ID ')' '{' statement '}' '}'
3 class_decl_list: class_decl_list class_decl
4 | %empty
5 class_decl: CLASS ID '{' var_decl_list method_decl_list '}'
6 | CLASS ID EXTENDS ID '{' var_decl_list method_decl_list '}'
7 var_decl_list: var_decl_list var_decl
8 | %empty
9 method_decl_list: method_decl_list method_decl
10 | %empty
11 var_decl: type ID ';'
12 method_decl: PUBLIC type ID '(' formal_list ')' '{' var_decl_list statement_list RETURN exp ';' '}'
13 formal_list: type ID formal_rest_list
14 | %empty
15 formal_rest_list: formal_rest_list formal_rest
16 | %empty
17 formal_rest: ',' type ID
18 type: INT
19 | BOOLEAN
20 | ID
21 | type '[' ']'
22 statement: '{' statement_list '}'
23 | IF '(' exp ')' statement ELSE statement
24 | WHILE '(' exp ')' statement
25 | SOUT '(' exp ')' ';'
26 | SOUT '(' STRING_LITERAL ')' ';'
27 | ID '=' exp ';'
28 | ID index '=' exp ';'
29 statement_list: statement_list statement
30 | %empty
31 index: '[' exp ']'
32 | index '[' exp ']'
33 exp: exp OP exp
34 | '!' exp
35 | '+' exp
36 | '-' exp
37 | '(' exp ')'
38 | ID index
39 | ID '.' LENGTH
40 | ID index '.' LENGTH
41 | INTEGER_LITERAL
42 | TRUE
43 | FALSE
44 | object
45 | object '.' ID '(' exp_list ')'
46 object: ID
47 | THIS
48 | NEW ID '(' ')'
49 | NEW type index
50 exp_list: exp exp_rest_list
51 | %empty
52 exp_rest_list: exp_rest_list exp_rest
53 | %empty
54 exp_rest: ',' exp
And here are the relevant states from y.output that have shift/reduce conflicts.
State 58
7 var_decl_list: var_decl_list . var_decl
12 method_decl: PUBLIC type ID '(' formal_list ')' '{' var_decl_list . statement_list RETURN exp ';' '}'
INT shift, and go to state 20
BOOLEAN shift, and go to state 21
ID shift, and go to state 22
ID [reduce using rule 30 (statement_list)]
$default reduce using rule 30 (statement_list)
var_decl go to state 24
type go to state 25
statement_list go to state 69
State 76
38 exp: ID . index
39 | ID . '.' LENGTH
40 | ID . index '.' LENGTH
46 object: ID .
'[' shift, and go to state 64
'.' shift, and go to state 97
'.' [reduce using rule 46 (object)]
$default reduce using rule 46 (object)
index go to state 98
State 100
33 exp: exp . OP exp
34 | '!' exp .
OP shift, and go to state 103
OP [reduce using rule 34 (exp)]
$default reduce using rule 34 (exp)
State 101
33 exp: exp . OP exp
35 | '+' exp .
OP shift, and go to state 103
OP [reduce using rule 35 (exp)]
$default reduce using rule 35 (exp)
State 102
33 exp: exp . OP exp
36 | '-' exp .
OP shift, and go to state 103
OP [reduce using rule 36 (exp)]
$default reduce using rule 36 (exp)
State 120
33 exp: exp . OP exp
33 | exp OP exp .
OP shift, and go to state 103
OP [reduce using rule 33 (exp)]
$default reduce using rule 33 (exp)
And there we have it. I apologize again for the length of this grammar and the number of shift/reduce conflicts. I just cannot seem to understand how to fix them by changing the grammar in question. Any help would be thoroughly appreciated, though if no one has time to look through such a massive post, I would understand. If anyone needs more information, don't hesitate to ask.
The basic problem is that when parsing a method_decl body, it can't tell where the var_decl_list ends and the statement_list begins. This is because when the lookahead is ID, it doesn't know whether that is the start of another var_decl or the start of the first statement, and it needs to reduce an empty statement before it can start working on a statement_list.
There are a number of ways you can deal with this:
have the lexer return different tokens for type IDs and other IDs -- that way the difference will tell the parser which is next.
don't require an empty statement at the start of a statement list. Change the grammar to:
statement_list: statement | statement_list statement ;
opt_statement_list: statement_list | %empty ;
and use opt_statement_list in the method_decl rule. This gets around the problem of having to reduce an empty statement_list before you start parsing statements. This is a process known as "unfactoring" the grammar as you are replacing rules with multiple variations. It makes the grammar more complex, and in this case, doesn't solve the problem, it just moves it; you'll then see shift/reduce conflicts betweeen statement: ID . index and type: ID on a [ lookahead. This problem can also be solved by unfactoring, but is harder.
So this brings up the general idea of resolving shift-reduce conflicts by unfactoring. The basic idea is to get rid of the rule causing the reduce half of the shift reduce conflict, replacing it with rules that are more limited in context, so don't trigger the conflict. The example above is easily solved by the "replace a 0-or-more recursive repeat with a 1-or-more recursive repeat and an optional rule". This works well for shift-reduce conflicts on the epsilon rule of the repeat if the following context means you can easily resolve when the 0-case should be legal (only when the next token is } in this case.)
The second conflict is tougher. Here the conflict is on reducing type: ID when the lookahead is [. So we need to duplicate type rules until that is not necessary. Something like:
type: simpleType | arrayType ;
simpleType: INT | BOOLEAN | ID ;
arrayType: INT '[' ']' | BOOLEAN '[' ']' | ID '[' ']'
| arrayType '[' ']' ;
replaces the "0 or more repetitions of the '[' ']' suffix" with "1 or more" and works for similar reasons (defers the reduction until after seeing the '[' ']' instead of requiring it before.) The key being that the simpleType: ID rule never needs to be reduced when the lookahead is '[' as it is only valid in other contexts.

Explain this shift-reduce conflict

Below is a very simplified XML grammar for Bison:
head : NODE_START NAME atts
| NODE_START NAME
;
element : head NODE_CLOSE NODE_END
| head NODE_END anys NODE_START NODE_CLOSE NAME NODE_END
| head NODE_END NODE_START NODE_CLOSE NAME NODE_END
;
text : TEXT
;
comment : NODE_START COMMENT_START COMMENT_END NODE_END
;
cdata : NODE_START CDATA_START CDATA_END NODE_END
;
attr : NAME EQUALS value
;
value : QUOTED
| APOSED
;
atts : attr atts
| attr
elt : element
| comment
| cdata
any : elt
| text
;
elts : elt elts
| elt
;
anys : text elts anys
| elts
| text
;
s : any
| PROLOG any
;
The alleged conflict is the rule anys -> text.
When I look at the corresponding output:
State 35
21 anys: text elts . anys
NODE_START shift, and go to state 1
TEXT shift, and go to state 2
head go to state 4
element go to state 5
text go to state 25
comment go to state 7
cdata go to state 8
elt go to state 26
elts go to state 27
anys go to state 42
How do I understand what is at conflict here?
1. Interpreting the dump file
If you look at the beginning of the .output file, you will see the following:
Rules useless in parser due to conflicts
23 anys: text
State 26 conflicts: 1 shift/reduce
State 27 conflicts: 1 shift/reduce
The first warning tells you that the production anys: text was eliminated altogether because the resolution of parsing conflicts (elsewhere in the grammar) made it impossible for the rule to ever be used. (Thus, it is "useless".) The next two lines tell you where to find the conflicts: in states 26 and 27.
So the rule you quote is not the "alleged conflict" and the state you quote has nothing to do with conflicts (indeed, I have no idea why you focused on it.)
In the states with conflicts, you will see, for example:
State 26
21 anys: text . elts anys
23 | text .
NODE_START shift, and go to state 1
NODE_START [reduce using rule 23 (anys)]
head go to state 4
element go to state 5
comment go to state 7
cdata go to state 8
elt go to state 27
elts go to state 35
The conflict is indicated by a lookahead (in this case NODE_START) with two or more different actions. The action(s) enclosed in brackets (in this case [reduce using rule 23 (anys)]) were eliminated by bison's conflict resolution mechanism (which, in the absence of precedence declarations, chooses the shift action if there is one, and otherwise the reduce action with the smallest production number).
The state dumps should make it clear why the rule anys: text became useless. In both cases where it could be reduced, there was a shift-reduce conflict and the shift action was preferred.
2. Cause of the shift-reduce conflict
The problem is anys: text elts anys. Consider an input consisting of three elts. This could be parsed as an elts consisting of two elts followed by an elts consisting of a single elt, or vice versa. The ambiguity causes a shift-reduce conflict.
Another problem with that production is that it does not permit an elts to end with a text (unless it consists only of a single text.
A better definition would be the simple
anys: any | anys any
Note: you are using a bottom-up parser and right recursion is (literally) an anti-pattern. Writing your lists left-recursively as above will limit parser stack usage and cause senantic actions to run in the expected order (that is, left to right). Unless you have very specific needs, you should avoid right recursion.

yacc shift/reduce in parser

I am writing a parser for a compiler in one homework and when I am running the command
$ bison --yacc -v --defines -o parser.c parser.y
parser.y: warning: 8 shift/reduce conflicts [-Wconflicts-sr]
$
Except of the if/else shift/reduce conflict which is expected I am taking in parser.output file conflicts in the following states,
State 34
35 term: lvalue . PLUSPLUS
37 | lvalue . MINUSMINUS
39 assignexpr: lvalue . ASSIGN expr
40 primary: lvalue .
49 member: lvalue . FULLSTOP IDENTIFIER
50 | lvalue . LEFTSQUARE expr RIGHTSQUARE
54 call: lvalue . callsuffix
ASSIGN shift, and go to state 88
PLUSPLUS shift, and go to state 89
MINUSMINUS shift, and go to state 90
LEFTSQUARE shift, and go to state 91
FULLSTOP shift, and go to state 92
LEFTPAR shift, and go to state 93
PLUSPLUS [reduce using rule 40 (primary)]
MINUSMINUS [reduce using rule 40 (primary)]
LEFTSQUARE [reduce using rule 40 (primary)]
LEFTPAR [reduce using rule 40 (primary)]
$default reduce using rule 40 (primary)
callsuffix go to state 94
normcall go to state 95
methodcall go to state 96
State 36
41 primary: call .
51 member: call . FULLSTOP IDENTIFIER
52 | call . LEFTSQUARE expr RIGHTSQUARE
53 call: call . LEFTPAR elist RIGHTPAR
LEFTSQUARE shift, and go to state 97
FULLSTOP shift, and go to state 98
LEFTPAR shift, and go to state 99
LEFTSQUARE [reduce using rule 41 (primary)]
LEFTPAR [reduce using rule 41 (primary)]
$default reduce using rule 41 (primary)
State 52
16 expr: expr . PLUS expr
17 | expr . MINUS expr
18 | expr . MUL expr
19 | expr . DIV expr
20 | expr . MOD expr
21 | expr . GREATER expr
22 | expr . GREATER_EQUAL expr
23 | expr . LESS expr
24 | expr . LESS_EQUAL expr
25 | expr . EQUAL expr
26 | expr . NOTEQUAL expr
27 | expr . AND expr
28 | expr . OR expr
95 returnstmt: RETURN expr .
PLUS shift, and go to state 74
MINUS shift, and go to state 75
MUL shift, and go to state 76
DIV shift, and go to state 77
MOD shift, and go to state 78
EQUAL shift, and go to state 79
NOTEQUAL shift, and go to state 80
OR shift, and go to state 81
AND shift, and go to state 82
GREATER shift, and go to state 83
LESS shift, and go to state 84
GREATER_EQUAL shift, and go to state 85
LESS_EQUAL shift, and go to state 86
MINUS [reduce using rule 95 (returnstmt)]
$default reduce using rule 95 (returnstmt)
Any idea how to solve it?
Its difficult to say what the problem is as you don't show your full grammar. Often times conflicts are due to other rules in the grammar (not the rules shown in the states with the conflict) due to the context, or how the rules are combined.
state 34/36:
It looks like you have some sort of circular ambiguity between the rules for primary, lvalue, and call. What are these (full) rules? How do expect to know the difference between an lvalue and a primary?
state 52: Here it looks like an ambiguity between a returnstmt and a following expression that begins with a MINUS. It looks like you do not have a statement terminator/separator?
These all might be the same underlying problem -- the parser can't figure out where one statement ends and the next begins...

Rewriting Bison grammar to fix shift/reduce conflicts

Here are the relevant parts of my Bison grammar rules:
statement:
expression ';' |
IF expression THEN statement ELSE statement END_IF ';'
;
expression:
IDENTIFIER |
IDENTIFIER '('expressions')' |
LIT_INT |
LIT_REAL |
BOOL_OP |
LOG_NOT expression |
expression operator expression |
'('expression')'
;
expressions:
expression |
expressions ',' expression
;
operator:
REL_OP |
ADD_OP |
MULT_OP |
LOG_OR |
LOG_AND
;
When compiling, I get 10 shift/reduce conflicts:
5 conflicts are caused by the LOG_NOT expression rule:
State 45
25 expression: LOG_NOT expression .
26 | expression . operator expression
REL_OP shift, and go to state 48
ADD_OP shift, and go to state 49
MULT_OP shift, and go to state 50
LOG_OR shift, and go to state 51
LOG_AND shift, and go to state 52
REL_OP [reduce using rule 25 (expression)]
ADD_OP [reduce using rule 25 (expression)]
MULT_OP [reduce using rule 25 (expression)]
LOG_OR [reduce using rule 25 (expression)]
LOG_AND [reduce using rule 25 (expression)]
$default reduce using rule 25 (expression)
operator go to state 54
5 conflicts are caused by the expressions operator expression rule:
State 62
26 expression: expression . operator expression
26 | expression operator expression .
REL_OP shift, and go to state 48
ADD_OP shift, and go to state 49
MULT_OP shift, and go to state 50
LOG_OR shift, and go to state 51
LOG_AND shift, and go to state 52
REL_OP [reduce using rule 26 (expression)]
ADD_OP [reduce using rule 26 (expression)]
MULT_OP [reduce using rule 26 (expression)]
LOG_OR [reduce using rule 26 (expression)]
LOG_AND [reduce using rule 26 (expression)]
$default reduce using rule 26 (expression)
operator go to state 54
I know that the problem has to do with precedence. For instance, if the expression was:
a + b * c
Does Bison shift after the a + and hope to find an expression, or does it reduce the a to an expression? I have a feeling that this is due to the Bison 1-token look-ahead limitation, but I can't figure out how to rewrite the rule(s) to resolve the conflicts.
My professor will take points off for shift/reduce conflicts, so I can't use %expect. My professor has also stated that we cannot use %left or %right precedence values.
This is my first post on Stack, so please let me know if I'm posting this all wrong. I've searched existing posts, but this really seems a case-by-case thing. If I use any code from Stack, I will note the source in my submitted project.
Thanks!
As written, your grammar is ambiguous. So it must have conflicts.
There is no inherent rule of binding precedences, and apparently you're not allowed to use bison's precedence declarations either. If you were allowed to, you wouldn't be able to use operator as a non-terminal, because you need to distinguish between
expr1 + expr2 * expr3 expr1 * expr2 + expr3
| | | | | |
| +---+---+ +---+---+ |
| | | |
| expr expr |
| | | |
+-----+-----+ +-----+-----+
| |
expr expr
And you cannot distinguish between them if + and * are replaced with operator. The terminals actually have to be visible.
Now, here's a quick clue:
expr1 + expr2 + expr3 reduces expr1 + expr2 first
expr1 * expr2 + expr3 reduces expr1 * expr2 first
So in non-terminal-1 + non-terminal-2, non-terminal-1 cannot produce x + y or x * y. But in non-terminal-1 * non-terminal-2, non-terminal-1 can produce `x + y
Thanks! I did some more troubleshooting, and fixed the reduce/conflict errors by rewriting the expression and operator rules:
expression:
expression LOG_OR term1 |
term1
;
term1:
term1 LOG_AND term2 |
term2
;
term2:
term2 REL_OP term3 |
term3
;
term3:
term3 ADD_OP term4 |
term4
;
term4:
term4 MULT_OP factor |
factor
;
factor:
IDENTIFIER |
IDENTIFIER '('expressions')' |
LIT_INT |
LIT_REAL |
BOOL_OP |
LOG_NOT factor |
'('expression')'
;
expressions:
expression |
expressions ',' expression
;
I had to rearrange what was actually an expression and what was actually a factor. I made a factor rule that includes all of the factors (terminals), which has the highest precedence. I then made a term# rule for each expression, which also sets them at different precedence levels (term5 has higher precedence than term4, term4 has higher precedence than term3, etc.).
This allowed me to set each operator at a difference precedence without using any of the built-in % precedence functions.
I was able to parse all of my test input files without error. Any thoughts on the design?

i am trying to build a parser for mini java where i am getting shift/reduce conflicts in the expression grammar part.I can't resolve this conflict

this is part of y.ouput file
state 65
15 Expression: Expression . "&&" Expression
16 | Expression . "<" Expression
17 | Expression . "+" Expression
18 | Expression . "-" Expression
19 | Expression . "*" Expression
20 | Expression . "[" Expression "]"
21 | Expression . "." "length"
22 | Expression . "." Identifier "(" Expression "," Expression ")"
25 | "!" Expression .
"[" shift, and go to state 67
"<" shift, and go to state 69
"+" shift, and go to state 70
"-" shift, and go to state 71
"*" shift, and go to state 72
"." shift, and go to state 73
"[" [reduce using rule 25 (Expression)]
"<" [reduce using rule 25 (Expression)]
"+" [reduce using rule 25 (Expression)]
"-" [reduce using rule 25 (Expression)]
"*" [reduce using rule 25 (Expression)]
"." [reduce using rule 25 (Expression)]
$default reduce using rule 25 (Expression)
this is how the precedence of operators is set
%left "&&"
%left '<'
%left '-' '+'
%left '*'
%right '!'
%left '.'
%left '(' ')'
%left '[' ']'
In bison, there is a difference between "x" and 'x'; they are not the same token. So, assuming you are using bison, your precedence declarations don't refer to the terminals in the productions.
Bison also allows %token definitions of the following form:
%token name quoted-string ...
For example (a short excerpt from bison's own grammar file):
%token
PERCENT_CODE "%code"
PERCENT_DEBUG "%debug"
PERCENT_DEFAULT_PREC "%default-prec"
PERCENT_DEFINE "%define"
PERCENT_DEFINES "%defines"
PERCENT_ERROR_VERBOSE "%error-verbose"
Once the symbols have been aliased, they can be used interchangeably in the grammar, making it possible to use the double-quoted string in productions; some people find such grammars easier to read. However, there is no mechanism to ensure that the lexer produces the correct token number for a double-quoted string since it only has access to the token names.
The "original" yacc, at least in the current "byacc" version maintained by Thomas Dickey, allows both single- and double-quoted token names, but does not distinguish between them; both "+" and '+' are mapped to token number 43 ('+'). It also does not provide any easy way to alias token names, so the double-quoted multi-character strings are not particularly easy to use in a reliable way.

Resources