In order to learn Lex/Yacc, I'm writing a CSV parser following the grammar specified on Page 3 of RFC 4180.
I've run into a "reduce/reduce conflict," and I'm not sure how to progress. It seems to be a conflict between Rules 1 and 3 of my grammar, but I don't know of any other way to describe a CSV with or without a line break following the last record. Also, when I remove Rule 10 (the empty field rule) the reduce/reduce conflict disappears; however, I need to handle empty fields.
What is the issue with my grammar and how should I correct it?
Yacc Source
%token COMMA
%token DQUOTE
%token CRLF
%token TEXTDATA
%%
file: records CRLF
| records;
records: records CRLF record
| record;
record: fields;
fields: fields COMMA field
| field;
field: DQUOTE escaped DQUOTE
| TEXTDATA
| ;
escaped: escaped TEXTDATA
| escaped COMMA
| escaped CRLF
| escaped DQUOTE DQUOTE
| TEXTDATA
| COMMA
| CRLF
| DQUOTE DQUOTE;
yacc -v Output
State 14 conflicts: 1 reduce/reduce
Grammar
0 $accept: file $end
1 file: records CRLF
2 | records
3 records: records CRLF record
4 | record
5 record: fields
6 fields: fields COMMA field
7 | field
8 field: DQUOTE escaped DQUOTE
9 | TEXTDATA
10 | /* empty */
11 escaped: escaped TEXTDATA
12 | escaped COMMA
13 | escaped CRLF
14 | escaped DQUOTE DQUOTE
15 | TEXTDATA
16 | COMMA
17 | CRLF
18 | DQUOTE DQUOTE
Terminals, with rules where they appear
$end (0) 0
error (256)
COMMA (258) 6 12 16
DQUOTE (259) 8 14 18
CRLF (260) 1 3 13 17
TEXTDATA (261) 9 11 15
Nonterminals, with rules where they appear
$accept (7)
on left: 0
file (8)
on left: 1 2, on right: 0
records (9)
on left: 3 4, on right: 1 2 3
record (10)
on left: 5, on right: 3 4
fields (11)
on left: 6 7, on right: 5 6
field (12)
on left: 8 9 10, on right: 6 7
escaped (13)
on left: 11 12 13 14 15 16 17 18, on right: 8 11 12 13 14
state 0
0 $accept: . file $end
DQUOTE shift, and go to state 1
TEXTDATA shift, and go to state 2
$default reduce using rule 10 (field)
file go to state 3
records go to state 4
record go to state 5
fields go to state 6
field go to state 7
state 1
8 field: DQUOTE . escaped DQUOTE
COMMA shift, and go to state 8
DQUOTE shift, and go to state 9
CRLF shift, and go to state 10
TEXTDATA shift, and go to state 11
escaped go to state 12
state 2
9 field: TEXTDATA .
$default reduce using rule 9 (field)
state 3
0 $accept: file . $end
$end shift, and go to state 13
state 4
1 file: records . CRLF
2 | records .
3 records: records . CRLF record
CRLF shift, and go to state 14
$default reduce using rule 2 (file)
state 5
4 records: record .
$default reduce using rule 4 (records)
state 6
5 record: fields .
6 fields: fields . COMMA field
COMMA shift, and go to state 15
$default reduce using rule 5 (record)
state 7
7 fields: field .
$default reduce using rule 7 (fields)
state 8
16 escaped: COMMA .
$default reduce using rule 16 (escaped)
state 9
18 escaped: DQUOTE . DQUOTE
DQUOTE shift, and go to state 16
state 10
17 escaped: CRLF .
$default reduce using rule 17 (escaped)
state 11
15 escaped: TEXTDATA .
$default reduce using rule 15 (escaped)
state 12
8 field: DQUOTE escaped . DQUOTE
11 escaped: escaped . TEXTDATA
12 | escaped . COMMA
13 | escaped . CRLF
14 | escaped . DQUOTE DQUOTE
COMMA shift, and go to state 17
DQUOTE shift, and go to state 18
CRLF shift, and go to state 19
TEXTDATA shift, and go to state 20
state 13
0 $accept: file $end .
$default accept
state 14
1 file: records CRLF .
3 records: records CRLF . record
DQUOTE shift, and go to state 1
TEXTDATA shift, and go to state 2
$end reduce using rule 1 (file)
$end [reduce using rule 10 (field)]
$default reduce using rule 10 (field)
record go to state 21
fields go to state 6
field go to state 7
state 15
6 fields: fields COMMA . field
DQUOTE shift, and go to state 1
TEXTDATA shift, and go to state 2
$default reduce using rule 10 (field)
field go to state 22
state 16
18 escaped: DQUOTE DQUOTE .
$default reduce using rule 18 (escaped)
state 17
12 escaped: escaped COMMA .
$default reduce using rule 12 (escaped)
state 18
8 field: DQUOTE escaped DQUOTE .
14 escaped: escaped DQUOTE . DQUOTE
DQUOTE shift, and go to state 23
$default reduce using rule 8 (field)
state 19
13 escaped: escaped CRLF .
$default reduce using rule 13 (escaped)
state 20
11 escaped: escaped TEXTDATA .
$default reduce using rule 11 (escaped)
state 21
3 records: records CRLF record .
$default reduce using rule 3 (records)
state 22
6 fields: fields COMMA field .
$default reduce using rule 6 (fields)
state 23
14 escaped: escaped DQUOTE DQUOTE .
$default reduce using rule 14 (escaped)
If the input is, for example, TEXTDATA CRLF, it is unclear whether it should derive file -> records CRLF and then derive records to a single record or whether it should derive file -> records and then derive records to two records where the second contains only an empty field.
To avoid this ambiguity you can just remove the records CRLF alternative. Files ending with a CRLF will still be accepted - they'll be treated as having an empty field at the end.
If that's not what you want, you'll need to rewrite fields, so that the last record is not allowed to be empty (and then keep the file: records CRLF production).
PS: On an unrelated note, it seems to me that you should move some of your parsing work to the lexer, specifically the part where you parse the contents of quoted strings. Something like "abc" would be best handled by making the lexer turn it into a single token.
Related
I've been stuck with some ambiguous grammar for a while now as yacc reports 6 shift/reduce conflicts. I've looked in the y.output file and have tried to understand how to look at the states and figure out what to do to fix the ambiguous grammar but to no avail. I'm legitimately stuck at how I'm supposed to fix the issues. I've looked at a lot of questions on stack overflow to see if other people's explanation would help me with my problem, but that hasn't helped me much either. For the record, I cannot use any precedence defining directives such as %left to solve the parsing conflicts.
Would someone be able to help me out by guiding me as to how I should change the grammar to fix the shift/reduce conflicts? Maybe by trying to resolve one of the issues and showing me the thinking process behind it? I know the grammar is quite long and hefty and I apologize in advance for that. If anyone is willing to spare their free time on this it would be greatly appreciated, but I realize that I may not be able to have that.
Anyways, here is my grammar in question (it is a slight expansion of the MiniJava grammar):
Grammar
0 $accept: program $end
1 program: main_class class_decl_list
2 main_class: CLASS ID '{' PUBLIC STATIC VOID MAIN '(' STRING '[' ']' ID ')' '{' statement '}' '}'
3 class_decl_list: class_decl_list class_decl
4 | %empty
5 class_decl: CLASS ID '{' var_decl_list method_decl_list '}'
6 | CLASS ID EXTENDS ID '{' var_decl_list method_decl_list '}'
7 var_decl_list: var_decl_list var_decl
8 | %empty
9 method_decl_list: method_decl_list method_decl
10 | %empty
11 var_decl: type ID ';'
12 method_decl: PUBLIC type ID '(' formal_list ')' '{' var_decl_list statement_list RETURN exp ';' '}'
13 formal_list: type ID formal_rest_list
14 | %empty
15 formal_rest_list: formal_rest_list formal_rest
16 | %empty
17 formal_rest: ',' type ID
18 type: INT
19 | BOOLEAN
20 | ID
21 | type '[' ']'
22 statement: '{' statement_list '}'
23 | IF '(' exp ')' statement ELSE statement
24 | WHILE '(' exp ')' statement
25 | SOUT '(' exp ')' ';'
26 | SOUT '(' STRING_LITERAL ')' ';'
27 | ID '=' exp ';'
28 | ID index '=' exp ';'
29 statement_list: statement_list statement
30 | %empty
31 index: '[' exp ']'
32 | index '[' exp ']'
33 exp: exp OP exp
34 | '!' exp
35 | '+' exp
36 | '-' exp
37 | '(' exp ')'
38 | ID index
39 | ID '.' LENGTH
40 | ID index '.' LENGTH
41 | INTEGER_LITERAL
42 | TRUE
43 | FALSE
44 | object
45 | object '.' ID '(' exp_list ')'
46 object: ID
47 | THIS
48 | NEW ID '(' ')'
49 | NEW type index
50 exp_list: exp exp_rest_list
51 | %empty
52 exp_rest_list: exp_rest_list exp_rest
53 | %empty
54 exp_rest: ',' exp
And here are the relevant states from y.output that have shift/reduce conflicts.
State 58
7 var_decl_list: var_decl_list . var_decl
12 method_decl: PUBLIC type ID '(' formal_list ')' '{' var_decl_list . statement_list RETURN exp ';' '}'
INT shift, and go to state 20
BOOLEAN shift, and go to state 21
ID shift, and go to state 22
ID [reduce using rule 30 (statement_list)]
$default reduce using rule 30 (statement_list)
var_decl go to state 24
type go to state 25
statement_list go to state 69
State 76
38 exp: ID . index
39 | ID . '.' LENGTH
40 | ID . index '.' LENGTH
46 object: ID .
'[' shift, and go to state 64
'.' shift, and go to state 97
'.' [reduce using rule 46 (object)]
$default reduce using rule 46 (object)
index go to state 98
State 100
33 exp: exp . OP exp
34 | '!' exp .
OP shift, and go to state 103
OP [reduce using rule 34 (exp)]
$default reduce using rule 34 (exp)
State 101
33 exp: exp . OP exp
35 | '+' exp .
OP shift, and go to state 103
OP [reduce using rule 35 (exp)]
$default reduce using rule 35 (exp)
State 102
33 exp: exp . OP exp
36 | '-' exp .
OP shift, and go to state 103
OP [reduce using rule 36 (exp)]
$default reduce using rule 36 (exp)
State 120
33 exp: exp . OP exp
33 | exp OP exp .
OP shift, and go to state 103
OP [reduce using rule 33 (exp)]
$default reduce using rule 33 (exp)
And there we have it. I apologize again for the length of this grammar and the number of shift/reduce conflicts. I just cannot seem to understand how to fix them by changing the grammar in question. Any help would be thoroughly appreciated, though if no one has time to look through such a massive post, I would understand. If anyone needs more information, don't hesitate to ask.
The basic problem is that when parsing a method_decl body, it can't tell where the var_decl_list ends and the statement_list begins. This is because when the lookahead is ID, it doesn't know whether that is the start of another var_decl or the start of the first statement, and it needs to reduce an empty statement before it can start working on a statement_list.
There are a number of ways you can deal with this:
have the lexer return different tokens for type IDs and other IDs -- that way the difference will tell the parser which is next.
don't require an empty statement at the start of a statement list. Change the grammar to:
statement_list: statement | statement_list statement ;
opt_statement_list: statement_list | %empty ;
and use opt_statement_list in the method_decl rule. This gets around the problem of having to reduce an empty statement_list before you start parsing statements. This is a process known as "unfactoring" the grammar as you are replacing rules with multiple variations. It makes the grammar more complex, and in this case, doesn't solve the problem, it just moves it; you'll then see shift/reduce conflicts betweeen statement: ID . index and type: ID on a [ lookahead. This problem can also be solved by unfactoring, but is harder.
So this brings up the general idea of resolving shift-reduce conflicts by unfactoring. The basic idea is to get rid of the rule causing the reduce half of the shift reduce conflict, replacing it with rules that are more limited in context, so don't trigger the conflict. The example above is easily solved by the "replace a 0-or-more recursive repeat with a 1-or-more recursive repeat and an optional rule". This works well for shift-reduce conflicts on the epsilon rule of the repeat if the following context means you can easily resolve when the 0-case should be legal (only when the next token is } in this case.)
The second conflict is tougher. Here the conflict is on reducing type: ID when the lookahead is [. So we need to duplicate type rules until that is not necessary. Something like:
type: simpleType | arrayType ;
simpleType: INT | BOOLEAN | ID ;
arrayType: INT '[' ']' | BOOLEAN '[' ']' | ID '[' ']'
| arrayType '[' ']' ;
replaces the "0 or more repetitions of the '[' ']' suffix" with "1 or more" and works for similar reasons (defers the reduction until after seeing the '[' ']' instead of requiring it before.) The key being that the simpleType: ID rule never needs to be reduced when the lookahead is '[' as it is only valid in other contexts.
I am writing a parser for a compiler in one homework and when I am running the command
$ bison --yacc -v --defines -o parser.c parser.y
parser.y: warning: 8 shift/reduce conflicts [-Wconflicts-sr]
$
Except of the if/else shift/reduce conflict which is expected I am taking in parser.output file conflicts in the following states,
State 34
35 term: lvalue . PLUSPLUS
37 | lvalue . MINUSMINUS
39 assignexpr: lvalue . ASSIGN expr
40 primary: lvalue .
49 member: lvalue . FULLSTOP IDENTIFIER
50 | lvalue . LEFTSQUARE expr RIGHTSQUARE
54 call: lvalue . callsuffix
ASSIGN shift, and go to state 88
PLUSPLUS shift, and go to state 89
MINUSMINUS shift, and go to state 90
LEFTSQUARE shift, and go to state 91
FULLSTOP shift, and go to state 92
LEFTPAR shift, and go to state 93
PLUSPLUS [reduce using rule 40 (primary)]
MINUSMINUS [reduce using rule 40 (primary)]
LEFTSQUARE [reduce using rule 40 (primary)]
LEFTPAR [reduce using rule 40 (primary)]
$default reduce using rule 40 (primary)
callsuffix go to state 94
normcall go to state 95
methodcall go to state 96
State 36
41 primary: call .
51 member: call . FULLSTOP IDENTIFIER
52 | call . LEFTSQUARE expr RIGHTSQUARE
53 call: call . LEFTPAR elist RIGHTPAR
LEFTSQUARE shift, and go to state 97
FULLSTOP shift, and go to state 98
LEFTPAR shift, and go to state 99
LEFTSQUARE [reduce using rule 41 (primary)]
LEFTPAR [reduce using rule 41 (primary)]
$default reduce using rule 41 (primary)
State 52
16 expr: expr . PLUS expr
17 | expr . MINUS expr
18 | expr . MUL expr
19 | expr . DIV expr
20 | expr . MOD expr
21 | expr . GREATER expr
22 | expr . GREATER_EQUAL expr
23 | expr . LESS expr
24 | expr . LESS_EQUAL expr
25 | expr . EQUAL expr
26 | expr . NOTEQUAL expr
27 | expr . AND expr
28 | expr . OR expr
95 returnstmt: RETURN expr .
PLUS shift, and go to state 74
MINUS shift, and go to state 75
MUL shift, and go to state 76
DIV shift, and go to state 77
MOD shift, and go to state 78
EQUAL shift, and go to state 79
NOTEQUAL shift, and go to state 80
OR shift, and go to state 81
AND shift, and go to state 82
GREATER shift, and go to state 83
LESS shift, and go to state 84
GREATER_EQUAL shift, and go to state 85
LESS_EQUAL shift, and go to state 86
MINUS [reduce using rule 95 (returnstmt)]
$default reduce using rule 95 (returnstmt)
Any idea how to solve it?
Its difficult to say what the problem is as you don't show your full grammar. Often times conflicts are due to other rules in the grammar (not the rules shown in the states with the conflict) due to the context, or how the rules are combined.
state 34/36:
It looks like you have some sort of circular ambiguity between the rules for primary, lvalue, and call. What are these (full) rules? How do expect to know the difference between an lvalue and a primary?
state 52: Here it looks like an ambiguity between a returnstmt and a following expression that begins with a MINUS. It looks like you do not have a statement terminator/separator?
These all might be the same underlying problem -- the parser can't figure out where one statement ends and the next begins...
How can I change this to remove the shiftt/reduce conflict?
var_part
:
| VAR var_declaration SEMIC var_part_multi
;
var_part_multi
: var_declaration SEMIC var_part_multi
|
;
var_declaration
: ID id_list COLON ID
;
id_list
: COMMA ID id_list
|
;
I have two conflicts and the y.output gives me this:
State 19 conflicts: 1 shift/reduce
State 59 conflicts: 1 shift/reduce
state 19
4 var_part: VAR var_declaration SEMIC . var_part_multi
5 var_part_multi: . var_declaration SEMIC var_part_multi
6
7 var_declaration: . ID id_list COLON ID
ID shift, and go to state 12
ID [reduce using rule 6 (var_part_multi)]
$default reduce using rule 6 (var_part_multi)
var_part_multi go to state 33
var_declaration go to state 34
state 59
5 var_part_multi: . var_declaration SEMIC var_part_multi
5 | var_declaration SEMIC . var_part_multi
6 | . [ID, BEGIN, DOT, IF, FUNCTION, REPEAT, SEMIC, VAL, WHILE, WRITELN]
7 var_declaration: . ID id_list COLON ID
ID shift, and go to state 12
ID [reduce using rule 6 (var_part_multi)]
$default reduce using rule 6 (var_part_multi)
var_part_multi go to state 95
var_declaration go to state 34
I know the problem is with the ID, it has two possible routes but I've been trying for the last hour changing the rules, adding precedences and whatnot and wasn't able to remove the conflict. Can you guys help?
You haven't pasted enough of your grammar to answer the question, but it is almost certainly related to the fact that var_part_multi can be empty.
The question is what is the context of the use of var_part; specifically, how it is possible for var_part to be followed by something which starts with ID.
In that case, since var_part_multi can be empty, the parser will have to choose between starting a non-empty var_part_multi using the ID, or reducing an empty var_part_multi (and then reducing a var_part), which will allow the ID to start the non-terminal which can follow var_part.
By the way, in your paste of the y.output file, the third line under State 19 (the one which starts with the number 6) has been truncated. It should resemble the third line under State 59.
If you can't figure out by examining your grammar how ID could follow var_part, it might help to trace the state machine backwards from one of the two conflicted states.
I've tried more than one independent yacc implementation and all of them agree that the grammar you've posted is conflict-free. For example:
$ cat so.y
%token VAR ID COLON SEMIC COMMA
%%
var_part
:
| VAR var_declaration SEMIC var_part_multi
;
var_part_multi
: var_declaration SEMIC var_part_multi
|
;
var_declaration
: ID id_list COLON ID
;
id_list
: COMMA ID id_list
|
;
$ cat y.output
state 0 //
0 $accept: . var_part
1 var_part: . [$end]
$end reduce using rule 1 (var_part)
VAR shift, and goto state 2
var_part goto state 1
state 1 // [$end]
0 $accept: var_part . [$end]
$end accept
state 2 // VAR
2 var_part: VAR . var_declaration SEMIC var_part_multi
ID shift, and goto state 4
var_declaration goto state 3
state 3 // VAR ID COLON ID [SEMIC]
2 var_part: VAR var_declaration . SEMIC var_part_multi
SEMIC shift, and goto state 11
state 4 // VAR ID
5 var_declaration: ID . id_list COLON ID
7 id_list: . [COLON]
COLON reduce using rule 7 (id_list)
COMMA shift, and goto state 6
id_list goto state 5
state 5 // VAR ID [COLON]
5 var_declaration: ID id_list . COLON ID
COLON shift, and goto state 9
state 6 // VAR ID COMMA
6 id_list: COMMA . ID id_list
ID shift, and goto state 7
state 7 // VAR ID COMMA ID
6 id_list: COMMA ID . id_list
7 id_list: . [COLON]
COLON reduce using rule 7 (id_list)
COMMA shift, and goto state 6
id_list goto state 8
state 8 // VAR ID COMMA ID [COLON]
6 id_list: COMMA ID id_list . [COLON]
COLON reduce using rule 6 (id_list)
state 9 // VAR ID COLON
5 var_declaration: ID id_list COLON . ID
ID shift, and goto state 10
state 10 // VAR ID COLON ID
5 var_declaration: ID id_list COLON ID . [SEMIC]
SEMIC reduce using rule 5 (var_declaration)
state 11 // VAR ID COLON ID SEMIC
2 var_part: VAR var_declaration SEMIC . var_part_multi
4 var_part_multi: . [$end]
$end reduce using rule 4 (var_part_multi)
ID shift, and goto state 4
var_declaration goto state 13
var_part_multi goto state 12
state 12 // VAR ID COLON ID SEMIC [$end]
2 var_part: VAR var_declaration SEMIC var_part_multi . [$end]
$end reduce using rule 2 (var_part)
state 13 // VAR ID COLON ID SEMIC ID COLON ID [SEMIC]
3 var_part_multi: var_declaration . SEMIC var_part_multi
SEMIC shift, and goto state 14
state 14 // VAR ID COLON ID SEMIC ID COLON ID SEMIC
3 var_part_multi: var_declaration SEMIC . var_part_multi
4 var_part_multi: . [$end]
$end reduce using rule 4 (var_part_multi)
ID shift, and goto state 4
var_declaration goto state 13
var_part_multi goto state 15
state 15 // VAR ID COLON ID SEMIC ID COLON ID SEMIC [$end]
3 var_part_multi: var_declaration SEMIC var_part_multi . [$end]
$end reduce using rule 3 (var_part_multi)
$
Here are the relevant parts of my Bison grammar rules:
statement:
expression ';' |
IF expression THEN statement ELSE statement END_IF ';'
;
expression:
IDENTIFIER |
IDENTIFIER '('expressions')' |
LIT_INT |
LIT_REAL |
BOOL_OP |
LOG_NOT expression |
expression operator expression |
'('expression')'
;
expressions:
expression |
expressions ',' expression
;
operator:
REL_OP |
ADD_OP |
MULT_OP |
LOG_OR |
LOG_AND
;
When compiling, I get 10 shift/reduce conflicts:
5 conflicts are caused by the LOG_NOT expression rule:
State 45
25 expression: LOG_NOT expression .
26 | expression . operator expression
REL_OP shift, and go to state 48
ADD_OP shift, and go to state 49
MULT_OP shift, and go to state 50
LOG_OR shift, and go to state 51
LOG_AND shift, and go to state 52
REL_OP [reduce using rule 25 (expression)]
ADD_OP [reduce using rule 25 (expression)]
MULT_OP [reduce using rule 25 (expression)]
LOG_OR [reduce using rule 25 (expression)]
LOG_AND [reduce using rule 25 (expression)]
$default reduce using rule 25 (expression)
operator go to state 54
5 conflicts are caused by the expressions operator expression rule:
State 62
26 expression: expression . operator expression
26 | expression operator expression .
REL_OP shift, and go to state 48
ADD_OP shift, and go to state 49
MULT_OP shift, and go to state 50
LOG_OR shift, and go to state 51
LOG_AND shift, and go to state 52
REL_OP [reduce using rule 26 (expression)]
ADD_OP [reduce using rule 26 (expression)]
MULT_OP [reduce using rule 26 (expression)]
LOG_OR [reduce using rule 26 (expression)]
LOG_AND [reduce using rule 26 (expression)]
$default reduce using rule 26 (expression)
operator go to state 54
I know that the problem has to do with precedence. For instance, if the expression was:
a + b * c
Does Bison shift after the a + and hope to find an expression, or does it reduce the a to an expression? I have a feeling that this is due to the Bison 1-token look-ahead limitation, but I can't figure out how to rewrite the rule(s) to resolve the conflicts.
My professor will take points off for shift/reduce conflicts, so I can't use %expect. My professor has also stated that we cannot use %left or %right precedence values.
This is my first post on Stack, so please let me know if I'm posting this all wrong. I've searched existing posts, but this really seems a case-by-case thing. If I use any code from Stack, I will note the source in my submitted project.
Thanks!
As written, your grammar is ambiguous. So it must have conflicts.
There is no inherent rule of binding precedences, and apparently you're not allowed to use bison's precedence declarations either. If you were allowed to, you wouldn't be able to use operator as a non-terminal, because you need to distinguish between
expr1 + expr2 * expr3 expr1 * expr2 + expr3
| | | | | |
| +---+---+ +---+---+ |
| | | |
| expr expr |
| | | |
+-----+-----+ +-----+-----+
| |
expr expr
And you cannot distinguish between them if + and * are replaced with operator. The terminals actually have to be visible.
Now, here's a quick clue:
expr1 + expr2 + expr3 reduces expr1 + expr2 first
expr1 * expr2 + expr3 reduces expr1 * expr2 first
So in non-terminal-1 + non-terminal-2, non-terminal-1 cannot produce x + y or x * y. But in non-terminal-1 * non-terminal-2, non-terminal-1 can produce `x + y
Thanks! I did some more troubleshooting, and fixed the reduce/conflict errors by rewriting the expression and operator rules:
expression:
expression LOG_OR term1 |
term1
;
term1:
term1 LOG_AND term2 |
term2
;
term2:
term2 REL_OP term3 |
term3
;
term3:
term3 ADD_OP term4 |
term4
;
term4:
term4 MULT_OP factor |
factor
;
factor:
IDENTIFIER |
IDENTIFIER '('expressions')' |
LIT_INT |
LIT_REAL |
BOOL_OP |
LOG_NOT factor |
'('expression')'
;
expressions:
expression |
expressions ',' expression
;
I had to rearrange what was actually an expression and what was actually a factor. I made a factor rule that includes all of the factors (terminals), which has the highest precedence. I then made a term# rule for each expression, which also sets them at different precedence levels (term5 has higher precedence than term4, term4 has higher precedence than term3, etc.).
This allowed me to set each operator at a difference precedence without using any of the built-in % precedence functions.
I was able to parse all of my test input files without error. Any thoughts on the design?
this is part of y.ouput file
state 65
15 Expression: Expression . "&&" Expression
16 | Expression . "<" Expression
17 | Expression . "+" Expression
18 | Expression . "-" Expression
19 | Expression . "*" Expression
20 | Expression . "[" Expression "]"
21 | Expression . "." "length"
22 | Expression . "." Identifier "(" Expression "," Expression ")"
25 | "!" Expression .
"[" shift, and go to state 67
"<" shift, and go to state 69
"+" shift, and go to state 70
"-" shift, and go to state 71
"*" shift, and go to state 72
"." shift, and go to state 73
"[" [reduce using rule 25 (Expression)]
"<" [reduce using rule 25 (Expression)]
"+" [reduce using rule 25 (Expression)]
"-" [reduce using rule 25 (Expression)]
"*" [reduce using rule 25 (Expression)]
"." [reduce using rule 25 (Expression)]
$default reduce using rule 25 (Expression)
this is how the precedence of operators is set
%left "&&"
%left '<'
%left '-' '+'
%left '*'
%right '!'
%left '.'
%left '(' ')'
%left '[' ']'
In bison, there is a difference between "x" and 'x'; they are not the same token. So, assuming you are using bison, your precedence declarations don't refer to the terminals in the productions.
Bison also allows %token definitions of the following form:
%token name quoted-string ...
For example (a short excerpt from bison's own grammar file):
%token
PERCENT_CODE "%code"
PERCENT_DEBUG "%debug"
PERCENT_DEFAULT_PREC "%default-prec"
PERCENT_DEFINE "%define"
PERCENT_DEFINES "%defines"
PERCENT_ERROR_VERBOSE "%error-verbose"
Once the symbols have been aliased, they can be used interchangeably in the grammar, making it possible to use the double-quoted string in productions; some people find such grammars easier to read. However, there is no mechanism to ensure that the lexer produces the correct token number for a double-quoted string since it only has access to the token names.
The "original" yacc, at least in the current "byacc" version maintained by Thomas Dickey, allows both single- and double-quoted token names, but does not distinguish between them; both "+" and '+' are mapped to token number 43 ('+'). It also does not provide any easy way to alias token names, so the double-quoted multi-character strings are not particularly easy to use in a reliable way.