I'm attempting to write a grammar for C and am having an issue that I don't quite understand. Relevant portions of the grammar:
stmt :
types decl SEMI { marks (A.Declare ($1, $2)) (1, 2) }
| simp SEMI { marks $1 (1, 1) }
| RETURN exp SEMI { marks (A.Return $2) (1, 2) }
| control { $1 }
| block { marks $1 (1, 1) }
;
control :
if { $1 }
| WHILE RPAREN exp LPAREN stmt { marks (A.While ($3, $5)) (1, 5) }
| FOR LPAREN simpopt SEMI exp SEMI simpopt RPAREN stmt { marks (A.For ($3, $5, $7, $9)) (1, 9) }
;
if :
IF RPAREN exp LPAREN stmt { marks (A.If ($3, $5, None)) (1, 5) }
| IF RPAREN exp LPAREN stmt ELSE stmt { marks (A.If ($3, $5, $7)) (1, 7) }
;
This doesn't work. I ran ocamlyacc -v and got the following report:
83: shift/reduce conflict (shift 86, reduce 14) on ELSE
state 83
if : IF RPAREN exp LPAREN stmt . (14)
if : IF RPAREN exp LPAREN stmt . ELSE stmt (15)
ELSE shift 86
IF reduce 14
WHILE reduce 14
FOR reduce 14
BOOL reduce 14
IDENT reduce 14
RETURN reduce 14
INT reduce 14
MAIN reduce 14
LBRACE reduce 14
RBRACE reduce 14
LPAREN reduce 14
I've read that shift/reduce conflicts are due to ambiguity in the specification of the grammar, but I don't see how I can specify this in a way that isn't ambiguous?
The grammar is certainly ambiguous, although you know what every string means, and furthermore despite the fact that ocamlyacc reports a shift/reduce conflict, its generated grammar will also produce the correct parse for every valid input.
The ambiguity comes from
if ( exp1 ) if ( exp2) stmt1 else stmt2;
Clearly stmt1 only executes if both exp1 and exp2 are true. But does stmt1 execute if exp1 is false, or if exp1 is true and exp2 is false? Those represent different parses; the first (invalid) parse attaches else stmt2 to if (exp1), while the parse that you, I and ocamlyacc know to be correct attaches else stmt2 to if (exp2).
The grammar can be rewritten, although it's a bit of a nuisance. The basic idea is to divide statements into two categories: "matched" (which means that every else in the statement is matched with some if) and "unmatched" (which means that a following else would match some if in the statement. A complete statement may be unmatched, because else clauses are optional, but you can never have an unmatched statement between an if and an else, because that else must match an if in the unmatched statement.
The following grammar is basically the one you provided, but rewritten to use bison-style single-quoted tokens, which I find more readable. I don't know if ocamlyacc handles those. (By the way, your grammar says IF RPAREN exp LPAREN... which, with the common definition of left and right parentheses, would mean if ) exp (. That's one reason I find single-quoted character terminals much more readable.)
Bison handles this grammar with no conflicts.
/* Fake non-terminals */
%token types decl simp exp
/* Keywords */
%token ELSE FOR IF RETURN WHILE
%%
stmt: matched_stmt | unmatched_stmt ;
stmt_list: stmt | stmt_list stmt ;
block: '{' stmt_list '}' ;
matched_stmt
: types decl ';'
| simp ';'
| RETURN exp ';'
| block
| matched_control
;
simpopt : simp | /* EMPTY */;
matched_control
: IF '(' exp ')' matched_stmt ELSE matched_stmt
| WHILE '(' exp ')' matched_stmt
| FOR '(' simpopt ';' exp ';' simpopt ')' matched_stmt
;
unmatched_stmt
: IF '(' exp ')' stmt
| IF '(' exp ')' matched_stmt ELSE unmatched_stmt
| WHILE '(' exp ')' unmatched_stmt
| FOR '(' simpopt ';' exp ';' simpopt ')' unmatched_stmt
;
Personally, I'd refactor a bit. Eg:
if_prefix : IF '(' exp ')'
;
loop_prefix: WHILE '(' exp ')'
| FOR '(' simpopt ';' exp ';' simpopt ')'
;
matched_control
: if_prefix matched_stmt ELSE matched_stmt
| loop_prefix matched_stmt
;
unmatched_stmt
: if_prefix stmt
| if_prefix ELSE unmatched_stmt
| loop_prefix unmatched_stmt
;
A common and simpler but less rigorous solution is to use precedence declarations as suggested in the bison manual.
Related
I am using Bison, together with Flex, to try and parse a simple grammar that was provided to me. In this grammar (almost) everything is considered an expression and has some kind of value; there are no statements. What's more, the EBNF definition of the grammar comes with certain ambiguities:
expression OP expression where op may be '+', '-' '&' etc. This can easily be solved using bison's associativity operators and setting %left, %right and %nonassoc according to common language standards.
IF expression THEN expression [ELSE expression] as well as DO expression WHILE expression, for which ignoring the common case dangling else problem I want the following behavior:
In if-then-else as well as while expressions, the embedded expressions are taken to be as long as possible (allowed by the grammar). E.g 5 + if cond_expr then then_expr else 10 + 12 is equivalent to 5 + (if cond_expr then then_expr else (10 + 12)) and not 5 + (if cond_expr then then_expr else 10) + 12
Given that everything in the language is considered an expression, I cannot find a way to re-write the production rules in a form that does not cause conflicts. One thing I tried, drawing inspiration from the dangling else example in the bison manual was:
expression: long_expression
| short_expression
;
long_expression: short_expression
| IF long_expression THEN long_expression
| IF long_expression long_expression ELSE long_expression
| WHILE long_expression DO long_expression
;
short_expression: short_expression '+' short_expression
| short_expression '-' short_expression
...
;
However this does not seem to work and I cannot figure out how I could tweak it into working. Note that I (assume I) have resolved the dangling ELSE problem using nonassoc for ELSE and THEN and the above construct as suggested in some book, but I am not sure this is even valid in the case where there are not statements but only expressions. Note as well as that associativity has been set for all other operators such as +, - etc. Any solutions or hints or examples that resolve this?
----------- EDIT: MINIMAL EXAMPLE ---------------
I tried to include all productions with tokens that have specific associativity, including some extra productions to show a bit of the grammar. Notice that I did not actually use my idea explained above. Notice as well that I have included a single binary and unary operator just to make the code a bit shorter, the rules for all operators are of the same form. Bison with -Wall flag finds no conflicts with these declarations (but I am pretty sure they are not 100% correct).
%token<int> INT32 LET IF WHILE INTEGER OBJECTID TYPEID NEW
%right <str> THEN ELSE STR
%right '^' UMINUS NOT ISNULL ASSIGN DO IN
%left '+' '-'
%left '*' '/'
%left <str> AND '.'
%nonassoc '<' '='
%nonassoc <str> LOWEREQ
%type<ast_expr> expression
%type ...
exprlist: expression { ; }
| exprlist ';' expression { ; };
block: '{' exprlist '}' { ; };
args: %empty { ; }
| expression { ; }
| args ',' expression { ; };
expression: IF expression THEN expression { ; }
| IF expression THEN expression ELSE expression { ; }
| WHILE expression DO expression { ; }
| LET OBJECTID ':' type IN expression { ; }
| NOT expression { /* UNARY OPERATORS */ ; }
| expression '=' expression { /* BINARY OPERATORS */ ; }
| OBJECTID '(' args ')' { ; }
| expression '.' OBJECTID '(' args ')' { ; }
| NEW TYPEID { ; }
| STR { ; }
| INTEGER { ; }
| '(' ')' { ; }
| '(' expression ')' { ; }
| block { ; }
;
The following associativity declarations resolved all shift/reduce conflicts and produced the expected output (in all tests I could think of at least):
...
%right <str> THEN ELSE
%right DO IN
%right ASSIGN
%left <str> AND
%right NOT
%nonassoc '<' '=' LOWEREQ
%left '+' '-'
%left '*' '/'
%right UMINUS ISNULL
%right '^'
%left '.'
...
%%
...
expression: IF expression THEN expression
| IF expression THEN expression ELSE expression
| WHILE expression DO expression
| LET OBJECTID ':' type IN expression
| LET OBJECTID ':' type ASSIGN expression IN expression
| OBJECTID ASSIGN expression
...
| '-' expression %prec UMINUS
| expression '=' expression
...
| expression LOWEREQ expression
| OBJECTID '(' args ')'
...
...
Notice that the order of declaration of associativity and precedence rules for all symbols matters! I have not included all the production rules but if-else-then, while-do, let in, unary and binary operands are the ones that produced conflicts or wrong results with different associativity declarations.
I'm trying to write a little interpreter with GNU bison.
I wanted to ask if anyone could explain the difference between the directive% right and% left and where my mistake is in the code below.
%token <flo> FLO
%token <name> NAME
%right '='
%left '+' '-'
%left '*' '/' '%'
%left '&' '|' 'x'
%left NEG NOT LOGIC_NOT
%left '^'
%left ARG
%type <flo> exp
%%
language: /* nothing */
| language statment
statment: '\n'
| exp
| error { yyerrok; }
;
exp: FLO { $$ = $1; }
| NAME '(' ')' { $$ = ycall($1); }
| NAME '(' exp ')' { $$ = ycall($1, $3); }
| NAME '(' exp ',' exp ')' { $$ = ycall($1, $3, $5); }
| NAME '=' exp { $$ = 1; ysetvar($1, $3); }
| NAME %prec VAR { $$ = ygetvar($1); }
| '_' exp %prec ARG { $$ = ygetarg($2, args); }
| '(' exp ')' { $$ = $2; }
/* 1 Operand */
| '-' exp %prec NEG { $$ = - $2; }
| '~' exp %prec NOT { $$ = ~ static_cast<int>($2); }
| '!' exp %prec LOGIC_NOT { $$ = ! static_cast<int>($2); }
/* 2 Operands */
| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp { $$ = $1 / $3; }
| exp '%' exp { $$ = static_cast<int>($1) % static_cast<int>($3); }
| exp '^' exp { $$ = pow($1, $3); }
| exp '&' exp { $$ = static_cast<int>($1) & static_cast<int>($3); }
| exp '|' exp { $$ = static_cast<int>($1) | static_cast<int>($3); }
| exp 'x' exp { $$ = static_cast<int>($1) ^ static_cast<int>($3); }
;
Look at the y.output file produced by yacc or bison with the -v argument. The first conflict is in state 5:
State 5
7 exp: NAME . '(' ')'
8 | NAME . '(' exp ')'
9 | NAME . '(' exp ',' exp ')'
10 | NAME . '=' exp
11 | NAME .
'=' shift, and go to state 14
'(' shift, and go to state 15
'(' [reduce using rule 11 (exp)]
$default reduce using rule 11 (exp)
In this case the conflcit is when there's a '(' after a NAME -- this is an ambiguity in your grammar in which it might be a call expression, or it might be a simple NAME expression followed by a parenthesized expression, due to the fact that you have no separator between statements in your language.
The second conflict is:
State 13
4 statment: exp .
17 exp: exp . '+' exp
18 | exp . '-' exp
19 | exp . '*' exp
20 | exp . '/' exp
21 | exp . '%' exp
22 | exp . '^' exp
23 | exp . '&' exp
24 | exp . '|' exp
25 | exp . 'x' exp
'+' shift, and go to state 21
'-' shift, and go to state 22
'*' shift, and go to state 23
'/' shift, and go to state 24
'%' shift, and go to state 25
'&' shift, and go to state 26
'|' shift, and go to state 27
'x' shift, and go to state 28
'^' shift, and go to state 29
'-' [reduce using rule 4 (statment)]
$default reduce using rule 4 (statment)
which is essentially the same problem, this time with a '-' -- the input NAME - NAME might be a single binary subtract statements, or it might be two statements -- a NAME followed by a unary negate.
If you add a separator between statements (such as ;), both of these conflicts would go away.
to solve the dangling else problem, I used the following solution:
stmt : stmt_matched
| stmt_unmatched
;
stmt_unmatched : IF '(' exp ')' stmt
| IF '(' exp ')' stmt_matched ELSE stmt_unmatched
;
stmt_matched : IF '(' exp ')' stmt_matched ELSE stmt_matched
| stmt_for
| ...
;
For defining the rules of grammar about the for loop, I produce a conflict shift/reduce due to the same problem:
stmt_for : FOR '(' exp ';' exp ';' exp ')' stmt
;
How can I solve this problem?
Not all for statements are matched. Consider, for example
if (c) for (;;) if (d) ; else ;
So it is necessary to divide for statements into for_matched and for_unmatched. (And similarly with other compound statements such as while.)
I wrote a grammar in ANTLR for a Java-like if statement as follows:
if_statement
: 'if' expression
(statement | '{' statement+ '}')
('elif' expression (statement | '{' statement+ '}'))*
('else' (statement | '{' statement+ '}'))?
;
I've implemented the "statement" and "expression" correctly, but the if_statement is giving me the following error:
Decision can match input such as "'elif'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
|---> ('elif' expression (statement | '{' statement+ '}'))*
warning(200): /OptDB/src/OptDB/XL.g:38:9:
Decision can match input such as "'else'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
|---> ('else' (statement | '{' statement+ '}'))?
It seems like there are problems with the "elif" and "else" block.
Basically, we can have 0 or more "elif" blocks, so I wrapped them with *
Also we can have 0 or 1 "else" block, so I wrapped it it with ?.
What seems to cause the error?
========================================================================
I'll also put the implementations of "expression" and "statements":
statement
: assignment_statement
| if_statement
| while_statement
| for_statement
| function_call_statement
;
term
: IDENTIFIER
| '(' expression ')'
| INTEGER
| STRING_LITERAL
| CHAR_LITERAL
| IDENTIFIER '(' actualParameters ')'
;
negation
: 'not'* term
;
unary
: ('+' | '-')* negation
;
mult
: unary (('*' | '/' | 'mod') unary)*
;
add
: mult (('+' | '-') mult)*
;
relation
: add (('=' | '/=' | '<' | '<=' | '>=' | '>') add)*
;
expression
: relation (('and' | 'or') relation)*
;
actualParameters
: expression (',' expression)*
;
Because your grammar allows for statement block without being grouped by {...}, you've got yourself a classic dangling else ambiguity.
Short explanation. The input:
if expr1 if expr2 ... else ...
could be parsed as:
Parse 1
if expr1
if expr2
...
else
...
but also as this:
Parse 2
if expr1
if expr2
...
else
...
To eliminate the ambiguity, either change:
(statement | '{' statement+ '}')
into:
'{' statement+ '}'
// or
'{' statement* '}'
so that it's clear by looking at the braces to which if the else belongs to, or add a predicate to force the parser to choose Parse 1:
if_statement
: 'if' expression statement_block
(('elif')=> 'elif' expression statement_block)*
(('else')=> 'else' statement_block)?
;
statement_block
: '{' statement* '}'
| statement
;
I am writing a simple parser in bison. The parser checks whether a program has any syntax errors with respect to my following grammar:
%{
#include <stdio.h>
void yyerror (const char *s) /* Called by yyparse on error */
{
printf ("%s\n", s);
}
%}
%token tNUM tINT tREAL tIDENT tINTTYPE tREALTYPE tINTMATRIXTYPE
%token tREALMATRIXTYPE tINTVECTORTYPE tREALVECTORTYPE tTRANSPOSE
%token tIF tENDIF tDOTPROD tEQ tNE tGTE tLTE tGT tLT tOR tAND
%left "(" ")" "[" "]"
%left "<" "<=" ">" ">="
%right "="
%left "+" "-"
%left "*" "/"
%left "||"
%left "&&"
%left "==" "!="
%% /* Grammar rules and actions follow */
prog: stmtlst ;
stmtlst: stmt | stmt stmtlst ;
stmt: decl | asgn | if;
decl: type vars "=" expr ";" ;
type: tINTTYPE | tINTVECTORTYPE | tINTMATRIXTYPE | tREALTYPE | tREALVECTORTYPE
| tREALMATRIXTYPE ;
vars: tIDENT | tIDENT "," vars ;
asgn: tIDENT "=" expr ";" ;
if: tIF "(" bool ")" stmtlst tENDIF ;
expr: tIDENT | tINT | tREAL | vectorLit | matrixLit | expr "+" expr| expr "-" expr
| expr "*" expr | expr "/" expr| expr tDOTPROD expr | transpose ;
transpose: tTRANSPOSE "(" expr ")" ;
vectorLit: "[" row "]" ;
matrixLit: "[" row ";" rows "]" ;
row: value | value "," row ;
rows: row | row ";" rows ;
value: tINT | tREAL | tIDENT ;
bool: comp | bool tAND bool | bool tOR bool ;
comp: expr relation expr ;
relation: tGT | tLT | tGTE | tLTE | tNE | tEQ ;
%%
int main ()
{
if (yyparse()) {
// parse error
printf("ERROR\n");
return 1;
}
else {
// successful parsing
printf("OK\n");
return 0;
}
}
The code may look long and complicated, but i think what i am going to ask does not need the full code, but in any case i preferred to write the code. I am sure my grammar is correct, but ambiguous. When i try to create the executable of the program by writing "bison -d filename.y", i get an error saying that conflicts: 13 shift/reduce. I defined the precedence of the operators at the beginning of this file, and i tried a lot of combinations of these precedences, but i still get this error. How can i remove this ambiguity? Thank you
tOR, tAND, and tDOTPROD need to have their precedence specified as well.