Ply shift/reduce conflicts: dangling else and empty productions - parsing

I had lots of conflicts, most of them were due to operators and relational operators which had different precedences. But I still face some conflicts that I don't really know how to tackle them. some of them are below. I suspect that maybe I should do epsilon elimination for stmtlist but to be honest I'm not sure about it.
state 70:
state 70
(27) block -> LCB varlist . stmtlist RCB
(25) varlist -> varlist . vardec
(28) stmtlist -> . stmt
(29) stmtlist -> . stmtlist stmt
(30) stmtlist -> .
(15) vardec -> . type idlist SEMICOLON
(33) stmt -> . RETURN exp SEMICOLON
(34) stmt -> . exp SEMICOLON
(35) stmt -> . WHILE LRB exp RRB stmt
(36) stmt -> . FOR LRB exp SEMICOLON exp SEMICOLON exp RRB stmt
(37) stmt -> . IF LRB exp RRB stmt elseiflist
(38) stmt -> . IF LRB exp RRB stmt elseiflist ELSE stmt
(39) stmt -> . PRINT LRB ID RRB SEMICOLON
(40) stmt -> . block
(7) type -> . INTEGER
(8) type -> . FLOAT
(9) type -> . BOOLEAN
(44) exp -> . lvalue ASSIGN exp
(45) exp -> . exp SUM exp
(46) exp -> . exp MUL exp
(47) exp -> . exp SUB exp
(48) exp -> . exp DIV exp
(49) exp -> . exp MOD exp
(50) exp -> . exp AND exp
(51) exp -> . exp OR exp
(52) exp -> . exp LT exp
(53) exp -> . exp LE exp
(54) exp -> . exp GT exp
(55) exp -> . exp GE exp
(56) exp -> . exp NE exp
(57) exp -> . exp EQ exp
(58) exp -> . const
(59) exp -> . lvalue
(60) exp -> . ID LRB explist RRB
(61) exp -> . LRB exp RRB
(62) exp -> . ID LRB RRB
(63) exp -> . SUB exp
(64) exp -> . NOT exp
(27) block -> . LCB varlist stmtlist RCB
(31) lvalue -> . ID
(32) lvalue -> . ID LSB exp RSB
(72) const -> . INTEGERNUMBER
(73) const -> . FLOATNUMBER
(74) const -> . TRUE
(75) const -> . FALSE
! shift/reduce conflict for RETURN resolved as shift
! shift/reduce conflict for WHILE resolved as shift
! shift/reduce conflict for FOR resolved as shift
! shift/reduce conflict for IF resolved as shift
! shift/reduce conflict for PRINT resolved as shift
! shift/reduce conflict for ID resolved as shift
! shift/reduce conflict for LRB resolved as shift
! shift/reduce conflict for SUB resolved as shift
! shift/reduce conflict for NOT resolved as shift
! shift/reduce conflict for LCB resolved as shift
! shift/reduce conflict for INTEGERNUMBER resolved as shift
! shift/reduce conflict for FLOATNUMBER resolved as shift
! shift/reduce conflict for TRUE resolved as shift
! shift/reduce conflict for FALSE resolved as shift
RCB reduce using rule 30 (stmtlist -> .)
RETURN shift and go to state 99
WHILE shift and go to state 101
FOR shift and go to state 102
IF shift and go to state 103
PRINT shift and go to state 104
INTEGER shift and go to state 8
FLOAT shift and go to state 9
BOOLEAN shift and go to state 10
ID shift and go to state 31
LRB shift and go to state 36
SUB shift and go to state 34
NOT shift and go to state 37
LCB shift and go to state 45
INTEGERNUMBER shift and go to state 38
FLOATNUMBER shift and go to state 39
TRUE shift and go to state 40
FALSE shift and go to state 41
! RETURN [ reduce using rule 30 (stmtlist -> .) ]
! WHILE [ reduce using rule 30 (stmtlist -> .) ]
! FOR [ reduce using rule 30 (stmtlist -> .) ]
! IF [ reduce using rule 30 (stmtlist -> .) ]
! PRINT [ reduce using rule 30 (stmtlist -> .) ]
! ID [ reduce using rule 30 (stmtlist -> .) ]
! LRB [ reduce using rule 30 (stmtlist -> .) ]
! SUB [ reduce using rule 30 (stmtlist -> .) ]
! NOT [ reduce using rule 30 (stmtlist -> .) ]
! LCB [ reduce using rule 30 (stmtlist -> .) ]
! INTEGERNUMBER [ reduce using rule 30 (stmtlist -> .) ]
! FLOATNUMBER [ reduce using rule 30 (stmtlist -> .) ]
! TRUE [ reduce using rule 30 (stmtlist -> .) ]
! FALSE [ reduce using rule 30 (stmtlist -> .) ]
stmtlist shift and go to state 96
vardec shift and go to state 97
stmt shift and go to state 98
type shift and go to state 72
exp shift and go to state 100
block shift and go to state 105
lvalue shift and go to state 33
const shift and go to state 35
here is a list of all productions:
program โ†’ declist main ( ) block
declist โ†’ dec | declist dec | ๐œ–
dec โ†’ vardec | funcdec
type โ†’ int | float | bool
iddec โ†’ id | id [ exp ] | id=exp
idlist โ†’ iddec | idlist , iddec
vardec โ†’ type idlist ;
funcdec โ†’ type id (paramdecs) block | void id (paramdecs) block
paramdecs โ†’ paramdecslist | ๐œ–
paramdecslist โ†’ paramdec | paramdecslist , paramdec
paramdec โ†’ type id | type id []
Precedencevarlist โ†’ vardec | varlist vardec | ๐œ–
block โ†’ { varlist stmtlist }
stmtlist โ†’ stmt | stmlist stmt | ๐œ–
lvalue โ†’ id | id [exp]
stmt โ†’ return exp ; | exp ;| block |
while (exp) stmt |
for(exp ; exp ; exp) stmt |
if (exp) stmt elseiflist | if (exp) stmt elseiflist else stmt |
print ( id) ;
elseiflist โ†’ elif (exp) stmt | elseiflist elif (exp) stmt | ๐œ–
exp โ†’ lvalue=exp | exp operator exp |exp relop exp|
const | lvalue | id(explist) | (exp) | id() | - exp | ! exp
operator โ†’ โ€œ||โ€ | && | + | - | * | / | %
const โ†’ intnumber | floatnumber | true | false
relop โ†’ > | < | != | == | <= | >=
explist โ†’ exp | explist,exp
Another problem is the famous dangling else, I added ('nonassoc', 'IFP'), ('left', 'ELSE' , 'ELIF') to precedence tuple and change the grammar in this way:
def p_stmt_5(self, p):
"""stmt : IF LRB exp RRB stmt elseiflist %prec IFP """
print("""stmt : IF LRB exp RRB stmt elseiflist """)
def p_stmt_6(self, p):
"""stmt : IF LRB exp RRB stmt elseiflist ELSE stmt"""
print("""stmt : IF LRB exp RRB stmt elseiflist else stmt """)
But it didn't make it go away. below is the state where the shift/reduce conflict happens.
state 130
(37) stmt -> IF LRB exp RRB stmt . elseiflist
(38) stmt -> IF LRB exp RRB stmt . elseiflist ELSE stmt
(41) elseiflist -> . ELIF LRB exp RRB stmt
(42) elseiflist -> . elseiflist ELIF LRB exp RRB stmt
(43) elseiflist -> .
! shift/reduce conflict for ELIF resolved as shift
ELIF shift and go to state 134
RCB reduce using rule 43 (elseiflist -> .)
RETURN reduce using rule 43 (elseiflist -> .)
WHILE reduce using rule 43 (elseiflist -> .)
FOR reduce using rule 43 (elseiflist -> .)
IF reduce using rule 43 (elseiflist -> .)
PRINT reduce using rule 43 (elseiflist -> .)
ID reduce using rule 43 (elseiflist -> .)
LRB reduce using rule 43 (elseiflist -> .)
SUB reduce using rule 43 (elseiflist -> .)
NOT reduce using rule 43 (elseiflist -> .)
LCB reduce using rule 43 (elseiflist -> .)
INTEGERNUMBER reduce using rule 43 (elseiflist -> .)
FLOATNUMBER reduce using rule 43 (elseiflist -> .)
TRUE reduce using rule 43 (elseiflist -> .)
FALSE reduce using rule 43 (elseiflist -> .)
ELSE reduce using rule 43 (elseiflist -> .)
! ELIF [ reduce using rule 43 (elseiflist -> .) ]
elseiflist shift and go to state 133
Finally there are two more states with shift/reduce errors which I list below:
state 45
(27) block -> LCB . varlist stmtlist RCB
(24) varlist -> . vardec
(25) varlist -> . varlist vardec
(26) varlist -> .
(15) vardec -> . type idlist SEMICOLON
(7) type -> . INTEGER
(8) type -> . FLOAT
(9) type -> . BOOLEAN
! shift/reduce conflict for INTEGER resolved as shift
! shift/reduce conflict for FLOAT resolved as shift
! shift/reduce conflict for BOOLEAN resolved as shift
RETURN reduce using rule 26 (varlist -> .)
WHILE reduce using rule 26 (varlist -> .)
FOR reduce using rule 26 (varlist -> .)
IF reduce using rule 26 (varlist -> .)
PRINT reduce using rule 26 (varlist -> .)
ID reduce using rule 26 (varlist -> .)
LRB reduce using rule 26 (varlist -> .)
SUB reduce using rule 26 (varlist -> .)
NOT reduce using rule 26 (varlist -> .)
LCB reduce using rule 26 (varlist -> .)
INTEGERNUMBER reduce using rule 26 (varlist -> .)
FLOATNUMBER reduce using rule 26 (varlist -> .)
TRUE reduce using rule 26 (varlist -> .)
FALSE reduce using rule 26 (varlist -> .)
RCB reduce using rule 26 (varlist -> .)
INTEGER shift and go to state 8
FLOAT shift and go to state 9
BOOLEAN shift and go to state 10
! INTEGER [ reduce using rule 26 (varlist -> .) ]
! FLOAT [ reduce using rule 26 (varlist -> .) ]
! BOOLEAN [ reduce using rule 26 (varlist -> .) ]
varlist shift and go to state 70
vardec shift and go to state 71
type shift and go to state 72
And:
state 0
(0) S' -> . program
(1) program -> . declist MAIN LRB RRB block
(2) declist -> . dec
(3) declist -> . declist dec
(4) declist -> .
(5) dec -> . vardec
(6) dec -> . funcdec
(15) vardec -> . type idlist SEMICOLON
(16) funcdec -> . type ID LRB paramdecs RRB block
(17) funcdec -> . VOID ID LRB paramdecs RRB block
(7) type -> . INTEGER
(8) type -> . FLOAT
(9) type -> . BOOLEAN
! shift/reduce conflict for VOID resolved as shift
! shift/reduce conflict for INTEGER resolved as shift
! shift/reduce conflict for FLOAT resolved as shift
! shift/reduce conflict for BOOLEAN resolved as shift
MAIN reduce using rule 4 (declist -> .)
VOID shift and go to state 7
INTEGER shift and go to state 8
FLOAT shift and go to state 9
BOOLEAN shift and go to state 10
! VOID [ reduce using rule 4 (declist -> .) ]
! INTEGER [ reduce using rule 4 (declist -> .) ]
! FLOAT [ reduce using rule 4 (declist -> .) ]
! BOOLEAN [ reduce using rule 4 (declist -> .) ]
program shift and go to state 1
declist shift and go to state 2
dec shift and go to state 3
vardec shift and go to state 4
funcdec shift and go to state 5
type shift and go to state 6
Thank you so much in advance.

There are actually two somewhat related problems here, both having to do with ambiguity induced by duplicate base cases in recursive productions:
1. Ambiguity in stmtlist
First, as you imply, there is a problem with stmtlist. Your grammar for stmtlist is:
stmtlist โ†’ stmt | stmlist stmt | ๐œ–
which has two base cases: stmtlist โ†’ stmt and stmtlist โ†’ ๐œ–. This duplication means that a single stmt can be parsed in two ways:
stmtlist โ†’ stmt
stmtlist โ†’ stmtlist stmt โ†’ ๐œ– stmt
Grammatical ambiguities always manifest as conflicts. To eliminate the conflict, eliminate the ambiguity. If you want stmtlist to be possibly empty, use:
stmtlist โ†’ stmlist stmt | ๐œ–
If you want to insist that stmtlist contains at least one stmt, use:
stmtlist โ†’ stmlist stmt | stmt
Above all, try to understand the logic of the above suggestion.
In addition, you allow stmt to be empty. It should be obvious that this is going to lead to an ambiguity in stmtlist because it is impossible to know how many empty stmts there are in a list. It could be 3; it could be 42; it could be eight million. Empty is invisible.
The potential nothingness of stmt also creates an ambiguity with those compound statements which end with stmt, such as "while" '(' exp ')' stmt. If stmt could be nothing, then
while (x) while(y) c;
could be two statements: while(x) with an empty repeated statement, and then while(y) with a loop on c;. Or it could have the (probably expected) meaning of a while(x) loop whose repeated statement is a nested while(y) c;. I would suggest that no-one would expect the first interpretation and that the grammar should not allow it. If you wanted an empty while target, you would use ; as the repeated statement, not nothing.
I'm sure you didn't intend that a stmt can be nothing. It makes lots of sense to allow the empty statement written as ; (that is, an emptyness followed by a semicolon), but that's obviously a different syntax. (Inside {โ€ฆ} you might want to allow nothing, rather than insisting on a semicolon. To achieve that, you need an empty stmtlist, not an empty stmt.)
2. Dangling else: actually an ambiguity in elseiflist
I think this is the grammar you are using:
(37) stmt -> "if" '(' exp ')' stmt elseiflist %prec IFP
(38) stmt -> "if" '(' exp ')' stmt elseiflist "else" stmt
(41) elseiflist -> "elif" '(' exp ')' stmt
(42) elseiflist -> elseiflist "elif" '(' exp ')' stmt
(43) elseiflist ->
Just as with the stmtlist production, elseiflist is a recursive production with two base cases, one of which is redundant. Again, it is necessary to decide whether or not elseiflist can really be empty (Hint: it can be), and then to remove one or the other of the base cases to avoid an ambiguous parse.
Having said that, I don't think that's the best way of writing the grammar for an if statement; the parse tree it builds might not be quite as you expect. But I guess it will work.

Related

Understand potential conflicts

I have a small parser of expression built by Menhir. I'm trying to recover parenthesis-incomplete expressions during parsing by writing recovery grammars in parser.mly:
%{
open AST
%}
%token<int> LINT
%token<string> ID
%token LPAREN RPAREN COMMA
%token EOF PLUS STAR EQ
%start<AST.expression> expressionEOF
%right LPAREN RPAREN
%nonassoc EQ
%left PLUS
%left STAR
%%
expressionEOF: e=expression EOF
{
e
}
expression:
| x=LINT
{
Int x
}
| x=identifier
{
Read x
}
| e1=expression b=binop e2=expression
{
Binop (b, e1, e2)
}
| e1=expression b=binop
(* for "2+", "2*3+" *)
{
Binop (b, e1, FakeExpression)
}
| LPAREN e=expression RPAREN
{
Paren e
}
| LPAREN RPAREN
(* for "()" *)
{
Paren FakeExpression
}
| LPAREN
(* for "(" *)
{
ParenMissingRparen FakeExpression
}
| LPAREN e=expression
(* for "(1", "(1+2", "(1+2*3", "((1+2)" *)
{
ParenMissingRparen e
}
| RPAREN
(* for ")" *)
{
ExtraRparen FakeExpression
}
| e=expression RPAREN
(* for "3)", "4))", "2+3)" *)
{
ExtraRparen e
}
%inline binop:
PLUS { Add }
| STAR { Mul }
| EQ { Equal }
identifier: x=ID
{
Id x
}
It works fine on a set of incomplete expressions. However, menhir --explain parser.mly returns the following parser.conflict:
** Conflict (reduce/reduce) in state 10.
** Tokens involved: STAR RPAREN PLUS EQ EOF
** The following explanations concentrate on token STAR.
** This state is reached from expressionEOF after reading:
LPAREN expression RPAREN
** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)
expressionEOF
expression EOF
expression STAR expression // lookahead token appears
(?)
** In state 10, looking ahead at STAR, reducing production
** expression -> LPAREN expression RPAREN
** is permitted because of the following sub-derivation:
LPAREN expression RPAREN .
** In state 10, looking ahead at STAR, reducing production
** expression -> expression RPAREN
** is permitted because of the following sub-derivation:
LPAREN expression // lookahead token is inherited
expression RPAREN .
** Conflict (reduce/reduce) in state 3.
** Tokens involved: STAR RPAREN PLUS EQ EOF
** The following explanations concentrate on token STAR.
** This state is reached from expressionEOF after reading:
LPAREN RPAREN
** The derivations that appear below have the following common factor:
** (The question mark symbol (?) represents the spot where the derivations begin to differ.)
expressionEOF
expression EOF
expression STAR expression // lookahead token appears
(?)
** In state 3, looking ahead at STAR, reducing production
** expression -> LPAREN RPAREN
** is permitted because of the following sub-derivation:
LPAREN RPAREN .
** In state 3, looking ahead at STAR, reducing production
** expression -> RPAREN
** is permitted because of the following sub-derivation:
LPAREN expression // lookahead token is inherited
RPAREN .
I don't understand what it tries to explain. Could anyone tell me what may be potential conflicts (with example by preference) and what would be solutions?
You have:
expr: '(' expr ')'
| '(' expr
| expr ')'
So, you want ( x ) to match the first rule:
expr
-> '(' expr ')' (rule 1)
Which it does. But it also matches another way:
expr
-> expr ')' (rule 3)
-> '(' expr ')' (rule 2)
And it also matches like this:
expr
-> '(' expr (rule 2)
-> '(' expr ')' (rule 3)
Since you also let expr match ( and ), ( ) can also be matched several ways, including as expr ')' (with expr -> '('), or '(' expr (with expr -> ')').
The "solution" is to give up trying to add recognition of invalid sentences. The parse should fail on a syntax error; once it fails you can try to use Menhir's error recovery mechanism to produce an error message and continue the parse. See section 11 of the manual.

How do I preform 'lookahead' in an OCaml lexer / how do I rollback a lexeme?

Well, I'm writing my first parser, in OCaml, and I immediately somehow managed to make one with an infinite-loop.
Of particular note, I'm trying to lex identifiers according to the rules of the Scheme specification (I have no idea what I'm doing, obviously) โ€” and there's some language in there about identifiers requiring that they are followed by a delimiter. My approach, right now, is to have a delimited_identifier regex that includes one of the delimiter characters, that should not be consumed by the main lexer โ€ฆ and then once that's been matched, the reading of that lexeme is reverted by Sedlexing.rollback (well, my wrapper thereof), before being passed to a sublexer that only eats the actual identifier, hopefully leaving the delimiter in the buffer to be eaten as a different lexeme by the parent lexer.
I'm using Menhir and Sedlex, mostly synthesizing the examples from #smolkaj's ocaml-parsing example-repo and RWO's parsing chapter; here's the simplest reduction of my current parser and lexer:
%token LPAR RPAR LVEC APOS TICK COMMA COMMA_AT DQUO SEMI EOF
%token <string> IDENTIFIER
(* %token <bool> BOOL *)
(* %token <int> NUM10 *)
(* %token <string> STREL *)
%start <Parser.AST.t> program
%%
program:
| p = list(expression); EOF { p }
;
expression:
| i = IDENTIFIER { Parser.AST.Atom i }
%%
โ€ฆ and โ€ฆ
(** Regular expressions *)
let newline = [%sedlex.regexp? '\r' | '\n' | "\r\n" ]
let whitespace = [%sedlex.regexp? ' ' | newline ]
let delimiter = [%sedlex.regexp? eof | whitespace | '(' | ')' | '"' | ';' ]
let digit = [%sedlex.regexp? '0'..'9']
let letter = [%sedlex.regexp? 'A'..'Z' | 'a'..'z']
let special_initial = [%sedlex.regexp?
'!' | '$' | '%' | '&' | '*' | '/' | ':' | '<' | '=' | '>' | '?' | '^' | '_' | '~' ]
let initial = [%sedlex.regexp? letter | special_initial ]
let special_subsequent = [%sedlex.regexp? '+' | '-' | '.' | '#' ]
let subsequent = [%sedlex.regexp? initial | digit | special_subsequent ]
let peculiar_identifier = [%sedlex.regexp? '+' | '-' | "..." ]
let identifier = [%sedlex.regexp? initial, Star subsequent | peculiar_identifier ]
let delimited_identifier = [%sedlex.regexp? identifier, delimiter ]
(** Swallow whitespace and comments. *)
let rec swallow_atmosphere buf =
match%sedlex buf with
| Plus whitespace -> swallow_atmosphere buf
| ";" -> swallow_comment buf
| _ -> ()
and swallow_comment buf =
match%sedlex buf with
| newline -> swallow_atmosphere buf
| any -> swallow_comment buf
| _ -> assert false
(** Return the next token. *)
let rec token buf =
swallow_atmosphere buf;
match%sedlex buf with
| eof -> EOF
| delimited_identifier ->
Sedlexing.rollback buf;
identifier buf
| '(' -> LPAR
| ')' -> RPAR
| _ -> illegal buf (Char.chr (next buf))
and identifier buf =
match%sedlex buf with
| _ -> IDENTIFIER (Sedlexing.Utf8.lexeme buf)
(Yes, it's basically a no-op / the simplest thing possible rn. I'm trying to learn! :x)
Unfortunately, this combination results in an infinite loop in the parsing automaton:
State 0:
Lookahead token is now IDENTIFIER (1-1)
Shifting (IDENTIFIER) to state 1
State 1:
Lookahead token is now IDENTIFIER (1-1)
Reducing production expression -> IDENTIFIER
State 5:
Shifting (IDENTIFIER) to state 1
State 1:
Lookahead token is now IDENTIFIER (1-1)
Reducing production expression -> IDENTIFIER
State 5:
Shifting (IDENTIFIER) to state 1
State 1:
...
I'm new to parsing and lexing and all this; any advice would be welcome. I'm sure it's just a newbie mistake, but โ€ฆ
Thanks!
As said before, implementing too much logic inside the lexer is a bad idea.
However, the infinite loop does not come from the rollback but from your definition of identifier:
identifier buf =
match%sedlex buf with
| _ -> IDENTIFIER (Sedlexing.Utf8.lexeme buf)
within this definition _ matches the shortest possible words in the language consisting of all possible characters. In other words, _ always matches the empty word ฮผ without consuming any part of its input, sending the parser into an infinite loop.

Attempting to resolve shift-reduce parsing issue

I'm attempting to write a grammar for C and am having an issue that I don't quite understand. Relevant portions of the grammar:
stmt :
types decl SEMI { marks (A.Declare ($1, $2)) (1, 2) }
| simp SEMI { marks $1 (1, 1) }
| RETURN exp SEMI { marks (A.Return $2) (1, 2) }
| control { $1 }
| block { marks $1 (1, 1) }
;
control :
if { $1 }
| WHILE RPAREN exp LPAREN stmt { marks (A.While ($3, $5)) (1, 5) }
| FOR LPAREN simpopt SEMI exp SEMI simpopt RPAREN stmt { marks (A.For ($3, $5, $7, $9)) (1, 9) }
;
if :
IF RPAREN exp LPAREN stmt { marks (A.If ($3, $5, None)) (1, 5) }
| IF RPAREN exp LPAREN stmt ELSE stmt { marks (A.If ($3, $5, $7)) (1, 7) }
;
This doesn't work. I ran ocamlyacc -v and got the following report:
83: shift/reduce conflict (shift 86, reduce 14) on ELSE
state 83
if : IF RPAREN exp LPAREN stmt . (14)
if : IF RPAREN exp LPAREN stmt . ELSE stmt (15)
ELSE shift 86
IF reduce 14
WHILE reduce 14
FOR reduce 14
BOOL reduce 14
IDENT reduce 14
RETURN reduce 14
INT reduce 14
MAIN reduce 14
LBRACE reduce 14
RBRACE reduce 14
LPAREN reduce 14
I've read that shift/reduce conflicts are due to ambiguity in the specification of the grammar, but I don't see how I can specify this in a way that isn't ambiguous?
The grammar is certainly ambiguous, although you know what every string means, and furthermore despite the fact that ocamlyacc reports a shift/reduce conflict, its generated grammar will also produce the correct parse for every valid input.
The ambiguity comes from
if ( exp1 ) if ( exp2) stmt1 else stmt2;
Clearly stmt1 only executes if both exp1 and exp2 are true. But does stmt1 execute if exp1 is false, or if exp1 is true and exp2 is false? Those represent different parses; the first (invalid) parse attaches else stmt2 to if (exp1), while the parse that you, I and ocamlyacc know to be correct attaches else stmt2 to if (exp2).
The grammar can be rewritten, although it's a bit of a nuisance. The basic idea is to divide statements into two categories: "matched" (which means that every else in the statement is matched with some if) and "unmatched" (which means that a following else would match some if in the statement. A complete statement may be unmatched, because else clauses are optional, but you can never have an unmatched statement between an if and an else, because that else must match an if in the unmatched statement.
The following grammar is basically the one you provided, but rewritten to use bison-style single-quoted tokens, which I find more readable. I don't know if ocamlyacc handles those. (By the way, your grammar says IF RPAREN exp LPAREN... which, with the common definition of left and right parentheses, would mean if ) exp (. That's one reason I find single-quoted character terminals much more readable.)
Bison handles this grammar with no conflicts.
/* Fake non-terminals */
%token types decl simp exp
/* Keywords */
%token ELSE FOR IF RETURN WHILE
%%
stmt: matched_stmt | unmatched_stmt ;
stmt_list: stmt | stmt_list stmt ;
block: '{' stmt_list '}' ;
matched_stmt
: types decl ';'
| simp ';'
| RETURN exp ';'
| block
| matched_control
;
simpopt : simp | /* EMPTY */;
matched_control
: IF '(' exp ')' matched_stmt ELSE matched_stmt
| WHILE '(' exp ')' matched_stmt
| FOR '(' simpopt ';' exp ';' simpopt ')' matched_stmt
;
unmatched_stmt
: IF '(' exp ')' stmt
| IF '(' exp ')' matched_stmt ELSE unmatched_stmt
| WHILE '(' exp ')' unmatched_stmt
| FOR '(' simpopt ';' exp ';' simpopt ')' unmatched_stmt
;
Personally, I'd refactor a bit. Eg:
if_prefix : IF '(' exp ')'
;
loop_prefix: WHILE '(' exp ')'
| FOR '(' simpopt ';' exp ';' simpopt ')'
;
matched_control
: if_prefix matched_stmt ELSE matched_stmt
| loop_prefix matched_stmt
;
unmatched_stmt
: if_prefix stmt
| if_prefix ELSE unmatched_stmt
| loop_prefix unmatched_stmt
;
A common and simpler but less rigorous solution is to use precedence declarations as suggested in the bison manual.

Shift/Reduce conflicts in a propositional logic parser in Happy

I'm making a simple propositional logic parser on happy based on this BNF definition of the propositional logic grammar, this is my code
{
module FNC where
import Data.Char
import System.IO
}
-- Parser name, token types and error function name:
--
%name parse Prop
%tokentype { Token }
%error { parseError }
-- Token list:
%token
var { TokenVar $$ } -- alphabetic identifier
or { TokenOr }
and { TokenAnd }
'ยฌ' { TokenNot }
"=>" { TokenImp } -- Implication
"<=>" { TokenDImp } --double implication
'(' { TokenOB } --open bracket
')' { TokenCB } --closing bracket
'.' {TokenEnd}
%left "<=>"
%left "=>"
%left or
%left and
%left 'ยฌ'
%left '(' ')'
%%
--Grammar
Prop :: {Sentence}
Prop : Sentence '.' {$1}
Sentence :: {Sentence}
Sentence : AtomSent {Atom $1}
| CompSent {Comp $1}
AtomSent :: {AtomSent}
AtomSent : var { Variable $1 }
CompSent :: {CompSent}
CompSent : '(' Sentence ')' { Bracket $2 }
| Sentence Connective Sentence {Bin $2 $1 $3}
| 'ยฌ' Sentence {Not $2}
Connective :: {Connective}
Connective : and {And}
| or {Or}
| "=>" {Imp}
| "<=>" {DImp}
{
--Error function
parseError :: [Token] -> a
parseError _ = error ("parseError: Syntax analysis error.\n")
--Data types to represent the grammar
data Sentence
= Atom AtomSent
| Comp CompSent
deriving Show
data AtomSent = Variable String deriving Show
data CompSent
= Bin Connective Sentence Sentence
| Not Sentence
| Bracket Sentence
deriving Show
data Connective
= And
| Or
| Imp
| DImp
deriving Show
--Data types for the tokens
data Token
= TokenVar String
| TokenOr
| TokenAnd
| TokenNot
| TokenImp
| TokenDImp
| TokenOB
| TokenCB
| TokenEnd
deriving Show
--Lexer
lexer :: String -> [Token]
lexer [] = [] -- cadena vacia
lexer (c:cs) -- cadena es un caracter, c, seguido de caracteres, cs.
| isSpace c = lexer cs
| isAlpha c = lexVar (c:cs)
| isSymbol c = lexSym (c:cs)
| c== '(' = TokenOB : lexer cs
| c== ')' = TokenCB : lexer cs
| c== 'ยฌ' = TokenNot : lexer cs --solved
| c== '.' = [TokenEnd]
| otherwise = error "lexer: Token invalido"
lexVar cs =
case span isAlpha cs of
("or",rest) -> TokenOr : lexer rest
("and",rest) -> TokenAnd : lexer rest
(var,rest) -> TokenVar var : lexer rest
lexSym cs =
case span isSymbol cs of
("=>",rest) -> TokenImp : lexer rest
("<=>",rest) -> TokenDImp : lexer rest
}
Now, I have two problems here
For some reason I get 4 shift/reduce conflicts, I don't really know where they might be since I thought the precedence would solve them (and I think I followed the BNF grammar correctly)...
(this is rather a Haskell problem) On my lexer function, for some reason I get parsing errors on the line where I say what to do with 'ยฌ', if I remove that line it's works, why could that be? (this issue is solved)
Any help would be great.
If you use happy with -i it will generate an info file. The file lists all the states that your parser has. It will also list all the possible transitions for each state. You can use this information to determine if the shift/reduce conflict is one you care about.
Information about invoking happy and conflicts:
http://www.haskell.org/happy/doc/html/sec-invoking.html
http://www.haskell.org/happy/doc/html/sec-conflict-tips.html
Below is some of the output of -i. I've removed all but State 17. You'll want to get a copy of this file so that you can properly debug the problem. What you see here is just to help talk about it:
-----------------------------------------------------------------------------
Info file generated by Happy Version 1.18.10 from FNC.y
-----------------------------------------------------------------------------
state 17 contains 4 shift/reduce conflicts.
-----------------------------------------------------------------------------
Grammar
-----------------------------------------------------------------------------
%start_parse -> Prop (0)
Prop -> Sentence '.' (1)
Sentence -> AtomSent (2)
Sentence -> CompSent (3)
AtomSent -> var (4)
CompSent -> '(' Sentence ')' (5)
CompSent -> Sentence Connective Sentence (6)
CompSent -> 'ยฌ' Sentence (7)
Connective -> and (8)
Connective -> or (9)
Connective -> "=>" (10)
Connective -> "<=>" (11)
-----------------------------------------------------------------------------
Terminals
-----------------------------------------------------------------------------
var { TokenVar $$ }
or { TokenOr }
and { TokenAnd }
'ยฌ' { TokenNot }
"=>" { TokenImp }
"<=>" { TokenDImp }
'(' { TokenOB }
')' { TokenCB }
'.' { TokenEnd }
-----------------------------------------------------------------------------
Non-terminals
-----------------------------------------------------------------------------
%start_parse rule 0
Prop rule 1
Sentence rules 2, 3
AtomSent rule 4
CompSent rules 5, 6, 7
Connective rules 8, 9, 10, 11
-----------------------------------------------------------------------------
States
-----------------------------------------------------------------------------
State 17
CompSent -> Sentence . Connective Sentence (rule 6)
CompSent -> Sentence Connective Sentence . (rule 6)
or shift, and enter state 12
(reduce using rule 6)
and shift, and enter state 13
(reduce using rule 6)
"=>" shift, and enter state 14
(reduce using rule 6)
"<=>" shift, and enter state 15
(reduce using rule 6)
')' reduce using rule 6
'.' reduce using rule 6
Connective goto state 11
-----------------------------------------------------------------------------
Grammar Totals
-----------------------------------------------------------------------------
Number of rules: 12
Number of terminals: 9
Number of non-terminals: 6
Number of states: 19
That output basically says that it runs into a bit of ambiguity when it's looking at connectives. It turns out, the slides you linked mention this (Slide 11), "ambiguities are resolved through precedence ยฌโˆงโˆจโ‡’โ‡” or parentheses".
At this point, I would recommend looking at the shift/reduce conflicts and your desired precedences to see if the parser you have will do the right thing. If so, then you can safely ignore the warnings. If not, you have more work for yourself.
I can answer No. 2:
| c== 'ยฌ' == TokenNot : lexer cs --problem here
-- ^^
You have a == there where you should have a =.

Creating Bison File for Simple Grammar

I have the following simple grammar:
E -> T | ^ v . E
T -> F T1
T1 -> F T1 | epsilon
F -> ( E ) | v
I'm pretty new to Bison, so I was hoping someone could help show me how to write it out in that format. All I have so far is the following, but I'm not sure if it's correct:
%left '.'
%left 'v'
%% /* The grammar follows. */
exp:
term {printf("1");}
| '^' 'v' '.' exp {printf("2");}
;
term:
factor term1 {printf("3");}
;
term1:
factor term1 {printf("4");}
| {printf("5");}
;
factor:
'(' exp ')' {printf("6");}
| 'v' {printf("7");}
;
%%
You are missing the closing semicolon from several of the productions. There's nothing in the source grammar to suggest you need the productions about lines.

Resources