I would like to write an LR(1) grammar in BNF form for the language described by these two rules from The Complete Syntax of Lua:
parlist ::= namelist [`,´ `...´] | `...´
namelist ::= Name {`,´ Name}
I have tried the following grammars, but according to the tool I am using, both are "not LR(1) due to shift-reduce conflict":
parlist ::= namelist
parlist ::= namelist , ...
parlist ::= ...
namelist ::= Name namelist1
namelist1 ::= , Name namelist1
namelist1 ::= <epsilon>
parlist ::= namelist
parlist ::= namelist , ...
parlist ::= ...
namelist ::= namelist1 Name
namelist1 ::= namelist1 Name ,
namelist1 ::= <epsilon>
Is there an LR(1) grammar, in BNF form, for this language?
Here's a simple one:
parlist ::= Name
parlist ::= ...
parlist ::= Name , parlist
Here's a slightly less simple one which has the advantage of being left-recursive:
parlist ::= namelist
parlist ::= namelist , ...
parlist ::= ...
namelist ::= Name
namelist ::= namelist , Name
The rather artificial distortion of the grammar using an artificial namelist1 non-terminal, which looks like it is taken out of an LL grammar, is just causing problems. (It's not making the grammar LL(1) either, although that could easily be done by left factoring the first alternative above.)
Here's a bison grammar with no conflicts. It's LALR(1), which makes it also LR(1):
%token ELIPSES COMMA NAME
%%
parlist : namelist parlistsuffix
| ELIPSES
;
parlistsuffix: COMMA ELIPSES
| /* epsilon */
;
namelist: namelist COMMA NAME
| NAME
;
Related
I'm trying to implement a context-free grammar for the language of logical operators with parentheses including operator precedence.
For example, the follows:
{1} or {2}
{1} and {2} or {3}
({1} or {2}) and {3}
not {1} and ({2} or {3})
...
I've started from the following grammar:
expr := control | expr and expr | expr or expr | not expr | (expr)
control := {d+}
To implement operator precedence and eliminate left recursion, I changed it in the following way:
S ::= expr
expr ::= control or expr1 | expr1
expr1 ::= control and expr2 | expr2
expr2 ::= not expr3 | expr3
expr3 ::= (expr) | expr | control
control := {d+}
But such grammar doesn't support examples like: ({1} or {2}) and {3} that contain 'and' / 'or' after parentheses.
For now, I have the following grammar:
S ::= expr
expr ::= control or expr1 | expr1
expr1 ::= control and expr2 | expr2
expr2 ::= not expr3 | expr3
expr3 ::= (expr) | (expr) expr4 | expr | control
expr4 :: = and expr | or expr
control := {d+}
Is this grammar correct?
Can it be simplified in some way?
Thanks!
To implement your operator precedence, you want just:
S ::= expr
expr ::= expr or expr1 | expr1
expr1 ::= expr1 and expr2 | expr2
expr2 ::= not expr2 | expr3
expr3 ::= (expr) | control
control := {d+}
This is left-recursive in order to be left-associative, as that's generally what you want (both for correctness and most parser-generators), but if you need to avoid left recursion for some reason, you can use a right-recursive, right associative grammar:
S ::= expr
expr ::= expr1 or expr | expr1
expr1 ::= expr2 and expr1 | expr2
expr2 ::= not expr2 | expr3
expr3 ::= (expr) | control
control := {d+}
as both and and or are associative operators, so left-vs-right doesn't matter.
In both cases, you can "simplify" it by folding expr3 and control into expr2:
expr2 ::= not expr2 | ( expr ) | {d+}
I used this tool to generate the SLR(1) parsing table for this LL(1)/LR(1) grammar (which generates a small subset of XML):
document ::= element EOF
element ::= < elementPrefix
elementPrefix ::= NAME attribute elementSuffix
attribute ::= NAME = STRING attribute
attribute ::= EPSILON
elementSuffix ::= > elementOrData endTag
elementSuffix ::= />
elementOrData ::= < elementPrefix elementOrData
elementOrData ::= DATA elementOrData
elementOrData ::= EPSILON
endTag ::= </ NAME >
The tool correctly generates the table and associated automaton, which suggests that the grammar is SLR(1). Is that really the case? I understand that every LR(0) grammar is also SLR(1), but I was not sure how that relates to LL(1)/LR(1) grammars.
LL(1) and SLR(1) are both subsets of LR(1). They don't have a simple relationship to each other.
Given following grammar:
comment "/*" "*/" ;
TInt. Type1 ::= "int" ;
TBool. Type1 ::= "bool" ;
coercions Type 1 ;
BTrue. BExp ::= "true" ;
BFalse. BExp ::= "false" ;
EOr. Exp ::= Exp "||" Exp1 ;
EAnd. Exp1 ::= Exp1 "&&" Exp2 ;
EEq. Exp2 ::= Exp2 "==" Exp3 ;
ENeq. Exp2 ::= Exp2 "!=" Exp3 ;
ELt. Exp3 ::= Exp3 "<" Exp4 ;
EGt. Exp3 ::= Exp3 ">" Exp4 ;
ELte. Exp3 ::= Exp3 "<=" Exp4 ;
EGte. Exp3 ::= Exp3 ">=" Exp4 ;
EAdd. Exp4 ::= Exp4 "+" Exp5 ;
ESub. Exp4 ::= Exp4 "-" Exp5 ;
EMul. Exp5 ::= Exp5 "*" Exp6 ;
EDiv. Exp5 ::= Exp5 "/" Exp6 ;
EMod. Exp5 ::= Exp5 "%" Exp6 ;
ENot. Exp6 ::= "!" Exp ;
EVar. Exp8 ::= Ident ;
EInt. Exp8 ::= Integer ;
EBool. Exp8 ::= BExp ;
EIver. Exp8 ::= "[" Exp "]" ;
coercions Exp 8 ;
Decl. Decl ::= Ident ":" Type ;
terminator Decl ";" ;
LIdent. Lvalue ::= Ident ;
SBlock. Stm ::= "{" [Decl] [Stm] "}" ;
SExp. Stm ::= Exp ";" ;
SWhile. Stm ::= "while" "(" Exp ")" Stm ;
SReturn. Stm ::= "return" Exp ";" ;
SAssign. Stm ::= Lvalue "=" Exp ";" ;
SPrint. Stm ::= "print" Exp ";" ;
SIf. Stm ::= "if" "(" Exp ")" "then" Stm "endif" ;
SIfElse. Stm ::= "if" "(" Exp ")" "then" Stm "else" Stm "endif" ;
terminator Stm "" ;
entrypoints Stm;
parser created with bnfc fails to parse
{ c = a; }
although it parses
c = a;
or
{ print a; c = a; }
I think it could be a problem that parser sees Ident and doesn't know whether it's declaration or statement, LR stuff etc (still one token of lookeahed should be enough??). However I couldn't find any note in BNFC documentation that would say that it doesn't work for all grammars.
Any ideas how to get this working?
I would think you would get a shift/reduce conflict report for that grammar, although where that error message shows up might well depend on which tool BNFC is using to generate the parser. As far as I know, all the backend tools have the same approach to dealing with shift/reduce conflicts, which is to (1) warn the user about the conflict, and then (2) resolve the conflict in favour of shifting.
The problematic production is this one: (I've left out type annotations to reduce clutter)
Stm ::= "{" [Decl] [Stm] "}" ;
Here, [Decl] and [Stm] are macros, which automatically produce definitions for the non-terminals with those names (or something equivalent which will be accepted by the backend tool). Specifically, the automatically-produced productions are:
[Decl] ::= /* empty */
| Decl ';' [Decl]
[Stm] ::= /* empty */
| Stm [Stm]
(The ; in the first rule is the result of a "terminator" declaration. I don't know why BNFC generates right-recursive rules, but that's how I interpret the reference manual -- after a very quick glance -- and I'm sure they have their reasons. For the purpose of this problem, it doesn't matter.
What's important is that both Decl and Stm can start with an Ident. So let's suppose we're parsing { id ..., which might be { id : ... or { id = ..., but we've only read the { and the lookahead token id. So there are two possibilities:
id is the start of a Decl. We should shift the Ident and go to the state which includes Decl → Ident • ':' Type
id is the start of a Stm. In this case, we need to reduce the production [Decl] → • before we shift Ident into a Stm production.
So we have a shift/reduce conflict, because we cannot see the second next token (either : or =). And, as mentioned above, shift usually wins in this case, so the LR(1) parser will commit itself to expect a Decl. Consequently, { a = b ; } will fail.
An LR(2) parser generator would do fine with this grammar, but those are much harder to find. (Modern bison can produce GLR parsers, which are even more powerful than LR(2) at the cost of a bit of extra compute time, but not the version required by the BNFC tool.)
Possible solutions
Allow declarations to be intermingled with statements. This one is my preference. It is simple, and many programmers expect to be able to declare a variable at first use rather than at the beginning of the enclosing block.
Make the declaration recognizable from the first token, either by putting the type first (as in C) or by adding a keyword such as var (as in Javascript):
Modify the grammar to defer the lookahead. It is always possible to find an LR(1) grammar for any LR(k) language (provided k is finite), but it can be tedious. An ugly but effective alternative is to continue the lexical scan until either a : or some other non-whitespace character is found, so that id : gets tokenized as IdentDefine or some such. (This is the solution used by bison, as it happens. It means that you can't put comments between an identifier and the following :, but there are few, if any, good reasons to put a comment in that context.
Hi i am writing a Parser for a Programming language my university uses, with jflex and Cup
I started with just the first basic structures such as Processes an Variable Declarations.
I get the following Errors
Warning : *** Shift/Reduce conflict found in state #4
between vardecls ::= (*)
and vardecl ::= (*) IDENT COLON vartyp SEMI
and vardecl ::= (*) IDENT COLON vartyp EQEQ INT SEMI
under symbol IDENT
Resolved in favor of shifting.
Warning : *** Shift/Reduce conflict found in state #2
between vardecls ::= (*)
and vardecl ::= (*) IDENT COLON vartyp SEMI
and vardecl ::= (*) IDENT COLON vartyp EQEQ INT SEMI
under symbol IDENT
Resolved in favor of shifting.
My Code in Cup looks like this :
non terminal programm;
non terminal programmtype;
non terminal vardecl;
non terminal vardecls;
non terminal processdecl;
non terminal processdecls;
non terminal vartyp;
programm ::= programmtype:pt vardecls:vd processdecls:pd
{: RESULT = new SolutionNode(pt, vd, pd); :} ;
programmtype ::= IDENT:v
{: RESULT = ProblemType.KA; :} ;
vardecls ::= vardecl:v1 vardecls:v2
{: v2.add(v1);
RESULT = v2; :}
|
{: ArrayList<VarDecl> list = new ArrayList<VarDecl>() ;
RESULT = list; :}
;
vardecl ::= IDENT:id COLON vartyp:vt SEMI
{: RESULT = new VarDecl(id, vt); :}
| IDENT:id COLON vartyp:vt EQEQ INT:i1 SEMI
{: RESULT = new VarDecl(id, vt, i1); :}
;
vartyp ::= INTEGER
{: RESULT = VarType.Integer ; :}
;
processdecls ::= processdecl:v1 processdecls:v2
{: v2.add(v1);
RESULT = v2; :}
| {: ArrayList<ProcessDecl> list = new ArrayList<ProcessDecl>() ;
RESULT = list; :}
;
processdecl ::= IDENT:id COLON PROCESS vardecls:vd BEGIN END SEMI
{: RESULT = new ProcessDecl(id, vd); :}
;
I Guess i get the Errors because the Process Declaration and the VariableDeclaration both start with Identifiers then a ":" and then either the Terminal PROCESS or a Terminal like INTEGER. If so i'd like to know how i can tell my Parser to look ahead a bit more. Or whatever Solution is possible.
Thanks for your answers.
Your diagnosis is absolutely correct. Because the parser cannot know whether IDENT starts a processdecl or a vardecl without two more lookahead tokens, it cannot know when it has just reduced a vardecl and is looking at an IDENT whether it is about to see another vardecl or a processdecl.
In the first case, it must just shift the IDENT as part of the following vardecl. In the second case, it needs to first reduce an empty vardecls and then successively reduce vardecls until it has constructed the complete list.
To get rid of the shift reduce conflict, you need to simplify the parser's decision-making.
The simplest solution is to allow the parser to accept declarations in any order. Then you end up with something like this:
program ::= program_type declaration_list ;
declaration_list ::=
var_declaration declaration_list
| process_declaration declaration_list
|
;
var_declaration_list ::=
var_declaration var_declaration_list
|
;
process_declaration ::=
IDENT:id COLON PROCESS var_declaration_list BEGIN END SEMI ;
(Personally, I'd make the declaration lists left-recursive rather than right-recursive, but it depends whether you prefer to append or prepend in the list's action. Left-recursion uses less parser stack.)
If you really want to insist that all variable declarations come before any process declaration, you can check for that in the action for declaration_list.
Alternatively, you can start by making both types of declaration list left-recursive instead of right recursive. That will almost work, but it will still generate a shift-reduce conflict in the same state as the original grammar, this time because it needs to reduce an empty process declaration list before the first process declaration can be reduced.
Fortunately, that's easier to work around. If the process declaration list cannot be empty, there is no problem, so it's just a question of rearranging the productions:
program ::= program_type var_declaration_list process_declaration_list
| program_type var_declaration_list
;
var_declaration_list ::=
var_declaration var_declaration_list
|
;
process_declaration_list ::=
process_declaration_list process_declaration
| process_declaration
;
Finally, an ugly but possible alternative is to make the variable declaration list left-recursive and the process declaration list right-recursive. In that case, there is no empty production between the last variable declaration and the first process declaration.
I need help to solve this one and explanation how to deal with this SHIFT/REDUCE CONFLICTS in future.
I have some conflicts between few states in my cup file.
Grammer look like this:
I have conflicts between "(" [ActPars] ")" states.
1. Statement = Designator ("=" Expr | "++" | "‐‐" | "(" [ActPars] ")" ) ";"
2. Factor = number | charConst | Designator [ "(" [ActPars] ")" ].
I don't want to paste whole 700 lines of cup file.
I will give you the relevant states and error output.
This is code for the line 1.)
Matched ::= Designator LPAREN ActParamsList RPAREN SEMI_COMMA
ActParamsList ::= ActPars
|
/* EPS */
;
ActPars ::= Expr
|
Expr ActPComma
;
ActPComma ::= COMMA ActPars;
This is for the line 2.)
Factor ::= Designator ActParamsOptional ;
ActParamsOptional ::= LPAREN ActParamsList2 RPAREN
|
/* EPS */
;
ActParamsList2 ::= ActPars
|
/* EPS */
;
Expr ::= SUBSTRACT Term RepOptionalExpression
|
Term RepOptionalExpression
;
The ERROR output looks like this:
Warning : *** Shift/Reduce conflict found in state #182
between ActParamsOptional ::= LPAREN ActParamsList RPAREN (*)
and Matched ::= Designator LPAREN ActParamsList RPAREN (*) SEMI_COMMA
under symbol SEMI_COMMA
Resolved in favor of shifting.
Error : * More conflicts encountered than expected -- parser generation aborted
I believe the problem is that your parser won't know if it should shift to the token:
SEMI_COMMA
or reduce to the token
ActParamsOptional
since the tokens defined in both ActParamsOptional and Matched are
LPAREN ActPars RPAREN