I'm trying to generate the parser table using the lemon parser generator, but the .out file generated when I run lemon grammar.y only contains the states of the automaton.
Is there a way to also get the goto table for non-terminals, not only the states of the automaton?
Or this can only be done by reading the generated code?
Are there any other tools that can generate both the action and the goto tables?
PS:
The .out file (generated by lemon) for a simple grammar looks like this:
State 0:
start ::= * e
e ::= * e PLUS t
e ::= * t
t ::= * t MUL f
t ::= * f
f ::= * LPAR e RPAR
f ::= * ID
LPAR shift 1
ID shift 4
start accept
e shift 11
t shift 6
f shift 5
State 1:
e ::= * e PLUS t
e ::= * t
t ::= * t MUL f
t ::= * f
f ::= * LPAR e RPAR
f ::= LPAR * e RPAR
f ::= * ID
LPAR shift 1
ID shift 4
e shift 10
t shift 6
f shift 5
State 2:
e ::= e PLUS * t
t ::= * t MUL f
t ::= * f
f ::= * LPAR e RPAR
f ::= * ID
LPAR shift 1
ID shift 4
t shift 9
f shift 5
State 3:
t ::= t MUL * f
f ::= * LPAR e RPAR
f ::= * ID
LPAR shift 1
ID shift 4
f shift 8
State 4:
(6) f ::= ID *
$ reduce 6 f ::= ID
PLUS reduce 6 f ::= ID
MUL reduce 6 f ::= ID
RPAR reduce 6 f ::= ID
State 5:
(4) t ::= f *
$ reduce 4 t ::= f
PLUS reduce 4 t ::= f
MUL reduce 4 t ::= f
RPAR reduce 4 t ::= f
State 6:
(2) e ::= t *
t ::= t * MUL f
$ reduce 2 e ::= t
PLUS reduce 2 e ::= t
MUL shift 3
RPAR reduce 2 e ::= t
State 7:
(5) f ::= LPAR e RPAR *
$ reduce 5 f ::= LPAR e RPAR
PLUS reduce 5 f ::= LPAR e RPAR
MUL reduce 5 f ::= LPAR e RPAR
RPAR reduce 5 f ::= LPAR e RPAR
State 8:
(3) t ::= t MUL f *
$ reduce 3 t ::= t MUL f
PLUS reduce 3 t ::= t MUL f
MUL reduce 3 t ::= t MUL f
RPAR reduce 3 t ::= t MUL f
State 9:
(1) e ::= e PLUS t *
t ::= t * MUL f
$ reduce 1 e ::= e PLUS t
PLUS reduce 1 e ::= e PLUS t
MUL shift 3
RPAR reduce 1 e ::= e PLUS t
State 10:
e ::= e * PLUS t
f ::= LPAR e * RPAR
PLUS shift 2
RPAR shift 7
State 11:
(0) start ::= e *
e ::= e * PLUS t
$ reduce 0 start ::= e
PLUS shift 2
----------------------------------------------------
Symbols:
0: $:
1: PLUS
2: MUL
3: LPAR
4: RPAR
5: ID
6: error:
7: start: LPAR ID
8: e: LPAR ID
9: t: LPAR ID
10: f: LPAR ID
Lemon outputs the action table and the goto table in a single block. The goto function looks like shift actions, except that the lookahead is a non-terminal rather than a terminal.
So if we take State 0:
LPAR shift 1
ID shift 4
start accept
e shift 11
t shift 6
f shift 5
The first two lines are the actions on reading LPAR and ID, respectively. The remaining lines are the goto function, which is used when a reduce action reveals this state by popping the stack. (Unlike a traditional LR machine, in Lemon the accept action is in the goto table rather than in the action table for the end-of-input pseudo-terminal.)
Although most descriptions of the LR parser distinguish between the action table and the goto table, there is very little difference between a "shift" action and the "goto" part of a reduce action. Both of these push the current state number and a symbol onto the parser stack. The difference is that a reduce action (such as reduce 6, which means "reduce using production 6" -- it has nothing to do with state 6) first pops the right-hand side of the indicated production off of the stack and sets the current state to the newly-revealed state on the top of the stack before executing the shift/goto. (Another difference is that after a shift action, it is necessary to read a new lookahead token, whereas the reduce action does not consume the input.)
Related
I'm trying to make an expression evaluator in Hakell:
data Parser i o
= Success o [i]
| Failure String [i]
| Parser
{parse :: [i] -> Parser i o}
data Operator = Add | Sub | Mul | Div | Pow
data Expr
= Op Operator Expr Expr
| Val Double
expr :: Parser Char Expr
expr = add_sub
where
add_sub = calc Add '+' mul_div <|> calc Sub '-' mul_div <|> mul_div
mul_div = calc Mul '*' pow <|> calc Div '/' pow <|> pow
pow = calc Pow '^' factor <|> factor
factor = parens <|> val
val = Val <$> parseDouble
parens = parseChar '(' *> expr <* parseChar ')'
calc c o p = Op c <$> (p <* parseChar o) <*> p
My problem is that when I try to evaluate an expression with two operators with same priority (e.g. 1+1-1) the parser will fail.
How can I say that an add_sub can be an operation between two other add_subs without creating an infinite loop?
As explained by #chi the problem is that calc was using p twice which doesn't allow for patterns like muldiv + .... | muldiv - ... | ...
I just changed the definition of calc to :
calc c o p p2 = Op c <$> (p <* parseChar o) <*> p2
where p2 is the current priority (mul_div in the definition of mul_div)
it works much better but the order of calulations is backwards:
2/3/4 is parsed as 2/(3/4) instead of (2/3)/4
What is an unambiguous grammar equivalent to to the following ambiguous grammar for a language of expressions with let and addition?
E ⇒ let id = E in E
E ⇒ E + E
E ⇒ num
The ambiguity should be solved so that:
addition is left associative
addition has higher precedence than let expressions when it appears on the right
addition has lower precedence than let expressions when it appears on the left
Using braces to show the grouping of sub-expressions, the following illustrates how expressions should be interpreted:
num + num + num => { num + num } + num
let id = num in num + num => let id = num in { num + num }
num + let id = num in num => num + { let id = num in num }
Consider the expression
E1 + E2
E1 cannot have the form let ID = E3 because let ID = E3 + E2 must be parsed as let ID = (E3 + E2). This restriction is recursive: it also cannot have the form E4 + let ID = E3.
E2 can have the form let ID = E3 but it cannot have the form E3 + E4 (because E1 + E3 + E4 must be parsed as (E1 + E3) + E4). Only E1 can have the form E3 + E4.
It's straight-forward (but repetitive) to translate these restrictions to BNF:
Expr ⇒ Sum
Sum ⇒ SumNoLet '+' Atom
| Atom
SumNoLet ⇒ SumNoLet '+' AtomNoLet
| AtomNoLet
AtomNoLet ⇒ num
| id
| '(' Expr ')'
Atom ⇒ AtomNoLet
| 'let' id '=' Expr
To make the pattern clearer, we can add the * operator:
Expr ⇒ Sum
Sum ⇒ SumNoLet '+' Prod
| Prod
SumNoLet ⇒ SumNoLet '+' ProdNoLet
| ProdNoLet
Prod ⇒ ProdNoLet '*' Atom
| Atom
ProdNoLet ⇒ ProdNoLet '*' AtomNoLet
| AtomNoLet
AtomNoLet ⇒ num
| id
| '(' Expr ')'
Atom ⇒ AtomNoLet
| 'let' id '=' Expr
It is possible to implement this in bison (or other similar parser generators) using precedence declarations. But the precedence solution is harder to reason about, and can be confusing to incorporate into more complicated grammars.
I have the following simple grammar:
E -> T | ^ v . E
T -> F T1
T1 -> F T1 | epsilon
F -> ( E ) | v
I'm pretty new to Bison, so I was hoping someone could help show me how to write it out in that format. All I have so far is the following, but I'm not sure if it's correct:
%left '.'
%left 'v'
%% /* The grammar follows. */
exp:
term {printf("1");}
| '^' 'v' '.' exp {printf("2");}
;
term:
factor term1 {printf("3");}
;
term1:
factor term1 {printf("4");}
| {printf("5");}
;
factor:
'(' exp ')' {printf("6");}
| 'v' {printf("7");}
;
%%
You are missing the closing semicolon from several of the productions. There's nothing in the source grammar to suggest you need the productions about lines.
I'm trying to use SableCC to generate a Parser for models, which I call LAM. LAM in itself are simple, and a simple grammar (where I omit a lot of things) for these is:
L := 0 | (x,y) | F(x1,...,xn) | L || L | L ; L
I wrote this grammar:
Helpers
number = ['0' .. '9'] ;
letter = ['a' .. 'z'] ;
uletter = ['A' .. 'Z'] ;
Tokens
zero = '0' ;
comma = ',' ;
parallel = '||' ;
point = ';' ;
lpar = '(' ;
rpar = ')' ;
identifier = letter+ number* ;
uidentifier = uletter+ number* ;
Productions
expr = {term} term |
{parallel} expr parallel term |
{point} expr point term;
term = {parenthesis} lpar expr rpar |
{zero} zero |
{invk} uidentifier lpar paramlist rpar |
{pair} lpar [left]:identifier comma [right]:identifier rpar ;
paramlist = {list} list |
{empty} ;
list = {var} identifier |
{com} identifier comma list ;
This basically works, but there is a side effect: it is left associative. For example, if I have
L = L1 || L2 ; L3 || L4
Then it is parsed like:
L = ((L1 || L2) ; L3) || L4
I want to give all precedence to the ";" operator, and so have L parsed like
L = (L1 || L2) ; (L3 || L4)
(other things, like "||", could remains left-associative)
My questions are:
There are tips to do such conversions in a "automated" way?
How could be a grammar with all the precedence on the ";" ?
It is accepted also "RTFM link" :-D
Thank you all
You need to create a hierarchy of rules that matches the desired operator precedence.
expr = {subexp} subexp |
{parallel} subexp parallel expr ;
subexp = {term} term |
{point} term point subexp;
Note that I also changed the associativity.
I am supposed to make a parser for a language with the following grammar:
Program ::= Stmts "return" Expr ";"
Stmts ::= Stmt Stmts
| ε
Stmt ::= ident "=" Expr ";"
| "{" Stmts "}"
| "for" ident "=" Expr "to" Expr Stmt
| "choice" "{" Choices "}"
Choices ::= Choice Choices
| Choice
Choice ::= integer ":" Stmt
Expr ::= Shift
Shift ::= Shift "<<" integer
| Shift ">>" integer
| Term
Term ::= Term "+" Prod
| Term "-" Prod
| Prod
Prod ::= Prod "*" Prim
| Prim
Prim ::= ident
| integer
| "(" Expr ")"
With the following data type for Expr:
data Expr = Var Ident
| Val Int
| Lshift Expr Int
| Rshift Expr Int
| Plus Expr Expr
| Minus Expr Expr
| Mult Expr Expr
deriving (Eq, Show, Read)
My problem is implementing the Shift operator, because I get the following error when I encounter a left or right shift:
unexpected ">"
expecting operator or ";"
Here is the code I have for Expr:
expr = try (exprOp)
<|> exprShift
exprOp = buildExpressionParser arithmeticalOps prim <?> "arithmetical expression"
prim :: Parser Expr
prim = new_ident <|> new_integer <|> pE <?> "primitive expression"
where
new_ident = do {i <- ident; return $ Var i }
new_integer = do {i <- first_integer; return $ Val i }
pE = parens expr
arithmeticalOps = [ [binary "*" Mult AssocLeft],
[binary "+" Plus AssocLeft, binary "-" Minus AssocLeft]
]
binary name fun assoc = Infix (do{ reservedOp name; return fun }) assoc
exprShift =
do
e <- expr
a <- aShift
i <- first_integer
return $ a e i
aShift = (reservedOp "<<" >> return Lshift)
<|> (reservedOp ">>" >> return Rshift)
I suspect the problem is concerning lookahead, but I can't seem to figure it out.
Here's a grammar with left recursion eliminated (untested). Stmts and Choices can be simplified with Parsec's many and many1. The other recursive productions have to be expanded:
Program ::= Stmts "return" Expr ";"
Stmts ::= #many# Stmt
Stmt ::= ident "=" Expr ";"
| "{" Stmts "}"
| "for" ident "=" Expr "to" Expr Stmt
| "choice" "{" Choices "}"
Choices ::= #many1# Choice
Choice ::= integer ":" Stmt
Expr ::= Shift
Shift ::= Term ShiftRest
ShiftRest ::= <empty>
| "<<" integer
| ">>" integer
Term ::= Prod TermRest
TermRest ::= <empty>
| "+" Term
| "-" Term
Prod ::= Prim ProdRest
ProdRest ::= <empty>
| "*" Prod
Prim ::= ident
| integer
| "(" Expr ")"
Edit - "Part Two"
"empty" (in angles) is the empty production, you were using epsilon in the original post, but I don't know its Unicode code point and didn't think to copy-paste it.
Here's an example of how I would code the grammar. Note - unlike the grammar I posted empty versions must always be the last choice to give the other productions chance to match. Also your datatypes and constructors for the Abstract Syntax Tree probably differ to the the guesses I've made, but it should be fairly clear what's going on. The code is untested - hopefully any errors are obvious:
shift :: Parser Expr
shift = do
t <- term
leftShift t <|> rightShift <|> emptyShift t
-- Note - this gets an Expr passed in - it is the "prefix"
-- of the shift production.
--
leftShift :: Expr -> Parser Expr
leftShift t = do
reservedOp "<<"
i <- int
return (LShift t i)
-- Again this gets an Expr passed in.
--
rightShift :: Expr -> Parser Expr
rightShift t = do
reservedOp ">>"
i <- int
return (RShift t i)
-- The empty version does no parsing.
-- Usually I would change the definition of "shift"
-- and not bother defining "emptyShift", the last
-- line of "shift" would then be:
--
-- > leftShift t <|> rightShift t <|> return t
--
emptyShift :: Expr -> Parser Expr
emptyShift t = return t
Parsec is still Greek to me, but my vague guess is that aShift should use try.
The parsec docs on Hackage have an example explaining the use of try with <|> that might help you out.