Exploring parsing libraries in Haskell I came across this project: haskell-parser-examples. Running some examples I found a problem with the operator precedence. It works fine when using Parsec:
$ echo "3*2+1" | dist/build/lambda-parsec/lambda-parsec
Op Add (Op Mul (Num 3) (Num 2)) (Num 1)
Num 7
But not with Happy/Alex:
$ echo "3*2+1" | dist/build/lambda-happy-alex/lambda-happy-alex
Op Mul (Num 3) (Op Add (Num 2) (Num 1))
Num 9
Even though the operator precedence seems well-defined. Excerpt from the parser:
%left '+' '-'
%left '*' '/'
%%
Exprs : Expr { $1 }
| Exprs Expr { App $1 $2 }
Expr : Exprs { $1 }
| let var '=' Expr in Expr end { App (Abs $2 $6) $4 }
| '\\' var '->' Expr { Abs $2 $4 }
| Expr op Expr { Op (opEnc $2) $1 $3 }
| '(' Expr ')' { $2 }
| int { Num $1 }
Any hint? (I opened a bug report some time ago, but no response).
[Using gch 7.6.3, alex 3.1.3, happy 1.19.4]
This appears to be a bug in haskell-parser-examples' usage of token precedence. Happy's operator precedence only affects the rules that use the tokens directly. In the parser we want to apply precedence to the Expr rule, but the only applicable rule,
| Expr op Expr { Op (opEnc $2) $1 $3 }
doesn't use tokens itself, instead relying on opEnc to expand them. If opEnc is inlined into Expr,
| Expr '*' Expr { Op Mul $1 $3 }
| Expr '+' Expr { Op Add $1 $3 }
| Expr '-' Expr { Op Sub $1 $3 }
it should work properly.
Related
Let's assume I have grammar like this:
expr : expr '-' expr { $$ = $1 - $3; }
| "Function" '(' expr ',' expr ')' { $$ = ($3 - $5) * 2; }
| NUMBER { $$ = $1; };
How can use rule
expr : expr '-' expr { $$ = $1 - $3; }
inside
expr : "Function" '(' expr ',' expr ')' { $$ = ($3 - $5) * 2; }
Because implementation of $1 - $3 is repeated? It would be much better if I can use already implemented subtraction from rule one and only add multiplication with 2. This is just the basic example, but I have very big grammar with lot of repeating calculations.
How would I parse something like
f x y
Into
APPLY (APPLY f x) y
using Happy? Right now I have a rule that says
%left APP
Expr : Expr Expr %prec APP { APPLY $1 $2 }
But that parses the above as
APPLY f (APPLY x y)
The accepted answer is not satisfactory.
The correct way of solving this problem is:
%nonassoc VAR LPAREN -- etc...
%nonassoc APP
Expr : Expr Expr %prec APP { APPLY $1 $2 }
That is:
Adding a ghost precedence token called APP, and no need to make it left or right since it won't be relevant, so you can keep it nonassoc to not get the wrong intuition that it matters
Marking your Expr rule with %prec APP like you did
and most importantly and often forgotten, you need to give all tokens that may appear as the first token of an Expr production a precedence lower than that of APP, usually achieved by listing them somewhere above, either with left, right, or nonassoc for the ones that don't associate
The reason why your trial failed is probably that you missed the last step.
The reason why the last step is needed is that the algorithm, in deciding whether to shift the next token or to reduce the APP rule, will compare the precedence of the APP rule with the precedence of the incoming token. And by default, tokens that you don't mention have high precedence. So when faced with:
Expr Expr . LPAREN VAR RPAREN
for instance, it would compare the precedence of the APP rule (to reduce), with the precedence of LPAREN (to shift), and unless you set it up correctly, it will shift, and do the wrong thing.
Staging your grammar is just ugly and unpleasant.
You can encode left/right associativity using grammar rules.
For example, have a look at this basic lambda calculus parser:
https://github.com/ghulette/haskell-parser-examples/blob/master/src/HappyParser.y
The operative productions are:
Expr : let VAR '=' Expr in Expr { App (Abs $2 $6) $4 }
| '\\' VAR '->' Expr { Abs $2 $4 }
| Form { $1 }
Form : Form '+' Form { Binop Add $1 $3 }
| Juxt { $1 }
Juxt : Juxt Atom { App $1 $2 }
| Atom { $1 }
Atom : '(' Expr ')' { $2 }
| NUM { Num $1 }
| VAR { Var $1 }
My objective is to create a parser for a small language. It is currently giving me one shift/reduce error.
My CFG is ambiguous somewhere, but I can't figure out where
prog: PROGRAM beg {$$ = "program" $2;}
| PROGRAM stmt beg {$$ = "program" $2 $3;}
beg: BEG stmt END {$$ = "begin" $2 "end";}
| BEG END {$$ = "begin" "end";}
stmt: beg {$$ = $1;}
| if_stmt {$$ = $1;}/*
| IF expr THEN stmt {$$ = $1 $2 $3 $4;}*/
| WHILE expr beg {$$ = "while" $2 $3;}
| VAR COLEQUALS arithexpr SEMI {$$ = $1 ":=" $3 ";";}
| VAR COLON INTEGER SEMI {$$ = $1 ":" "integer" ";";} /*Declaring an integer */
| VAR COLON REAL SEMI {$$ $1 ":" "real" ";";} /*declaring a real */
if_stmt: IF expr THEN stmt {$$ = "if" $2 "then" $4;}
| IF expr THEN stmt ELSE stmt {$$ = "if" $2 "then" $4 "else" $6;}
expr: NOT VAR {$$ = "!" $2;}
| VAR GREATERTHAN arithexpr {$$ = $1 ">" $3;}
| VAR LESSTHAN arithexpr {$$ = $1 "<" $3;}
| VAR GREATERTHANEQUALTO arithexpr {$$ = $1 ">=" $3;}
| VAR LESSTHANEQUALTO arithexpr {$$ = $1 "<=" $3;}
| VAR EQUALS arithexpr {$$ = $1 "==" $3;}
| VAR NOTEQUALS arithexpr {$$ = $1 "!=" $3;}
| arithexpr AND arithexpr {$$ = $1 "&&" $3;}
| arithexpr OR arithexpr {$$ = $1 "||" $3;}
arithexpr: arithexpr PLUS term {$$ = $1 + $3;}
| arithexpr MINUS term {$$ = $1 - $3;}
| term {$$ = $1;}
term: term TIMES factor {$$ = $1 * $3;}
| term DIVIDE factor {$$ = $1 / $3;}
| factor {$$ = $1;}
factor: VAL {$$ = $1;}
The "error" comes from the ambiguity in the if_stmt's else part: stmt can be an if_stmt, and it's not clear to which if a else-part belongs, e.g. if you write:
if y1 then if y2 then x=1 else x=2
Then the else-part could either belong to the first if or the second one.
This question has been asked in variations many times, just search for if then else shift reduce
For diagnosis (to find out that you are also a victim of that if then else shift reduce problem) you can tell bison to produce an output-file with
bison -r all myparser.y
which will produce a file myparser.output, in which you can find for your case:
State 50 conflicts: 1 shift/reduce
....
state 50
11 if_stmt: IF expr THEN stmt . [ELSE, BEG, END]
12 | IF expr THEN stmt . ELSE stmt
ELSE shift, and go to state 60
ELSE [reduce using rule 11 (if_stmt)]
$default reduce using rule 11 (if_stmt)
state 51
...
One solution for this would be to introduce a block-statement and only alow these as statements in the if and else part:
stmt: ...
| blk_stmt
blk_stmt: BEGIN stmt END
if_stmt: IF expr THEN blk_stmt
| IF expr THEN blk_stmt ELSE blk_stmt
Which would for a modified c-language mean that only
if x1 then {if x2 then {y=1}} else {y=2}
be possible (with { representing the BEGIN-token and }representing the END-token) thus resolving the ambiguity.
to calculate the value of the expression on the fly at the production rules in happy doesn't work if I'm using the lambda expressions.
For example this code
Exp : let var '=' Exp in Exp { \p -> $6 (($2,$4 p):p) }
| Exp1 { $1 }
Exp1 : Exp1 '+' Term { \p -> $1 p + $3 p }
| Exp1 '-' Term { \p -> $1 p - $3 p }
| Term { $1 }
Term : Term '*' Factor { \p -> $1 p * $3 p }
| Term '/' Factor { \p -> $1 p `div` $3 p }
| Factor { $1 }
Factor
: int { \p -> $1 }
| var { \p -> case lookup $1 p of
Nothing -> error "no var"
Just i -> i }
| '(' Exp ')' { $2 }
from http://www.haskell.org/happy/doc/html/sec-using.html doesn't work.
Or more precisly I 've got an error message
No instance for (Show ([(String, Int)] -> Int))
arising from a use of `print'
Possible fix:
add an instance declaration for (Show ([(String, Int)] -> Int))
In a stmt of an interactive GHCi command: print it
It would be nice if you could explain me what I have to change.
It must have something to do with the lambda expression and the environment variable p.
When I'm using data types everything is fine.
The thing to note here is that the result of this parser is a function which takes an environment of variable bindings. The error message is basically GHCi telling you that it can't print functions, presumably because you forgot to pass an environment
> eval "1 + 1"
when you should have either passed an empty environment
> eval "1 + 1" []
or one with some pre-defined variables
> eval "x + x" [("x", 1)]
I tried to extend the example grammar that comes as part of the "F# Parsed Language Starter" to support unary minus (for expressions like 2 * -5).
I hit a block like Samsdram here
Basically, I extended the header of the .fsy file to include precedence like so:
......
%nonassoc UMINUS
....
and then the rules of the grammar like so:
...
Expr:
| MINUS Expr %prec UMINUS { Negative ($2) }
...
also, the definition of the AST:
...
and Expr =
| Negative of Expr
.....
but still get a parser error when trying to parse the expression mentioned above.
Any ideas what's missing? I read the source code of the F# compiler and it is not clear how they solve this, seems quite similar
EDIT
The precedences are ordered this way:
%left ASSIGN
%left AND OR
%left EQ NOTEQ LT LTE GTE GT
%left PLUS MINUS
%left ASTER SLASH
%nonassoc UMINUS
Had a play around and managed to get the precedence working without the need for %prec. Modified the starter a little though (more meaningful names)
Prog:
| Expression EOF { $1 }
Expression:
| Additive { $1 }
Additive:
| Multiplicative { $1 }
| Additive PLUS Multiplicative { Plus($1, $3) }
| Additive MINUS Multiplicative { Minus($1, $3) }
Multiplicative:
| Unary { $1 }
| Multiplicative ASTER Unary { Times($1, $3) }
| Multiplicative SLASH Unary { Divide($1, $3) }
Unary:
| Value { $1 }
| MINUS Value { Negative($2) }
Value:
| FLOAT { Value(Float($1)) }
| INT32 { Value(Integer($1)) }
| LPAREN Expression RPAREN { $2 }
I also grouped the expressions into a single variant, as I didn't like the way the starter done it. (was awkward to walk through it).
type Value =
| Float of Double
| Integer of Int32
| Expression of Expression
and Expression =
| Value of Value
| Negative of Expression
| Times of Expression * Expression
| Divide of Expression * Expression
| Plus of Expression * Expression
| Minus of Expression * Expression
and Equation =
| Equation of Expression
Taking code from my article Parsing text with Lex and Yacc (October 2007).
My precedences look like:
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc prec_uminus
%right POWER
%nonassoc FACTORIAL
and the yacc parsing code is:
expr:
| NUM { Num(float_of_string $1) }
| MINUS expr %prec prec_uminus { Neg $2 }
| expr FACTORIAL { Factorial $1 }
| expr PLUS expr { Add($1, $3) }
| expr MINUS expr { Sub($1, $3) }
| expr TIMES expr { Mul($1, $3) }
| expr DIVIDE expr { Div($1, $3) }
| expr POWER expr { Pow($1, $3) }
| OPEN expr CLOSE { $2 }
;
Looks equivalent. I don't suppose the problem is your use of UMINUS in capitals instead of prec_uminus in my case?
Another option is to split expr into several mutually-recursive parts, one for each precedence level.