Parsing function application with Happy - parsing

How would I parse something like
f x y
Into
APPLY (APPLY f x) y
using Happy? Right now I have a rule that says
%left APP
Expr : Expr Expr %prec APP { APPLY $1 $2 }
But that parses the above as
APPLY f (APPLY x y)

The accepted answer is not satisfactory.
The correct way of solving this problem is:
%nonassoc VAR LPAREN -- etc...
%nonassoc APP
Expr : Expr Expr %prec APP { APPLY $1 $2 }
That is:
Adding a ghost precedence token called APP, and no need to make it left or right since it won't be relevant, so you can keep it nonassoc to not get the wrong intuition that it matters
Marking your Expr rule with %prec APP like you did
and most importantly and often forgotten, you need to give all tokens that may appear as the first token of an Expr production a precedence lower than that of APP, usually achieved by listing them somewhere above, either with left, right, or nonassoc for the ones that don't associate
The reason why your trial failed is probably that you missed the last step.
The reason why the last step is needed is that the algorithm, in deciding whether to shift the next token or to reduce the APP rule, will compare the precedence of the APP rule with the precedence of the incoming token. And by default, tokens that you don't mention have high precedence. So when faced with:
Expr Expr . LPAREN VAR RPAREN
for instance, it would compare the precedence of the APP rule (to reduce), with the precedence of LPAREN (to shift), and unless you set it up correctly, it will shift, and do the wrong thing.
Staging your grammar is just ugly and unpleasant.

You can encode left/right associativity using grammar rules.
For example, have a look at this basic lambda calculus parser:
https://github.com/ghulette/haskell-parser-examples/blob/master/src/HappyParser.y
The operative productions are:
Expr : let VAR '=' Expr in Expr { App (Abs $2 $6) $4 }
| '\\' VAR '->' Expr { Abs $2 $4 }
| Form { $1 }
Form : Form '+' Form { Binop Add $1 $3 }
| Juxt { $1 }
Juxt : Juxt Atom { App $1 $2 }
| Atom { $1 }
Atom : '(' Expr ')' { $2 }
| NUM { Num $1 }
| VAR { Var $1 }

Related

Odd bison parsing

To pretext this, I understand that the formatting of the parse structure is weird, the teacher wanted it to be ~roughly~ in this format.
I'm making a simple "calculator" parser form an assignment using flex and bison, but I am getting odd or unusual output for the answer when using modulus. IT seems to work fine for all other operations
Input: "10 % 5"
Output: " % 10"
Input: "101 % 12"
Output: " % 101"
Input: "2^(-1 + 15/5) - 3*(4-1) + (-6)"
Output: "-11" //Correct
Relevant section of bison.y
command : pexpri {printf("%d\n", $1); return;}
;
pexpri : '-' expri '+' termi {$$ = -$2 + $4;} /* Super glued on unary, also reduce conflict, TODO: find bug */
| '-' expri '-' termi {$$ = -$2 - $4;}
| '-' expri {$$ = -$2;}
| expri {$$ = $1;}
;
expri : expri '+' termi {$$ = $1 + $3;} /* Addition subtraction level operations*/
| expri '-' termi {$$ = $1 - $3;}
| termi {$$ = $1;}
;
termi : termi '*' factori {$$ = $1 * $3;} /* Multiplication division level operations*/
| termi '/' factori {$$ = $1 / $3;}
| termi '%' factori {$$ = $1 % $3;}
| factori {$$ = $1;}
;
factori : factori '^' parti {$$ = pow($1, $3);} /* Exponentiation level operations */
| parti {$$ = $1;}
;
parti : '(' pexpri ')' {$$ = $2;} /* Parentheses handling or terminal, also adds even more reduction errors.... */
| INTEGER
;
Relevant section tokenizer.l
0 { /* To avoid useless trailing zeros. */
yylval.iVal = atoi(yytext);
return INTEGER;
}
[1-9][0-9]* {
yylval.iVal = atoi(yytext);
return INTEGER;
}
[-()^\+\*/] {return *yytext;}
The main function is essentially just a wrapper for yyparse.
I don't understand how or why it is printing the modulus symbol in the output because the ONLY print in the entire code is in the command section. I understand that the code isn't the best (in fact, it is awful), but any insight is much appreciated.
Also, if anybody can help me figure out how to manage unary negation in a more elegant way (Hopefully without spoiling much), that would also be super appreciated. (I cant just use %precidence or %left) The way I have it currently set up is ambiguous and is causing reduction errors.
If you look closely at
[-()^\+\*/] {return *yytext;}
you'll notice that it is not going to match %. The most likely consequence is that (f)lex's default fallback rule will apply. That rule matches any single character and uses ECHO to copy the matched token to the output stream.
It looks to me like whitespace characters might also be falling through to the default rule. They should be ignored explicitly.
By the way, it is not necessary to backslash-escape regular expression operators inside character classes, since they have no special meaning in that context. Hence a correct and easier to read rule would be
[-+*/%^()] {return *yytext;}
However, I strongly recommend using a fallback rule instead of listing all the possible single-character tokens. If an invalid single-character token is handled by a fallback rule, then the parser will respond by flagging an error.
[[:space:]]+ { /* Ignore whitespace*/ }
0|[1-9][0-9]* { yylval.iVal = atoi(yytext); return INTEGER; }
. { return *yytext; /* Fallback rule */ }
The default fallback rule is rarely useful in parsing, and I find it useful to add
%option nodefault
to my flex prolog, which will cause flex to produce an error message if a fallback rule is required.

How does a parser solves shift/reduce conflict?

I have a grammar for arithmetic expression which solves number of expression (one per line) in a text file. While compiling YACC I am getting message 2 shift reduce conflicts. But my calculations are proper. If parser is giving proper output how does it resolves the shift/reduce conflict. And In my case is there any way to solve it in YACC Grammar.
YACC GRAMMAR
Calc : Expr {printf(" = %d\n",$1);}
| Calc Expr {printf(" = %d\n",$2);}
| error {yyerror("\nBad Expression\n ");}
;
Expr : Term { $$ = $1; }
| Expr '+' Term { $$ = $1 + $3; }
| Expr '-' Term { $$ = $1 - $3; }
;
Term : Fact { $$ = $1; }
| Term '*' Fact { $$ = $1 * $3; }
| Term '/' Fact { if($3==0){
yyerror("Divide by Zero Encountered.");
break;}
else
$$ = $1 / $3;
}
;
Fact : Prim { $$ = $1; }
| '-' Prim { $$ = -$2; }
;
Prim : '(' Expr ')' { $$ = $2; }
| Id { $$ = $1; }
;
Id :NUM { $$ = yylval; }
;
What change should I do to remove such conflicts in my grammar ?
Bison/yacc resolves shift-reduce conflicts by choosing to shift. This is explained in the bison manual in the section on Shift-Reduce conflicts.
Your problem is that your input is just a series of Exprs, run together without any delimiter between them. That means that:
4 - 2
could be one expression (4-2) or it could be two expressions (4, -2). Since bison-generated parsers always prefer to shift, the parser will choose to parse it as one expression, even if it were typed on two lines:
4
-2
If you want to allow users to type their expressions like that, without any separator, then you could either live with the conflict (since it is relatively benign) or you could codify it into your grammar, but that's quite a bit more work. To put it into the grammar, you need to define two different types of Expr: one (which is the one you use at the top level) cannot start with an unary minus, and the other one (which you can use anywhere else) is allowed to start with a unary minus.
I suspect that what you really want to do is use newlines or some other kind of expression separator. That's as simple as passing the newline through to your parser and changing Calc to Calc: | Calc '\n' | Calc Expr '\n'.
I'm sure that this appears somewhere else on SO, but I can't find it. So here is how you disallow the use of unary minus at the beginning of an expression, so that you can run expressions together without delimiters. The non-terminals starting n_ cannot start with a unary minus:
input: %empty | input n_expr { /* print $2 */ }
expr: term | expr '+' term | expr '-' term
n_expr: n_term | n_expr '+' term | n_expr '-' term
term: factor | term '*' factor | term '/' factor
n_term: value | n_term '+' factor | n_term '/' factor
factor: value | '-' factor
value: NUM | '(' expr ')'
That parses the same language as your grammar, but without generating the shift-reduce conflict. Since it parses the same language, the input
4
-2
will still be parsed as a single expression; to get the expected result you would need to type
4
(-2)

Wrong operator precedence with Happy

Exploring parsing libraries in Haskell I came across this project: haskell-parser-examples. Running some examples I found a problem with the operator precedence. It works fine when using Parsec:
$ echo "3*2+1" | dist/build/lambda-parsec/lambda-parsec
Op Add (Op Mul (Num 3) (Num 2)) (Num 1)
Num 7
But not with Happy/Alex:
$ echo "3*2+1" | dist/build/lambda-happy-alex/lambda-happy-alex
Op Mul (Num 3) (Op Add (Num 2) (Num 1))
Num 9
Even though the operator precedence seems well-defined. Excerpt from the parser:
%left '+' '-'
%left '*' '/'
%%
Exprs : Expr { $1 }
| Exprs Expr { App $1 $2 }
Expr : Exprs { $1 }
| let var '=' Expr in Expr end { App (Abs $2 $6) $4 }
| '\\' var '->' Expr { Abs $2 $4 }
| Expr op Expr { Op (opEnc $2) $1 $3 }
| '(' Expr ')' { $2 }
| int { Num $1 }
Any hint? (I opened a bug report some time ago, but no response).
[Using gch 7.6.3, alex 3.1.3, happy 1.19.4]
This appears to be a bug in haskell-parser-examples' usage of token precedence. Happy's operator precedence only affects the rules that use the tokens directly. In the parser we want to apply precedence to the Expr rule, but the only applicable rule,
| Expr op Expr { Op (opEnc $2) $1 $3 }
doesn't use tokens itself, instead relying on opEnc to expand them. If opEnc is inlined into Expr,
| Expr '*' Expr { Op Mul $1 $3 }
| Expr '+' Expr { Op Add $1 $3 }
| Expr '-' Expr { Op Sub $1 $3 }
it should work properly.

Bison parser won't look-ahead for token

I have the following parser grammar (this is a small sample):
expr:
ident assignop expr
{
$$ = new NAssignment(new NAssignmentIdentifier(*$1), $2, *$3);
} |
STAR expr %prec IDEREF
{
$$ = new NDereferenceOperator(*$2);
} |
STAR expr assignop expr %prec IDEREF
{
$$ = new NAssignment(new NAssignmentDereference(*$2), $3, *$4);
} |
... ;
...
assignop:
ASSIGN_EQUAL |
ASSIGN_ADD |
ASSIGN_SUBTRACT |
ASSIGN_MULTIPLY |
ASSIGN_DIVIDE ;
Now I'm trying to parse any of the following lines:
*0x8000 = 0x7000;
*mem = 0x7000;
However, Bison keeps seeing "*mem" and reducing on the 'STAR expr' rule and not performing look-ahead to see whether 'STAR expr assignop...' matches. As far as I understand Bison, it should be doing this look-ahead. My closest guess is that %prec is turning off look-ahead or something strange like that, but I can't see why it would do so (since the prec values are equivalent).
How do I make it perform look-ahead in this case?
EDIT:
The state that it enters when encountering 'STAR expr' is:
state 45
28 expr: STAR expr .
29 | STAR expr . assignop expr
35 | expr . binaryop expr
$default reduce using rule 28 (expr)
assignop go to state 81
binaryop go to state 79
So I don't understand why it's picking $default when it could pick assignop (note that the order of the rules in the parser.y file don't affect which one it picks in this case; I've tried reordering the assignop one above the standard 'STAR expr').
This will happen if IDREF is higher precedence than ASSIGN_EQUAL, ASSIGN_ADD, etc. Specifically, in this case with the raw parser (before precedence is applies), you have shift/reduce conflicts between the expr: STAR expr rule and the various ASSIGN_XXX tokens. The precedence rules you have resolve all the conflicts in favor of the reduce.
The assignop in the state is a goto, not a shift or reduce, so doesn't enter into the lookahead or token handling at all -- gotos only occur after some token has been shifted and then later reduced to the non-terminal in question.
I ended up solving this problem by creating another rule 'deref' like so:
deref:
STAR ident
{
$$ = new NDereferenceOperator(*$<ident>2);
} |
STAR numeric
{
$$ = new NDereferenceOperator(*$2);
} |
STAR CURVED_OPEN expr CURVED_CLOSE
{
$$ = new NDereferenceOperator(*$3);
} |
deref assignop expr
{
if ($1->cType == "expression-dereference") // We can't accept NAssignments as the deref in this case.
$$ = new NAssignment(new NAssignmentDereference(((NDereferenceOperator*)$1)->expr), $2, *$3);
else
throw new CompilerException("Unable to apply dereferencing assignment operation to non-dereference operator based LHS.");
} ;
replacing both rules in 'expr' with a single 'deref'.

Extending example grammar for Fsyacc with unary minus

I tried to extend the example grammar that comes as part of the "F# Parsed Language Starter" to support unary minus (for expressions like 2 * -5).
I hit a block like Samsdram here
Basically, I extended the header of the .fsy file to include precedence like so:
......
%nonassoc UMINUS
....
and then the rules of the grammar like so:
...
Expr:
| MINUS Expr %prec UMINUS { Negative ($2) }
...
also, the definition of the AST:
...
and Expr =
| Negative of Expr
.....
but still get a parser error when trying to parse the expression mentioned above.
Any ideas what's missing? I read the source code of the F# compiler and it is not clear how they solve this, seems quite similar
EDIT
The precedences are ordered this way:
%left ASSIGN
%left AND OR
%left EQ NOTEQ LT LTE GTE GT
%left PLUS MINUS
%left ASTER SLASH
%nonassoc UMINUS
Had a play around and managed to get the precedence working without the need for %prec. Modified the starter a little though (more meaningful names)
Prog:
| Expression EOF { $1 }
Expression:
| Additive { $1 }
Additive:
| Multiplicative { $1 }
| Additive PLUS Multiplicative { Plus($1, $3) }
| Additive MINUS Multiplicative { Minus($1, $3) }
Multiplicative:
| Unary { $1 }
| Multiplicative ASTER Unary { Times($1, $3) }
| Multiplicative SLASH Unary { Divide($1, $3) }
Unary:
| Value { $1 }
| MINUS Value { Negative($2) }
Value:
| FLOAT { Value(Float($1)) }
| INT32 { Value(Integer($1)) }
| LPAREN Expression RPAREN { $2 }
I also grouped the expressions into a single variant, as I didn't like the way the starter done it. (was awkward to walk through it).
type Value =
| Float of Double
| Integer of Int32
| Expression of Expression
and Expression =
| Value of Value
| Negative of Expression
| Times of Expression * Expression
| Divide of Expression * Expression
| Plus of Expression * Expression
| Minus of Expression * Expression
and Equation =
| Equation of Expression
Taking code from my article Parsing text with Lex and Yacc (October 2007).
My precedences look like:
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc prec_uminus
%right POWER
%nonassoc FACTORIAL
and the yacc parsing code is:
expr:
| NUM { Num(float_of_string $1) }
| MINUS expr %prec prec_uminus { Neg $2 }
| expr FACTORIAL { Factorial $1 }
| expr PLUS expr { Add($1, $3) }
| expr MINUS expr { Sub($1, $3) }
| expr TIMES expr { Mul($1, $3) }
| expr DIVIDE expr { Div($1, $3) }
| expr POWER expr { Pow($1, $3) }
| OPEN expr CLOSE { $2 }
;
Looks equivalent. I don't suppose the problem is your use of UMINUS in capitals instead of prec_uminus in my case?
Another option is to split expr into several mutually-recursive parts, one for each precedence level.

Resources