Context-Free Grammar for Custom Programming Language - parsing

After having completed the Compiler Design course at my university I have been playing around with making a compiler for a simple programming language, but I'm having trouble with the parser. I'm making the compiler in mosml and using its builtin parser mosmlyac for constructing the parser. Here is an excerpt from my parser showing the grammar and associativity+precedence.
...
%right ASSIGN
%left OR
%left AND
%nonassoc NOT
%left EQUAL LESS
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc NEGATE
...
Prog : FunDecs EOF { $1 }
;
FunDecs : Fun FunDecs { $1 :: $2 }
| { [] }
;
Fun : Type ID LPAR TypeIds RPAR StmtBlock { FunDec (#1 $2, $1, $4, $6, #2 $2) }
| Type ID LPAR RPAR StmtBlock { FunDec (#1 $2, $1, [], $5, #2 $2) }
;
TypeIds : Type ID COMMA TypeIds { Param (#1 $2, $1) :: $4 }
| Type ID { [Param (#1 $2, $1)] }
;
Type : VOID { Void }
| INT { Int }
| BOOL { Bool }
| CHAR { Char }
| STRING { Array (Char) }
| Type LBRACKET RBRACKET { Array ($1) }
;
StmtBlock : LCURLY StmtList RCURLY { $2 }
;
StmtList : Stmt StmtList { $1 :: $2 }
| { [] }
;
Stmt : Exp SEMICOLON { $1 }
| IF Exp StmtBlock { IfElse ($2, $3, [], $1) }
| IF Exp StmtBlock ELSE StmtBlock { IfElse ($2, $3, $5, $1) }
| WHILE Exp StmtBlock { While ($2, $3, $1) }
| RETURN Exp SEMICOLON { Return ($2, (), $1) }
;
Exps : Exp COMMA Exps { $1 :: $3 }
| Exp { [$1] }
;
Index : LBRACKET Exp RBRACKET Index { $2 :: $4 }
| { [] }
;
Exp : INTLIT { Constant (IntVal (#1 $1), #2 $1) }
| TRUE { Constant (BoolVal (true), $1) }
| FALSE { Constant (BoolVal (false), $1) }
| CHRLIT { Constant (CharVal (#1 $1), #2 $1) }
| STRLIT { StringLit (#1 $1, #2 $1) }
| LCURLY Exps RCURLY { ArrayLit ($2, (), $1) }
| ARRAY LPAR Exp RPAR { ArrayConst ($3, (), $1) }
| Exp PLUS Exp { Plus ($1, $3, $2) }
| Exp MINUS Exp { Minus ($1, $3, $2) }
| Exp TIMES Exp { Times ($1, $3, $2) }
| Exp DIVIDE Exp { Divide ($1, $3, $2) }
| NEGATE Exp { Negate ($2, $1) }
| Exp AND Exp { And ($1, $3, $2) }
| Exp OR Exp { Or ($1, $3, $2) }
| NOT Exp { Not ($2, $1) }
| Exp EQUAL Exp { Equal ($1, $3, $2) }
| Exp LESS Exp { Less ($1, $3, $2) }
| ID { Var ($1) }
| ID ASSIGN Exp { Assign (#1 $1, $3, (), #2 $1) }
| ID LPAR Exps RPAR { Apply (#1 $1, $3, #2 $1) }
| ID LPAR RPAR { Apply (#1 $1, [], #2 $1) }
| ID Index { Index (#1 $1, $2, (), #2 $1) }
| ID Index ASSIGN Exp { AssignIndex (#1 $1, $2, $4, (), #2 $1) }
| PRINT LPAR Exp RPAR { Print ($3, (), $1) }
| READ LPAR Type RPAR { Read ($3, $1) }
| LPAR Exp RPAR { $2 }
;
Prog is the %start symbol and I have left out the %token and %type declaration on purpose.
The problem I have is that this grammar seems to be ambiguous and looking at the output of running mosmlyac -v on the grammar it seems that it is the rules containing the token ID that is the problem and creates shift/reduce and reduce/reduce conflicts. The output also tells me that the rule Exp : ID is never reduced.
Can anyone help me make this grammar unambiguous?

Index has an empty production.
Now consider:
Exp : ID
| ID Index
Which of those applies? Since Index is allowed to be empty, there is no context in which only one of those is applicable. The parser generator you are using evidently prefers to reduce an empty INDEX, making Exp : ID unusable and creating a large number of conflicts.
I'd suggesting changing Index to:
Index : LBRACKET Exp RBRACKET Index { $2 :: $4 }
| LBRACKET Exp RBRACKET { [ $2 ] }
although in the long run, you might be better off with a more traditional "lvalue/rvalue" grammar, in which lvalue includes ID and lvalue [ Exp ] and rvalue includes lvalue. (That will give a more elaborate parse tree for ID [ Exp ] [ Exp ], but there is an obvious homormorphism.)

Related

Cannot find shift/reduce & reduce/reduce errors

Ive been working on a project and have come across:
Below is the list of if declarations, I have everything here except for the types in typeSpec, but we don't have to worry about those for right now. I am fairly sure that I understand what shift/reduce and reduce/reduce errors are but (lucky for me) the compiler doesn't tell me where the conflicts are. Can someone point me in the right direction on where the conflicts are, or give me some tips on how to find where the conflicts are would be really appreciated. Also if I am missing something from the post, anything and everything is greatly appreciated aswell!
%%
program
: declList { syntaxTree = $1; }
;
declList
: declList decl { $$ = AddSibling($1, $2); }
| decl { $$ = $1; }
;
decl
: varDecl { $$ = $1; }
| funDecl { $$ = $1; }
;
varDecl
: typeSpec varDeclList SEMICOLON { $$ = $2; SetType($2, $1, false); }
;
scopedVarDecl
: STATIC typeSpec varDeclList SEMICOLON { $$ = $3; SetType($3, $2, true); }
| typeSpec varDeclList SEMICOLON { $$ = $2; SetType($2, $1, false); }
;
varDeclList
: varDeclList COMMA varDeclInit { $$ = AddSibling($1, $3); }
| varDeclInit { $$ = $1; }
;
varDeclInit
: varDeclId { $$ = $1; }
| varDeclId COLON simpleExp { $$ = $1; AddChild($$, $3); }
;
varDeclId
: ID { $$ = NewDeclNode(VarK, UndefinedType, $1); $$->tmp = $1->idIndex; }
| ID LBRACKET NUMCONST RBRACKET{ $$ = NewDeclNode(VarK, UndefinedType, $1); $$->isArray = true; $$->aSize = $3->nvalue; $$->tmp = $1->idIndex;}
;
typeSpec
: {}
;
funDecl
: typeSpec ID LPAREN parms RPAREN compoundStmt { $$ = NewDeclNode(FuncK, $1, $2, $4, $6);$$->tmp = $2->idIndex; SetType($$, $1, true);}
| ID LPAREN parms RPAREN compoundStmt{ $$ = NewDeclNode(FuncK, Void, $1, $3, $5); $$->tmp = $1->idIndex; }
;
parms
: parmList { $$ = $1; }
| { $$ = NULL; }
;
parmList
: parmList SEMICOLON parmTypeList { $$ = AddSibling($1, $3); }
| parmTypeList { $$ = $1; }
;
parmTypeList
: typeSpec parmIdList { $$ = $2; SetType($2, $1, false); }
;
parmIdList
: parmIdList COMMA parmId { $$ = AddSibling($1, $3); }
| parmId { $$ = $1; }
;
parmId
: ID { $$ = NewDeclNode(ParamK, Void, $1); $$->tmp = $1->svalue; }
| ID LBRACKET RBRACKET { $$ = NewDeclNode(ParamK, Void, $1); $$->isArray = true; $$->tmp = $1->svalue; }
;
stmt
: matched { $$ = $1; }
| unmatched { $$ = $1; }
;
matched
: expStmt { $$ = $1; }
| compoundStmt { $$ = $1; }
| returnStmt { $$ = $1; } //
| breakStmt { $$ = $1; } //
| matchedSelectStmt { $$ = $1; }
| matchedIterStmt { $$ = $1; }
;
unmatched
: unmatchedSelectStmt{ $$ = $1; }
| unmatchedIterStmt { $$ = $1; }
;
expStmt
: exp SEMICOLON { $$ = $1; }
| SEMICOLON { $$ = NULL; }
;
compoundStmt
: BEG localDecls stmtList END { $$ = NewStmtNode(CompoundK, $1, $2, $3); }
;
localDecls
: localDecls scopedVarDecl { $$ = AddSibling($1, $2); }
| { $$ = NULL; }
;
stmtList
: stmtList stmt { $$ = AddSibling($1, $2); }
| { $$ = NULL; }
;
matchedSelectStmt : IF simpleExp THEN matched ELSE matched
{ $$ = NewStmtNode(IfK, $1, $2, $4, $6); }
;
unmatchedSelectStmt
: IF simpleExp THEN stmt { $$ = NewStmtNode(IfK, $1, $2, $4); }
| IF simpleExp THEN matched ELSE unmatched { $$ = NewStmtNode(IfK, $1, $2, $4, $6); }
;
matchedIterStmt
: WHILE simpleExp DO matched { $$ = NewStmtNode(WhileK, $1, $2, $4); }
| FOR ID ASGN iterRange DO matched { $$ = NewStmtNode(ForK, $1, NewDeclNode(VarK, Integer, $2), $4, $6); $$->tmp = $2->idIndex; }
;
unmatchedIterStmt
: WHILE simpleExp DO unmatched { $$ = NewStmtNode(WhileK, $1, $2, $4); }
| FOR ID ASGN iterRange DO unmatched { $$ = NewStmtNode(ForK, $1, NewDeclNode(VarK, Integer, $2), $4, $6); $$->tmp = $2->idIndex; }
;
iterRange
: simpleExp TO simpleExp { $$ = NewStmtNode(RangeK, $2, $1, $3);}
| simpleExp TO simpleExp BY simpleExp { $$ = NewStmtNode(RangeK, $2, $1, $3, $5); $$->tmp = $2->idIndex; }
;
returnStmt
: RETURN SEMICOLON { $$ = NewStmtNode(ReturnK, $1); }
| RETURN exp SEMICOLON { $$ = NewStmtNode(ReturnK, $1, $2); }
;
breakStmt
: BREAK SEMICOLON { $$ = NewStmtNode(BreakK, $1); }
;
exp
: mutable ASGN exp { $$ = NewExpNode(AssignK, $2, $1, $3); }
| mutable ADDASGN exp { $$ = NewExpNode(AssignK, $2, $1, $3); }
| mutable SUBASGN exp { $$ = NewExpNode(AssignK, $2, $1, $3); }
| mutable MULASGN exp { $$ = NewExpNode(AssignK, $2, $1, $3); }
| mutable DIVASGN exp { $$ = NewExpNode(AssignK, $2, $1, $3); }
| mutable INC { $$ = NewExpNode(AssignK, $2, $1); }
| mutable DEC { $$ = NewExpNode(AssignK, $2, $1); }
| simpleExp {$$ = $1; }
;
simpleExp
: simpleExp OR andExp { $$ = NewExpNode(OpK, $2, $1, $3); }
| andExp { $$ = $1; }
;
andExp
: andExp AND unaryRelExp { $$ = NewExpNode(OpK, $2, $1, $3); }
| unaryExp { $$ = $1; }
;
unaryRelExp
: NOT unaryRelExp { $$ = NewExpNode(OpK, $1, $2); }
| relExp { $$ = $1; }
;
relExp
: sumExp relop sumExp { $$ = $2; AddChild($$, $1); AddChild($$, $3); }
| sumExp { $$ = $1; }
;
relop
: GT { $$ = NewExpNode(OpK, $1); }
| GEQ { $$ = NewExpNode(OpK, $1); }
| LT { $$ = NewExpNode(OpK, $1); }
| LEQ { $$ = NewExpNode(OpK, $1); }
| EQ { $$ = NewExpNode(OpK, $1); }
| NEQ { $$ = NewExpNode(OpK, $1); }
;
sumExp
: sumExp sumop mulExp { $$ = $2; AddChild($$,$1); AddChild($$,$3); }
| mulExp { $$ = $1; }
;
sumop
: ADD { $$ = NewExpNode(OpK, $1); }
| MINUS { $$ = NewExpNode(OpK, $1); }
;
mulExp
: mulExp mulop unaryExp { $$ = $2; AddChild($$, $1); AddChild($$, $3); }
| unaryExp { $$ = $1; }
;
mulop
: STAR { $$ = NewExpNode(OpK, $1); }
| DIV { $$ = NewExpNode(OpK, $1); }
| PERCENT { $$ = NewExpNode(OpK, $1); }
;
unaryExp
: unaryop unaryExp { $$ = $1; AddChild($$, $2); }
| factor { $$ = $1; }
;
unaryop
: MINUS { $$ = NewExpNode(OpK, $1); }
| STAR { $$ = NewExpNode(OpK, $1); }
| QMARK { $$ = NewExpNode(OpK, $1); }
;
factor
: mutable { $$ = $1; }
| immutable { $$ = $1; }
mutable
: ID { $$ = NewExpNode(IdK, $1); $$->name = $1->idIndex; }
| ID LBRACKET exp RBRACKET { $$ = NewExpNode(OpK, $2, NewExpNode(IdK, $1), $3); $$->child[0]->name = $1->idIndex; }
;
immutable
: LPAREN exp RPAREN { $$ = $2; }
| call { $$ = $1; }
| constant { $$ = $1; }
;
call
: ID LPAREN args RPAREN { $$ = NewExpNode(CallK, $1, $3); $$->name = $1->idIndex; }
;
args
: argList { $$ = $1; }
| { $$ = NULL; }
;
argList
: argList COMMA exp { $$ = AddSibling($1, $3); $$->name = $2->svalue; }
| exp { $$ = $1; }
;
constant
: NUMCONST { $$ = NewExpNode(ConstantK, $1); $$->expType = Integer; $$->value = $1->nvalue; }
| CHARCONST { $$ = NewExpNode(ConstantK, $1); $$->expType = Char; $$->cvalue = $1->cvalue; }
| STRINGCONST { $$ = NewExpNode(ConstantK, $1); $$->expType = String; $$->string = $1->svalue; }
| BOOLCONST { $$ = NewExpNode(ConstantK, $1); $$->expType = Boolean; $$->value = $1->nvalue;}
;
%%

Yacc shift/reduce that I cannot identify

So I am having this .y file on which I am trying to parse and evaluate a function with it's parameters, but a have one shift/reduce conflict that I cannot identify:
.y
%{
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include "types.h"
#define YYDEBUG 0
/* prototypes */
nodeType *opr(int oper, int nops, ...);
nodeType *id(int i);
nodeType *con(int value);
void freeNode(nodeType *p);
void yyerror(char *s);
nodeType *RadEc;
int sym[26]; /* symbol table */
%}
%union {
int iValue; /* integer value */
char sIndex; /* symbol table index */
nodeType *nPtr; /* node pointer */
};
%token <iValue> INTEGER
%token <sIndex> VARIABLE
%token WHILE IF PRINT SUBRAD ENDSUB THEN DO ENDIF RAD
%nonassoc IFX
%nonassoc ELSE
%left GE LE EQ NE '>' '<'
%left '+' '-'
%left '*' '/'
%nonassoc UMINUS
%type <nPtr> statement expr stmt_list
%type <iValue> expresie
%start program
%%
program : declaratii cod { exit(0); }
;
declaratii: SUBRAD stmt_list ENDSUB { RadEc=$2; }
| /* NULL */
;
statement : '\n' { $$ = opr(';', 2, NULL, NULL); }
| expr '\n' { $$ = $1; }
| PRINT expr '\n' { $$ = opr(PRINT, 1, $2); }
| VARIABLE '=' expr '\n' { $$ = opr('=', 2, id($1), $3); }
| DO stmt_list WHILE expr { $$ = opr(WHILE, 2, $4, $2); }
| IF expr THEN stmt_list ENDIF %prec IFX { $$ = opr(IF, 2, $2, $4); }
| IF expr THEN stmt_list ELSE stmt_list ENDIF { $$ = opr(IF, 3, $2, $4, $6); }
;
stmt_list : statement
| stmt_list statement { $$ = opr(';', 2, $1, $2); }
;
expr : INTEGER { $$ = con($1); }
| VARIABLE { $$ = id($1); }
| '-' expr %prec UMINUS { $$ = opr(UMINUS, 1, $2); }
| expr '+' expr { $$ = opr('+', 2, $1, $3); }
| expr '-' expr { $$ = opr('-', 2, $1, $3); }
| expr '*' expr { $$ = opr('*', 2, $1, $3); }
| expr '/' expr { $$ = opr('/', 2, $1, $3); }
| expr '<' expr { $$ = opr('<', 2, $1, $3); }
| expr '>' expr { $$ = opr('>', 2, $1, $3); }
| expr GE expr { $$ = opr(GE, 2, $1, $3); }
| expr LE expr { $$ = opr(LE, 2, $1, $3); }
| expr NE expr { $$ = opr(NE, 2, $1, $3); }
| expr EQ expr { $$ = opr(EQ, 2, $1, $3); }
| '(' expr ')' { $$ = $2; }
;
cod : '.' {exit(0);}
| instruc '\n' cod
;
instruc : '\n'
| PRINT expresie {printf("%d\n",$2);}
| VARIABLE '=' expresie {sym[$1]=$3;}
| RAD'('expresie','expresie','expresie')' {sym[0]=$3; sym[1]=$5; sym[2]=$7; ex(RadEc);}
;
expresie : INTEGER { $$ = $1; }
| VARIABLE { $$ = sym[$1]; }
| '-' expresie %prec UMINUS { $$ = -$2; }
| expresie '+' expresie { $$ = $1+$3; }
| expresie '-' expresie { $$ = $1-$3; }
| expresie '*' expresie { $$ = $1*$3; }
| expresie '/' expresie { $$ = $1/$3; }
| expresie '<' expresie { $$ = $1<$3; }
| expresie '>' expresie { $$ = $1>$3; }
| expresie GE expresie { $$ = $1>=$3; }
| expresie LE expresie { $$ = $1<=$3; }
| expresie NE expresie { $$ = $1!=$3; }
| expresie EQ expresie { $$ = $1==$3; }
| '(' expresie ')' { $$ = $2; }
;
%%
nodeType *con(int value)
{
nodeType *p;
/* allocate node */
if ((p = malloc(sizeof(conNodeType))) == NULL)
yyerror("out of memory");
/* copy information */
p->type = typeCon;
p->con.value = value;
return p;
}
nodeType *id(int i)
{
nodeType *p;
/* allocate node */
if ((p = malloc(sizeof(idNodeType))) == NULL)
yyerror("out of memory");
/* copy information */
p->type = typeId;
p->id.i = i;
return p;
}
nodeType *opr(int oper, int nops, ...)
{
va_list ap;
nodeType *p;
size_t size;
int i;
/* allocate node */
size = sizeof(oprNodeType) + (nops - 1) * sizeof(nodeType*);
if ((p = malloc(size)) == NULL)
yyerror("out of memory");
/* copy information */
p->type = typeOpr;
p->opr.oper = oper;
p->opr.nops = nops;
va_start(ap, nops);
for (i = 0; i < nops; i++)
p->opr.op[i] = va_arg(ap, nodeType*);
va_end(ap);
return p;
}
void freeNode(nodeType *p)
{
int i;
if (!p)
return;
if (p->type == typeOpr) {
for (i = 0; i < p->opr.nops; i++)
freeNode(p->opr.op[i]);
}
free (p);
}
int ex(nodeType *p)
{
if (!p)
return 0;
switch(p->type)
{
case typeCon: return p->con.value;
case typeId: return sym[p->id.i];
case typeOpr: switch(p->opr.oper)
{
case WHILE: while(ex(p->opr.op[0]))
ex(p->opr.op[1]);
return 0;
case IF: if (ex(p->opr.op[0]))
ex(p->opr.op[1]);
else if (p->opr.nops > 2)
ex(p->opr.op[2]);
return 0;
case PRINT: printf("%d\n", ex(p->opr.op[0]));
return 0;
case ';': ex(p->opr.op[0]);
return ex(p->opr.op[1]);
case '=': return sym[p->opr.op[0]->id.i] = ex(p->opr.op[1]);
case UMINUS: return -ex(p->opr.op[0]);
case '+': return ex(p->opr.op[0]) + ex(p->opr.op[1]);
case '-': return ex(p->opr.op[0]) - ex(p->opr.op[1]);
case '*': return ex(p->opr.op[0]) * ex(p->opr.op[1]);
case '/': return ex(p->opr.op[0]) / ex(p->opr.op[1]);
case '<': return ex(p->opr.op[0]) < ex(p->opr.op[1]);
case '>': return ex(p->opr.op[0]) > ex(p->opr.op[1]);
case GE: return ex(p->opr.op[0]) >= ex(p->opr.op[1]);
case LE: return ex(p->opr.op[0]) <= ex(p->opr.op[1]);
case NE: return ex(p->opr.op[0]) != ex(p->opr.op[1]);
case EQ: return ex(p->opr.op[0]) == ex(p->opr.op[1]);
}
}
}
void yyerror(char *s)
{
fprintf(stdout, "%s\n", s);
}
int main(void)
{
#if YYDEBUG
yydebug = 1;
#endif
yyparse();
return 0;
}
I tried different ways to see were am I losing something, but I am pretty new at this and still cannot figure it out very well the conflicts.
Any help much appreciated.
Your grammar allows statements to be expressions and it allows two statements to appear in sequence without any separator.
Now, both of the following are expressions:
a
-1
Suppose they appear like that in a statement list. How is that different from this single expression?
a - 1
Ambiguity always shows up as a parsing conflict.
By the way, delimited if statements (with an endif marker) cannot exhibit the dangling else ambiguity. The endif bracket makes the parse unambiguous. So all of the precedence apparatus copied from a different grammar is totally redundant here.

Wrong operator precedence with Happy

Exploring parsing libraries in Haskell I came across this project: haskell-parser-examples. Running some examples I found a problem with the operator precedence. It works fine when using Parsec:
$ echo "3*2+1" | dist/build/lambda-parsec/lambda-parsec
Op Add (Op Mul (Num 3) (Num 2)) (Num 1)
Num 7
But not with Happy/Alex:
$ echo "3*2+1" | dist/build/lambda-happy-alex/lambda-happy-alex
Op Mul (Num 3) (Op Add (Num 2) (Num 1))
Num 9
Even though the operator precedence seems well-defined. Excerpt from the parser:
%left '+' '-'
%left '*' '/'
%%
Exprs : Expr { $1 }
| Exprs Expr { App $1 $2 }
Expr : Exprs { $1 }
| let var '=' Expr in Expr end { App (Abs $2 $6) $4 }
| '\\' var '->' Expr { Abs $2 $4 }
| Expr op Expr { Op (opEnc $2) $1 $3 }
| '(' Expr ')' { $2 }
| int { Num $1 }
Any hint? (I opened a bug report some time ago, but no response).
[Using gch 7.6.3, alex 3.1.3, happy 1.19.4]
This appears to be a bug in haskell-parser-examples' usage of token precedence. Happy's operator precedence only affects the rules that use the tokens directly. In the parser we want to apply precedence to the Expr rule, but the only applicable rule,
| Expr op Expr { Op (opEnc $2) $1 $3 }
doesn't use tokens itself, instead relying on opEnc to expand them. If opEnc is inlined into Expr,
| Expr '*' Expr { Op Mul $1 $3 }
| Expr '+' Expr { Op Add $1 $3 }
| Expr '-' Expr { Op Sub $1 $3 }
it should work properly.

Extending example grammar for Fsyacc with unary minus

I tried to extend the example grammar that comes as part of the "F# Parsed Language Starter" to support unary minus (for expressions like 2 * -5).
I hit a block like Samsdram here
Basically, I extended the header of the .fsy file to include precedence like so:
......
%nonassoc UMINUS
....
and then the rules of the grammar like so:
...
Expr:
| MINUS Expr %prec UMINUS { Negative ($2) }
...
also, the definition of the AST:
...
and Expr =
| Negative of Expr
.....
but still get a parser error when trying to parse the expression mentioned above.
Any ideas what's missing? I read the source code of the F# compiler and it is not clear how they solve this, seems quite similar
EDIT
The precedences are ordered this way:
%left ASSIGN
%left AND OR
%left EQ NOTEQ LT LTE GTE GT
%left PLUS MINUS
%left ASTER SLASH
%nonassoc UMINUS
Had a play around and managed to get the precedence working without the need for %prec. Modified the starter a little though (more meaningful names)
Prog:
| Expression EOF { $1 }
Expression:
| Additive { $1 }
Additive:
| Multiplicative { $1 }
| Additive PLUS Multiplicative { Plus($1, $3) }
| Additive MINUS Multiplicative { Minus($1, $3) }
Multiplicative:
| Unary { $1 }
| Multiplicative ASTER Unary { Times($1, $3) }
| Multiplicative SLASH Unary { Divide($1, $3) }
Unary:
| Value { $1 }
| MINUS Value { Negative($2) }
Value:
| FLOAT { Value(Float($1)) }
| INT32 { Value(Integer($1)) }
| LPAREN Expression RPAREN { $2 }
I also grouped the expressions into a single variant, as I didn't like the way the starter done it. (was awkward to walk through it).
type Value =
| Float of Double
| Integer of Int32
| Expression of Expression
and Expression =
| Value of Value
| Negative of Expression
| Times of Expression * Expression
| Divide of Expression * Expression
| Plus of Expression * Expression
| Minus of Expression * Expression
and Equation =
| Equation of Expression
Taking code from my article Parsing text with Lex and Yacc (October 2007).
My precedences look like:
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc prec_uminus
%right POWER
%nonassoc FACTORIAL
and the yacc parsing code is:
expr:
| NUM { Num(float_of_string $1) }
| MINUS expr %prec prec_uminus { Neg $2 }
| expr FACTORIAL { Factorial $1 }
| expr PLUS expr { Add($1, $3) }
| expr MINUS expr { Sub($1, $3) }
| expr TIMES expr { Mul($1, $3) }
| expr DIVIDE expr { Div($1, $3) }
| expr POWER expr { Pow($1, $3) }
| OPEN expr CLOSE { $2 }
;
Looks equivalent. I don't suppose the problem is your use of UMINUS in capitals instead of prec_uminus in my case?
Another option is to split expr into several mutually-recursive parts, one for each precedence level.

Fsyacc: an item with the same key has been added

I'm starting to play with Fslex/Fsyacc. When trying to generate the parser using this input
Parser.fsy:
%{
open Ast
%}
// The start token becomes a parser function in the compiled code:
%start start
// These are the terminal tokens of the grammar along with the types of
// the data carried by each token:
%token <System.Int32> INT
%token <System.String> STRING
%token <System.String> ID
%token PLUS MINUS ASTER SLASH LT LT EQ GTE GT
%token LPAREN RPAREN LCURLY RCURLY LBRACKET RBRACKET COMMA
%token ARRAY IF THEN ELSE WHILE FOR TO DO LET IN END OF BREAK NIL FUNCTION VAR TYPE IMPORT PRIMITIVE
%token EOF
// This is the type of the data produced by a successful reduction of the 'start'
// symbol:
%type <Ast.Program> start
%%
// These are the rules of the grammar along with the F# code of the
// actions executed as rules are reduced. In this case the actions
// produce data using F# data construction terms.
start: Prog { Program($1) }
Prog:
| Expr EOF { $1 }
Expr:
// literals
| NIL { Ast.Nil ($1) }
| INT { Ast.Integer($1) }
| STRING { Ast.Str($1) }
// arrays and records
| ID LBRACKET Expr RBRACKET OF Expr { Ast.Array ($1, $3, $6) }
| ID LCURLY AssignmentList RCURLY { Ast.Record ($1, $3) }
AssignmentList:
| Assignment { [$1] }
| Assignment COMMA AssignmentList {$1 :: $3 }
Assignment:
| ID EQ Expr { Ast.Assignment ($1,$3) }
Ast.fs
namespace Ast
open System
type Integer =
| Integer of Int32
and Str =
| Str of string
and Nil =
| None
and Id =
| Id of string
and Array =
| Array of Id * Expr * Expr
and Record =
| Record of Id * (Assignment list)
and Assignment =
| Assignment of Id * Expr
and Expr =
| Nil
| Integer
| Str
| Array
| Record
and Program =
| Program of Expr
Fsyacc reports the following error: "FSYACC: error FSY000: An item with the same key has already been added."
I believe the problem is in the production for AssignmentList, but can't find a way around...
Any tips will be appreciated
Hate answering my own questions, but the problem was here (line 15 of parser input file)
%token PLUS MINUS ASTER SLASH LT LT EQ GTE GT
Note the double definition (should have been LTE)
My vote goes for finding a way to improve the output of the Fslex/Fsyacc executables/msbuild tasks

Resources