Let's assume I have grammar like this:
expr : expr '-' expr { $$ = $1 - $3; }
| "Function" '(' expr ',' expr ')' { $$ = ($3 - $5) * 2; }
| NUMBER { $$ = $1; };
How can use rule
expr : expr '-' expr { $$ = $1 - $3; }
inside
expr : "Function" '(' expr ',' expr ')' { $$ = ($3 - $5) * 2; }
Because implementation of $1 - $3 is repeated? It would be much better if I can use already implemented subtraction from rule one and only add multiplication with 2. This is just the basic example, but I have very big grammar with lot of repeating calculations.
Related
I have a grammar for arithmetic expression which solves number of expression (one per line) in a text file. While compiling YACC I am getting message 2 shift reduce conflicts. But my calculations are proper. If parser is giving proper output how does it resolves the shift/reduce conflict. And In my case is there any way to solve it in YACC Grammar.
YACC GRAMMAR
Calc : Expr {printf(" = %d\n",$1);}
| Calc Expr {printf(" = %d\n",$2);}
| error {yyerror("\nBad Expression\n ");}
;
Expr : Term { $$ = $1; }
| Expr '+' Term { $$ = $1 + $3; }
| Expr '-' Term { $$ = $1 - $3; }
;
Term : Fact { $$ = $1; }
| Term '*' Fact { $$ = $1 * $3; }
| Term '/' Fact { if($3==0){
yyerror("Divide by Zero Encountered.");
break;}
else
$$ = $1 / $3;
}
;
Fact : Prim { $$ = $1; }
| '-' Prim { $$ = -$2; }
;
Prim : '(' Expr ')' { $$ = $2; }
| Id { $$ = $1; }
;
Id :NUM { $$ = yylval; }
;
What change should I do to remove such conflicts in my grammar ?
Bison/yacc resolves shift-reduce conflicts by choosing to shift. This is explained in the bison manual in the section on Shift-Reduce conflicts.
Your problem is that your input is just a series of Exprs, run together without any delimiter between them. That means that:
4 - 2
could be one expression (4-2) or it could be two expressions (4, -2). Since bison-generated parsers always prefer to shift, the parser will choose to parse it as one expression, even if it were typed on two lines:
4
-2
If you want to allow users to type their expressions like that, without any separator, then you could either live with the conflict (since it is relatively benign) or you could codify it into your grammar, but that's quite a bit more work. To put it into the grammar, you need to define two different types of Expr: one (which is the one you use at the top level) cannot start with an unary minus, and the other one (which you can use anywhere else) is allowed to start with a unary minus.
I suspect that what you really want to do is use newlines or some other kind of expression separator. That's as simple as passing the newline through to your parser and changing Calc to Calc: | Calc '\n' | Calc Expr '\n'.
I'm sure that this appears somewhere else on SO, but I can't find it. So here is how you disallow the use of unary minus at the beginning of an expression, so that you can run expressions together without delimiters. The non-terminals starting n_ cannot start with a unary minus:
input: %empty | input n_expr { /* print $2 */ }
expr: term | expr '+' term | expr '-' term
n_expr: n_term | n_expr '+' term | n_expr '-' term
term: factor | term '*' factor | term '/' factor
n_term: value | n_term '+' factor | n_term '/' factor
factor: value | '-' factor
value: NUM | '(' expr ')'
That parses the same language as your grammar, but without generating the shift-reduce conflict. Since it parses the same language, the input
4
-2
will still be parsed as a single expression; to get the expected result you would need to type
4
(-2)
I was doing a simple C parser project.
This problem occurred when I was writing the grammar for if-else construct.
the grammar that I have written is as following:
iexp: IF OP exp CP block eixp END{
printf("Valid if-else ladder\n");
};
eixp:
|ELSE iexp
|ELSE block;
exp: id
|NUMBER
|exp EQU exp
|exp LESS exp
|exp GRT exp
|OP exp CP;
block: statement
|iexp
|OCP statement CCP
|OCP iexp CCP;
statement: id ASSIGN NUMBER SEMICOL;
id: VAR;
where the lex part looks something like this
"if" {return IF;}
"else" {return ELSE;}
[0-9]+ {return NUMBER;}
">" {return GRT;}
"<" {return LESS;}
"==" {return EQU;}
"{" {return OCP;}
"}" {return CCP;}
"(" {return OP;}
")" {return CP;}
"$" {return END;}
";" {return SEMICOL;}
"=" {return ASSIGN;}
[a-zA-Z]+ {return VAR;}
. {;}
I am getting o/p as
yacc: 9 shift/reduce conflicts, 1 reduce/reduce conflict.
When I eliminate the left recursion on exp derivation the conflicts vanish but why it's so ?
the revised grammar after eliminating left recursion was :
exp: id
|NUMBER
|id EQU exp
|id LESS exp
|id GRT exp
|OP exp CP;
I was able to parse successfully the grammar for the evaluation of arithmetic expressions. Is it so that %right, %left made it successful
%token ID
%left '+' '-'
%left '*' '/'
%right NEGATIVE
%%
S:E {
printf("\nexpression : %s\nResult=%d\n", buf, $$);
buf[0] = '\0';
};
E: E '+' E {
printf("+");
($$ = $1 + $3);
} |
E '-' E {
printf("-");
($$ = $1 - $3);
} |
E '*' E {
printf("*");
($$ = $1 * $3);
} |
E '/' E {
printf("/");
($$ = $1 / $3);
} |
'(' E ')' {
($$ = $2);
} |
ID {
/*do nothing done by lex*/
};
This is not homework, but it is from a book.
I'm given a following bison spec file:
%{
#include <stdio.h>
#include <ctype.h>
int yylex();
int yyerror();
%}
%token NUMBER
%%
command : exp { printf("%d\n", $1); }
; /* allows printing of the result */
exp : exp '+' term { $$ = $1 + $3; }
| exp '-' term { $$ = $1 - $3; }
| term { $$ = $1; }
;
term : term '*' factor { $$ = $1 * $3; }
| factor { $$ = $1; }
;
factor : NUMBER { $$ = $1; }
| '(' exp ')' { $$ = $2; }
;
%%
int main() {
return yyparse();
}
int yylex() {
int c;
/* eliminate blanks*/
while((c = getchar()) == ' ');
if (isdigit(c)) {
ungetc(c, stdin);
scanf("%d", &yylval);
return (NUMBER);
}
/* makes the parse stop */
if (c == '\n') return 0;
return (c);
}
int yyerror(char * s) {
fprintf(stderr, "%s\n", s);
return 0;
} /* allows for printing of an error message */
The task is to do the following:
Rewrite the spec to add the following useful error messages:
"missing right parenthesis," generated by the string (2+3
"missing left parenthesis," generated by the string 2+3)
"missing operator," generated by the string 2 3
"missing operand," generated by the string (2+)
The simplest solution that I was able to come up with is to do the following:
half_exp : exp '+' { $$ = $1; }
| exp '-' { $$ = $1; }
| exp '*' { $$ = $1; }
;
factor : NUMBER { $$ = $1; }
| '(' exp '\n' { yyerror("missing right parenthesis"); }
| exp ')' { yyerror("missing left parenthesis"); }
| '(' exp '\n' { yyerror("missing left parenthesis"); }
| '(' exp ')' { $$ = $2; }
| '(' half_exp ')' { yyerror("missing operand"); exit(0); }
;
exp : exp '+' term { $$ = $1 + $3; }
| exp '-' term { $$ = $1 - $3; }
| term { $$ = $1; }
| exp exp { yyerror("missing operator"); }
;
These changes work, however they lead to a lot of conflicts.
Here is my question.
Is there a way to rewrite this grammar in such a way so that it wouldn't generate conflicts?
Any help is appreciated.
Yes it is possible:
command : exp { printf("%d\n", $1); }
; /* allows printing of the result */
exp: exp '+' exp {
// code
}
| exp '-' exp {
// code
}
| exp '*' exp {
// code
}
| exp '/' exp {
// code
}
|'(' exp ')' {
// code
}
Bison allows Ambiguous grammars.
I don't see how can you rewrite grammar to avoid conflicts. You just missed the point of terms, factors etc. You use these when you want left recursion context free grammar.
From this grammar:
E -> E+T
|T
T -> T*F
|F
F -> (E)
|num
Once you free it from left recursion you would go to:
E -> TE' { num , ( }
E' -> +TE' { + }
| eps { ) , EOI }
T -> FT' { ( , num }
T' -> *FT' { * }
|eps { + , ) , EOI }
F -> (E) { ( }
|num { num }
These sets alongside rules are showing what input character has to be in order to use that rule. Of course this is just example for simple arithmetic expressions for example 2*(3+4)*5+(3*3*3+4+5*6) etc.
If you want to learn more about this topic I suggest you to read about "left recursion context free grammar". There are some great books covering this topic and also covering how to get input sets.
But as I said above, all of this can be avoided because Bison allows Ambiguous grammars.
Exploring parsing libraries in Haskell I came across this project: haskell-parser-examples. Running some examples I found a problem with the operator precedence. It works fine when using Parsec:
$ echo "3*2+1" | dist/build/lambda-parsec/lambda-parsec
Op Add (Op Mul (Num 3) (Num 2)) (Num 1)
Num 7
But not with Happy/Alex:
$ echo "3*2+1" | dist/build/lambda-happy-alex/lambda-happy-alex
Op Mul (Num 3) (Op Add (Num 2) (Num 1))
Num 9
Even though the operator precedence seems well-defined. Excerpt from the parser:
%left '+' '-'
%left '*' '/'
%%
Exprs : Expr { $1 }
| Exprs Expr { App $1 $2 }
Expr : Exprs { $1 }
| let var '=' Expr in Expr end { App (Abs $2 $6) $4 }
| '\\' var '->' Expr { Abs $2 $4 }
| Expr op Expr { Op (opEnc $2) $1 $3 }
| '(' Expr ')' { $2 }
| int { Num $1 }
Any hint? (I opened a bug report some time ago, but no response).
[Using gch 7.6.3, alex 3.1.3, happy 1.19.4]
This appears to be a bug in haskell-parser-examples' usage of token precedence. Happy's operator precedence only affects the rules that use the tokens directly. In the parser we want to apply precedence to the Expr rule, but the only applicable rule,
| Expr op Expr { Op (opEnc $2) $1 $3 }
doesn't use tokens itself, instead relying on opEnc to expand them. If opEnc is inlined into Expr,
| Expr '*' Expr { Op Mul $1 $3 }
| Expr '+' Expr { Op Add $1 $3 }
| Expr '-' Expr { Op Sub $1 $3 }
it should work properly.
I am trying to write an interpreter but am having difficulty understanding the theoretical underpinnings of the process.
I understand that the first part is to write a lexer which splits the string up into a list of valid tokens and then a parser is used to generate the corresponding abstract syntax tree for this string of tokens. However, a parser is built using a grammar rule, which is what I'm having difficulty understanding.
A Grammar rule is obviously used to create the rules of the resulting abstract syntax tree, but how exactly does this middle step work. Does it pattern match on string characters and a specific list of tokens or . . .?
Any type of intuition or explanation is welcomed. Thanks!
Search the internet for lex/yacc examples and tutorials. Learning by doing.
Ability to program in C is also necessary.
http://ds9a.nl/lex-yacc/cvs/lex-yacc-howto.html
lex is the ancient Unix lexer, which generates C-code from a regexp based spec.
yacc is the ancient Unix parser for building syntax trees. It generates C code too.
The modern GNU versions of the tools are called flex and bison.
Here's the core of the yacc code of a calculator. It shows how higher level constructs are built from the tokens, and what to do when such constructs are encountered.
%%
list : // empty
| list stm '\n' { print(); }
| list cmd '\n' { print(); }
| list cmd stm '\n' { print(); }
| list stm cmd '\n' { print(); }
| list cmd stm cmd '\n' { print(); }
| list error '\n' { yyerrok; print(); }
;
cmd : COMMAND { commands[$1](); }
;
stm : expr { output = $1; outputPush(); }
| VAR '=' expr { vars_set($1, &$3); }
;
expr : { outputGet(); $$ = output; }
| '_' { outputGet(); $$ = output; }
| '(' expr ')' { $$ = $2; }
| expr OPADD expr { $$ = tNumOpIn ($1, $2, $3); }
| expr OPMUL expr { $$ = tNumOpIn ($1, $2, $3); }
| expr OPPOW expr { $$ = tNumOpIn ($1, $2, $3); }
| OPPRE expr { $$ = tNumOpPre($1, $2); }
| VAR { if (vars_get($1,&$$)) $$=output; }
| NUMBER { $$ = $1; }
;
%%