solving the shift/reduce conflict - parsing

I am currently having the following grammar which I am asked if it is SLR(1) or not. The grammar is:
E -> E + A + A | E - A + A | E + A - A | E - A - A | T .
T -> T + A | T - A | A .
A -> A * B | A / B | B .
B -> ( E ) | x .
The grammar is ambigius and with the help of the software of grammophone in github ( http://mdaines.github.io/grammophone/#/ just copy paste the grammar into the edit section) i am locating a conflict of shift/reduce in the line 2 of the board when + and - are coming. Now the next question is to fix the grammar so that it becomes SLR(1). How can I do that ?? I am searching the net and I cannot find the answer. Sorry for my bad english.
Edit:
The grammar was at first like
E->E+E+E|E-E-E|E+E-E|E-E+E
E->E+E|E-E|E*E|E/E
E->(E)|x
and the exercise tells you that * and / has equally priority but higher than + and - where + and - has equal priority. Given those you have to change the given grammar into another one which is SLR(1). All I could come up with was the very first one. The production comes from left to right and we cannot remove any rules.

Related

What decides which production the parser tries?

I am trying to build a parser for a desk calculator and am using the following bison code for it.
%union{
float f;
char c;
// int
}
%token <f> NUM
%token <c> ID
%type <f> S E T F G
%%
C : S ';'
| C S ';'
;
S : ID '=' E {fprintf(debug,"13\n");printf("%c has been assigned the value %f.",$1,$3);symbolTable[$1]=$3;}
| E {fprintf(debug,"12\n");result = $$;}
;
E : E '+' T {fprintf(debug,"11\n");$$ = $1+$3;}
| E '-' T {fprintf(debug,"10\n");$$ = $1-$3;}
| T {fprintf(debug,"9\n");$$ = $1;}
;
T : T '*' F {fprintf(debug,"7\n");$$ = $1*$3;}
| T '/' F {fprintf(debug,"6\n");$$ = $1/$3;}
| F {fprintf(debug,"5\n");$$ = $1;}
;
F : G '#' F {fprintf(debug,"4\n");$$ = pow($1,$3);}
| G {fprintf(debug,"3\n");$$ = $1;}
;
G : '(' E ')' {fprintf(debug,"2\n");$$ = $2;}
| NUM {fprintf(debug,"1\n");$$ = $1;}
| ID {fprintf(debug,"0\n");$$ = symbolTable[$1];}
;
%%
My LEX rules are
digit [0-9]
num {digit}+
alpha [A-Za-z]
id {alpha}({alpha}|{digit})*
white [\ \t]
%%
let {printf("let");return LET;}
{num} {yylval.f = atoi(yytext);return NUM;}
{alpha} {yylval.c = yytext[0];return ID;}
[+\-\*/#\(\)] {return yytext[0];}
. {}
%%
The input I gave is a=2+3
When the lexer returns an ID(for 'a'), the parser is going for the production with fprintf(debug,"0\n"). But I want it to go for the production fprintf(debug,"13\n").
So, I am wondering what made my parser go for a reduction on production 0, instead of shifting = to stack, and how do I control it?
What you actually specified is a translation grammar, given by the following:
C → S ';' 14 | C S ';' 8
S → ID '=' E 13 | E 12
E → E '+' T 11 | E '-' T 10 | T 9
T → T '*' F 7 | T "/" F 6 | F 5
F → G '#' F 4 | G 3
G → '(' E ')' 2 | NUM 1 | ID 0
with top-level/start configuration C. (For completeness, I added in 8 and 14).
There is only one word generated from C, by this translation grammar, containing ID '=' NUM '+' NUM as the subword of input tokens, and that is ID ('a') '=' NUM('2') 1 3 5 9 '+' NUM('3') 1 3 5 11 13 ';' 14, which is equal to the input-output pair (ID '=' NUM '+' NUM ';', 1 3 5 9 1 3 5 11 13 14). So, the sequence 1 3 5 9 1 3 5 11 13 14 is the one and only translation. Provided the grammar is LALR(1), then this translation will be produced, as a result; and the grammar is LALR(1).
If you're not getting this result, then that can only mean that you implemented wrong whatever you left out of your description: i.e. the lexer ... or that your grammar processor has a bug or your machine has a failure.
And, no; actually what you did is the better way to see what's going on - just stick in a single printf statement to the right hand side of each rule and run it that way to see what translation sequences are produced. The "trace" facility in the parser generator is superfluous for that very reason ... at least the way it is usually implemented (more on that below). In addition, you can get a direct view of everything with the -v option, which produces the LR(0) tables with LALR(1) annotations.
The kind of built-in testing facility that would actually be more helpful - especially for examples like this - is just what I described: one that echoes the inputs interleaved with the output actions. So, when you run it on "a = 2 + 3 ;", it would give you ID('a') '=' NUM('2') 1 3 5 9 '+' NUM('3') 1 3 5 11 13 ';' 14 with echo turned on, and just 1 3 5 9 1 3 5 11 13 14 with echo turned off. That would actually be more useful to have as a built-in capability, instead of the trace mode you normally see in implementations of yacc.
The POSIX specification actually leaves open the issue of how "YYDEBUG", "yydebug" and "-t" are to be implemented in a compliant implementation of yacc, to make room for alternative approaches like this.
Well, it turns out that the problem is I am not identifying = as a token here, in my LEX.
As silly as it sounds, it points out a very important concept of yacc/Bison. The question of whether to shift or reduce is answered by checking the next symbol, also called the lookahead. In this case, the lookahead was NUM(for 2) and not =, because of my faulty LEX code. Since there is no production involving ID followed by NUM, it is going for a reduction to G.
And about how I figured it out, it turns out bison has a built-in trace feature. It lays out neatly like a diary entry, whatever it does while parsing. each and every step is written down.
To enable it,
Run bison with -Dparse.trace option.
bison calc.y -d -Dparse.trace
In the main function of parser grab the extern yydebug and set it to non-zero value.
int main(){
extern int yydebug;
yydebug = 1;
.
.
.
}

SLR parsing conflicts with epsilon production

Consider the following grammar
S -> aPbSQ | a
Q -> tS | ε
P -> r
While constructing the DFA we can see there shall be a state which contains Items
Q -> .tS
Q -> . (epsilon as a blank string)
since t is in follow(Q) there appears to be a shift - reduce conflict.
Can we conclude the nature of the grammar isn't SLR(1) ?
(Please ignore my incorrect previous answer.)
Yes, the fact that you have a shift/reduce conflict in this configuring set is sufficient to show that this grammar isn't SLR(1).

Converting given ambiguous arithmetic expression grammar to unambiguous LL(1)

In this term, I have course on Compilers and we are currently studying syntax - different grammars and types of parsers. I came across a problem which I can't exactly figure out, or at least I can't make sure I'm doing it correctly. I already did 2 attempts and counterexamples were found.
I am given this ambiguous grammar for arithmetic expressions:
E → E+E | E-E | E*E | E/E | E^E | -E | (E)| id | num , where ^ stands for power.
I figured out what the priorities should be. Highest priority are parenthesis, followed by power, followed by unary minus, followed by multiplication and division, and then there is addition and substraction. I am asked to convert this into equivalent LL(1) grammar. So I wrote this:
E → E+A | E-A | A
A → A*B | A/B | B
B → -C | C
C → D^C | D
D → (E) | id | num
What seems to be the problem with this is not equivalent grammar to the first one, although it's non-ambiguous. For example: Given grammar can recognize input: --5 while my grammar can't. How can I make sure I'm covering all cases? How should I modify my grammar to be equivalent with the given one? Thanks in advance.
Edit: Also, I would of course do elimination of left recursion and left factoring to make this LL(1), but first I need to figure out this main part I asked above.
Here's one that should work for your case
E = E+A | E-A | A
A = A*C | A/C | C
C = C^B | B
B = -B | D
D = (E) | id | num
As a sidenote: pay also attention to the requirements of your task since some applications might assign higher priority to the unary minus operator with respect to the power binary operator.

Arithmetic expression grammar in prefix notation (Java Cup)

I'm writting a grammar for arithmetic expression in prefix notation. However I have an issue when parsing negative numbers or substraction. Grammar example is this:
precedence right +, -;
precedence right *, /;
precedence right uminus;
E ::= + E E
| - E E
| * E E
| / E E
| ( E )
| - E %prec uminus
| id
| digit
;
But if my input is - 5 4, it reduces 5 as E, next it reduces - E (negative) and then parser gives me a syntax error at 4. The correct one should be 5 as E, next 4 as E and then - E E as E. How can I solve this problem using associativity? or do I need to rewrite my grammar?
(Promoted from comment)
Your grammar really is ambiguous, and precedence declarations won't help you a bit.
Consider the input the input consisting of N - tokens, followed by M 1 tokens.
- - - - - - - ... - 1 1 1 ... 1
In order for this to be an expression, M-1 of the - tokens must be binary, and the remaining N-(M-1) unary, but there is no way to tell which is which (unless they are all binary).
Even if you arbitrarily say that the first N-(M-1) -s are unary, you can't tell what the value of N-(M-1) is until you read the entire input, which means you can't parse with a finite lookahead.
But the whole point of prefix notation is to avoid the need for parentheses. Arbitrary declarations like the above make it impossible to represent alternative interpretations, so that some expressions would be impossible to represent in prefix notation. That's just plain wrong.
Here's a simple case:
- 5 - - - 4 3 1
is either
5 - (- (4 - (3 - 1)))
5 - ((- (4 - 3)) - 1)
5 - (((- 4) - 3) - 1)
In prefix notation, you need to declare the "arity" of every operator, either implicitly (every operator has a known number of arguments), or explicitly using a notation like this, borrowed from Prolog:
-/2 5 -/2 -/2 -/1 4 3 1
Alternatively, you can delimit the arguments with mandatory parentheses, as with Lisp/Scheme "s-exprs":
(- 5 (- (- (- 4) 3) 1))
In first place, remove all precedence declarations. They are not needed in prefix grammars. In fact, that should be enough to solve the issue in any parser generator. Which one are you using, BTW?
Cup has a finite lookahead. As #rici points out, the ambiguity can't be resolved in this case. What you can do is to restrict the grammar so just one consecutive unary - can be used.
B ::= E
| - E
;
E ::= + B B
| - B B
| * B B
| / B B
| ( B )
| id
| digit
;
Please check the above several times as I'm pretty rusty.

Shift / reduce conflicts in grammar of arithmetic expression with n-ary sums / products

Parsing binary sums / products are easy, but I'm having troubles defining a grammar that parses
a + b * c + d + e
as
sum(a, prod(b, c), d, e)
My initial (naive) attempt generated 61 shift / reduce conflicts.
I'm using java cup (but I suppose a solution for any other parser generator would be easily translated).
The following ANTLR grammar:
parse
: exp EOF
;
exp
: add_exp
;
add_exp
: mul_exp ('+' mul_exp)*
;
mul_exp
: atom ('*' atom)*
;
atom
: Number
| '(' exp ')'
;
Number
: 'a'..'z'
;
parses the input a + b * c + d + e as:
alt text http://img266.imageshack.us/img266/7099/17212574.png
As you can see, the mul_exp is the furthest away in the tree and (using an appropriate "walk" through your tree) will be evaluated first.
and the input a + b * (c + d) + e is parsed as:
alt text http://img688.imageshack.us/img688/2207/89332200.png
The images were generated with ANTLRWorks.
EDIT:
A tool like ANTLRWorks makes debugging a grammar a breeze! For example, if I click on the atom rule in the grammar above, the following is automatically generated and displayed at the bottom of the screen:
alt text http://img340.imageshack.us/img340/6793/53395907.png
Of course, that rule isn't complex at all, but when you do get to work with more complex rules, it's pretty darn easy to visualize them like that.
HTH.

Resources