Is these 2 grammars equal? - parsing

I have the following grammar which is ambiguous and of course not slr1:
E -> E+A+A | E+A-A | E-A+A | E-A-A | T
T -> T+A | T-A | A
A -> A*B | A/B | B
B -> (E) | x
I used the rule of transformation which is:
E -> E + T -----> E -> TE'
E' -> +TE' | ε
so the first grammar transforms into this:
E -> T E' .
E' -> + A + A E' .
E' -> + A - A E' .
E' -> - A + A E' .
E' -> - A - A E' .
E' -> .
T -> A T' .
T' -> + A T' .
T' -> - A T' .
T' -> .
A -> B A' .
A' -> * B A' .
A' -> / B A' .
A' -> .
B -> ( E ) .
B -> x .
This is solving the ambiguity but it continues not being slr1. The transformation is correct. After that I erase the T rules and I install them to E'. So the final grammar which is slr1 is the following:
E -> A E' .
E' -> + A + A E' .
E' -> + A - A E' .
E' -> - A + A E' .
E' -> - A - A E' .
E' -> + A .
E' -> - A .
E' -> .
A -> B A' .
A' -> * B A' .
A' -> / B A' .
A' -> .
B -> ( E ) .
B -> x .
Now I have 2 questions.
The 2 final grammars are equal ?? I define equality by saying that these 2 grammars must accept the same sentences. It seems that they do.
Is the fact that i erased the T rules correct ?? My exercise ask that I change the very first one to slr1 and all i came up with was the final one. Thx in advanced and sorry for my english.

I hope that your assignment is marked by someone who provides good feedback. I would like to believe that higher education still works in some places, but obviously that is a bit of an illusion.
Anyway. The grammar you end up with is a valid solution to the problem as you present it, but the solution is based on a misconception and an error, which coincidentally cancel each other out to produce a valid result.
First, the misconception: left-recursion is not the same as ambiguity, and consequently left-factoring and eliminating left-recursion does not remove ambiguity. In particular, your claim that "This is solving the ambiguity but it continues not being SLR(1)" is mistaken. The transformation does not remove the ambiguity; the grammar continues to not be SLR(1) because it is still ambiguous.
E E
T E' T E'
A T' E' A T' E'
B A' T' E' B A' T' E'
x A' T' E' x A' T' E'
x T' E' x T' E'
x + A T' E' x E'
x + B A' T' E' x + A + A E'
x + x A' T' E' x + B A' + A E'
x + x T' E' x + x A' + A E'
x + x + A T' E' x + x + A E'
x + x + B A' T' E' x + x + B A' E'
x + x + x A' T' E' x + x + x A' E'
x + x + x T' E' x + x + x E'
x + x + x E' x + x + x
x + x + x
The mistake is the erasure of the T rules. You start with
E -> T E' .
T -> A T' .
T' -> + A T' .
T' -> - A T' .
T' -> .
From that, you can easily erase T, since it is only used in one place:
E -> A T' E'.
T' -> + A T' .
T' -> - A T' .
T' -> .
Erasing T' is not so simple, though, because it is recursive. And, in any event, E' does not have any production which uses T', so adding new productions to E' is not a mechanical elimination of T'.
However, the productions you choose to add to E' do, in fact, eliminate the ambiguity. So well done, in that sense. But note that you could have done this without the left-recursion elimination:
E -> E + A + A .
E -> E + A - A .
E -> E - A + A .
E -> E - A - A .
E -> A + A .
E -> A - A .
E -> A .
A -> A * B .
A -> A / B .
A -> B .
B -> ( E ) .
B -> x .
That grammar is unambiguous, for the same reason yours is: the + and - operators are decomposed into a sequence of ternary operations, possibly preceded by a single binary operation (in case the sequence of additive operators contains an odd number of operators). But it is not SLR(1). Indeed, it is not LR(k) for any k, because it is impossible to know whether the sequence of operations should start with a ternary or binary operation until we know whether there are an even or odd number of operators.
But we can solve that problem (in effect, in the same way as your grammar) by making the additive operators right-associative:
E -> A + A + E .
E -> A + A - E .
E -> A - A + E .
E -> A - A - E .
# Rest of the grammar is the same
This grammar is not LL(1), of course; it is not left-factored. But the original problem did not require an LL(1) grammar, and the above is SLR(1).
However, that is just one possible interpretation of the original ambiguous grammar, and quite possibly not the most natural one since right-associativity is not usually the natural interpretation. Unless the problem specifies a desired associativity, there is no way to know what is desired.

Related

LR(1) item sets for left recursive grammar

I read several papers about creating LR(1) item sets, but none of them pertained to left recursive grammars, such as those used for parsing expressions. If I had the following grammar,
E -> E + T | T
T -> T * F | F
F -> (E) | NUMBER
How would I go about creating the LR(1) item sets?
Left recursion isn't inherently a problem for LR(1) parsers, and the rules for determining the configurating sets and lookaheads is the same regardless of whether your grammar has left recursion.
In your case, we begin by augmenting the grammar with our new start symbol:
S -> E
E -> E + T | T
T -> T * F | F
F -> (E) | NUMBER
Our initial configurating set corresponds to looking at the production S -> E with a lookahead of $. Initially, that gives us the following:
(1)
S -> .E [$]
We now need to expand out what E could be. That gives us these new items:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
Now, let's look at the item E -> .E + T [$]. We need to expand out what E could be here, and the rules for doing so are the same as in the non-left-recursive case: we list all productions for E with the dot at the front, with a lookahead given by what follows the E in the production E -> .E + T [$]. In this case we're looking for an E with a lookahead of + because that's what follows is in the production. That adds these items:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
E -> .E + T [+]
E -> .T [+]
From here, we expand out all the cases where there's a dot before a T, which gives the following:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
E -> .E + T [+]
E -> .T [+]
T -> .T * F [$]
T -> .F [$]
T -> .T * F [+]
T -> .F [+]
We now have to expand out the Ts in the context of T -> .T * F [$], and we do so by listing all productions of T followed by what the T is followed by in T -> .T * F [$] (namely, *). That gives us the following:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
E -> .E + T [+]
E -> .T [+]
T -> .T * F [$]
T -> .F [$]
T -> .T * F [+]
T -> .F [+]
T -> .T * F [*]
T -> .F [*]
And from here we'd expand out the productions that have a dot before the F. Do you see how to do that based on how things have played out so far?

How to handle operator precedence in an LL(1) parser

I was writing an LL(1) parser for an expression grammar. I had the following grammar:
E -> E + E
E -> E - E
E -> E * E
E -> E / E
E -> INT
However, this is left recursive and I removed the left recursion with the following grammar:
E -> INT E'
E' -> + INT E'
E' -> - INT E'
E' -> * INT E'
E' -> / INT E'
E' -> ε
If I was to have the expression 1 + 2 * 3, how would the parser know to evaluate the multiplication before the addition?
Try this:
; an expression is an addition
E -> ADD
; an addition is a multiplication that is optionally followed
; by +- and another addition
ADD -> MUL T
T -> PM ADD
T -> ε
PM -> +
PM -> -
; a multiplication is an integer that is optionally followed
; by */ and another multiplication
MUL -> INT G
G -> MD MUL
G -> ε
MD -> *
MD -> /
; an integer is a digit that is optionally followed by an integer
INT -> DIGIT J
J -> INT
J -> ε
; digits
DIGIT -> 0
DIGIT -> 1
DIGIT -> 2
DIGIT -> 3
DIGIT -> 4
DIGIT -> 5
DIGIT -> 6
DIGIT -> 7
DIGIT -> 8
DIGIT -> 9

How to solve shift/reduce conflict using operator precedence?

So I have this grammar I'm trying to build an LR(1) table for
E' -> E
E -> E + E
E -> E * E
E -> ( E )
E -> a
So far, this my table
I'm trying to solve the conflicts here. I thought about changing the grammar to postfix instead of infix but I'm not really sure if I can do that. Any ideas?
Here is your grammar, with precedence:
E' -> E
E -> E + T
E -> T
T -> T * F
T -> F
F -> ( E )
F -> a
Don't forget the extra E -> T, and T -> F, as without it the grammar will be useless.
Note: This will not work with LR(0), because you'll get a conflict.

Grammar Precedence and associativity

if i am given following grammar
E->E W T|T
T->L S T|L
L->a|b|c
W->*
S->+|-
From following grammar i see that since + and - are deeper down the tree they have higher precedence then *, am i correct on that?
Also since this is left recursion i can assume left associativity?
Since operators can have different associativity i a confused how to tell which one has which one.
I guess what i am asking is how can i tell operator associativity based on grammar?
Start with
T->L S T|L
and consider a+b+c, which can be produced from T as follows:
T -> L S T
-> L S (L S T)
-> L S (L S (L))
-> L S (L S (c))
-> L S (b + (c))
-> L + (b + (c))
-> a + (b + (c))
(The parentheses are only there as a shorthand for the parse tree.)
That rightmost derivation is unique; T cannot match (a + b) + c because a + b is not an L.
Consequently, + and - are "right-associative".
By contrast, we have
E->E W T|T
so a*b*c will be produced as follows:
E -> E W T
-> E W L
-> E W c
-> E * c
-> (E W T) * c
-> (E W L) * c
-> (E W b) * c
-> (E * b) * c
-> ((T) * b) * c
-> ((L) * b) * c
-> ((a) * b) * c
Again, that parse is unambiguous.
I didn't do a+b*c, so it would be a good exercise.

Building LR(1) configuration lookahead

I really have some troubles to cauculate the lookahead when building the LR(1) item sets, i had tried some lecture notes form different sites, but still...
My example is
S -> E + S | E
E -> num | ( S )
The item set is
I0:
S’ -> . S $
S -> . E + S $
S -> . E $
E -> . num +,$
E -> . ( S ) +,$
I1:
S ->E .+ S $
S ->E . $
The first item in set I0
S’ -> . S $
is initialization.
The second item in set I0
S -> . E + S $
means there is nothing on stack, we expect to read E+S, then reduce iff the token after E+S is $.
The third item in set I0
S -> . E $
means that we expect to read E and reduce iff the token after E is $.
Then i am confused about the fouth item in set I0,
E -> . num +,$
I have no ideas why there are + and $ tokens.
and if anyone can explain this for me in plain English please.
For each configuration [A –> u•Bv, a] in I, for each production B –> w in G', and for
each terminal b in First(va) such that [B –> •w, b] is not in I: add [B –> •w, b] to I.
Thanks!!!
I think i figured it out.
i am using the algorithm of
for set I0:
Begin with [S' -> .S, $]
Match [A -> α.Bβ, a]
Then add in [B -> .γ, b]
Where terminal b is FIRST(βa)
for set I1...In
Compute GOTO(I0,X)
Add in X productions and LOOKAHEAD token
In the example
S -> E + S
S -> E
E -> num
E -> ( S )
Firstly,
S’ -> . S $
we try to match it to [A -> α.Bβ, a], That is
A =S', α = ε, B = S , β = ε , a = $ and
FIRST(βa) = {$}
Add in [B -> .γ, b], which are
S -> . E + S $ ...1
S -> . E $ ...2
in I0.
Then, we need to add in productions for E as 1 and 2.
In this case, our [A -> α.Bβ, a] are 1 and 2.
Thus, FIRST(βa) = { + , $ }, and we have
E -> . num +,$
E -> . ( S ) +,$
Now, we compute GOTO(I0, X)
For X = E
we move dot one position and found no productions need to be added. So we just add in second component $ from
S -> . E + S $
S -> . E $
which gives us I1
S ->E .+ S $
S ->E . $
and so on...
So, is this the correct and efficient way when building LR(1) item sets?
For
E -> . num +,$
E -> . ( S ) +,$
the +,$ indicate that only these tokens can follow a number or a closing parenthesis. Think about it: The grammar does noty allow adjacent num's or ()'s, they must either be at the end of the sentence or followed by a +.
As for translation request, it is a fancy way of saying how to calculate the set of tokens that can follow a given token. The +,$ above are an example. They are the only legal tokens that can follow num and ).

Resources