Follow Set of grammar - parsing

I am working on trying to compute the FOLLOW set of the following grammar:
E -> TX
T -> int Y | ( E )
X -> + E | ε
Y -> * T | ε
I have calculated the following FOLLOW set so far:
follow (E) = {$} U {)}
follow (Y) = follow (T)
follow (T) = follow (Y)
follow (X) = follow (E) = {$, )}
follow (E) = first ()) = {)}
I know that the follow (T) / follow (Y) contains {+,$,)} but I am struggling to get to that point.
Any assistance in explaining the method here would be greatly helpful.
Note: I have followed these rules
1) If A is the start symbol put $ in Follow (A)
2) If there is a production B -> αAb, then Follow (A) = First (b)
3) If there is a production B -> aA or B -> αAb where First (b) is ε, add Follow (A) = Follow (B)

I have figured it out (and spent the better part of the afternoon)!
So the rules I'm following for anyone who finds this are:
follow(E) = follow(T)
follow(E) = first ())
follow(X) = follow(E)
follow(Y) = follow(T)
**follow(T) = first(X)** //the important one!
Following these rules you can build the sets:
follow(E) = {$, )}
follow(T) = {$, ), +}
follow(X) = {$, )}
follow(Y) = {$, ), +}
Which concludes the follow sets for the grammar!

Related

Compilers: First and Follow Sets of a grammar that does not contain epsilon

In my current compilers course, I've understood how to find the first and follow sets of a grammar, and so far all of the grammars I have dealt with have contained epsilon. Now I am being asked to find the first and follow sets of a grammar without epsilon, and to determine whether it is LR(0) and SLR. Not having epsilon has thrown me off, so I don't know if I've done it correctly. I would appreciate any comments on whether I am on the right track with the first and follow sets, and how to begin determining if it is LR(0)
Consider the following grammar describing Lisp arithmetic:
S -> E // S is start symbol, E is expression
E -> (FL) // F is math function, L is a list
L -> LI | I // I is an item in a list
I -> n | E // an item is a number n or an expression E
F -> + | - | *
FIRST:
FIRST(S)= FIRST(E) = {(}
FIRST(L)= FIRST(I) = {n,(}
FIRST(F) = {+, -, *}
FOLLOW:
FOLLOW(S) = {$}
FOLLOW(E) = FOLLOW(L) = {), n, $}
FOLLOW(I) = {),$}
FOLLOW(F) = {),$}
The FIRST sets are right, but the FOLLOW sets are incorrect.
The FOLLOW(S) = {$} is right, though technically this is for the augmented grammar S' -> S$ .
E appears on the right side of S -> E and I -> E, both of which mean that the follow of that set is in the follow of E, so: FOLLOW(E) = FOLLOW(S) ∪ FOLLOW(I) .
L appears on the right hand side of L -> LI, which gives FOLLOW(L) ⊇ FIRST(I) , and E -> (FL), which gives FOLLOW(L) ⊇ {)} .
I appears on the right side of L -> LI | I , which gives FOLLOW(I) = FOLLOW(L) .
F appears on the right side in E -> (FL) , which gives FOLLOW(F) = FIRST(L)
Solving for these gives:
FOLLOW(F) = {n, (}
FOLLOW(L) = FIRST(I) ∪ {)} = {n, (, )}
FOLLOW(I) = {n, (, )}
FOLLOW(E) = {$} ∪ {n, (, )} = {n, (, ), $}

proving that a grammar is LL(1)

I'm given the following grammar :
S -> A a A b | B b B a
A -> epsilon
B -> epsilon
I know that it's obvious that it's LL(1), but I'm facing troubles constructing the parsing table.. I followed the algorithm word by word to find the first and follow of each non-terminal , correct me if I'm wrong:
First(S) = {a,b}
First(A) = First(B) = epsilon
Follow(S) = {$}
Follow(A) = {a,b}
Follow(B) = {a,b}
when I construct the parsing table, according to the algorithm, I get a conflict under the $ symbol... what the hell am I doing wrong??
a b $
A A-> epsilon
B B-> epsilon
S S -> AaAb
S -> BbBa
is it ok if I get 2 productions under $ or something?? or am I constructing the parsing table wrong? please help I'm new to the compiler course
There is a tiny mistake. Algorithm is as follows from dragon book,
for each rule (S -> A):
for each terminal a in First(A):
add (S -> A) to M[S, a]
if First(A) contains empty:
for each terminal b in Follow(S):
add (S -> A) to M[S, b]
Let's take them one by one.
S -> AaAb. Here, First(AaAb) = {a}. So add S -> AaAb to M[S, a].
S -> BbBa. Here, First(BbBa) = {b}. So add S -> BbBa to M[S, b].
A -> epsilon. Here, Follow(A) = {a, b}. So add A -> epsilon to M[A, a] and M[A, b].
B -> epsilon. Here, Follow(B) = {a, b}. So add B -> epsilon to M[B, a] and M[B, b].

How Follow function works? (compiler)

Grammar:
E -> TE’
E’ -> +TE’ | ε
T -> FΤ’
Τ’ -> *FΤ’ | ε
F -> (E)| id
Functions:
1. FIRST(F) = FIRST(T) = FIRST(E) = {(, id}
2. FIRST(E’) = {+, ε}
3. FIRST(T’) = {*, ε}
4. FOLLOW(E) = FOLLOW(E’) = {), $}
5. FOLLOW(T) = FOLLOW(T’) = {+, ), $}
6. FOLLOW(F) = {*, +, ), $}
Here is the grammar and the functions from my lectures...Can someone explain me how FOLLOW works??? I understood how FIRST work but FOLLOW is very difficult to understand...
Have a look at Wikipedia's FIRST_and_FOLLOW_sets
.
FOLLOW(E):
You look for any references of E.
Here (E) and union all following terminals and the FIRST-set of the following nonterminals.
Here only the following terminal ).
FOLLOW(F):
F is referenced by FT, *FT'. So FOLLOW(F) is the union of FIRST(T) = {(, id}* and FIRST(T') = {*, ε}.
Finally, FOLLOW(F) = {(, id, *, ε}.
here FOLLOW(F) is find by this way:
T-->FT' means FOLLOW(T) IS subset of FOLLOW(F)
T'-->*FT' means FIRST(T') contain epsilon then except epsilon and add other values to set.

LL(1) grammar verification

Let G be a grammar such that:
S -> aBa
B -> bB | \epsilon
where \epsilon represents the empty string.
After computing FIRST and FOLLOW, is there a way to tell if G is LL(1) without resorting to the parsing table?
After computing the FIRST and FOLLOW sets for the variables of G, you can compute the length 1 lookahead sets LA(1) for the variables and rules of G. Then G is strong LL(1) iff the following condition holds:
LA(1)(A -> wi) partition LA(1)(A) for each variable A such that A -> wi is a rule.
Alternatively, you can prove that G is strong LL(1) from the definition of a strong LL(k) grammar without computing the FIRST and FOLLOW sets. This is oftentimes easier and less tedious than computing FIRST and FOLLOW for small grammars like G.
I don't have a book handy, so there might be an error in some of these definitions or computations. But this is how I would approach the problem. Computing the FIRST and FOLLOW sets gives:
FIRST(1)(S) = trunc(1)({x : S =>* x AND x IN Σ*})
= trunc(1)({ab^na : n >= 0})
= {a}
FIRST(1)(B) = trunc(1)({x : B =>* x AND x IN Σ*})
= trunc(1)({b^n : n >= 0})
= {ε,b}
FOLLOW(1)(S) = trunc(1)({x : S =>* uSv AND x IN FIRST(1)(v)})
= trunc(1)({x : x IN FIRST(1)(ε)})
= trunc(1)(FIRST(1)(ε))
= {ε}
FOLLOW(1)(B) = trunc(1)({x : S =>* uBv AND x IN FIRST(1)(v)})
= trunc(1)({x : x IN FIRST(1)(a)})
= trunc(1)(FIRST(1)(a))
= {a}
Computing the length 1 lookahead sets for the variables and rules gives:
LA(1)(S) = trunc(1)(FIRST(1)(S)FOLLOW(1)(S))
= trunc(1)({a}{ε})
= trunc(1){a}
= {a}
LA(1)(B) = trunc(1)(FIRST(1)(B)FOLLOW(1)(B))
= trunc(1)({ε,b}{a})
= trunc(1){a,b}
= {a,b}
LA(1)(S -> aBa) = trunc(1)(FIRST(1)(a)FIRST(1)(B)FIRST(1)(a)FOLLOW(1)(S))
= trunc(1){a}
= {a}
LA(1)(B -> bB) = trunc(1)(FIRST(1)(b)FIRST(1)(B))
= trunc(1){b}
= {b}
LA(1)(B -> ε) = trunc(1)(FIRST(1)(ε)FOLLOW(1)(b))
= trunc(1)({ε}{a})
= {a}
Since LA(1)(B -> ε) and LA(1)(B -> bB) partition LA(1)(B) and LA(1)(S -> aBa) trivially partitions LA(1)(S), G is strong LL(1).

Building LR(1) configuration lookahead

I really have some troubles to cauculate the lookahead when building the LR(1) item sets, i had tried some lecture notes form different sites, but still...
My example is
S -> E + S | E
E -> num | ( S )
The item set is
I0:
S’ -> . S $
S -> . E + S $
S -> . E $
E -> . num +,$
E -> . ( S ) +,$
I1:
S ->E .+ S $
S ->E . $
The first item in set I0
S’ -> . S $
is initialization.
The second item in set I0
S -> . E + S $
means there is nothing on stack, we expect to read E+S, then reduce iff the token after E+S is $.
The third item in set I0
S -> . E $
means that we expect to read E and reduce iff the token after E is $.
Then i am confused about the fouth item in set I0,
E -> . num +,$
I have no ideas why there are + and $ tokens.
and if anyone can explain this for me in plain English please.
For each configuration [A –> u•Bv, a] in I, for each production B –> w in G', and for
each terminal b in First(va) such that [B –> •w, b] is not in I: add [B –> •w, b] to I.
Thanks!!!
I think i figured it out.
i am using the algorithm of
for set I0:
Begin with [S' -> .S, $]
Match [A -> α.Bβ, a]
Then add in [B -> .γ, b]
Where terminal b is FIRST(βa)
for set I1...In
Compute GOTO(I0,X)
Add in X productions and LOOKAHEAD token
In the example
S -> E + S
S -> E
E -> num
E -> ( S )
Firstly,
S’ -> . S $
we try to match it to [A -> α.Bβ, a], That is
A =S', α = ε, B = S , β = ε , a = $ and
FIRST(βa) = {$}
Add in [B -> .γ, b], which are
S -> . E + S $ ...1
S -> . E $ ...2
in I0.
Then, we need to add in productions for E as 1 and 2.
In this case, our [A -> α.Bβ, a] are 1 and 2.
Thus, FIRST(βa) = { + , $ }, and we have
E -> . num +,$
E -> . ( S ) +,$
Now, we compute GOTO(I0, X)
For X = E
we move dot one position and found no productions need to be added. So we just add in second component $ from
S -> . E + S $
S -> . E $
which gives us I1
S ->E .+ S $
S ->E . $
and so on...
So, is this the correct and efficient way when building LR(1) item sets?
For
E -> . num +,$
E -> . ( S ) +,$
the +,$ indicate that only these tokens can follow a number or a closing parenthesis. Think about it: The grammar does noty allow adjacent num's or ()'s, they must either be at the end of the sentence or followed by a +.
As for translation request, it is a fancy way of saying how to calculate the set of tokens that can follow a given token. The +,$ above are an example. They are the only legal tokens that can follow num and ).

Resources