Follow set example doesn't follow any rules? - parsing

S → asg
S → if C then S E
C → bool
E → else S
E → λ
all the lower case and the λ are terminal symbols
I need help deriving the follow set of this grammar. I normally do not have trouble with these problems and I know the rules, but when I practiced this example from my book this is the only thing I could get:
Follow(S) = {$} U Follow(E)
Follow(C) =
Follow(E) =

According to https://www.cs.uaf.edu/~cs331/notes/FirstFollow.pdf:
To compute FOLLOW(A) for all nonterminals A, apply the following rules until nothing can be added to any FOLLOW set:
Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker.
If there is a production A ⇒ αΒβ, then everything in FIRST(β), except for ε, is placed in FOLLOW(B).
If there is a production A ⇒ αΒ, or a production A ⇒ αΒβ where FIRST(β) contains ε (i.e., β ⇒ε), then everything in FOLLOW(A) is in FOLLOW(B).
Assuming S is the start symbol in your grammar and λ represents an empty string, we get:
{$} ⊆ Follow(S) by rule 1.
(First(E) \ {λ}) ⊆ Follow(S) by rule 2 / production 2.
Follow(E) ⊆ Follow(S) by rule 3 / production 4.
(First(then S E) \ {λ}) ⊆ Follow(C) by rule 2 / production 2.
Follow(S) ⊆ Follow(E) by rule 3 / production 2.
First(then S E) is just then (because it's terminal), so we have {then} ⊆ Follow(C).
This is the only constraint on Follow(C), so the smallest set that satisfies it is:
Follow(C) = {then}
Because we have Follow(E) ⊆ Follow(S) and Follow(S) ⊆ Follow(E), it follows (hah) that they're equal:
Follow(E) = Follow(S)
Finally we have
Follow(S) = {$} ∪ (First(E) \ {λ})
Fortunately First(E) is easy because E only has two productions, one of which is empty and the other starts with a terminal symbol:
First(E) = {λ, else}
Therefore
Follow(S) = {$, else}
and
Follow(E) = {$, else}

Related

How to make a parse tree from Follow() set in LL parsing?

I was given the question for test:
Show the parse tree of a string proving that b is in the follow of T.
S=> ET
T=>bSc | d
E=>aTE| ε
I solved the First set :
First(S)=>First(E)=>{a} U First(T)=> {a,b,d}
First(T)=>{b,d}
First(E)=>{a,ε}
And the Follow set :
Follow(S)=>{$,c}
Follow(T)=>Follow(S) U First(E)=> {$,c} U First(E)=>{$,c,a}
Follow(E)=>First(T) U Follow(E)=>{b,d}
Where I am going wrong ?
You wrote:
Follow(T) ⇒ Follow(S) ∪ First(E)
But ε is in First(E), so that should be:
Follow(T) ⇒ Follow(S) ∪ First(E) ∪ Follow(E)

Finding FIRST sets in a grammar

Today I am reading how to find First and Follow of a grammar. I saw this grammar:
S → ACB | CbB | Ba
A → da | BC
B → g | ε
C → h | ε
The claim is that
FIRST(S) = FIRST(ABC) U FIRST(CbB) U FIRST(Ba)
= {d, g, h, ε} U {h, b} U {g, a}
= {d, g, h, ε, b, a}
I don't understand how a and b are in this set. Can anyone explain this?
Notice that B and C both are nullable (they can produce ε). This means that from the production
S → CbB
we get that b ∈ FIRST(S), since if we use the production C → ε we can get a production that starts with b.
Similarly, note that
S → Ba
is a production, so we get a ∈ FIRST(S) because we can use the production B → ε to get an a at the front of a string derivable from S.
Hope this helps!

How to calculate FIRST sets by hand

I don't understand one of the examples provided by my tutor.
Example
S ::= aBA | BB | Bc
A ::= Ad | d
B ::= ε
We have
FIRST(B) = FIRST(ε)
= {ε}
FIRST(A) = FIRST(Ad) ∪ FIRST(d)
= FIRST(A) ∪ {d}
= {d}
FIRST(S) = FIRST(aBA) ∪ FIRST(BB) ∪ FIRST(Bc)
= FIRST(a) ∪ (FIRST(B)\{ε}) ∪ FIRST(B) ∪ (FIRST(B)\{ε) ∪ FIRST(c)
= {a, ε, c}
Why is there a FIRST(B) in the FIRST(S) calculation? Shouldn't it be
(FIRST(B)\{ε)?
Why is A missing from FIRST(S) calculation?
This page gives the mechanical rules for deriving FIRST (and FOLLOW) sets. I'll try to explain the logic behind these rules and how they apply to your example.
FIRST sets
FIRST(u) is the set of terminals that can occur first in a full derivation of u, where u is a sequence of terminals and non-terminals. In other words, when calculating the FIRST(u) set, we are looking only for the terminals that could possibly be the first terminal of a string that can be derived from u.
FIRST(aBA)
Given the definition, we can see that FIRST(aBA) reduces to FIRST(a), then to a. This is because no matter what the A and B productions are, the terminal a will always occur first in anything derived from aBA since a is a terminal, and can't be removed from the front of that sequence.
FIRST(Bc)
I'm going to skip FIRST(BB) for now and move on to FIRST(Bc). Things are different here, since B is a non-terminal. At first, we say that anything in FIRST(B) is also in FIRST(S). Unfortunately, FIRST(B) contains ε which causes problems, as we could have the scenario
FIRST(Bc)
-> FIRST(εc)
= FIRST(c)
= c
where the arrow is a possible derivation/reduction. In general, we therefore say that FIRST(Xu), where ε is in FIRST(X), is equal to (FIRST(X)\{ε}) ∪ FIRST(u). This explains the last two terms in your calculation.
FIRST(BB)
Using the above rule, we can now derive FIRST(BB) as (FIRST(B)\{ε}) ∪ FIRST(B). Similarly, if we were calculating FIRST(BBB) we would reduce it as
FIRST(BBB)
= (FIRST(B)\{ε}) ∪ FIRST(BB)
= (FIRST(B)\{ε}) ∪ (FIRST(B)\{ε}) ∪ FIRST(B)
Of note is that while calculating a FIRST set, the last symbol in a sequence of symbols never has the empty string removed from it, because at this point, the empty string is a legitimate possibility. This can be seen in a possible derivation in your example:
S
-> BB
-> εε
-> ε
Hopefully you can see from all of the above why FIRST(B) appears in your calculation while FIRST(A) does not.

Multiple entries in an LL(1) parsing table?

Given this grammar:
S → S1 S2
S1 → a | ε
S2 → ab | ε
Therefore, we have
FIRST(S1) = { a, ε }
FOLLOW(S1) = { a }
Does that mean that in the parsing table I'll have multiple definitions in the row for S1 and the column for a?
Yes, that's correct. (However, note that your FOLLOW set is wrong; it also contains the end-of-input marker $). The issue here is that if the parser sees an a, it can't tell if that's because it wants to use the derivation
S → S1S2 → a S2
Or the derivation
S → S1S2 → S2 → ab
To fix this, you can note that your grammar only generates the strings { a, ab, aab }. Therefore, you can build an LL(1) for the language grammar that directly produces those three strings:
S → aY
Y → ε | aZ
Z → ε | b
Hope this helps!

LL(1) Parsing -- First(A) with Recursive First Alternatives

How would I apply the FIRST() rule on a production such as :
A -> AAb | Ab | s
where A is a non-terminal, and b,s are terminals.
FIRST(A) of alternatives 1 & 2 would be A again, but such would end in infinite applications of FIRST, since I need a terminal to get the FIRST set?
To compute FIRST sets, you typically perform a fixed-point iteration. That is, you start off with a small set of values, then iteratively recompute FIRST sets until the sets converge.
In this case, you would start off by noting that the production A → s means that FIRST(A) must contain {s}. So initially you set FIRST(A) = {s}.
Now, you iterate across each production of A and update FIRST based on the knowledge of the FIRST sets you've computed so far. For example, the rule
A → AAb
Means that you should update FIRST(A) to include all elements of FIRST(AAb). This causes no change to FIRST(A). You then visit
A → Ab
You again update FIRST(A) to include FIRST(Ab), which is again a no-op. Finally, you visit
A → s
And since FIRST(A) already contains s, this causes no change.
Since nothing changed on this iteration, you would end up with FIRST(A) = {s}, which is indeed correct because any derivation starting at A ultimately will produce an s as its first character.
For more information, you might find these lecture slides useful (here's part two). They describe in detail how top-down parsing works and how to iteratively compute FIRST sets.
Hope this helps!
My teaching notes are in Spanish, but the algorithms are in English. This is one way to calculate FIRST:
foreach a ∈ Σ do
F(a) := {a}
for each A ∈ N do
if A→ε ∈ P then
F(A) := {ε}
else
F(A) := ∅
repeat
for each A ∈ N do
F'(A) := F(A)
for each A → X1X2...Xn ∈ P do
if n > 0 then
F(A) := F(A) ∪ F'(X1) ⋅k F'(X2) ⋅k ... ⋅k F'(Xn)
until F(A) = F'(A) forall A ∈ N
FIRSTk(X) := F(X) forall X ∈ (Σ ∪ N)
Σ is the alphabet (terminals), N is the set of non-terminals, P is the set of productions (rules), ε is the null string, and ⋅k is concatenation trimmed to k places. Note that ∅ ⋅k x = ∅, and that concatenating two sets produces the concatenation of the elements in the Cartesian product.
The easiest way to calculate FIRST sets by hand is by using one table per algorithm iteration.
F(A) = ∅
F'(A) = F(A) ⋅1 F(A) .1 F(b) U F(A) .1 F(b) U F(s)
F'(A) = ∅ ⋅1 ∅ ⋅1 {b} U ∅ ⋅1 {b} U {s}
F'(A) = ∅ U ∅ U {s}
F'(A) = {s}
F''(A) = F'(A) ⋅1 F'(A) .1 F'(b) U F'(A) .1 F'(b) U F'(s)
F''(A) = {s} ⋅1 {s} ⋅1 {b} U {s} ⋅1 {b} U {s}
F''(A) = {s} U {s} U {s}
F''(A) = {s}
And we're done, because F' = F'', so FIRST = F'', and FIRST(A) = {s}.
your grammar rule has left recursion as you already realized and LL parsers are not able to parse grammars with left recursion.
So you need to get rid of left recursion first and then you should be able to compute the first set for the rule.

Resources