LR(1) Item DFA - Computing Lookaheads - parsing

I have trouble understanding how to compute the lookaheads for the LR(1)-items.
Lets say that I have this grammar:
S -> AB
A -> aAb | a
B -> d
A LR(1)-item is an LR(0) item with a lookahead. So we will get the following LR(0)-item for state 0:
S -> .AB , {lookahead}
A -> .aAb, {lookahead}
A -> .a, {lookahead}
State: 1
A -> a.Ab, {lookahead}
A -> a. ,{lookahead}
A -> .aAb ,{lookahead}
A ->.a ,{lookahead}
Can somebody explain how to compute the lookaheads ? What is the general approach ?
Thank you in advance

The lookaheads used in an LR(1) parser are computed as follows. First, the start state has an item of the form
S -> .w ($)
for every production S -> w, where S is the start symbol. Here, the $ marker denotes the end of the input.
Next, for any state that contains an item of the form A -> x.By (t), where x is an arbitrary string of terminals and nonterminals and B is a nonterminal, you add an item of the form B -> .w (s) for every production B -> w and for every terminal in the set FIRST(yt). (Here, FIRST refers to FIRST sets, which are usually introduced when talking about LL parsers. If you haven't seen them before, I would take a few minutes to look over those lecture notes).
Let's try this out on your grammar. We start off by creating an item set containing
S -> .AB ($)
Next, using our second rule, for every production of A, we add in a new item corresponding to that production and with lookaheads of every terminal in FIRST(B$). Since B always produces the string d, FIRST(B$) = d, so all of the productions we introduce will have lookahead d. This gives
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
Now, let's build the state corresponding to seeing an 'a' in this initial state. We start by moving the dot over one step for each production that starts with a:
A -> a.Ab (d)
A -> a. (d)
Now, since the first item has a dot before a nonterminal, we use our rule to add one item for each production of A, giving those items lookahead FIRST(bd) = b. This gives
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
Continuing this process will ultimately construct all the LR(1) states for this LR(1) parser. This is shown here:
[0]
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
[1]
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
[2]
A -> a.Ab (b)
A -> a. (b)
A -> .aAb (b)
A -> .a (b)
[3]
A -> aA.b (d)
[4]
A -> aAb. (d)
[5]
S -> A.B ($)
B -> .d ($)
[6]
B -> d. ($)
[7]
S -> AB. ($)
[8]
A -> aA.b (b)
[9]
A -> aAb. (b)
In case it helps, I taught a compilers course last summer and have all the lecture slides available online. The slides on bottom-up parsing should cover all of the details of LR parsing and parse table construction, and I hope that you find them useful!
Hope this helps!

here is the LR(1) automaton for the grammar as the follow has been done above
I think it's better for the understanding to trying draw the automaton and the flow will make the idea of the lookaheads clearer

The LR(1) item set constructed by you should have two more items.
I8 A--> aA.b , b from I2
I9 A--> aAb. , b from I8

I also get 11 states, not 8:
State 0
S: .A B ["$"]
A: .a A b ["d"]
A: .a ["d"]
Transitions
S -> 1
A -> 2
a -> 5
Reductions
none
State 1
S_Prime: S .$ ["$"]
Transitions
none
Reductions
none
State 2
S: A .B ["$"]
B: .d ["$"]
Transitions
B -> 3
d -> 4
Reductions
none
State 3
S: A B .["$"]
Transitions
none
Reductions
$ => S: A B .
State 4
B: d .["$"]
Transitions
none
Reductions
$ => B: d .
State 5
A: a .A b ["d"]
A: .a A b ["b"]
A: .a ["b"]
A: a .["d"]
Transitions
A -> 6
a -> 8
Reductions
d => A: a .
State 6
A: a A .b ["d"]
Transitions
b -> 7
Reductions
none
State 7
A: a A b .["d"]
Transitions
none
Reductions
d => A: a A b .
State 8
A: a .A b ["b"]
A: .a A b ["b"]
A: .a ["b"]
A: a .["b"]
Transitions
A -> 9
a -> 8
Reductions
b => A: a .
State 9
A: a A .b ["b"]
Transitions
b -> 10
Reductions
none
State 10
A: a A b .["b"]
Transitions
none
Reductions
b => A: a A b .

Related

Calculating First and Follow of a grammar

I'm trying to calculate First and Follow of the following grammar:
S -> A B C D E
A -> a
A -> EPSILON
B -> b
B -> EPSILON
C -> c
D -> d
D -> EPSILON
E -> e
E -> EPSILON
I calculated them and got First(S)={a,b,c}. But using this tools, says: First(S)= {a, ε, c, b}. Why epsilon is part of First(S)? As I understand it should not be there. Is it my mistake or a bug? In case it's a bug. Are there other tools I can use to verify my results? In case it's my mistake, it would be helpful to understand why. Printscreen:
Also I got Follow(C)={d,e,$} but their result is Follow(C)={c, d, $}. Why?

Which productions are considered in LR(1) lookahead?

I'm currently looking at two closure calculation examples using the tool at
http://jsmachines.sourceforge.net/machines/lr1.html
Example 1
S -> A c
A -> b B
B -> A b
Here, in the initial state ends up with a closure of:
[S -> .A c, $]; [A -> .b B, c]}
Example 2
S -> A B
A -> a
B -> b
B -> ''
The calculated first step closure is:
{[S -> .A B, $]; [A -> .a, b/$]}
In example 1, why is the follow of b from rule 3 not included in the lookahead? In case 2, we follow B to figure out that $ is part of the lookahead so is there some special reason to not consider all rules in case 1?
When doing a closure with ". A α" we use FIRST(α) as the lookahead, and only include the containing (parent) lookahead if ε ∈ FIRST(α). In example 1, ε ∉ FIRST(c), so the lookahead is just c. In example 2, ε ∈ FIRST(B), so we add the containing lookahead ($ in this case) to the lookahead.
FOLLOW is never relevant.

Handling nullable productions in LR(0) grammars

I think it's a pretty straightforward question, but I couldn't find the answer anywhere.
If I have a grammar with a non-terminal that derivates NULL, like this:
S -> B$
B -> idP
P -> (E)
P ->
E -> B
How do I handle the production #3 to diagram the LR(0) states of it? Do I have to include a column corresponding to the transition with the empty set in my LR(0) parsing table?
The item P -> · is not different from any other item with the · at the right-hand end; the fact that nothing precedes the · does not make it special. The closure of the item
B -> id · P
will be the state q:
B -> id · P
P -> · ( E )
P -> ·
from which goto(q, P) will indicate a transition to B -> id P · and goto(q, () will indicate a transition to P -> ( · E ). goto on $ and ) are not defined on that state, but action is; it will indicate that P should be reduced using the P -> rule, after which goto(q, P) will be used.

SLR Parsing - with an epsilon production

Say I have:
S -> A
A -> B C A a | ϵ
B -> k | ϵ
C -> m
Now in the initial state S' -> S, I'm going to include:
S' -> .S
Then the closure of S:
A -> .B C A a , A -> .
Closure would also include B -> .k and B -> . obviously.
But since B -> ϵ is a production, would I also have to include C -> ,m in the initial state? Since in A -> B C A a, B can be ϵ.
I just wanted to know if I'm right and if this is the right way to deal with epsilons in grammar. If not, do guide me in the right direction. Thanks!
No, C -> . m is not part of the initial state, because C cannot be reduced without a preceding B (even if the B is reduced from ε).

Why do we put 'A' as the look ahead symbol when all have `$`?

I am using canonical LR Method to construct the Parsing table.
Consider the grammar :
s -> D C A
s -> D a B
a -> C
s -> a A
The book I am reading mentions the first closure state as :
I(0) = [s -> .D C A , $]
[s -> .D a B , $]
[a -> .C , A]
[s -> .a A , $]
In the state
[a -> .C , A]
from where does A in the item come ? All the items have $ as a Look ahead symbol and third item has A .
Please explain this.
The item:
[ a -> · C, A ]
Results from the expansion of the item:
[ s -> · a A ]
in which the nonterminal a is followed by the terminal A. That means that the reduction of C to a can occur in a successor state whose context is s -> a · A; or, in other words, when the lookahead is A.
All of the other items in the state you mention result either from the initial (implicit) item
[ s' -> · s $ ]
where the nonterminal s is followed by the pseudo-terminal $ (that is, the end-of-input marker), so that their lookaheads are all $.

Resources