Handling nullable productions in LR(0) grammars - parsing

I think it's a pretty straightforward question, but I couldn't find the answer anywhere.
If I have a grammar with a non-terminal that derivates NULL, like this:
S -> B$
B -> idP
P -> (E)
P ->
E -> B
How do I handle the production #3 to diagram the LR(0) states of it? Do I have to include a column corresponding to the transition with the empty set in my LR(0) parsing table?

The item P -> · is not different from any other item with the · at the right-hand end; the fact that nothing precedes the · does not make it special. The closure of the item
B -> id · P
will be the state q:
B -> id · P
P -> · ( E )
P -> ·
from which goto(q, P) will indicate a transition to B -> id P · and goto(q, () will indicate a transition to P -> ( · E ). goto on $ and ) are not defined on that state, but action is; it will indicate that P should be reduced using the P -> rule, after which goto(q, P) will be used.

Related

LALR(1) Parser DFA Lookahead Core Question

I am having trouble understanding what the rules are for adding a lookahead to a core production during the construction of the DFA. To illustrate my confusion, I will be using an online parser generator that exposes all the internal calculations; this_tool. (<- open in a new tab)
(The formating is: NONTERMINAL -> RULE, LOOKAHEADS, where the lookaheads are forward slash sperated)
Using this grammar as an example:
S -> E
E -> ( E )
E -> N O E
E -> N
N -> 1
N -> 2
N -> 3
O -> +
O -> -
Copy and pasting the above grammar into the lalr parser generator will produce a dfa with 12 states (click the >>). My question is finally, why are the goto(0, N) kernel productions ( {[E -> N.O E, $/)]; [E -> N., $/)]} ) initiated with the ) terminal? Where does the ) come from? I would expect the goto(0, N) to be {[E -> N.O E, $]; [E -> N., $]}. Equally the kernel production in the goto(0, ( ) has an 'extra' ).
As the dfa is being constructed, equal cores are merged (the core is the set of productions that introduce a new state by performing closure on that set). State 2 has production [E -> .N, )];, which when merged with [E -> N., $] produces the correct output, but there's no way for state 0 to have known about lookahead of )
Thanks in advance, sorry if this was a confusing and specific question and about using an external website to demonstrate my issue.✌️
The solution is to propagate any newly found lookaheads then 'goto' the states where those lookaheads are cores of.
The method is described in chapter 4 section 7.5 of the Dragon Book 2nd ed.
(here: https://github.com/muthukumarse/books/blob/master/Dragon%20Book%20Compilers%20Principle%20Techniques%20and%20Tools%202nd%20Edtion.pdf)

Which productions are considered in LR(1) lookahead?

I'm currently looking at two closure calculation examples using the tool at
http://jsmachines.sourceforge.net/machines/lr1.html
Example 1
S -> A c
A -> b B
B -> A b
Here, in the initial state ends up with a closure of:
[S -> .A c, $]; [A -> .b B, c]}
Example 2
S -> A B
A -> a
B -> b
B -> ''
The calculated first step closure is:
{[S -> .A B, $]; [A -> .a, b/$]}
In example 1, why is the follow of b from rule 3 not included in the lookahead? In case 2, we follow B to figure out that $ is part of the lookahead so is there some special reason to not consider all rules in case 1?
When doing a closure with ". A α" we use FIRST(α) as the lookahead, and only include the containing (parent) lookahead if ε ∈ FIRST(α). In example 1, ε ∉ FIRST(c), so the lookahead is just c. In example 2, ε ∈ FIRST(B), so we add the containing lookahead ($ in this case) to the lookahead.
FOLLOW is never relevant.

Why do we put 'A' as the look ahead symbol when all have `$`?

I am using canonical LR Method to construct the Parsing table.
Consider the grammar :
s -> D C A
s -> D a B
a -> C
s -> a A
The book I am reading mentions the first closure state as :
I(0) = [s -> .D C A , $]
[s -> .D a B , $]
[a -> .C , A]
[s -> .a A , $]
In the state
[a -> .C , A]
from where does A in the item come ? All the items have $ as a Look ahead symbol and third item has A .
Please explain this.
The item:
[ a -> · C, A ]
Results from the expansion of the item:
[ s -> · a A ]
in which the nonterminal a is followed by the terminal A. That means that the reduction of C to a can occur in a successor state whose context is s -> a · A; or, in other words, when the lookahead is A.
All of the other items in the state you mention result either from the initial (implicit) item
[ s' -> · s $ ]
where the nonterminal s is followed by the pseudo-terminal $ (that is, the end-of-input marker), so that their lookaheads are all $.

LR(1) Item DFA - Computing Lookaheads

I have trouble understanding how to compute the lookaheads for the LR(1)-items.
Lets say that I have this grammar:
S -> AB
A -> aAb | a
B -> d
A LR(1)-item is an LR(0) item with a lookahead. So we will get the following LR(0)-item for state 0:
S -> .AB , {lookahead}
A -> .aAb, {lookahead}
A -> .a, {lookahead}
State: 1
A -> a.Ab, {lookahead}
A -> a. ,{lookahead}
A -> .aAb ,{lookahead}
A ->.a ,{lookahead}
Can somebody explain how to compute the lookaheads ? What is the general approach ?
Thank you in advance
The lookaheads used in an LR(1) parser are computed as follows. First, the start state has an item of the form
S -> .w ($)
for every production S -> w, where S is the start symbol. Here, the $ marker denotes the end of the input.
Next, for any state that contains an item of the form A -> x.By (t), where x is an arbitrary string of terminals and nonterminals and B is a nonterminal, you add an item of the form B -> .w (s) for every production B -> w and for every terminal in the set FIRST(yt). (Here, FIRST refers to FIRST sets, which are usually introduced when talking about LL parsers. If you haven't seen them before, I would take a few minutes to look over those lecture notes).
Let's try this out on your grammar. We start off by creating an item set containing
S -> .AB ($)
Next, using our second rule, for every production of A, we add in a new item corresponding to that production and with lookaheads of every terminal in FIRST(B$). Since B always produces the string d, FIRST(B$) = d, so all of the productions we introduce will have lookahead d. This gives
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
Now, let's build the state corresponding to seeing an 'a' in this initial state. We start by moving the dot over one step for each production that starts with a:
A -> a.Ab (d)
A -> a. (d)
Now, since the first item has a dot before a nonterminal, we use our rule to add one item for each production of A, giving those items lookahead FIRST(bd) = b. This gives
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
Continuing this process will ultimately construct all the LR(1) states for this LR(1) parser. This is shown here:
[0]
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
[1]
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
[2]
A -> a.Ab (b)
A -> a. (b)
A -> .aAb (b)
A -> .a (b)
[3]
A -> aA.b (d)
[4]
A -> aAb. (d)
[5]
S -> A.B ($)
B -> .d ($)
[6]
B -> d. ($)
[7]
S -> AB. ($)
[8]
A -> aA.b (b)
[9]
A -> aAb. (b)
In case it helps, I taught a compilers course last summer and have all the lecture slides available online. The slides on bottom-up parsing should cover all of the details of LR parsing and parse table construction, and I hope that you find them useful!
Hope this helps!
here is the LR(1) automaton for the grammar as the follow has been done above
I think it's better for the understanding to trying draw the automaton and the flow will make the idea of the lookaheads clearer
The LR(1) item set constructed by you should have two more items.
I8 A--> aA.b , b from I2
I9 A--> aAb. , b from I8
I also get 11 states, not 8:
State 0
S: .A B ["$"]
A: .a A b ["d"]
A: .a ["d"]
Transitions
S -> 1
A -> 2
a -> 5
Reductions
none
State 1
S_Prime: S .$ ["$"]
Transitions
none
Reductions
none
State 2
S: A .B ["$"]
B: .d ["$"]
Transitions
B -> 3
d -> 4
Reductions
none
State 3
S: A B .["$"]
Transitions
none
Reductions
$ => S: A B .
State 4
B: d .["$"]
Transitions
none
Reductions
$ => B: d .
State 5
A: a .A b ["d"]
A: .a A b ["b"]
A: .a ["b"]
A: a .["d"]
Transitions
A -> 6
a -> 8
Reductions
d => A: a .
State 6
A: a A .b ["d"]
Transitions
b -> 7
Reductions
none
State 7
A: a A b .["d"]
Transitions
none
Reductions
d => A: a A b .
State 8
A: a .A b ["b"]
A: .a A b ["b"]
A: .a ["b"]
A: a .["b"]
Transitions
A -> 9
a -> 8
Reductions
b => A: a .
State 9
A: a A .b ["b"]
Transitions
b -> 10
Reductions
none
State 10
A: a A b .["b"]
Transitions
none
Reductions
b => A: a A b .

How to determine the FIRST set of E in this grammar?

I wonder how to determine the FIRST set of E with grammar:
E -> XYE | e
X -> x
Y -> y
Can anyone give me some direction?
Well, assuming that you're starting with E, then either the first terminal is x via the E→XYE production (since X always produces x) or it is e via the E→e production. So First(E) = {x,e}.
That seems pretty straightforward...
Treat rules of the form A -> ...x... | ...y ....
as two rules A -> ...x... and B -> ...y...
Form a set S initially containing rules of form E-> ....
then
Set a set P to empty.
Set a set F to empty.
Repeat until S is empty
Choose element of S, and call it R
If R is in P, remove R from S
Elsif R is of the form A -> b ...
then { add b to F,
add R to P,
remove R from S}
Else (R is the form A -> B ...)
then { place all rules of form B -> ... into S
remove R from S}
End
WHen the loop terminates, F contains the tokens which
are the First(F).
This does not take into account empty productions.

Resources