I'm currently looking at two closure calculation examples using the tool at
http://jsmachines.sourceforge.net/machines/lr1.html
Example 1
S -> A c
A -> b B
B -> A b
Here, in the initial state ends up with a closure of:
[S -> .A c, $]; [A -> .b B, c]}
Example 2
S -> A B
A -> a
B -> b
B -> ''
The calculated first step closure is:
{[S -> .A B, $]; [A -> .a, b/$]}
In example 1, why is the follow of b from rule 3 not included in the lookahead? In case 2, we follow B to figure out that $ is part of the lookahead so is there some special reason to not consider all rules in case 1?
When doing a closure with ". A α" we use FIRST(α) as the lookahead, and only include the containing (parent) lookahead if ε ∈ FIRST(α). In example 1, ε ∉ FIRST(c), so the lookahead is just c. In example 2, ε ∈ FIRST(B), so we add the containing lookahead ($ in this case) to the lookahead.
FOLLOW is never relevant.
Related
I'm trying to calculate First and Follow of the following grammar:
S -> A B C D E
A -> a
A -> EPSILON
B -> b
B -> EPSILON
C -> c
D -> d
D -> EPSILON
E -> e
E -> EPSILON
I calculated them and got First(S)={a,b,c}. But using this tools, says: First(S)= {a, ε, c, b}. Why epsilon is part of First(S)? As I understand it should not be there. Is it my mistake or a bug? In case it's a bug. Are there other tools I can use to verify my results? In case it's my mistake, it would be helpful to understand why. Printscreen:
Also I got Follow(C)={d,e,$} but their result is Follow(C)={c, d, $}. Why?
The following grammar generates the sentences a, a, a, b, b, b, ..., h, b. Unfortunately it is not LR(1) so cannot be used with tools such as "yacc".
S -> a comma a.
S -> C comma b.
C -> a | b | c | d | e | f | g | h.
Is it possible to transform this grammar to be LR(1) (or even LALR(1), LL(k) or LL(1)) without the need to expand the nonterminal C and thus significantly increase the number of productions?
Not as long as you have the nonterminal C unchanged preceding comma in some rule.
In that case it is clear that a parser cannot decide, having seen an "a", and having lookahead "comma", whether to reduce or shift. So with C unchanged, this grammar is not LR(1), as you have said.
But the solution lies in the two phrases, "having seen an 'a'" and "C unchanged". You asked if there's fix that doesn't expand C. There isn't, but you could expand C "a little bit" by removing "a" from C, since that's the source of the problem:
S -> a comma a .
S -> a comma b .
S -> C comma b .
C -> b | c | d | e | f | g | h .
So, we did not "significantly" increase the number of productions.
Say I have:
S -> A
A -> B C A a | ϵ
B -> k | ϵ
C -> m
Now in the initial state S' -> S, I'm going to include:
S' -> .S
Then the closure of S:
A -> .B C A a , A -> .
Closure would also include B -> .k and B -> . obviously.
But since B -> ϵ is a production, would I also have to include C -> ,m in the initial state? Since in A -> B C A a, B can be ϵ.
I just wanted to know if I'm right and if this is the right way to deal with epsilons in grammar. If not, do guide me in the right direction. Thanks!
No, C -> . m is not part of the initial state, because C cannot be reduced without a preceding B (even if the B is reduced from ε).
I am using canonical LR Method to construct the Parsing table.
Consider the grammar :
s -> D C A
s -> D a B
a -> C
s -> a A
The book I am reading mentions the first closure state as :
I(0) = [s -> .D C A , $]
[s -> .D a B , $]
[a -> .C , A]
[s -> .a A , $]
In the state
[a -> .C , A]
from where does A in the item come ? All the items have $ as a Look ahead symbol and third item has A .
Please explain this.
The item:
[ a -> · C, A ]
Results from the expansion of the item:
[ s -> · a A ]
in which the nonterminal a is followed by the terminal A. That means that the reduction of C to a can occur in a successor state whose context is s -> a · A; or, in other words, when the lookahead is A.
All of the other items in the state you mention result either from the initial (implicit) item
[ s' -> · s $ ]
where the nonterminal s is followed by the pseudo-terminal $ (that is, the end-of-input marker), so that their lookaheads are all $.
I have trouble understanding how to compute the lookaheads for the LR(1)-items.
Lets say that I have this grammar:
S -> AB
A -> aAb | a
B -> d
A LR(1)-item is an LR(0) item with a lookahead. So we will get the following LR(0)-item for state 0:
S -> .AB , {lookahead}
A -> .aAb, {lookahead}
A -> .a, {lookahead}
State: 1
A -> a.Ab, {lookahead}
A -> a. ,{lookahead}
A -> .aAb ,{lookahead}
A ->.a ,{lookahead}
Can somebody explain how to compute the lookaheads ? What is the general approach ?
Thank you in advance
The lookaheads used in an LR(1) parser are computed as follows. First, the start state has an item of the form
S -> .w ($)
for every production S -> w, where S is the start symbol. Here, the $ marker denotes the end of the input.
Next, for any state that contains an item of the form A -> x.By (t), where x is an arbitrary string of terminals and nonterminals and B is a nonterminal, you add an item of the form B -> .w (s) for every production B -> w and for every terminal in the set FIRST(yt). (Here, FIRST refers to FIRST sets, which are usually introduced when talking about LL parsers. If you haven't seen them before, I would take a few minutes to look over those lecture notes).
Let's try this out on your grammar. We start off by creating an item set containing
S -> .AB ($)
Next, using our second rule, for every production of A, we add in a new item corresponding to that production and with lookaheads of every terminal in FIRST(B$). Since B always produces the string d, FIRST(B$) = d, so all of the productions we introduce will have lookahead d. This gives
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
Now, let's build the state corresponding to seeing an 'a' in this initial state. We start by moving the dot over one step for each production that starts with a:
A -> a.Ab (d)
A -> a. (d)
Now, since the first item has a dot before a nonterminal, we use our rule to add one item for each production of A, giving those items lookahead FIRST(bd) = b. This gives
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
Continuing this process will ultimately construct all the LR(1) states for this LR(1) parser. This is shown here:
[0]
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
[1]
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
[2]
A -> a.Ab (b)
A -> a. (b)
A -> .aAb (b)
A -> .a (b)
[3]
A -> aA.b (d)
[4]
A -> aAb. (d)
[5]
S -> A.B ($)
B -> .d ($)
[6]
B -> d. ($)
[7]
S -> AB. ($)
[8]
A -> aA.b (b)
[9]
A -> aAb. (b)
In case it helps, I taught a compilers course last summer and have all the lecture slides available online. The slides on bottom-up parsing should cover all of the details of LR parsing and parse table construction, and I hope that you find them useful!
Hope this helps!
here is the LR(1) automaton for the grammar as the follow has been done above
I think it's better for the understanding to trying draw the automaton and the flow will make the idea of the lookaheads clearer
The LR(1) item set constructed by you should have two more items.
I8 A--> aA.b , b from I2
I9 A--> aAb. , b from I8
I also get 11 states, not 8:
State 0
S: .A B ["$"]
A: .a A b ["d"]
A: .a ["d"]
Transitions
S -> 1
A -> 2
a -> 5
Reductions
none
State 1
S_Prime: S .$ ["$"]
Transitions
none
Reductions
none
State 2
S: A .B ["$"]
B: .d ["$"]
Transitions
B -> 3
d -> 4
Reductions
none
State 3
S: A B .["$"]
Transitions
none
Reductions
$ => S: A B .
State 4
B: d .["$"]
Transitions
none
Reductions
$ => B: d .
State 5
A: a .A b ["d"]
A: .a A b ["b"]
A: .a ["b"]
A: a .["d"]
Transitions
A -> 6
a -> 8
Reductions
d => A: a .
State 6
A: a A .b ["d"]
Transitions
b -> 7
Reductions
none
State 7
A: a A b .["d"]
Transitions
none
Reductions
d => A: a A b .
State 8
A: a .A b ["b"]
A: .a A b ["b"]
A: .a ["b"]
A: a .["b"]
Transitions
A -> 9
a -> 8
Reductions
b => A: a .
State 9
A: a A .b ["b"]
Transitions
b -> 10
Reductions
none
State 10
A: a A b .["b"]
Transitions
none
Reductions
b => A: a A b .