How to understand Example-4.64 of the syntax analysis chapter in Dragon Book? - parsing

everybody!
When I learn dragon book, I encountered some trouble. I can't understand the first step in Eaxmple-4.64, which appears in subsection 4.7.5 and page 273.
Problem
At first, Eaxmple-4.61 gives an augmented non-SLR grammar. The original text is as follows:
Example 4.61 : We shall use as an example of the efficient LALR(1) table construction method the non-SLR grammar from Example 4.48, which we reproduce below in its augmented form:
S' -> S
S -> L = R | R
L -> *R | id
R -> L
Then, Eaxmple-4.64 wants to construct the kernels of LALR(1) items for the above grammer. The original text is as follows:
Eaxmple-4.64 : Let us construct the kernels of the LALR(1) items for the grammar of Example 4.61. The kernels of the LR(0) items were shown in Fig. 4.44. When we apply Algorithm 4.62 to the kernel of set of items I0, we first compute CLOSURE({ [S'->.S , #] }), which is
S' -> .S, #
S -> .L = R, #
S -> .R, #
L -> .*R, #/= // why is there a "=".
L -> .id, #/= // why is there a "=".
R -> .L, #
And the pseudo code CLOSUER(I) as follows:
But I think the answer is:
S' -> .S, #
S -> .L = R, #
S -> .R, #
L -> .*R, = // the difference
L -> .id, = // the difference
R -> .L, #
I don't know how the # is derived in L -> .*R, #/= and L -> .id, #/=. Could anybody tell me the reason. Thanks!

Both of them come from the closure of the item R→.L, #, which maps to A→α.Bβ, a with A=R, α=ε, B=L, β=ε, a=# so that FIRST(βa) is {#}, leading to the addition of both productions for L with lookahead #.

Related

LALR(1) Parser DFA Lookahead Core Question

I am having trouble understanding what the rules are for adding a lookahead to a core production during the construction of the DFA. To illustrate my confusion, I will be using an online parser generator that exposes all the internal calculations; this_tool. (<- open in a new tab)
(The formating is: NONTERMINAL -> RULE, LOOKAHEADS, where the lookaheads are forward slash sperated)
Using this grammar as an example:
S -> E
E -> ( E )
E -> N O E
E -> N
N -> 1
N -> 2
N -> 3
O -> +
O -> -
Copy and pasting the above grammar into the lalr parser generator will produce a dfa with 12 states (click the >>). My question is finally, why are the goto(0, N) kernel productions ( {[E -> N.O E, $/)]; [E -> N., $/)]} ) initiated with the ) terminal? Where does the ) come from? I would expect the goto(0, N) to be {[E -> N.O E, $]; [E -> N., $]}. Equally the kernel production in the goto(0, ( ) has an 'extra' ).
As the dfa is being constructed, equal cores are merged (the core is the set of productions that introduce a new state by performing closure on that set). State 2 has production [E -> .N, )];, which when merged with [E -> N., $] produces the correct output, but there's no way for state 0 to have known about lookahead of )
Thanks in advance, sorry if this was a confusing and specific question and about using an external website to demonstrate my issue.✌️
The solution is to propagate any newly found lookaheads then 'goto' the states where those lookaheads are cores of.
The method is described in chapter 4 section 7.5 of the Dragon Book 2nd ed.
(here: https://github.com/muthukumarse/books/blob/master/Dragon%20Book%20Compilers%20Principle%20Techniques%20and%20Tools%202nd%20Edtion.pdf)

Find a s-grammar (simple grammar)

find a simple grammar (a.k.a s-grammar) for the following language:
L={(ab)2mb :m>=0}
[i did this but it is wrong]
S-> aASBB|b
A-> a
B->b
What about this?
S -> aA | T
A -> bB
B -> aC
C -> bS
T -> b
This is a regular grammar - all productions of the form X -> sY or X -> t, and corresponds to a minimal DFA for the language in question via a direct mapping of productions to transactions and nonterminal symbols to states.

LR(1) parsing table with epsilon productions

I'm having trouble building the collection of sets of items for LR(1) parsers with a grammar containing epsilon productions. For example, given the following grammar (where eps stands for epsilon)
S -> a S U
U -> b
| eps
State0 would be
S' -> .S, $
S -> .a S U, $
Moving with 'a' from State0 would give the following state, let's call it State2
S -> a .S U, $
S -> .a S U, $/???
In order to have the lookahead for the second item of State2 I need to calculate FIRST(U$). I know that FIRST(U) = {'b', eps}. My first question is: the lookaheads of the second item of State2 are $ and 'b'? Since U can be eps, my brain tells me that I can have $ as a lookahead as well, not just 'b'. It would have been just 'b' if FIRST(U) would have been just {'b'}. Is that correct?
Second question: at some point I will have a state as the following one
S -> a S .U, $
U -> .b, $
U -> .eps, $
What do I do here? Do I need to move with eps and have a set with the item U -> eps., $? What if I have another terminal as lookahead, i.e. X -> .eps, a/$? And if I move, ending up having a set of the form X -> eps., $, do I reduce?
And more: do I need to insert eps in the parse table as a symbol?
Thanks
FIRST(U$) means "the set of symbols which could be first in a derivation of U$". Clearly, if U can derive the empty string, $ must be part of this set. The end-of-input marker $ ensures that we never have to worry about epsilons in the FIRST sets. (If we were doing LR(k) instead of LR(1), we would use k end markers so that all the strings in FIRSTk had length k.
The item associated with U → (or with U → ε if you insist) is U → • . In other words, it is reducible and should trigger a reduce action on matching lookahead.
ε is not a symbol; we only use it (sometimes) to make the empty string visible. But the empty string is empty.

fixing a grammar to LR(0)

Question:
Given the following grammar, fix it to an LR(O) grammar:
S -> S' $
S'-> aS'b | T
T -> cT | c
Thoughts
I've been trying this for quite sometime, using automatic tools for checking my fixed grammars, with no success. Our professor likes asking this kind of questions on test without giving us a methodology for approaching this (except for repeated trying). Is there any method that can be applied to answer these kind of questions? Can anyone show this method can be applied on this example?
I don't know of an automatic procedure, but the basic idea is to defer decisions. That is, if at a particular state in the parse, both shift and reduce actions are possible, find a way to defer the reduction.
In the LR(0) parser, you can make a decision based on the token you just shifted, but not on the token you (might be) about to shift. So you need to move decisions to the end of productions, in a manner of speaking.
For example, your language consists of all sentences { ancmbn$ | n ≥ 0, m > 0}. If we restrict that to n > 0, then an LR(0) grammar can be constructed by deferring the reduction decision to the point following a b:
S -> S' $.
S' -> U | a S' b.
U -> a c T.
T -> b | c T.
That grammar is LR(0). In the original grammar, at the itemset including T -> c . and T -> c . T, both shift and reduce are possible: shift c and reduce before b. By moving the b into the production for T, we defer the decision until after the shift: after shifting b, a reduction is required; after c, the reduction is impossible.
But that forces every sentence to have at least one b. It omits sentences for which n = 0 (that is, the regular language c*$). That subset has an LR(0) grammar:
S -> S' $.
S' -> c | S' c.
We can construct the union of these two languages in a straight-forward manner, renaming one of the S's:
S -> S1' $ | S2' $.
S1' -> U | a S1' b.
U -> a c T.
T -> b | c T.
S2' -> c | S2' c.
This grammar is LR(0), but the form in which the end-of-input sentinel $ has been included seems to be cheating. At least, it violates the rule for augmented grammars, because an augmented grammar's base rule is always S -> S' $ where S' and $ are symbols not used in the original grammar.
It might seem that we could avoid that technicality by right-factoring:
S -> S' $
S' -> S1' | S2'
Unfortunately, while that grammar is still deterministic, and does recognise exactly the original language, it is not LR(0).
(Many thanks to #templatetypedef for checking the original answer, and identifying a flaw, and also to #Dennis, who observed that c* was omitted.)

Generating the LL(1) parsing table for the given CFG

The CFG is as following :
S -> SD|SB
B -> b|c
D -> a|dB
The method which I tried is as following:
I removed non-determinism from the first production (S->SD|SB) by left-factoring method.
So, the CFG after applying left-factoring is as following:
S -> SS'
S'-> D|B
B -> b|c
D -> a|dB
I need to find the first of S for the production i.e. S -> SS'
in order to proceed further. Could some one please help or advise?
You cannot convert this grammar that way into an LL(1) parser: the grammar is left recursive, you thus will have to perform left recursion removal. The point is that you can perform the following trick: since the only rule for S is S -> SS' and S -> (epsilon), it means that you simply reverse the order, and thus introduce the rule S -> S'S. So now the grammar is:
S -> S'S
S'-> D|B
B -> b|c
D -> a|dB
Now we can construct first: first(B)={b,c}, first(D)={a,d}, first(S')={a,b,c,d} and first(S)={a,b,c,d}.

Resources