Generating the LL(1) parsing table for the given CFG - parsing

The CFG is as following :
S -> SD|SB
B -> b|c
D -> a|dB
The method which I tried is as following:
I removed non-determinism from the first production (S->SD|SB) by left-factoring method.
So, the CFG after applying left-factoring is as following:
S -> SS'
S'-> D|B
B -> b|c
D -> a|dB
I need to find the first of S for the production i.e. S -> SS'
in order to proceed further. Could some one please help or advise?

You cannot convert this grammar that way into an LL(1) parser: the grammar is left recursive, you thus will have to perform left recursion removal. The point is that you can perform the following trick: since the only rule for S is S -> SS' and S -> (epsilon), it means that you simply reverse the order, and thus introduce the rule S -> S'S. So now the grammar is:
S -> S'S
S'-> D|B
B -> b|c
D -> a|dB
Now we can construct first: first(B)={b,c}, first(D)={a,d}, first(S')={a,b,c,d} and first(S)={a,b,c,d}.

Related

SLR parsing conflicts with epsilon production

Consider the following grammar
S -> aPbSQ | a
Q -> tS | ε
P -> r
While constructing the DFA we can see there shall be a state which contains Items
Q -> .tS
Q -> . (epsilon as a blank string)
since t is in follow(Q) there appears to be a shift - reduce conflict.
Can we conclude the nature of the grammar isn't SLR(1) ?
(Please ignore my incorrect previous answer.)
Yes, the fact that you have a shift/reduce conflict in this configuring set is sufficient to show that this grammar isn't SLR(1).

Understand whether a grammar is LR(1) with no parsing table

I've found out an exercise that require a trick to understand whether a grammar is LR(1) with no parsing table operations.
The grammar is the followed:
S -> Aa | Bb
A -> aAb | ab
B -> aBbb | abb
Do you know what is the trick behind?
Thanks, :)
Imagine that you're an LR(1) parser and that you've just read aab with a lookahead of b. (I know, you're probably thinking "man, that happens to me all the time!") What exactly should you do here?
Looking at the grammar, you can't tell whether the initial production was Aa or Bb, so you're going to have to simultaneously consider production rules for A and for B. If you look at the A options, you'll see that one option here would be to reduce A → ab, which is plausible here because the lookahead is a b and that's precisely what you'd expect to find after seeing an ab when expanding out an A (notice that there's the rule A → aRb, so any recursively-expanded As would be followed by a b). So that tells you to reduce. On the other hand, look at the B options. If you see aab followed by a b, you'd be thinking "oh, that second b is going to make aabb, and then I'd go and reduce B → abb, because that's totally a thing I like to do because I'm an LR(1) parser." So that tells you to shift. At that point, bam! You've got a shift/reduce conflict, so you're almost certainly not going to have an LR(1) grammar.
So does that actually happen? Well, let's go build the LR(1) configurating sets that we'd see if we did indeed read aab and see b as a lookahead:
Initial State
S' -> .S [$]
S -> .Aa [$]
S -> .Bb [$]
A -> .aAb [a]
A -> .ab [a]
B -> .aBbb [b]
B -> .abb [b]
State after reading a
A -> a.Ab [a]
A -> a.b [a]
A -> .aAb [b]
A -> .ab [b]
B -> a.Bbb [b]
B -> a.bb [b]
B -> .aBbb [b]
B -> .abb [b]
State after reading aa
A -> a.Ab [b]
A -> a.b [b]
A -> .aAb [b]
A -> .ab [b]
B -> a.Bbb [b]
B -> a.bb [b]
B -> .aBbb [b]
B -> .abb [b]
State after reading aab
A -> ab. [b]
B -> ab.b [b]
And hey! There's that shift/reduce conflict we were talking about. That first item reduces on b, but the second shifts on b. So there you go! Our intuition led us to think that this isn't going to be an LR(1) grammar, and if we look at the tables the evidence is supported by the data.
So how would you know to try that? Well, in general, it's pretty hard to do this. The main cue, for me, at least, is that the parser has to guess whether it wants A or B at some point, but the way it tiebreaks is the number of bs. The parser was going to have to at some point determine whether it likes ab and to go with A or whether it likes abb and to go with B, but it can't see both of the bs before making the decision. That led me to think that we'd like to find some sort of conflict where we've seen enough to know that some recursion was happening (so that the trailing b would cause problem) and to find a place where the recursion would differ between the two production rules.

fixing a grammar to LR(0)

Question:
Given the following grammar, fix it to an LR(O) grammar:
S -> S' $
S'-> aS'b | T
T -> cT | c
Thoughts
I've been trying this for quite sometime, using automatic tools for checking my fixed grammars, with no success. Our professor likes asking this kind of questions on test without giving us a methodology for approaching this (except for repeated trying). Is there any method that can be applied to answer these kind of questions? Can anyone show this method can be applied on this example?
I don't know of an automatic procedure, but the basic idea is to defer decisions. That is, if at a particular state in the parse, both shift and reduce actions are possible, find a way to defer the reduction.
In the LR(0) parser, you can make a decision based on the token you just shifted, but not on the token you (might be) about to shift. So you need to move decisions to the end of productions, in a manner of speaking.
For example, your language consists of all sentences { ancmbn$ | n ≥ 0, m > 0}. If we restrict that to n > 0, then an LR(0) grammar can be constructed by deferring the reduction decision to the point following a b:
S -> S' $.
S' -> U | a S' b.
U -> a c T.
T -> b | c T.
That grammar is LR(0). In the original grammar, at the itemset including T -> c . and T -> c . T, both shift and reduce are possible: shift c and reduce before b. By moving the b into the production for T, we defer the decision until after the shift: after shifting b, a reduction is required; after c, the reduction is impossible.
But that forces every sentence to have at least one b. It omits sentences for which n = 0 (that is, the regular language c*$). That subset has an LR(0) grammar:
S -> S' $.
S' -> c | S' c.
We can construct the union of these two languages in a straight-forward manner, renaming one of the S's:
S -> S1' $ | S2' $.
S1' -> U | a S1' b.
U -> a c T.
T -> b | c T.
S2' -> c | S2' c.
This grammar is LR(0), but the form in which the end-of-input sentinel $ has been included seems to be cheating. At least, it violates the rule for augmented grammars, because an augmented grammar's base rule is always S -> S' $ where S' and $ are symbols not used in the original grammar.
It might seem that we could avoid that technicality by right-factoring:
S -> S' $
S' -> S1' | S2'
Unfortunately, while that grammar is still deterministic, and does recognise exactly the original language, it is not LR(0).
(Many thanks to #templatetypedef for checking the original answer, and identifying a flaw, and also to #Dennis, who observed that c* was omitted.)

Correct Unrestricted Grammar for:

I can't seem to figure out the Unrestricted Grammar for
L = (w am bn | w={a,b}* m=number of a's in w n=number of b's in w).
I've constructed the following grammar for it, but it keeps rejecting every string I enter in JFLAP. But manually creating a parse tree for it gives me no problem. Can anyone look at it for me and see what's wrong?
S -> AST | BSU | epsilon
UT -> TU
T -> A
U -> B
A -> a
B -> b
I've downloaded and used JFLAP on your grammar. I think the issue is that you have not used the notation that JFLAP does for grammar entry. It does not used the | symbol, but you have to supply several rules instead. Therefore in JFLAP notation (and still and valid grammar) you would have:
S -> AST
S -> BSU
S -> ε
UT -> TU
T -> A
U -> B
A -> a
B -> b
You would also need to set the empty string as ε in the FLAP preferences. If you can manually create a parse tree you can also do this in JFLAP to show the derivations.

Dealing with infinite loops when constructing states for LR(1) parsing

I'm currently constructing LR(1) states from the following grammar.
S->AS
S->c
A->aA
A->b
where A,S are nonterminals and a,b,c are terminals.
This is the construction of I0
I0: S' -> .S, epsilon
---------------
S -> .AS, epsilon
S -> .c, epsilon
---------------
S -> .AS, a
S -> .c, c
A -> .aA, a
A -> .b, b
And I1.
From S, I1: S' -> S., epsilon //DONE
And so on. But when I get to constructing I4...
From a, I4: A -> a.A, a
-----------
A -> .aA, a
A -> .b, b
The problem is
A -> .aA
When I attempt to construct the next state from a, I'm going to once again get the exact same content of I4, and this continues infinitely. A similar loop occurs with
S -> .AS
So, what am I doing wrong? There has to be some detail that I'm missing, but I've browsed my notes and my book and either can't find or just don't understand what's wrong here. Any help?
I'm pretty sure I figured out the answer. Obviously, states can point to each other, so that eliminates the need to create new ones if it's content already exists. I'd still like it if someone can confirm this, though.

Resources