I am trying to write a LL(1) parser generator and I am running into a issue with grammars which I know to be LL(1) but I cannot factor them properly.
For example consider the grammar:
S -> As Ao
As -> a As
As -> Ɛ
Ao -> a
Ao -> Ɛ
Now this grammar has a first-follow conflict in As so I perform Ɛ elimination and have:
S -> As Ao
S -> Ao
As -> a As
As -> a
Ao -> a
Ao -> Ɛ
Which has a first-first conflicts in S and As. Resolving the conflict in As produces:
S -> As Ao
S -> Ao
As -> a As'
As' -> As
As' -> Ɛ
Ao -> a
Ao -> Ɛ
Which has a first-follow conflict in As' which when eliminated simply cycles. Further, the conflict in S cannot be solved via left factorization
I believe the issue is that As Ao == As if I knew how to prove this I believe the issue would go away as the initial grammar could be transformed to:
S -> As
As -> a As
As -> Ɛ
Is there standard techniques for resolving such conflicts?
edit:
I realize the grammar above is ambiguous. The grammar I am really interested in parsing is:
S -> a As Ao
As -> , a AS
As -> Ɛ
Ao -> ,
Ao -> Ɛ
I.e. a comma separated list of a with an optional trailing comma.
The original grammar is ambiguous, so no deterministic parser can be produced.
Of course, you can eliminate the ambiguities easily enough, since the language is just "zero or more as":
S ⇒ As
As ⇒ a As
As ⇒ Ɛ
But presumably the question is a simplification of some more complicated grammar in which Ao is not the same as a, but in which FIRST(As) and FIRST(Ao) have some common element.
In general, it is difficult to write LL(1) grammars for such languages, and it is indeed possible that such a grammar does not exist for the language. In order to answer the question in more detail, it would be necessary to understand what is meant by the claim that the grammar is known to be LL(1).
Related
Suppose you have a grammar G and we find an LR(1) automaton for it. We can transform it into a LALR(1) or SLR(1) parser by doing state-merging and transforming rules but conflicts may appear.
My question is the following: must all problems appear in merged states? Is it possible for a non-conflict LR(1) state that wasn't merged to have a conflict either in LALR(1) or in SLR(1) automaton?
Interesting question! The answer is
if a grammar is LR(1), any conflicts in the LALR(1) parser must occur in merged states, but
if a grammar is LR(1), conflicts may appear in LR(1) states that were not merged.
For the first point, suppose you have a grammar that’s LR(1), so you can form its LR(1) parser. We can convert that to an LALR(1) parser by merging together all states with the same productions, ignoring lookaheads. If you have an LR(1) state that doesn’t get merged with anything, then that LR(1) state is present verbatim in the LALR(1) parser. And since the LR(1) state has no shift/reduce or reduce/reduce conflicts, the corresponding LALR(1) parser state won’t have any conflicts.
On the SLR(1) front, you can end up with states where no LR(1) state merging would occur, yet there's a reduce/reduce conflict. The intuition behind this is that you can have a state with no reduce/reduce conflicts in the LR(1) parser because the lookaheads have enough detail to resolve the conflict, yet when switching from LR(1) to SLR(1) and expanding the lookahead sets we accidentally introduce a reduce/reduce conflict. Here's an example of a grammar where this happens:
S → aTb | aR | cT
T → d
R → d
Here's the LR(1) configurating sets:
(1)
S' -> .S [$]
S -> .aTb [$]
S -> .aR [$]
S -> .cT [$]
(2)
S' -> S. [$]
(3)
S -> a.Tb [$]
S -> a.R [$]
T -> .d [b]
R -> .d [$]
(4)
T -> d. [b]
R -> d. [$]
(5)
S -> aT.b [$]
(6)
S -> aTb. [$]
(7)
S -> aR. [$]
(8)
S -> c.T [$]
T -> .d [$]
(9)
T -> d. [$]
(10)
S -> cT. [$]
These are the same item sets that you'd have in the SLR(1) parser. Notice, also, that FOLLOW(T) = {$, b}. This means that the LR(1) state
(4)
T -> d. [b]
R -> d. [$]
is converted to the SLR(1) state
(4)
T -> d. [b, $]
R -> d. [$]
which has a reduce/reduce conflict on $.
Consider the following grammar
S -> aPbSQ | a
Q -> tS | ε
P -> r
While constructing the DFA we can see there shall be a state which contains Items
Q -> .tS
Q -> . (epsilon as a blank string)
since t is in follow(Q) there appears to be a shift - reduce conflict.
Can we conclude the nature of the grammar isn't SLR(1) ?
(Please ignore my incorrect previous answer.)
Yes, the fact that you have a shift/reduce conflict in this configuring set is sufficient to show that this grammar isn't SLR(1).
Question:
Given the following grammar, fix it to an LR(O) grammar:
S -> S' $
S'-> aS'b | T
T -> cT | c
Thoughts
I've been trying this for quite sometime, using automatic tools for checking my fixed grammars, with no success. Our professor likes asking this kind of questions on test without giving us a methodology for approaching this (except for repeated trying). Is there any method that can be applied to answer these kind of questions? Can anyone show this method can be applied on this example?
I don't know of an automatic procedure, but the basic idea is to defer decisions. That is, if at a particular state in the parse, both shift and reduce actions are possible, find a way to defer the reduction.
In the LR(0) parser, you can make a decision based on the token you just shifted, but not on the token you (might be) about to shift. So you need to move decisions to the end of productions, in a manner of speaking.
For example, your language consists of all sentences { ancmbn$ | n ≥ 0, m > 0}. If we restrict that to n > 0, then an LR(0) grammar can be constructed by deferring the reduction decision to the point following a b:
S -> S' $.
S' -> U | a S' b.
U -> a c T.
T -> b | c T.
That grammar is LR(0). In the original grammar, at the itemset including T -> c . and T -> c . T, both shift and reduce are possible: shift c and reduce before b. By moving the b into the production for T, we defer the decision until after the shift: after shifting b, a reduction is required; after c, the reduction is impossible.
But that forces every sentence to have at least one b. It omits sentences for which n = 0 (that is, the regular language c*$). That subset has an LR(0) grammar:
S -> S' $.
S' -> c | S' c.
We can construct the union of these two languages in a straight-forward manner, renaming one of the S's:
S -> S1' $ | S2' $.
S1' -> U | a S1' b.
U -> a c T.
T -> b | c T.
S2' -> c | S2' c.
This grammar is LR(0), but the form in which the end-of-input sentinel $ has been included seems to be cheating. At least, it violates the rule for augmented grammars, because an augmented grammar's base rule is always S -> S' $ where S' and $ are symbols not used in the original grammar.
It might seem that we could avoid that technicality by right-factoring:
S -> S' $
S' -> S1' | S2'
Unfortunately, while that grammar is still deterministic, and does recognise exactly the original language, it is not LR(0).
(Many thanks to #templatetypedef for checking the original answer, and identifying a flaw, and also to #Dennis, who observed that c* was omitted.)
The CFG is as following :
S -> SD|SB
B -> b|c
D -> a|dB
The method which I tried is as following:
I removed non-determinism from the first production (S->SD|SB) by left-factoring method.
So, the CFG after applying left-factoring is as following:
S -> SS'
S'-> D|B
B -> b|c
D -> a|dB
I need to find the first of S for the production i.e. S -> SS'
in order to proceed further. Could some one please help or advise?
You cannot convert this grammar that way into an LL(1) parser: the grammar is left recursive, you thus will have to perform left recursion removal. The point is that you can perform the following trick: since the only rule for S is S -> SS' and S -> (epsilon), it means that you simply reverse the order, and thus introduce the rule S -> S'S. So now the grammar is:
S -> S'S
S'-> D|B
B -> b|c
D -> a|dB
Now we can construct first: first(B)={b,c}, first(D)={a,d}, first(S')={a,b,c,d} and first(S)={a,b,c,d}.
I can't seem to figure out the Unrestricted Grammar for
L = (w am bn | w={a,b}* m=number of a's in w n=number of b's in w).
I've constructed the following grammar for it, but it keeps rejecting every string I enter in JFLAP. But manually creating a parse tree for it gives me no problem. Can anyone look at it for me and see what's wrong?
S -> AST | BSU | epsilon
UT -> TU
T -> A
U -> B
A -> a
B -> b
I've downloaded and used JFLAP on your grammar. I think the issue is that you have not used the notation that JFLAP does for grammar entry. It does not used the | symbol, but you have to supply several rules instead. Therefore in JFLAP notation (and still and valid grammar) you would have:
S -> AST
S -> BSU
S -> ε
UT -> TU
T -> A
U -> B
A -> a
B -> b
You would also need to set the empty string as ε in the FLAP preferences. If you can manually create a parse tree you can also do this in JFLAP to show the derivations.