Given this grammar:
S → S1 S2
S1 → a | ε
S2 → ab | ε
Therefore, we have
FIRST(S1) = { a, ε }
FOLLOW(S1) = { a }
Does that mean that in the parsing table I'll have multiple definitions in the row for S1 and the column for a?
Yes, that's correct. (However, note that your FOLLOW set is wrong; it also contains the end-of-input marker $). The issue here is that if the parser sees an a, it can't tell if that's because it wants to use the derivation
S → S1S2 → a S2
Or the derivation
S → S1S2 → S2 → ab
To fix this, you can note that your grammar only generates the strings { a, ab, aab }. Therefore, you can build an LL(1) for the language grammar that directly produces those three strings:
S → aY
Y → ε | aZ
Z → ε | b
Hope this helps!
Related
I have a grammar with one production rule:
S → aSbS | bSaS | ∈
This is ambiguous. Is it allowed to remove the ambiguity this way?
S → A|B|∈
A → aS
B → bS
This makes it unambiguous.
Another grammar:
S → A | B
A → aAb | ab
B → abB | ∈
Correction to make it unambiguous
S → A | B
A → aA'b
A' → ab
B → abB | ∈
I am not using any rules to make the grammars unambiguous. If it is wrong to remove ambiguity in a grammar this way, can anyone point a proper set of rules for removing ambiguity in ambiguous grammars?
As #kaby76 points out in a comment, in your first example, you haven't just removed the unambiguity. You have also changed the language recognised by the grammar.
S → a S b S
| b S a A
| ε
recognises only strings with the same number of a's and b's, while
S → A
| B
| ε
A → a S
B → b S
recognises any string made up of a's and b's.
So that's certainly not a legitimate disambiguation.
By the way, your second grammar could have been simplified; A and B serve no useful purpose.
S → a S
| b S
| ε
There are unambiguous grammars for this language. One example:
S → a A S
| b B S
A → a
| b A A
B → b
| a B B
See this post on the Computer Science StackExchange for an explanation.
In your second example, the grammar
S → A
| B
A → a A b
| a b
B → a b B
| ε
is ambiguous, but only because A and B both match ab. Every other recognised string has exactly one possible parse.
In this grammar, A matches strings which consist of some number of as followed by the same number of bs, while B matches strings which consist of any number of repetitions of ab.
ab fits both criteria: it is a repetition of ab (consisting of just one copy) and it is a sequence of as followed by the same number of bs (in this case, one of each). The empty string also matches both criteria (with repetition count 0), but it has been excluded from A by making the starting rule A → a b. An easy way to make the grammar unambiguous would be to change that base rule to A → a a b b.
Again, your disambiguated grammar does not recognise the same language as the original grammar. Your change makes A non-recursive, so that it only recognises aabb and now strings like aaabbb, aaaabbbb, and so on are not recognised. So once again, it is not simply an unambiguous version of the original.
Note that this grammar only matches strings with an equal number of as and bs, but it does not match all such strings. There are many other strings with an equal number of as and bs which are not matched, such as ba or aabbab. So its language is a subset of the language of the first grammar, but it is not the same language.
Finally, you ask if there is a mechanical procedure which can create an unambiguous grammar given an ambiguous grammar. The answer is no. It can be proven that there is no algorithm which can even decide whether a particular context-free grammar is ambiguous. Nor is there an algorithm which can decide whether two context-free grammars have the same language. So it shouldn't be surprising that there is no algorithm which can construct an equivalent unambiguous grammar from an ambiguous grammar.
That doesn't mean that the task cannot be done for certain grammars. But it's not a simple mechanical procedure, and it might not work for a particular grammar.
I know that First/First and First/Follow conflicts exist in a grammar which makes the grammar "not LL(1)". I was just wondering if Follow/Follow conflict exist in a grammar.
Yes, this is possible, but it requires an unusual configuration to make it happen. Consider the following grammar, which has been augmented with a new start symbol:
S' → S$
S → tT
T → A | B
A → ε
B → ε
Now, let's imagine trying to fill in our LL(1) parse table, which is shown here:
$ t
+----------+----------+
S' | | S' -> S$ |
+----------+----------+
S | | S -> tT |
+----------+----------+
T | T -> A | |
| T -> B | |
+----------+----------+
A | A -> e | |
+----------+----------+
B | B -> e | |
+----------+----------+
Notice that there are two items in the entry for (T, $). And that makes sense: if we have the active nonterminal T and see a $, we know that we need to select a production that's going to expand out to the empty string. And we have two different ways of doing this: we could use T → A or T → B, with the ultimate goal of expanding each of those nonterminals out to the empty string. This is a problem - we can't predict which route to take.
Now, what sort of conflict is this? It can't be a FIRST/FIRST conflict, because FIRST(A) = {ε} and FIRST(B) = {ε}, so neither A nor B has any terminals in its first set. It can't be a FIRST/FOLLOW conflict for the same reason.
That means that it's the rare FOLLOW/FOLLOW conflict - we know that we'd choose the production based on what's in the FOLLOW sets of A and B, and yet they're exactly identical to one another and so the parser can't choose what to do next unambiguously.
This is prehaps a simpler example
S → A a
A → B | C
B → ε
C → ε
Here, since a is both in the FOLLOW of B and C, on (A, a) there will be a conflict between A → B and A → C. Note that there are no other conflicts.
I have some problem in transforming the following non LL(1) grammar into LL(1) grammar. Is it possible to be transformed?
> A ::= B | A ; B
> B ::= C | [ A ]
> C ::= D | C , D
> D ::= x | (C)
where ;, x, (, ), [,] are terminals.
The main problems here are the productions
A → A ; B
and
C → C, D
which are left-recursive. In both cases, these productions will generate a string of objects separated by some kind of delimeter (semicolon in the first case, comma in the second), so you can rewrite them like this:
A → B ; A
C → D, C
This gives the grammar
A → B | B; A
B → C | [A]
C → D | D, C
D → x | (C)
The problem now is that there are productions for A and C that have a common prefix. But that's nothing to worry about: we can left-factor them like this:
A → B H
H → ε | ; A
B → C | [A]
C → D I
I → ε | C
D → x | (C)
I believe that this grammar is now LL(1).
For the given context free grammar:
S -> G $
G -> PG | P
P -> id : R
R -> id R | epsilon
How do I rewrite the grammar so that it is LR(1)?
The current grammar has shift/reduce conflicts when parsing the input "id : .id", where "." is the input pointer for the parser.
This grammar produces the language satisfying the regular expression (id:(id)*)+
It's easy enough to produce an LR(1) grammar for the same language. The trick is finding one which has a similar parse tree, or at least from which the original parse tree can be recovered easily.
Here's a manually generated grammar, which is slightly simplified from the general algorithm. In effect, we rewrite the regular expression:
(id:id*)+
to:
id(:id+)*:id*
which induces the grammar:
S → id G $
G → P G | P'
P' → : R'
P → : R
R' → ε | id R'
R → ε | id R
which is LALR(1).
In effect, we've just shifted all the productions one token to the right, and there is a general algorithm which can be used to create an LR(1) grammar from an LR(k+1) grammar for any k≥1. (The version of this algorithm I'm using comes from Parsing Theory by S. Sippu & E. Soisalon-Soininen, Vol II, section 6.7.)
The non-terminals of the new grammar will have the form (x, V, y) where V is a symbol from the original grammar (either a terminal or a non-terminal) and x and y are terminal sequences of maximum length k such that:
y ∈ FOLLOWk(V)
x ∈ FIRSTk(Vy)
(The lengths of y and consequently x might be less than k if the end of input is included in the follow set. Some people avoid this issue by adding k end symbols, but I think this version is just as simple.)
A non-terminal (x, V, y) will generate the x-derivative of the strings derived from Vy from the original grammar. Informally, the entire grammar is shifted k tokens to the right; each non-terminal matches a string which is missing the first k tokens but is augmented with the following k tokens.
The productions are generated mechanically from the original productions. First, we add a new start symbol, S' with productions:
S' → x (x, S, ε)
for every x ∈ FIRSTk(S). Then, for every production
T → V0 V1 … Vm
we generate the set of productions:
(x0,T,xm+1) → (x0,V0,x1) (x1,V1,x2) … (xm,Vm,xm+1)
and for every terminal A we generate the set of productions
(Ax,A,xB) → B if |x| = k
(Ax,A,x) → ε if |x| ≤ k
Since there is an obvious homomorphism from the productions in the new grammar to the productions in the old grammar, we can directly create the original parse tree, although we need to play some tricks with the semantic values in order to correctly attach them to the parse tree.
Today I am reading how to find First and Follow of a grammar. I saw this grammar:
S → ACB | CbB | Ba
A → da | BC
B → g | ε
C → h | ε
The claim is that
FIRST(S) = FIRST(ABC) U FIRST(CbB) U FIRST(Ba)
= {d, g, h, ε} U {h, b} U {g, a}
= {d, g, h, ε, b, a}
I don't understand how a and b are in this set. Can anyone explain this?
Notice that B and C both are nullable (they can produce ε). This means that from the production
S → CbB
we get that b ∈ FIRST(S), since if we use the production C → ε we can get a production that starts with b.
Similarly, note that
S → Ba
is a production, so we get a ∈ FIRST(S) because we can use the production B → ε to get an a at the front of a string derivable from S.
Hope this helps!