Can a follow-follow conflict exist in a grammar? - parsing

I know that First/First and First/Follow conflicts exist in a grammar which makes the grammar "not LL(1)". I was just wondering if Follow/Follow conflict exist in a grammar.

Yes, this is possible, but it requires an unusual configuration to make it happen. Consider the following grammar, which has been augmented with a new start symbol:
S' → S$
S → tT
T → A | B
A → ε
B → ε
Now, let's imagine trying to fill in our LL(1) parse table, which is shown here:
$ t
+----------+----------+
S' | | S' -> S$ |
+----------+----------+
S | | S -> tT |
+----------+----------+
T | T -> A | |
| T -> B | |
+----------+----------+
A | A -> e | |
+----------+----------+
B | B -> e | |
+----------+----------+
Notice that there are two items in the entry for (T, $). And that makes sense: if we have the active nonterminal T and see a $, we know that we need to select a production that's going to expand out to the empty string. And we have two different ways of doing this: we could use T → A or T → B, with the ultimate goal of expanding each of those nonterminals out to the empty string. This is a problem - we can't predict which route to take.
Now, what sort of conflict is this? It can't be a FIRST/FIRST conflict, because FIRST(A) = {ε} and FIRST(B) = {ε}, so neither A nor B has any terminals in its first set. It can't be a FIRST/FOLLOW conflict for the same reason.
That means that it's the rare FOLLOW/FOLLOW conflict - we know that we'd choose the production based on what's in the FOLLOW sets of A and B, and yet they're exactly identical to one another and so the parser can't choose what to do next unambiguously.

This is prehaps a simpler example
S → A a
A → B | C
B → ε
C → ε
Here, since a is both in the FOLLOW of B and C, on (A, a) there will be a conflict between A → B and A → C. Note that there are no other conflicts.

Related

Does an empty column in an LL(1) parsing table mean that it is wrong?

I have a grammar and I am asked to build a parsing table and to verify that it is an LL(1) grammar. I noticed after building the parsing table that some of the columns were empty and had no production rules in them. Does this mean that it is wrongfully built? Or that it is not LL(1)? Or maybe something is missing?
Thank you.
That’s nothing to worry about. An empty column indicates that there’s a terminal symbol that never starts a production (among other things). For example, take this simple LL(1) grammar:
S → abcdefg
Here, in the row for S, the columns for b, c, d, e, f, and g will all be empty, and the column for a will be the rule S → abcdefg. More specifically, the augmented grammar looks like this:
S' → S
S → abcdefg
The parsing table looks like this:
| a | b | c | d | e | f | g
---+---------+---+---+---+---+---+---
S'| S | | | | | |
S | abcdefg | | | | | |
Notice that most columns are empty.

Is it possible to transform this grammar to be LR(1)?

The following grammar generates the sentences a, a, a, b, b, b, ..., h, b. Unfortunately it is not LR(1) so cannot be used with tools such as "yacc".
S -> a comma a.
S -> C comma b.
C -> a | b | c | d | e | f | g | h.
Is it possible to transform this grammar to be LR(1) (or even LALR(1), LL(k) or LL(1)) without the need to expand the nonterminal C and thus significantly increase the number of productions?
Not as long as you have the nonterminal C unchanged preceding comma in some rule.
In that case it is clear that a parser cannot decide, having seen an "a", and having lookahead "comma", whether to reduce or shift. So with C unchanged, this grammar is not LR(1), as you have said.
But the solution lies in the two phrases, "having seen an 'a'" and "C unchanged". You asked if there's fix that doesn't expand C. There isn't, but you could expand C "a little bit" by removing "a" from C, since that's the source of the problem:
S -> a comma a .
S -> a comma b .
S -> C comma b .
C -> b | c | d | e | f | g | h .
So, we did not "significantly" increase the number of productions.

can removing left recursion introduce ambiguity?

Let's assume we have the following CFG G:
A -> A b A
A -> a
Which should produce the strings
a, aba, ababa, abababa, and so on. Now I want to remove the left recursion to make it suitable for predictive parsing. The dragon book gives the following rule to remove immediate left recursions.
Given
A -> Aa | b
rewrite as
A -> b A'
A' -> a A'
| ε
If we simply apply the rule to the grammar from above, we get grammar G':
A -> a A'
A' -> b A A'
| ε
Which looks good to me, but apparently this grammar is not LL(1), because of some ambiguity. I get the following First/Follow sets:
First(A) = { "a" }
First(A') = { ε, "b" }
Follow(A) = { $, "b" }
Follow(A') = { $, "b" }
From which I construct the parsing table
| a | b | $ |
----------------------------------------------------
A | A -> a A' | | |
A' | | A' -> b A A' | A' -> ε |
| | A' -> ε | |
and there is a conflict in T[A',b], so the grammar isn't LL(1), although there are no left recursions any more and there are also no common prefixes of the productions so that it would require left factoring.
I'm not completely sure where the ambiguity comes from. I guess that during parsing the stack would fill with S'. And you can basically remove it (reduce to epsilon), if it isn't needed any more. I think this is the case if another S' is below on on the stack.
I think the LL(1) grammar G'' that I try to get from the original one would be:
A -> a A'
A' -> b A
| ε
Am I missing something? Did I do anything wrong?
Is there a more general procedure for removing left recursion that considers this edge case? If I want to automatically remove left recursions I should be able to handle this, right?
Is the second grammar G' a LL(k) grammar for some k > 1?
The original grammar was ambiguous, so it is not surprising that the new grammar is also ambiguous.
Consider the string a b a b a. We can derive this in two ways from the original grammar:
A ⇒ A b A
⇒ A b a
⇒ A b A b a
⇒ A b a b a
⇒ a b a b a
A ⇒ A b A
⇒ A b A b A
⇒ A b A b a
⇒ A b a b a
⇒ a b a b a
Unambiguous grammars are, of course possible. Here are right- and left-recursive versions:
A ⇒ a A ⇒ a
A ⇒ a b A A ⇒ A b a
(Although these represent the same language, they have different parses: the right-recursive version associates to the right, while the left-recursive one associates to the left.)
Removing left recursion cannot introduce ambiguity. This kind of transformation preserves ambiguity. If the CFG is already ambiguous, the result will be ambiguous too, and if the original is not, the resulting neither.

Confused at transferring an ambiguous grammar to an unambiguous one

An ambiguous grammar is given and I am asked to rewrite the grammar to make it unambiguous. In fact, I don't know why the given grammar is ambiguous, let alone rewriting it to an unambiguous one.
The given grammar is S -> SS | a | b , and I have four choices:
A: S -> Sa | Sb | epsilon
B: S -> SS’
S’-> a | b
C: S -> S | S’
S’-> a | b
D: S -> Sa | Sb.
For each choice, I have already know that D is incorrect because it generates no strings at all,C is incorrect because it only matches the strings 'a' and 'b'.
However, I think the answer is A while the correct answer is B.I think B is wrong because it just generates S over and over again, and B can't deal with empty strings.
Why is the given grammar ambiguous?
Why is A incorrect while B is correct?
The original grammar is ambiguous because multiple right-most (or left-most) derivations are possible for any string of at least three letters. For example:
S -> SS -> SSS -> SSa -> Saa -> aaa
S -> SS -> Sa -> SSa -> Saa -> aaa
The first one corresponds, roughly speaking, to the parse a(aa) while the second to the parse (aa)a.
None of the alternatives is correct. A incorrectly matches ε while B does not match anything (like D). If B were, for example,
S -> SS' | S'
S' -> a | b
it would be correct. (This grammar is left-associative.)

Making a Grammar LL(1)

I have the following grammar:
S → a S b S | b S a S | ε
Since I'm trying to write a small compiler for it, I'd like to make it LL(1). I see that there seems to be a FIRST/FOLLOW conflict here, and I know I have to use substitution to resolve it, but I'm not exactly sure how to go about it. Here is my proposed grammar, but I'm not sure if it's correct:
S-> aSbT | epsilon
T-> bFaF| epsilon
F-> epsilon
Can someone help out?
In his original paper on LR parsing, Knuth gives the following grammar for this language, which he conjectures "is the briefest possible unambiguous grammar for this language:"
S → ε | aAbS | bBaS
A → ε | aAbA
B → ε | bBaB
Intuitively, this tries to break up any string of As and Bs into blocks that balance out completely. Some blocks start with a and end with b, while others start with b and end with a.
We can compute FIRST and FOLLOW sets as follows:
FIRST(S) = { ε, a, b }
FIRST(A) = { ε, a }
FIRST(B) = { ε, b }
FOLLOW(S) = { $ }
FOLLOW(A) = { b }
FOLLOW(B) = { a }
Based on this, we get the following LL(1) parse table:
| a | b | $
--+-------+-------+-------
S | aAbS | bBaS | e
A | aAbA | e |
B | e | bBaB |
And so this grammar is not only LR(1), but it's LL(1) as well.
Hope this helps!

Resources