How can I eliminate the left recursive in CFG? [duplicate] - parsing

I understand that in order to eliminate an immediate left recursion from a grammar containing production of the form A⇒Aα i need to replace it by A⇒βA'and A'⇒αA/∈
Im having the following productions,i need to eliminate immediate left recursion
E⇒E+T/T
E⇒E+T/T
T⇒T*F/T
F⇒(E)/(id)
I can see that after elimination the first production becomes
E⇒TE'
E'⇒+TE'/T∈
Can somebody explain how this comes

It's really just a matter of following the algorithm. Let's look at the general case. According to the algorithm a rule of the form:
A => A a1 | ... | A aN | b1 | .. | bN
where A a1, ..., A aN are nonzero left recursive sequences of terminals and nonterminals and b1, ..., bN are sequences of terminals and nonterminals that does not start with the terminal A.
The algorithm says we need to replace this by
A => b1 A' | ... | bN A'
A' => a1 A' | ... | aN A' | epsilon
Let's look at your case. Here we have
E => E + T | T
So you can think of a1 is the sequence + T since E + T is a left recursive sequence of terminals and nonterminals. Likewise you can think of B1 as T since this is a nonleft recursive sequence. We now use this to define the new nonterminal E as:
E => b1 E'
And since b1 is T this becomes
E => T E'
Defining E' we get
E' => a1 E' | epsilon
And since a1 is + T this becomes
E' => + T E' | epsilon
Thus you end up with the grammar
E => T E'
E' => + T E' | epsilon

Related

Is it possible to transform this grammar to be LR(1)?

The following grammar generates the sentences a, a, a, b, b, b, ..., h, b. Unfortunately it is not LR(1) so cannot be used with tools such as "yacc".
S -> a comma a.
S -> C comma b.
C -> a | b | c | d | e | f | g | h.
Is it possible to transform this grammar to be LR(1) (or even LALR(1), LL(k) or LL(1)) without the need to expand the nonterminal C and thus significantly increase the number of productions?
Not as long as you have the nonterminal C unchanged preceding comma in some rule.
In that case it is clear that a parser cannot decide, having seen an "a", and having lookahead "comma", whether to reduce or shift. So with C unchanged, this grammar is not LR(1), as you have said.
But the solution lies in the two phrases, "having seen an 'a'" and "C unchanged". You asked if there's fix that doesn't expand C. There isn't, but you could expand C "a little bit" by removing "a" from C, since that's the source of the problem:
S -> a comma a .
S -> a comma b .
S -> C comma b .
C -> b | c | d | e | f | g | h .
So, we did not "significantly" increase the number of productions.

can removing left recursion introduce ambiguity?

Let's assume we have the following CFG G:
A -> A b A
A -> a
Which should produce the strings
a, aba, ababa, abababa, and so on. Now I want to remove the left recursion to make it suitable for predictive parsing. The dragon book gives the following rule to remove immediate left recursions.
Given
A -> Aa | b
rewrite as
A -> b A'
A' -> a A'
| ε
If we simply apply the rule to the grammar from above, we get grammar G':
A -> a A'
A' -> b A A'
| ε
Which looks good to me, but apparently this grammar is not LL(1), because of some ambiguity. I get the following First/Follow sets:
First(A) = { "a" }
First(A') = { ε, "b" }
Follow(A) = { $, "b" }
Follow(A') = { $, "b" }
From which I construct the parsing table
| a | b | $ |
----------------------------------------------------
A | A -> a A' | | |
A' | | A' -> b A A' | A' -> ε |
| | A' -> ε | |
and there is a conflict in T[A',b], so the grammar isn't LL(1), although there are no left recursions any more and there are also no common prefixes of the productions so that it would require left factoring.
I'm not completely sure where the ambiguity comes from. I guess that during parsing the stack would fill with S'. And you can basically remove it (reduce to epsilon), if it isn't needed any more. I think this is the case if another S' is below on on the stack.
I think the LL(1) grammar G'' that I try to get from the original one would be:
A -> a A'
A' -> b A
| ε
Am I missing something? Did I do anything wrong?
Is there a more general procedure for removing left recursion that considers this edge case? If I want to automatically remove left recursions I should be able to handle this, right?
Is the second grammar G' a LL(k) grammar for some k > 1?
The original grammar was ambiguous, so it is not surprising that the new grammar is also ambiguous.
Consider the string a b a b a. We can derive this in two ways from the original grammar:
A ⇒ A b A
⇒ A b a
⇒ A b A b a
⇒ A b a b a
⇒ a b a b a
A ⇒ A b A
⇒ A b A b A
⇒ A b A b a
⇒ A b a b a
⇒ a b a b a
Unambiguous grammars are, of course possible. Here are right- and left-recursive versions:
A ⇒ a A ⇒ a
A ⇒ a b A A ⇒ A b a
(Although these represent the same language, they have different parses: the right-recursive version associates to the right, while the left-recursive one associates to the left.)
Removing left recursion cannot introduce ambiguity. This kind of transformation preserves ambiguity. If the CFG is already ambiguous, the result will be ambiguous too, and if the original is not, the resulting neither.

Transform a grammar G into LL(1)

I have the following grammar and I need to convert it to LL(1) grammar
G = (N; T; P; S) N = {S,A,B,C} T = {a, b, c, d}
P = {
S -> CbSb | adB | bc
A -> BdA | b
B -> aCd | ë
C -> Cca | bA | a
}
The point is that I know how to convert when its just a production, but I can't find any clear method of solving this on the internet.
Thanks in advance!
Remove left recursion, direct and indirect.
Build an LA(k) table. If there's no ambiguity, the grammar (and the language) is LL(k).
The obvious left recursion in the grammar is:
S ==> C... ==> C...

Correct LL(1) grammar for arithmetic expressions

This is a correct LL grammar:
E->TX
T->(E)Y |intY
X->+E | -E | e
Y->*E | /E| e
but it 'll produce the same AST tree for expressions
int-int+int and int-(int+int)
e.q
Sub(Simple(int),Add(Simple(int),Simple(int))
Of course, I can use some lookahead, but this isn't cool.
Try this grammer
E -> T E'
E' -> + T E' | -TE' |epsilon
T -> F T'
T' -> * F T' | /FT' |epsilon
F -> (E) | int
Solved this problem by adding some additional "if-s" on evaluating AST. The grammar remains the same

Eliminating Immediate Left Recursion

I understand that in order to eliminate an immediate left recursion from a grammar containing production of the form A⇒Aα i need to replace it by A⇒βA'and A'⇒αA/∈
Im having the following productions,i need to eliminate immediate left recursion
E⇒E+T/T
E⇒E+T/T
T⇒T*F/T
F⇒(E)/(id)
I can see that after elimination the first production becomes
E⇒TE'
E'⇒+TE'/T∈
Can somebody explain how this comes
It's really just a matter of following the algorithm. Let's look at the general case. According to the algorithm a rule of the form:
A => A a1 | ... | A aN | b1 | .. | bN
where A a1, ..., A aN are nonzero left recursive sequences of terminals and nonterminals and b1, ..., bN are sequences of terminals and nonterminals that does not start with the terminal A.
The algorithm says we need to replace this by
A => b1 A' | ... | bN A'
A' => a1 A' | ... | aN A' | epsilon
Let's look at your case. Here we have
E => E + T | T
So you can think of a1 is the sequence + T since E + T is a left recursive sequence of terminals and nonterminals. Likewise you can think of B1 as T since this is a nonleft recursive sequence. We now use this to define the new nonterminal E as:
E => b1 E'
And since b1 is T this becomes
E => T E'
Defining E' we get
E' => a1 E' | epsilon
And since a1 is + T this becomes
E' => + T E' | epsilon
Thus you end up with the grammar
E => T E'
E' => + T E' | epsilon

Resources