I'm having problems understanding an online explanation of how to remove the left recursion in this grammar. I know how to remove direct recursion, but I'm not clear how to handle the indirect. Could anyone explain it?
A --> B x y | x
B --> C D
C --> A | c
D --> d
The way I learned to do this is to replace one of the offending non-terminal symbols with each of its expansions. In this case, we first replace B with its expansions:
A --> B x y | x
B --> C D
becomes
A --> C x y | D x y | x
Now, we do the same for non-terminal symbol C:
A --> C x y | D x y | x
C --> A | c
becomes
A --> A x y | c x y | D x y | x
The only other remaining grammar rule is
D --> d
so you can also make that replacement, leaving your entire grammar as
A --> A x y | c x y | d x y | x
There is no indirect left recursion now, since there is nothing indirect at all.
Also see here.
To eliminate left recursion altogether (not merely indirect left recursion), introduce the A' symbol from your own materials (credit to OP for this clarification and completion):
A -> x A'
A' -> xyA' | cxyA' | dxyA' | epsilon
Response to naomik's comments
Yes, grammars have interesting properties, and you can characterize certain semantic capabilities in terms of constraints on grammar rules. There are transformation algorithms to handle certain types of parsing problems.
In this case, we want to remove left-recursion: one desirable property of a grammar is that the use of any rule must consume at least one input token (terminal symbol). Left-recursion opens a door to infinite recursion in the parser.
I learned these things in my "Foundations of Computing" and "Compiler Construction" classes many years ago. Instead of writing a parser to adapt to a particular grammar, we'd transform the grammar to fit the parser style we wanted.
Related
How do you identify whether a grammar is LL(1), LR(0), or SLR(1)?
Can anyone please explain it using this example, or any other example?
X → Yz | a
Y → bZ | ε
Z → ε
To check if a grammar is LL(1), one option is to construct the LL(1) parsing table and check for any conflicts. These conflicts can be
FIRST/FIRST conflicts, where two different productions would have to be predicted for a nonterminal/terminal pair.
FIRST/FOLLOW conflicts, where two different productions are predicted, one representing that some production should be taken and expands out to a nonzero number of symbols, and one representing that a production should be used indicating that some nonterminal should be ultimately expanded out to the empty string.
FOLLOW/FOLLOW conflicts, where two productions indicating that a nonterminal should ultimately be expanded to the empty string conflict with one another.
Let's try this on your grammar by building the FIRST and FOLLOW sets for each of the nonterminals. Here, we get that
FIRST(X) = {a, b, z}
FIRST(Y) = {b, epsilon}
FIRST(Z) = {epsilon}
We also have that the FOLLOW sets are
FOLLOW(X) = {$}
FOLLOW(Y) = {z}
FOLLOW(Z) = {z}
From this, we can build the following LL(1) parsing table:
a b z $
X a Yz Yz
Y bZ eps
Z eps
Since we can build this parsing table with no conflicts, the grammar is LL(1).
To check if a grammar is LR(0) or SLR(1), we begin by building up all of the LR(0) configurating sets for the grammar. In this case, assuming that X is your start symbol, we get the following:
(1)
X' -> .X
X -> .Yz
X -> .a
Y -> .
Y -> .bZ
(2)
X' -> X.
(3)
X -> Y.z
(4)
X -> Yz.
(5)
X -> a.
(6)
Y -> b.Z
Z -> .
(7)
Y -> bZ.
From this, we can see that the grammar is not LR(0) because there is a shift/reduce conflicts in state (1). Specifically, because we have the shift item X → .a and Y → ., we can't tell whether to shift the a or reduce the empty string. More generally, no grammar with ε-productions is LR(0).
However, this grammar might be SLR(1). To see this, we augment each reduction with the lookahead set for the particular nonterminals. This gives back this set of SLR(1) configurating sets:
(1)
X' -> .X
X -> .Yz [$]
X -> .a [$]
Y -> . [z]
Y -> .bZ [z]
(2)
X' -> X.
(3)
X -> Y.z [$]
(4)
X -> Yz. [$]
(5)
X -> a. [$]
(6)
Y -> b.Z [z]
Z -> . [z]
(7)
Y -> bZ. [z]
The shift/reduce conflict in state (1) has been eliminated because we only reduce when the lookahead is z, which doesn't conflict with any of the other items.
If you have no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, your grammar is LL(1).
An example of a FIRST/FIRST conflict:
S -> Xb | Yc
X -> a
Y -> a
By seeing only the first input symbol "a", you cannot know whether to apply the production S -> Xb or S -> Yc, because "a" is in the FIRST set of both X and Y.
An example of a FIRST/FOLLOW conflict:
S -> AB
A -> fe | ε
B -> fg
By seeing only the first input symbol "f", you cannot decide whether to apply the production A -> fe or A -> ε, because "f" is in both the FIRST set of A and the FOLLOW set of A (A can be parsed as ε/empty and B as f).
Notice that if you have no epsilon-productions you cannot have a FIRST/FOLLOW conflict.
Simple answer:A grammar is said to be an LL(1),if the associated LL(1) parsing table has atmost one production in each table entry.
Take the simple grammar A -->Aa|b.[A is non-terminal & a,b are terminals]
then find the First and follow sets A.
First{A}={b}.
Follow{A}={$,a}.
Parsing table for Our grammar.Terminals as columns and Nonterminal S as a row element.
a b $
--------------------------------------------
S | A-->a |
| A-->Aa. |
--------------------------------------------
As [S,b] contains two Productions there is a confusion as to which rule to choose.So it is not LL(1).
Some simple checks to see whether a grammar is LL(1) or not.
Check 1: The Grammar should not be left Recursive.
Example: E --> E+T. is not LL(1) because it is Left recursive.
Check 2: The Grammar should be Left Factored.
Left factoring is required when two or more grammar rule choices share a common prefix string.
Example: S-->A+int|A.
Check 3:The Grammar should not be ambiguous.
These are some simple checks.
LL(1) grammar is Context free unambiguous grammar which can be parsed by LL(1) parsers.
In LL(1)
First L stands for scanning input from Left to Right. Second L stands
for Left Most Derivation. 1 stands for using one input symbol at each
step.
For Checking grammar is LL(1) you can draw predictive parsing table. And if you find any multiple entries in table then you can say grammar is not LL(1).
Their is also short cut to check if the grammar is LL(1) or not .
Shortcut Technique
With these two steps we can check if it LL(1) or not.
Both of them have to be satisfied.
1.If we have the production:A->a1|a2|a3|a4|.....|an.
Then,First(a(i)) intersection First(a(j)) must be phi(empty set)[a(i)-a subscript i.]
2.For every non terminal 'A',if First(A) contains epsilon
Then First(A) intersection Follow(A) must be phi(empty set).
The following grammar generates the sentences a, a, a, b, b, b, ..., h, b. Unfortunately it is not LR(1) so cannot be used with tools such as "yacc".
S -> a comma a.
S -> C comma b.
C -> a | b | c | d | e | f | g | h.
Is it possible to transform this grammar to be LR(1) (or even LALR(1), LL(k) or LL(1)) without the need to expand the nonterminal C and thus significantly increase the number of productions?
Not as long as you have the nonterminal C unchanged preceding comma in some rule.
In that case it is clear that a parser cannot decide, having seen an "a", and having lookahead "comma", whether to reduce or shift. So with C unchanged, this grammar is not LR(1), as you have said.
But the solution lies in the two phrases, "having seen an 'a'" and "C unchanged". You asked if there's fix that doesn't expand C. There isn't, but you could expand C "a little bit" by removing "a" from C, since that's the source of the problem:
S -> a comma a .
S -> a comma b .
S -> C comma b .
C -> b | c | d | e | f | g | h .
So, we did not "significantly" increase the number of productions.
Let's assume we have the following CFG G:
A -> A b A
A -> a
Which should produce the strings
a, aba, ababa, abababa, and so on. Now I want to remove the left recursion to make it suitable for predictive parsing. The dragon book gives the following rule to remove immediate left recursions.
Given
A -> Aa | b
rewrite as
A -> b A'
A' -> a A'
| ε
If we simply apply the rule to the grammar from above, we get grammar G':
A -> a A'
A' -> b A A'
| ε
Which looks good to me, but apparently this grammar is not LL(1), because of some ambiguity. I get the following First/Follow sets:
First(A) = { "a" }
First(A') = { ε, "b" }
Follow(A) = { $, "b" }
Follow(A') = { $, "b" }
From which I construct the parsing table
| a | b | $ |
----------------------------------------------------
A | A -> a A' | | |
A' | | A' -> b A A' | A' -> ε |
| | A' -> ε | |
and there is a conflict in T[A',b], so the grammar isn't LL(1), although there are no left recursions any more and there are also no common prefixes of the productions so that it would require left factoring.
I'm not completely sure where the ambiguity comes from. I guess that during parsing the stack would fill with S'. And you can basically remove it (reduce to epsilon), if it isn't needed any more. I think this is the case if another S' is below on on the stack.
I think the LL(1) grammar G'' that I try to get from the original one would be:
A -> a A'
A' -> b A
| ε
Am I missing something? Did I do anything wrong?
Is there a more general procedure for removing left recursion that considers this edge case? If I want to automatically remove left recursions I should be able to handle this, right?
Is the second grammar G' a LL(k) grammar for some k > 1?
The original grammar was ambiguous, so it is not surprising that the new grammar is also ambiguous.
Consider the string a b a b a. We can derive this in two ways from the original grammar:
A ⇒ A b A
⇒ A b a
⇒ A b A b a
⇒ A b a b a
⇒ a b a b a
A ⇒ A b A
⇒ A b A b A
⇒ A b A b a
⇒ A b a b a
⇒ a b a b a
Unambiguous grammars are, of course possible. Here are right- and left-recursive versions:
A ⇒ a A ⇒ a
A ⇒ a b A A ⇒ A b a
(Although these represent the same language, they have different parses: the right-recursive version associates to the right, while the left-recursive one associates to the left.)
Removing left recursion cannot introduce ambiguity. This kind of transformation preserves ambiguity. If the CFG is already ambiguous, the result will be ambiguous too, and if the original is not, the resulting neither.
How do you identify whether a grammar is LL(1), LR(0), or SLR(1)?
Can anyone please explain it using this example, or any other example?
X → Yz | a
Y → bZ | ε
Z → ε
To check if a grammar is LL(1), one option is to construct the LL(1) parsing table and check for any conflicts. These conflicts can be
FIRST/FIRST conflicts, where two different productions would have to be predicted for a nonterminal/terminal pair.
FIRST/FOLLOW conflicts, where two different productions are predicted, one representing that some production should be taken and expands out to a nonzero number of symbols, and one representing that a production should be used indicating that some nonterminal should be ultimately expanded out to the empty string.
FOLLOW/FOLLOW conflicts, where two productions indicating that a nonterminal should ultimately be expanded to the empty string conflict with one another.
Let's try this on your grammar by building the FIRST and FOLLOW sets for each of the nonterminals. Here, we get that
FIRST(X) = {a, b, z}
FIRST(Y) = {b, epsilon}
FIRST(Z) = {epsilon}
We also have that the FOLLOW sets are
FOLLOW(X) = {$}
FOLLOW(Y) = {z}
FOLLOW(Z) = {z}
From this, we can build the following LL(1) parsing table:
a b z $
X a Yz Yz
Y bZ eps
Z eps
Since we can build this parsing table with no conflicts, the grammar is LL(1).
To check if a grammar is LR(0) or SLR(1), we begin by building up all of the LR(0) configurating sets for the grammar. In this case, assuming that X is your start symbol, we get the following:
(1)
X' -> .X
X -> .Yz
X -> .a
Y -> .
Y -> .bZ
(2)
X' -> X.
(3)
X -> Y.z
(4)
X -> Yz.
(5)
X -> a.
(6)
Y -> b.Z
Z -> .
(7)
Y -> bZ.
From this, we can see that the grammar is not LR(0) because there is a shift/reduce conflicts in state (1). Specifically, because we have the shift item X → .a and Y → ., we can't tell whether to shift the a or reduce the empty string. More generally, no grammar with ε-productions is LR(0).
However, this grammar might be SLR(1). To see this, we augment each reduction with the lookahead set for the particular nonterminals. This gives back this set of SLR(1) configurating sets:
(1)
X' -> .X
X -> .Yz [$]
X -> .a [$]
Y -> . [z]
Y -> .bZ [z]
(2)
X' -> X.
(3)
X -> Y.z [$]
(4)
X -> Yz. [$]
(5)
X -> a. [$]
(6)
Y -> b.Z [z]
Z -> . [z]
(7)
Y -> bZ. [z]
The shift/reduce conflict in state (1) has been eliminated because we only reduce when the lookahead is z, which doesn't conflict with any of the other items.
If you have no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, your grammar is LL(1).
An example of a FIRST/FIRST conflict:
S -> Xb | Yc
X -> a
Y -> a
By seeing only the first input symbol "a", you cannot know whether to apply the production S -> Xb or S -> Yc, because "a" is in the FIRST set of both X and Y.
An example of a FIRST/FOLLOW conflict:
S -> AB
A -> fe | ε
B -> fg
By seeing only the first input symbol "f", you cannot decide whether to apply the production A -> fe or A -> ε, because "f" is in both the FIRST set of A and the FOLLOW set of A (A can be parsed as ε/empty and B as f).
Notice that if you have no epsilon-productions you cannot have a FIRST/FOLLOW conflict.
Simple answer:A grammar is said to be an LL(1),if the associated LL(1) parsing table has atmost one production in each table entry.
Take the simple grammar A -->Aa|b.[A is non-terminal & a,b are terminals]
then find the First and follow sets A.
First{A}={b}.
Follow{A}={$,a}.
Parsing table for Our grammar.Terminals as columns and Nonterminal S as a row element.
a b $
--------------------------------------------
S | A-->a |
| A-->Aa. |
--------------------------------------------
As [S,b] contains two Productions there is a confusion as to which rule to choose.So it is not LL(1).
Some simple checks to see whether a grammar is LL(1) or not.
Check 1: The Grammar should not be left Recursive.
Example: E --> E+T. is not LL(1) because it is Left recursive.
Check 2: The Grammar should be Left Factored.
Left factoring is required when two or more grammar rule choices share a common prefix string.
Example: S-->A+int|A.
Check 3:The Grammar should not be ambiguous.
These are some simple checks.
LL(1) grammar is Context free unambiguous grammar which can be parsed by LL(1) parsers.
In LL(1)
First L stands for scanning input from Left to Right. Second L stands
for Left Most Derivation. 1 stands for using one input symbol at each
step.
For Checking grammar is LL(1) you can draw predictive parsing table. And if you find any multiple entries in table then you can say grammar is not LL(1).
Their is also short cut to check if the grammar is LL(1) or not .
Shortcut Technique
With these two steps we can check if it LL(1) or not.
Both of them have to be satisfied.
1.If we have the production:A->a1|a2|a3|a4|.....|an.
Then,First(a(i)) intersection First(a(j)) must be phi(empty set)[a(i)-a subscript i.]
2.For every non terminal 'A',if First(A) contains epsilon
Then First(A) intersection Follow(A) must be phi(empty set).
I was assigned a task for creating a parser for Arithmetic Expressions (with parenthesis and unary operators). So I just wanna know if this grammar correct or not and is it in LL(1) form and having real problems constructing the parse table for this
S -> TS'
S' -> +TS' | -TS' | epsilon
T -> UT'
T' -> *UT' | /UT' | epsilon
U -> VX
X -> ^U | epsilon
V -> (W) | -W | W | epsilon
W -> S | number
Precedence (high to low)
(), unary –
^
*, /
+, -
Associativity for binary operators
^ = right
+, -, *, / = left
Is it in LL(1) form?
To tell if the grammar is LL(1) or not, you need to expand the production rules out. If you can generate any sequence of productions which results in the left-hand-side appearing as the first thing on the right-hand-side, the grammar is not LL(1).
For example, consider this rule:
X --> X | x | epsilon
This clearly can't be part of an LL(1) grammar, since it's left-recursive if you apply the leftmost production. But what about this?
X --> Y | x
Y --> X + X
This isn't an LL(1) grammar either, but it's more subtle: first you have to apply X --> Y, then apply Y --> X + X to see that you now have X --> X + X, which is left-recursive.
You seem to be missing anything for unary plus operator. Try this instead...
V -> (W) | -W | +W | epsilon