Suppose you have a grammar G and we find an LR(1) automaton for it. We can transform it into a LALR(1) or SLR(1) parser by doing state-merging and transforming rules but conflicts may appear.
My question is the following: must all problems appear in merged states? Is it possible for a non-conflict LR(1) state that wasn't merged to have a conflict either in LALR(1) or in SLR(1) automaton?
Interesting question! The answer is
if a grammar is LR(1), any conflicts in the LALR(1) parser must occur in merged states, but
if a grammar is LR(1), conflicts may appear in LR(1) states that were not merged.
For the first point, suppose you have a grammar that’s LR(1), so you can form its LR(1) parser. We can convert that to an LALR(1) parser by merging together all states with the same productions, ignoring lookaheads. If you have an LR(1) state that doesn’t get merged with anything, then that LR(1) state is present verbatim in the LALR(1) parser. And since the LR(1) state has no shift/reduce or reduce/reduce conflicts, the corresponding LALR(1) parser state won’t have any conflicts.
On the SLR(1) front, you can end up with states where no LR(1) state merging would occur, yet there's a reduce/reduce conflict. The intuition behind this is that you can have a state with no reduce/reduce conflicts in the LR(1) parser because the lookaheads have enough detail to resolve the conflict, yet when switching from LR(1) to SLR(1) and expanding the lookahead sets we accidentally introduce a reduce/reduce conflict. Here's an example of a grammar where this happens:
S → aTb | aR | cT
T → d
R → d
Here's the LR(1) configurating sets:
(1)
S' -> .S [$]
S -> .aTb [$]
S -> .aR [$]
S -> .cT [$]
(2)
S' -> S. [$]
(3)
S -> a.Tb [$]
S -> a.R [$]
T -> .d [b]
R -> .d [$]
(4)
T -> d. [b]
R -> d. [$]
(5)
S -> aT.b [$]
(6)
S -> aTb. [$]
(7)
S -> aR. [$]
(8)
S -> c.T [$]
T -> .d [$]
(9)
T -> d. [$]
(10)
S -> cT. [$]
These are the same item sets that you'd have in the SLR(1) parser. Notice, also, that FOLLOW(T) = {$, b}. This means that the LR(1) state
(4)
T -> d. [b]
R -> d. [$]
is converted to the SLR(1) state
(4)
T -> d. [b, $]
R -> d. [$]
which has a reduce/reduce conflict on $.
Related
I am trying to write a LL(1) parser generator and I am running into a issue with grammars which I know to be LL(1) but I cannot factor them properly.
For example consider the grammar:
S -> As Ao
As -> a As
As -> Ɛ
Ao -> a
Ao -> Ɛ
Now this grammar has a first-follow conflict in As so I perform Ɛ elimination and have:
S -> As Ao
S -> Ao
As -> a As
As -> a
Ao -> a
Ao -> Ɛ
Which has a first-first conflicts in S and As. Resolving the conflict in As produces:
S -> As Ao
S -> Ao
As -> a As'
As' -> As
As' -> Ɛ
Ao -> a
Ao -> Ɛ
Which has a first-follow conflict in As' which when eliminated simply cycles. Further, the conflict in S cannot be solved via left factorization
I believe the issue is that As Ao == As if I knew how to prove this I believe the issue would go away as the initial grammar could be transformed to:
S -> As
As -> a As
As -> Ɛ
Is there standard techniques for resolving such conflicts?
edit:
I realize the grammar above is ambiguous. The grammar I am really interested in parsing is:
S -> a As Ao
As -> , a AS
As -> Ɛ
Ao -> ,
Ao -> Ɛ
I.e. a comma separated list of a with an optional trailing comma.
The original grammar is ambiguous, so no deterministic parser can be produced.
Of course, you can eliminate the ambiguities easily enough, since the language is just "zero or more as":
S ⇒ As
As ⇒ a As
As ⇒ Ɛ
But presumably the question is a simplification of some more complicated grammar in which Ao is not the same as a, but in which FIRST(As) and FIRST(Ao) have some common element.
In general, it is difficult to write LL(1) grammars for such languages, and it is indeed possible that such a grammar does not exist for the language. In order to answer the question in more detail, it would be necessary to understand what is meant by the claim that the grammar is known to be LL(1).
How do you identify whether a grammar is LL(1), LR(0), or SLR(1)?
Can anyone please explain it using this example, or any other example?
X → Yz | a
Y → bZ | ε
Z → ε
To check if a grammar is LL(1), one option is to construct the LL(1) parsing table and check for any conflicts. These conflicts can be
FIRST/FIRST conflicts, where two different productions would have to be predicted for a nonterminal/terminal pair.
FIRST/FOLLOW conflicts, where two different productions are predicted, one representing that some production should be taken and expands out to a nonzero number of symbols, and one representing that a production should be used indicating that some nonterminal should be ultimately expanded out to the empty string.
FOLLOW/FOLLOW conflicts, where two productions indicating that a nonterminal should ultimately be expanded to the empty string conflict with one another.
Let's try this on your grammar by building the FIRST and FOLLOW sets for each of the nonterminals. Here, we get that
FIRST(X) = {a, b, z}
FIRST(Y) = {b, epsilon}
FIRST(Z) = {epsilon}
We also have that the FOLLOW sets are
FOLLOW(X) = {$}
FOLLOW(Y) = {z}
FOLLOW(Z) = {z}
From this, we can build the following LL(1) parsing table:
a b z $
X a Yz Yz
Y bZ eps
Z eps
Since we can build this parsing table with no conflicts, the grammar is LL(1).
To check if a grammar is LR(0) or SLR(1), we begin by building up all of the LR(0) configurating sets for the grammar. In this case, assuming that X is your start symbol, we get the following:
(1)
X' -> .X
X -> .Yz
X -> .a
Y -> .
Y -> .bZ
(2)
X' -> X.
(3)
X -> Y.z
(4)
X -> Yz.
(5)
X -> a.
(6)
Y -> b.Z
Z -> .
(7)
Y -> bZ.
From this, we can see that the grammar is not LR(0) because there is a shift/reduce conflicts in state (1). Specifically, because we have the shift item X → .a and Y → ., we can't tell whether to shift the a or reduce the empty string. More generally, no grammar with ε-productions is LR(0).
However, this grammar might be SLR(1). To see this, we augment each reduction with the lookahead set for the particular nonterminals. This gives back this set of SLR(1) configurating sets:
(1)
X' -> .X
X -> .Yz [$]
X -> .a [$]
Y -> . [z]
Y -> .bZ [z]
(2)
X' -> X.
(3)
X -> Y.z [$]
(4)
X -> Yz. [$]
(5)
X -> a. [$]
(6)
Y -> b.Z [z]
Z -> . [z]
(7)
Y -> bZ. [z]
The shift/reduce conflict in state (1) has been eliminated because we only reduce when the lookahead is z, which doesn't conflict with any of the other items.
If you have no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, your grammar is LL(1).
An example of a FIRST/FIRST conflict:
S -> Xb | Yc
X -> a
Y -> a
By seeing only the first input symbol "a", you cannot know whether to apply the production S -> Xb or S -> Yc, because "a" is in the FIRST set of both X and Y.
An example of a FIRST/FOLLOW conflict:
S -> AB
A -> fe | ε
B -> fg
By seeing only the first input symbol "f", you cannot decide whether to apply the production A -> fe or A -> ε, because "f" is in both the FIRST set of A and the FOLLOW set of A (A can be parsed as ε/empty and B as f).
Notice that if you have no epsilon-productions you cannot have a FIRST/FOLLOW conflict.
Simple answer:A grammar is said to be an LL(1),if the associated LL(1) parsing table has atmost one production in each table entry.
Take the simple grammar A -->Aa|b.[A is non-terminal & a,b are terminals]
then find the First and follow sets A.
First{A}={b}.
Follow{A}={$,a}.
Parsing table for Our grammar.Terminals as columns and Nonterminal S as a row element.
a b $
--------------------------------------------
S | A-->a |
| A-->Aa. |
--------------------------------------------
As [S,b] contains two Productions there is a confusion as to which rule to choose.So it is not LL(1).
Some simple checks to see whether a grammar is LL(1) or not.
Check 1: The Grammar should not be left Recursive.
Example: E --> E+T. is not LL(1) because it is Left recursive.
Check 2: The Grammar should be Left Factored.
Left factoring is required when two or more grammar rule choices share a common prefix string.
Example: S-->A+int|A.
Check 3:The Grammar should not be ambiguous.
These are some simple checks.
LL(1) grammar is Context free unambiguous grammar which can be parsed by LL(1) parsers.
In LL(1)
First L stands for scanning input from Left to Right. Second L stands
for Left Most Derivation. 1 stands for using one input symbol at each
step.
For Checking grammar is LL(1) you can draw predictive parsing table. And if you find any multiple entries in table then you can say grammar is not LL(1).
Their is also short cut to check if the grammar is LL(1) or not .
Shortcut Technique
With these two steps we can check if it LL(1) or not.
Both of them have to be satisfied.
1.If we have the production:A->a1|a2|a3|a4|.....|an.
Then,First(a(i)) intersection First(a(j)) must be phi(empty set)[a(i)-a subscript i.]
2.For every non terminal 'A',if First(A) contains epsilon
Then First(A) intersection Follow(A) must be phi(empty set).
I am new to the subject of compilation and have just started an exercise for Bottom -up parsing.
I have stuck on the following problem.
build a LR(0) parsing table for following grammar:
1) E –> E + T
2) E –> T
3) T –> (E)
4) T –> id
I0 :
E' –> .E
E –> .E + T
E –> .T
T –> .(E)
T –> .id
on E the next state in the DFA would be :
I1:
E' -> E.
E -> E. + T
from what I have learned so far isn't this a S-R conflict?
because the parser would not know whether to reduce or shift as it
has no look-ahead variable?
so this should not be LR(0) grammar?
but the PDF which I am reading have built the LR(0) table.
So is there a mistake in the PDF or have I gone wrong some where understanding the concept?
You augmented the grammar with E' -> E. Normally, you'd augment with a production like E' -> E $, where $ is a (terminal) symbol that doesn't otherwise occur in the grammar, and denotes end-of-input.
So I1 would actually be
E' -> E. $
E -> E. + T
and there isn't a conflict. (And I believe the grammar is LR(0).)
This is indeed a shift/reduce conflict. This grammar isn't LR(0). You can also see this because it's not prefix-free; the grammar contains multiple strings that are prefixes of one another, so it can't be LR(0).
That said, you can still construct all the LR(0) configurating sets and make the LR(0) automaton. It just won't be deterministic because of the shift/reduce conflict. It's possible, therefore, that you're right and the handout is right.
Hope this helps!
We find follow(A) in case we find a production of type
A → α
Can α here be ε?
Like In the below example:
P → aPa | bPb | ε
If α could be ε, it is not LR(1)
Yes, α can be ε. α represents an arbitrary string, and since ε is a string it is a possible α
Because of this, your grammar isn't LR(1), and therefore it isn't SLR(1) either (since all SLR(1) grammars are also LR(1)).
To see this, we can construct the LR(1) configurating sets:
(1) P' -> .P ($)
P -> .aPa ($)
P -> .bPb ($)
P -> . ($)
(2) P -> a.Pa ($)
P -> .aPa (a)
P -> .bPb (a)
P -> . (a)
At this point we can stop because there's a shift/reduce confict: we can't tell whether to shift a or reduce P → ε given the terminal a.
With some more advanced math, you can prove that there are no LR(1) grammars for this language (the language of all even-length palindromes). This follows because the languages with LR(1) grammars are precisely the deterministic context-free languages, and the set of all even-length palindromes is not a deterministic context-free language.
Hope this helps!
How do you identify whether a grammar is LL(1), LR(0), or SLR(1)?
Can anyone please explain it using this example, or any other example?
X → Yz | a
Y → bZ | ε
Z → ε
To check if a grammar is LL(1), one option is to construct the LL(1) parsing table and check for any conflicts. These conflicts can be
FIRST/FIRST conflicts, where two different productions would have to be predicted for a nonterminal/terminal pair.
FIRST/FOLLOW conflicts, where two different productions are predicted, one representing that some production should be taken and expands out to a nonzero number of symbols, and one representing that a production should be used indicating that some nonterminal should be ultimately expanded out to the empty string.
FOLLOW/FOLLOW conflicts, where two productions indicating that a nonterminal should ultimately be expanded to the empty string conflict with one another.
Let's try this on your grammar by building the FIRST and FOLLOW sets for each of the nonterminals. Here, we get that
FIRST(X) = {a, b, z}
FIRST(Y) = {b, epsilon}
FIRST(Z) = {epsilon}
We also have that the FOLLOW sets are
FOLLOW(X) = {$}
FOLLOW(Y) = {z}
FOLLOW(Z) = {z}
From this, we can build the following LL(1) parsing table:
a b z $
X a Yz Yz
Y bZ eps
Z eps
Since we can build this parsing table with no conflicts, the grammar is LL(1).
To check if a grammar is LR(0) or SLR(1), we begin by building up all of the LR(0) configurating sets for the grammar. In this case, assuming that X is your start symbol, we get the following:
(1)
X' -> .X
X -> .Yz
X -> .a
Y -> .
Y -> .bZ
(2)
X' -> X.
(3)
X -> Y.z
(4)
X -> Yz.
(5)
X -> a.
(6)
Y -> b.Z
Z -> .
(7)
Y -> bZ.
From this, we can see that the grammar is not LR(0) because there is a shift/reduce conflicts in state (1). Specifically, because we have the shift item X → .a and Y → ., we can't tell whether to shift the a or reduce the empty string. More generally, no grammar with ε-productions is LR(0).
However, this grammar might be SLR(1). To see this, we augment each reduction with the lookahead set for the particular nonterminals. This gives back this set of SLR(1) configurating sets:
(1)
X' -> .X
X -> .Yz [$]
X -> .a [$]
Y -> . [z]
Y -> .bZ [z]
(2)
X' -> X.
(3)
X -> Y.z [$]
(4)
X -> Yz. [$]
(5)
X -> a. [$]
(6)
Y -> b.Z [z]
Z -> . [z]
(7)
Y -> bZ. [z]
The shift/reduce conflict in state (1) has been eliminated because we only reduce when the lookahead is z, which doesn't conflict with any of the other items.
If you have no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, your grammar is LL(1).
An example of a FIRST/FIRST conflict:
S -> Xb | Yc
X -> a
Y -> a
By seeing only the first input symbol "a", you cannot know whether to apply the production S -> Xb or S -> Yc, because "a" is in the FIRST set of both X and Y.
An example of a FIRST/FOLLOW conflict:
S -> AB
A -> fe | ε
B -> fg
By seeing only the first input symbol "f", you cannot decide whether to apply the production A -> fe or A -> ε, because "f" is in both the FIRST set of A and the FOLLOW set of A (A can be parsed as ε/empty and B as f).
Notice that if you have no epsilon-productions you cannot have a FIRST/FOLLOW conflict.
Simple answer:A grammar is said to be an LL(1),if the associated LL(1) parsing table has atmost one production in each table entry.
Take the simple grammar A -->Aa|b.[A is non-terminal & a,b are terminals]
then find the First and follow sets A.
First{A}={b}.
Follow{A}={$,a}.
Parsing table for Our grammar.Terminals as columns and Nonterminal S as a row element.
a b $
--------------------------------------------
S | A-->a |
| A-->Aa. |
--------------------------------------------
As [S,b] contains two Productions there is a confusion as to which rule to choose.So it is not LL(1).
Some simple checks to see whether a grammar is LL(1) or not.
Check 1: The Grammar should not be left Recursive.
Example: E --> E+T. is not LL(1) because it is Left recursive.
Check 2: The Grammar should be Left Factored.
Left factoring is required when two or more grammar rule choices share a common prefix string.
Example: S-->A+int|A.
Check 3:The Grammar should not be ambiguous.
These are some simple checks.
LL(1) grammar is Context free unambiguous grammar which can be parsed by LL(1) parsers.
In LL(1)
First L stands for scanning input from Left to Right. Second L stands
for Left Most Derivation. 1 stands for using one input symbol at each
step.
For Checking grammar is LL(1) you can draw predictive parsing table. And if you find any multiple entries in table then you can say grammar is not LL(1).
Their is also short cut to check if the grammar is LL(1) or not .
Shortcut Technique
With these two steps we can check if it LL(1) or not.
Both of them have to be satisfied.
1.If we have the production:A->a1|a2|a3|a4|.....|an.
Then,First(a(i)) intersection First(a(j)) must be phi(empty set)[a(i)-a subscript i.]
2.For every non terminal 'A',if First(A) contains epsilon
Then First(A) intersection Follow(A) must be phi(empty set).