Removing ambiguity in a grammar

Removing ambiguity in a grammar - parsing

I have a grammar with one production rule:
S → aSbS | bSaS | ∈
This is ambiguous. Is it allowed to remove the ambiguity this way?
S → A|B|∈
A → aS
B → bS
This makes it unambiguous.
Another grammar:
S → A | B
A → aAb | ab
B → abB | ∈
Correction to make it unambiguous
S → A | B
A → aA'b
A' → ab
B → abB | ∈
I am not using any rules to make the grammars unambiguous. If it is wrong to remove ambiguity in a grammar this way, can anyone point a proper set of rules for removing ambiguity in ambiguous grammars?

As #kaby76 points out in a comment, in your first example, you haven't just removed the unambiguity. You have also changed the language recognised by the grammar.
S → a S b S
| b S a A
| ε
recognises only strings with the same number of a's and b's, while
S → A
| B
| ε
A → a S
B → b S
recognises any string made up of a's and b's.
So that's certainly not a legitimate disambiguation.
By the way, your second grammar could have been simplified; A and B serve no useful purpose.
S → a S
| b S
| ε
There are unambiguous grammars for this language. One example:
S → a A S
| b B S
A → a
| b A A
B → b
| a B B
See this post on the Computer Science StackExchange for an explanation.
In your second example, the grammar
S → A
| B
A → a A b
| a b
B → a b B
| ε
is ambiguous, but only because A and B both match ab. Every other recognised string has exactly one possible parse.
In this grammar, A matches strings which consist of some number of as followed by the same number of bs, while B matches strings which consist of any number of repetitions of ab.
ab fits both criteria: it is a repetition of ab (consisting of just one copy) and it is a sequence of as followed by the same number of bs (in this case, one of each). The empty string also matches both criteria (with repetition count 0), but it has been excluded from A by making the starting rule A → a b. An easy way to make the grammar unambiguous would be to change that base rule to A → a a b b.
Again, your disambiguated grammar does not recognise the same language as the original grammar. Your change makes A non-recursive, so that it only recognises aabb and now strings like aaabbb, aaaabbbb, and so on are not recognised. So once again, it is not simply an unambiguous version of the original.
Note that this grammar only matches strings with an equal number of as and bs, but it does not match all such strings. There are many other strings with an equal number of as and bs which are not matched, such as ba or aabbab. So its language is a subset of the language of the first grammar, but it is not the same language.
Finally, you ask if there is a mechanical procedure which can create an unambiguous grammar given an ambiguous grammar. The answer is no. It can be proven that there is no algorithm which can even decide whether a particular context-free grammar is ambiguous. Nor is there an algorithm which can decide whether two context-free grammars have the same language. So it shouldn't be surprising that there is no algorithm which can construct an equivalent unambiguous grammar from an ambiguous grammar.
That doesn't mean that the task cannot be done for certain grammars. But it's not a simple mechanical procedure, and it might not work for a particular grammar.

Related

Compilers, finding FIRST set for grammar

I am reading the famous violet dragon book 2nd edition, and can't get the example from page 65, about creating the FIRST set:
We have the following grammar (terminals are bolded):
stmt → expr;
| if ( expr ) stmt
| for ( optexpr ; optexpr ; optexpr ) stmt
| other
optexpr → ε
| expr
And the book suggests that the following is the correct calculation of FIRST:
FIRST(stmt) → {expr, if, for, other} // agree on this
FIRST(expr ;) → {expr} // Where does this come from?
As the comment suggests, from where does the second line come from?

There is no error in the textbook.
The FIRST function is defined (on page 64, emphasis added):
Let α be a string of grammar symbols (terminals and/or nonterminals). We define FIRST(α) to be the set of terminals that appear as the first symbols of one or more strings of terminals generated from α.
In this example, expr ; is a string of grammar symbols consisting of two terminals, so it is a possible value of α. Since it includes no nonterminals, it cannot only generate itself; thus the only string of terminals generated from that value of α is precisely expr ;, and the only terminal that will appear in FIRST(α) is the first symbol in that string, expr.
This may all seem to be belabouring the obvious, but it leads up to the important note just under the example you cite:
The FIRST sets must be considered if there are two productions A → α and A → β. Ignoring ε-productions for the moment, predictive parsing requires FIRST(α) and FIRST(β) to be disjoint.
Since expr ; is one of the possible right-hand sides for stmt, we need to compute its FIRST set (even though the computation in this case is trivial) in order to test this prerequisite.

Can an LL(1) grammar have multiple rules that begin with the same non-terminal?

Given a grammar G defined by
A -> Ca
B -> Cb
C -> e|f
Is this grammar LL(1)?
I realize that we could compress this down into a single line, but that's not the point of this question.
Mainly, can an LL(1) grammar have multiple rules that begin with the same non-terminal?
As a follow up question, how do I construct a parse table for the above grammar?
I've worked out the following:
FIRST(A) = {e,f}
FIRST(B) = {e,f}
FIRST(C) = {a,b}
FOLLOW(A) = {}
FOLLOW(B) = {}
FOLLOW(C) = {a,b}
I looked at this post, but didn't understand how they went from the FIRSTs and FOLLOWs to a table.

The grammar you've given is not LL(1) because there is a FIRST/FIRST conflict between the two productions A → Ca and A → Cb.
In general, grammars with multiple productions for the same nonterminal that begin with the same nonterminal will not be LL(1) because you will get a FIRST/FIRST conflict. There are grammars with this property that are LL(1), though they're essentially degenerate cases. Consider, for example, the following grammar:
A → Ea
A → Eb
E → ε
Here, the nonterminal E only expands out to the empty string ε and therefore, in effect, isn't really there. Therefore, the above grammar is LL(1) because there are no FIRST/FIRST conflicts between the two productions. To see this, here's the parse table:
a b $
A Ea Eb -
E ε ε -
Hope this helps!

I am solve your question in 2 case:
First case if the terminal is {a,b,e,f}
Second case if the terminal is {a,b,f} and e is epsilon
so no multipe entry this is LL(1).
For more information:
look at this explanation and example below:
Best regards

Determining FOLLOW sets of CFG

On this page the author explains how to determine the FOLLOW sets of a CFG. Under the headline Syntax Analysis Goal: FOLLOW Sets he states:
Steps to Make the Follow Set
Conventions: a, b, and c represent a terminal or non-terminal. a*
represents zero or more terminals or non-terminals (possibly both). a+
represents one or more... D is a non-terminal.
Place an End of Input token ($) into the starting rule's follow set.
Suppose we have a rule R → a*Db. Everything in First(b) (except for ε)
is added to Follow(D). If First(b) contains ε then everything in
Follow(R) is put in Follow(D).
Finally, if we have a rule R → a*D,
then everything in Follow(R) is placed in Follow(D).
The Follow set of
a terminal is an empty set.
So far so good. But in the box below this item, we read:
[...] Step 2 on rule 1 (N → V = E) indicates that first(=) is in Follow(V).
Now this is the part I don't understand. When he says that First(=) is in Follow (V), he obviously maps = to b and V to D (b and D from the explication in the first box). But (a*)(D)(b) does not match ()(V)(=)E.
Am I reading this completely wrong, or did the author maybe write a*Db instead of a*Dba*?
(Especially if you read this on wikipedia: "FOLLOW(I) of an Item I [A → α • B β, x] is the set of terminals that can appear after nonterminal B, where α, β are arbitrary symbol strings, and x is an arbitrary lookahead terminal.")

Yes, he meant:
R → a* D b*
and since b* could be zero symbols, i.e. ε, the second rule is unneeded. Remember that FIRST is defined on arbitrary sequences of symbols.
In other words, for:
A → α B β
Every terminal in FIRST(β) is in FOLLOW(B), and
If β ⇒* ε, then everything in FOLLOW(A) is in FOLLOW(B).
Here's what Aho, Sethi & Ullman say in the dragon book:
Formally, we say LR(1) item [A → α·β, a] is valid for a viable prefix γ if there is a derivation S ⇒* δAw ⇒ δαβw
where γ &equals; δα and either a is the first symbol of w or w is ε and a is $.
(The ⇒'s above are marked rm, meaning right-most derivation; in other words, in every derivation step, the right-most non-terminal is substituted with one of its productions. Consequently, w only contains terminals.)
In other words, the LR(1) item is valid (could apply) if we've reached some point where we've decided that A might be the next reduction and a might follow A; at the current point in the parse, we've read α. So if a follows β, then the reduction is possible. We don't yet know that, unless β is the empty sequence, but we need to remember the fact in case it turns out that β can derive the empty sequence.
I hope that helps. It's late here and I'm too tired to check it again. Maybe tomorrow...

How to identify whether a grammar is LL(1), LR(0) or SLR(1)?

How do you identify whether a grammar is LL(1), LR(0), or SLR(1)?
Can anyone please explain it using this example, or any other example?
X → Yz | a
Y → bZ | ε
Z → ε

To check if a grammar is LL(1), one option is to construct the LL(1) parsing table and check for any conflicts. These conflicts can be
FIRST/FIRST conflicts, where two different productions would have to be predicted for a nonterminal/terminal pair.
FIRST/FOLLOW conflicts, where two different productions are predicted, one representing that some production should be taken and expands out to a nonzero number of symbols, and one representing that a production should be used indicating that some nonterminal should be ultimately expanded out to the empty string.
FOLLOW/FOLLOW conflicts, where two productions indicating that a nonterminal should ultimately be expanded to the empty string conflict with one another.
Let's try this on your grammar by building the FIRST and FOLLOW sets for each of the nonterminals. Here, we get that
FIRST(X) = {a, b, z}
FIRST(Y) = {b, epsilon}
FIRST(Z) = {epsilon}
We also have that the FOLLOW sets are
FOLLOW(X) = {$}
FOLLOW(Y) = {z}
FOLLOW(Z) = {z}
From this, we can build the following LL(1) parsing table:
a b z $
X a Yz Yz
Y bZ eps
Z eps
Since we can build this parsing table with no conflicts, the grammar is LL(1).
To check if a grammar is LR(0) or SLR(1), we begin by building up all of the LR(0) configurating sets for the grammar. In this case, assuming that X is your start symbol, we get the following:
(1)
X' -> .X
X -> .Yz
X -> .a
Y -> .
Y -> .bZ
(2)
X' -> X.
(3)
X -> Y.z
(4)
X -> Yz.
(5)
X -> a.
(6)
Y -> b.Z
Z -> .
(7)
Y -> bZ.
From this, we can see that the grammar is not LR(0) because there is a shift/reduce conflicts in state (1). Specifically, because we have the shift item X → .a and Y → ., we can't tell whether to shift the a or reduce the empty string. More generally, no grammar with ε-productions is LR(0).
However, this grammar might be SLR(1). To see this, we augment each reduction with the lookahead set for the particular nonterminals. This gives back this set of SLR(1) configurating sets:
(1)
X' -> .X
X -> .Yz [$]
X -> .a [$]
Y -> . [z]
Y -> .bZ [z]
(2)
X' -> X.
(3)
X -> Y.z [$]
(4)
X -> Yz. [$]
(5)
X -> a. [$]
(6)
Y -> b.Z [z]
Z -> . [z]
(7)
Y -> bZ. [z]
The shift/reduce conflict in state (1) has been eliminated because we only reduce when the lookahead is z, which doesn't conflict with any of the other items.

If you have no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, your grammar is LL(1).
An example of a FIRST/FIRST conflict:
S -> Xb | Yc
X -> a
Y -> a
By seeing only the first input symbol "a", you cannot know whether to apply the production S -> Xb or S -> Yc, because "a" is in the FIRST set of both X and Y.
An example of a FIRST/FOLLOW conflict:
S -> AB
A -> fe | ε
B -> fg
By seeing only the first input symbol "f", you cannot decide whether to apply the production A -> fe or A -> ε, because "f" is in both the FIRST set of A and the FOLLOW set of A (A can be parsed as ε/empty and B as f).
Notice that if you have no epsilon-productions you cannot have a FIRST/FOLLOW conflict.

Simple answer:A grammar is said to be an LL(1),if the associated LL(1) parsing table has atmost one production in each table entry.
Take the simple grammar A -->Aa|b.[A is non-terminal & a,b are terminals]
then find the First and follow sets A.
First{A}={b}.
Follow{A}={$,a}.
Parsing table for Our grammar.Terminals as columns and Nonterminal S as a row element.
a b $
--------------------------------------------
S | A-->a |
| A-->Aa. |
--------------------------------------------
As [S,b] contains two Productions there is a confusion as to which rule to choose.So it is not LL(1).
Some simple checks to see whether a grammar is LL(1) or not.
Check 1: The Grammar should not be left Recursive.
Example: E --> E+T. is not LL(1) because it is Left recursive.
Check 2: The Grammar should be Left Factored.
Left factoring is required when two or more grammar rule choices share a common prefix string.
Example: S-->A+int|A.
Check 3:The Grammar should not be ambiguous.
These are some simple checks.

LL(1) grammar is Context free unambiguous grammar which can be parsed by LL(1) parsers.
In LL(1)
First L stands for scanning input from Left to Right. Second L stands
for Left Most Derivation. 1 stands for using one input symbol at each
step.
For Checking grammar is LL(1) you can draw predictive parsing table. And if you find any multiple entries in table then you can say grammar is not LL(1).
Their is also short cut to check if the grammar is LL(1) or not .
Shortcut Technique

With these two steps we can check if it LL(1) or not.
Both of them have to be satisfied.
1.If we have the production:A->a1|a2|a3|a4|.....|an.
Then,First(a(i)) intersection First(a(j)) must be phi(empty set)[a(i)-a subscript i.]
2.For every non terminal 'A',if First(A) contains epsilon
Then First(A) intersection Follow(A) must be phi(empty set).

Conversion to Chomsky Normal Form

I do need your help.
I have these productions:
1) A--> aAb
2) A--> bAa
3) A--> ε
I should apply the Chomsky Normal Form (CNF).
In order to apply the above rule I should:
eliminate ε producions
eliminate unitary productions
remove useless symbols
Immediately I get stuck. The reason is that A is a nullable symbol (ε is part of its body)
Of course I can't remove the A symbol.
Can anyone help me to get the final solution?

As the Wikipedia notes, there are two definitions of Chomsky Normal Form, which differ in the treatment of ε productions. You will have to pick the one where these are allowed, as otherwise you will never get an equivalent grammar: your grammar produces the empty string, while a CNF grammar following the other definition isn't capable of that.

To begin conversion to Chomsky normal form (using Definition (1) provided by the Wikipedia page), you need to find an equivalent essentially noncontracting grammar. A grammar G with start symbol S is essentially noncontracting iff
1. S is not a recursive variable
2. G has no ε-rules other than S -> ε if ε ∈ L(G)
Calling your grammar G, an equivalent grammar G' with a non-recursive start symbol is:
G' : S -> A
A -> aAb | bAa | ε
Clearly, the set of nullable variables of G' is {S,A}, since A -> ε is a production in G' and S -> A is a chain rule. I assume that you have been given an algorithm for removing ε-rules from a grammar. That algorithm should produce a grammar similar to:
G'' : S -> A | ε
A -> aAb | bAa | ab | ba
The grammar G'' is essentially noncontracting; you can now apply the remaining algorithms to the grammar to find an equivalent grammar in Chomsky normal form.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart