Determining the type of grammar [duplicate]

Determining the type of grammar [duplicate] - parsing

How do you identify whether a grammar is LL(1), LR(0), or SLR(1)?
Can anyone please explain it using this example, or any other example?
X → Yz | a
Y → bZ | ε
Z → ε

To check if a grammar is LL(1), one option is to construct the LL(1) parsing table and check for any conflicts. These conflicts can be
FIRST/FIRST conflicts, where two different productions would have to be predicted for a nonterminal/terminal pair.
FIRST/FOLLOW conflicts, where two different productions are predicted, one representing that some production should be taken and expands out to a nonzero number of symbols, and one representing that a production should be used indicating that some nonterminal should be ultimately expanded out to the empty string.
FOLLOW/FOLLOW conflicts, where two productions indicating that a nonterminal should ultimately be expanded to the empty string conflict with one another.
Let's try this on your grammar by building the FIRST and FOLLOW sets for each of the nonterminals. Here, we get that
FIRST(X) = {a, b, z}
FIRST(Y) = {b, epsilon}
FIRST(Z) = {epsilon}
We also have that the FOLLOW sets are
FOLLOW(X) = {$}
FOLLOW(Y) = {z}
FOLLOW(Z) = {z}
From this, we can build the following LL(1) parsing table:
a b z $
X a Yz Yz
Y bZ eps
Z eps
Since we can build this parsing table with no conflicts, the grammar is LL(1).
To check if a grammar is LR(0) or SLR(1), we begin by building up all of the LR(0) configurating sets for the grammar. In this case, assuming that X is your start symbol, we get the following:
(1)
X' -> .X
X -> .Yz
X -> .a
Y -> .
Y -> .bZ
(2)
X' -> X.
(3)
X -> Y.z
(4)
X -> Yz.
(5)
X -> a.
(6)
Y -> b.Z
Z -> .
(7)
Y -> bZ.
From this, we can see that the grammar is not LR(0) because there is a shift/reduce conflicts in state (1). Specifically, because we have the shift item X → .a and Y → ., we can't tell whether to shift the a or reduce the empty string. More generally, no grammar with ε-productions is LR(0).
However, this grammar might be SLR(1). To see this, we augment each reduction with the lookahead set for the particular nonterminals. This gives back this set of SLR(1) configurating sets:
(1)
X' -> .X
X -> .Yz [$]
X -> .a [$]
Y -> . [z]
Y -> .bZ [z]
(2)
X' -> X.
(3)
X -> Y.z [$]
(4)
X -> Yz. [$]
(5)
X -> a. [$]
(6)
Y -> b.Z [z]
Z -> . [z]
(7)
Y -> bZ. [z]
The shift/reduce conflict in state (1) has been eliminated because we only reduce when the lookahead is z, which doesn't conflict with any of the other items.

If you have no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, your grammar is LL(1).
An example of a FIRST/FIRST conflict:
S -> Xb | Yc
X -> a
Y -> a
By seeing only the first input symbol "a", you cannot know whether to apply the production S -> Xb or S -> Yc, because "a" is in the FIRST set of both X and Y.
An example of a FIRST/FOLLOW conflict:
S -> AB
A -> fe | ε
B -> fg
By seeing only the first input symbol "f", you cannot decide whether to apply the production A -> fe or A -> ε, because "f" is in both the FIRST set of A and the FOLLOW set of A (A can be parsed as ε/empty and B as f).
Notice that if you have no epsilon-productions you cannot have a FIRST/FOLLOW conflict.

Simple answer:A grammar is said to be an LL(1),if the associated LL(1) parsing table has atmost one production in each table entry.
Take the simple grammar A -->Aa|b.[A is non-terminal & a,b are terminals]
then find the First and follow sets A.
First{A}={b}.
Follow{A}={$,a}.
Parsing table for Our grammar.Terminals as columns and Nonterminal S as a row element.
a b $
--------------------------------------------
S | A-->a |
| A-->Aa. |
--------------------------------------------
As [S,b] contains two Productions there is a confusion as to which rule to choose.So it is not LL(1).
Some simple checks to see whether a grammar is LL(1) or not.
Check 1: The Grammar should not be left Recursive.
Example: E --> E+T. is not LL(1) because it is Left recursive.
Check 2: The Grammar should be Left Factored.
Left factoring is required when two or more grammar rule choices share a common prefix string.
Example: S-->A+int|A.
Check 3:The Grammar should not be ambiguous.
These are some simple checks.

LL(1) grammar is Context free unambiguous grammar which can be parsed by LL(1) parsers.
In LL(1)
First L stands for scanning input from Left to Right. Second L stands
for Left Most Derivation. 1 stands for using one input symbol at each
step.
For Checking grammar is LL(1) you can draw predictive parsing table. And if you find any multiple entries in table then you can say grammar is not LL(1).
Their is also short cut to check if the grammar is LL(1) or not .
Shortcut Technique

With these two steps we can check if it LL(1) or not.
Both of them have to be satisfied.
1.If we have the production:A->a1|a2|a3|a4|.....|an.
Then,First(a(i)) intersection First(a(j)) must be phi(empty set)[a(i)-a subscript i.]
2.For every non terminal 'A',if First(A) contains epsilon
Then First(A) intersection Follow(A) must be phi(empty set).

Related

Why this grammar has Reduce/Reduce conflict in LR(0)?

I have the following grammar:
S -> a b D E
S -> A B E F
D -> M x
E -> N y
F -> z
M -> epsilon
N -> epsilon
My textbook says there is a Reduce/Reduce conflict in LR(0). I built a diagram and found out that there is a state:
S -> a b . D E
S -> A B . E F
D -> . M x
E -> . N y
M -> .
N -> .
The textbook says that it's a Reduce/Reduce conflict. I'm trying to figure out why. If I build the SLR table I get the following row (3 is the state above):
That's because:
Follow(M)={x} so we can do reduce to rule 6 from state 3.
Follow(N)={y} so we can do reduce to rule 7 from state 3.
I was taught that there is a conflict S/R if there is a cell with S/R and conflict R/R if there is a cell with R/R. But I don't see two Rs in the same cell in the table. So why is it a reduce/reduce conflict?

You show an SLR(1) parsing table, in which the columns correspond to a lookahead of length 1. It's correct, and there is no conflict.
But here we're talking about an LR(0) machine, in which there is no lookahead. (That's the 0 in LR(0).) The only decision the machine can make is to shift or reduce, and since it cannot use lookahead, it can only use the state itself. A given state must be either a shift state or a reduce state (and, if a reduce state, which production is being reduced).
(In case it's confusing, and it often is, the concept of lookahead does not refer to the use of the shifted symbol to decide which state to transition to. The transition is taken based on the shifted symbol, which is at that point no longer part of the lookahead.)
So in that state, there is no possible shift action; in all items in the itemset, either the dot is at the end or the next symbol is a non-terminal (implying a GOTO action after returning from a reduce).
But the state does not have a unique reduction. Depending on the lookahead, the parsers needs to choose to reduce M or to reduce N. And since there is no lookahead, the decision cannot be made and hence it's a conflict.

SLR parsing conflicts with epsilon production

Consider the following grammar
S -> aPbSQ | a
Q -> tS | ε
P -> r
While constructing the DFA we can see there shall be a state which contains Items
Q -> .tS
Q -> . (epsilon as a blank string)
since t is in follow(Q) there appears to be a shift - reduce conflict.
Can we conclude the nature of the grammar isn't SLR(1) ?

(Please ignore my incorrect previous answer.)
Yes, the fact that you have a shift/reduce conflict in this configuring set is sufficient to show that this grammar isn't SLR(1).

How do I rewrite a context free grammar so that it is LR(1)?

For the given context free grammar:
S -> G $
G -> PG | P
P -> id : R
R -> id R | epsilon
How do I rewrite the grammar so that it is LR(1)?
The current grammar has shift/reduce conflicts when parsing the input "id : .id", where "." is the input pointer for the parser.
This grammar produces the language satisfying the regular expression (id:(id)*)+

It's easy enough to produce an LR(1) grammar for the same language. The trick is finding one which has a similar parse tree, or at least from which the original parse tree can be recovered easily.
Here's a manually generated grammar, which is slightly simplified from the general algorithm. In effect, we rewrite the regular expression:
(id:id*)+
to:
id(:id+)*:id*
which induces the grammar:
S → id G $
G → P G | P'
P' → : R'
P → : R
R' → ε | id R'
R → ε | id R
which is LALR(1).
In effect, we've just shifted all the productions one token to the right, and there is a general algorithm which can be used to create an LR(1) grammar from an LR(k+1) grammar for any k≥1. (The version of this algorithm I'm using comes from Parsing Theory by S. Sippu & E. Soisalon-Soininen, Vol II, section 6.7.)
The non-terminals of the new grammar will have the form (x, V, y) where V is a symbol from the original grammar (either a terminal or a non-terminal) and x and y are terminal sequences of maximum length k such that:
y ∈ FOLLOWk(V)
x ∈ FIRSTk(Vy)
(The lengths of y and consequently x might be less than k if the end of input is included in the follow set. Some people avoid this issue by adding k end symbols, but I think this version is just as simple.)
A non-terminal (x, V, y) will generate the x-derivative of the strings derived from Vy from the original grammar. Informally, the entire grammar is shifted k tokens to the right; each non-terminal matches a string which is missing the first k tokens but is augmented with the following k tokens.
The productions are generated mechanically from the original productions. First, we add a new start symbol, S' with productions:
S' → x (x, S, ε)
for every x ∈ FIRSTk(S). Then, for every production
T → V0 V1 … Vm
we generate the set of productions:
(x0,T,xm+1) → (x0,V0,x1) (x1,V1,x2) … (xm,Vm,xm+1)
and for every terminal A we generate the set of productions
(Ax,A,xB) → B if |x| = k
(Ax,A,x) → ε if |x| ≤ k
Since there is an obvious homomorphism from the productions in the new grammar to the productions in the old grammar, we can directly create the original parse tree, although we need to play some tricks with the semantic values in order to correctly attach them to the parse tree.

How to identify whether a grammar is LL(1), LR(0) or SLR(1)?

How do you identify whether a grammar is LL(1), LR(0), or SLR(1)?
Can anyone please explain it using this example, or any other example?
X → Yz | a
Y → bZ | ε
Z → ε

To check if a grammar is LL(1), one option is to construct the LL(1) parsing table and check for any conflicts. These conflicts can be
FIRST/FIRST conflicts, where two different productions would have to be predicted for a nonterminal/terminal pair.
FIRST/FOLLOW conflicts, where two different productions are predicted, one representing that some production should be taken and expands out to a nonzero number of symbols, and one representing that a production should be used indicating that some nonterminal should be ultimately expanded out to the empty string.
FOLLOW/FOLLOW conflicts, where two productions indicating that a nonterminal should ultimately be expanded to the empty string conflict with one another.
Let's try this on your grammar by building the FIRST and FOLLOW sets for each of the nonterminals. Here, we get that
FIRST(X) = {a, b, z}
FIRST(Y) = {b, epsilon}
FIRST(Z) = {epsilon}
We also have that the FOLLOW sets are
FOLLOW(X) = {$}
FOLLOW(Y) = {z}
FOLLOW(Z) = {z}
From this, we can build the following LL(1) parsing table:
a b z $
X a Yz Yz
Y bZ eps
Z eps
Since we can build this parsing table with no conflicts, the grammar is LL(1).
To check if a grammar is LR(0) or SLR(1), we begin by building up all of the LR(0) configurating sets for the grammar. In this case, assuming that X is your start symbol, we get the following:
(1)
X' -> .X
X -> .Yz
X -> .a
Y -> .
Y -> .bZ
(2)
X' -> X.
(3)
X -> Y.z
(4)
X -> Yz.
(5)
X -> a.
(6)
Y -> b.Z
Z -> .
(7)
Y -> bZ.
From this, we can see that the grammar is not LR(0) because there is a shift/reduce conflicts in state (1). Specifically, because we have the shift item X → .a and Y → ., we can't tell whether to shift the a or reduce the empty string. More generally, no grammar with ε-productions is LR(0).
However, this grammar might be SLR(1). To see this, we augment each reduction with the lookahead set for the particular nonterminals. This gives back this set of SLR(1) configurating sets:
(1)
X' -> .X
X -> .Yz [$]
X -> .a [$]
Y -> . [z]
Y -> .bZ [z]
(2)
X' -> X.
(3)
X -> Y.z [$]
(4)
X -> Yz. [$]
(5)
X -> a. [$]
(6)
Y -> b.Z [z]
Z -> . [z]
(7)
Y -> bZ. [z]
The shift/reduce conflict in state (1) has been eliminated because we only reduce when the lookahead is z, which doesn't conflict with any of the other items.

If you have no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, your grammar is LL(1).
An example of a FIRST/FIRST conflict:
S -> Xb | Yc
X -> a
Y -> a
By seeing only the first input symbol "a", you cannot know whether to apply the production S -> Xb or S -> Yc, because "a" is in the FIRST set of both X and Y.
An example of a FIRST/FOLLOW conflict:
S -> AB
A -> fe | ε
B -> fg
By seeing only the first input symbol "f", you cannot decide whether to apply the production A -> fe or A -> ε, because "f" is in both the FIRST set of A and the FOLLOW set of A (A can be parsed as ε/empty and B as f).
Notice that if you have no epsilon-productions you cannot have a FIRST/FOLLOW conflict.

Simple answer:A grammar is said to be an LL(1),if the associated LL(1) parsing table has atmost one production in each table entry.
Take the simple grammar A -->Aa|b.[A is non-terminal & a,b are terminals]
then find the First and follow sets A.
First{A}={b}.
Follow{A}={$,a}.
Parsing table for Our grammar.Terminals as columns and Nonterminal S as a row element.
a b $
--------------------------------------------
S | A-->a |
| A-->Aa. |
--------------------------------------------
As [S,b] contains two Productions there is a confusion as to which rule to choose.So it is not LL(1).
Some simple checks to see whether a grammar is LL(1) or not.
Check 1: The Grammar should not be left Recursive.
Example: E --> E+T. is not LL(1) because it is Left recursive.
Check 2: The Grammar should be Left Factored.
Left factoring is required when two or more grammar rule choices share a common prefix string.
Example: S-->A+int|A.
Check 3:The Grammar should not be ambiguous.
These are some simple checks.

LL(1) grammar is Context free unambiguous grammar which can be parsed by LL(1) parsers.
In LL(1)
First L stands for scanning input from Left to Right. Second L stands
for Left Most Derivation. 1 stands for using one input symbol at each
step.
For Checking grammar is LL(1) you can draw predictive parsing table. And if you find any multiple entries in table then you can say grammar is not LL(1).
Their is also short cut to check if the grammar is LL(1) or not .
Shortcut Technique

With these two steps we can check if it LL(1) or not.
Both of them have to be satisfied.
1.If we have the production:A->a1|a2|a3|a4|.....|an.
Then,First(a(i)) intersection First(a(j)) must be phi(empty set)[a(i)-a subscript i.]
2.For every non terminal 'A',if First(A) contains epsilon
Then First(A) intersection Follow(A) must be phi(empty set).

Conversion to Chomsky Normal Form

I do need your help.
I have these productions:
1) A--> aAb
2) A--> bAa
3) A--> ε
I should apply the Chomsky Normal Form (CNF).
In order to apply the above rule I should:
eliminate ε producions
eliminate unitary productions
remove useless symbols
Immediately I get stuck. The reason is that A is a nullable symbol (ε is part of its body)
Of course I can't remove the A symbol.
Can anyone help me to get the final solution?

As the Wikipedia notes, there are two definitions of Chomsky Normal Form, which differ in the treatment of ε productions. You will have to pick the one where these are allowed, as otherwise you will never get an equivalent grammar: your grammar produces the empty string, while a CNF grammar following the other definition isn't capable of that.

To begin conversion to Chomsky normal form (using Definition (1) provided by the Wikipedia page), you need to find an equivalent essentially noncontracting grammar. A grammar G with start symbol S is essentially noncontracting iff
1. S is not a recursive variable
2. G has no ε-rules other than S -> ε if ε ∈ L(G)
Calling your grammar G, an equivalent grammar G' with a non-recursive start symbol is:
G' : S -> A
A -> aAb | bAa | ε
Clearly, the set of nullable variables of G' is {S,A}, since A -> ε is a production in G' and S -> A is a chain rule. I assume that you have been given an algorithm for removing ε-rules from a grammar. That algorithm should produce a grammar similar to:
G'' : S -> A | ε
A -> aAb | bAa | ab | ba
The grammar G'' is essentially noncontracting; you can now apply the remaining algorithms to the grammar to find an equivalent grammar in Chomsky normal form.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Determining the type of grammar [duplicate] - parsing

How do you identify whether a grammar is LL(1), LR(0), or SLR(1)? Can anyone please explain it using this example, or any other example? X → Yz | a Y → bZ | ε Z → ε

Related

Why this grammar has Reduce/Reduce conflict in LR(0)?

SLR parsing conflicts with epsilon production

How do I rewrite a context free grammar so that it is LR(1)?

How to identify whether a grammar is LL(1), LR(0) or SLR(1)?

Conversion to Chomsky Normal Form

Categories

Resources