SLR(1) parser with epsilon - parsing

let's suppose I have the following grammar:
E --> TX
T --> (E) | int Y
X --> + E | ε
Y --> * T | ε
Building the item sets I get a state like this one:
T --> int . Y
Y --> . * T
Y --> .
This state is adequate or not? That is, the grammar is SLR(1) or not?
Thanks

Yes the entries specified by you in the state absolutely correct.
T->int.Y
Y->.*T
Y->.
This is the 5th state in the DFA created for SLR(1) parser for given grammar.
Confusion may have arise in Y->Ɛ . When you place a dot in the augmented productions for example S->A.B it means that A is completed and B is yet to be completed (by completion here means progress in parsing) . Similarly if you write Y->.Ɛ , it means Ɛ is yet to be over, but we also know that Ɛ is null string i.e nothing therefore Y->.Ɛ is interpreted as Y->.
I created the DFA (13 states) for this grammar and found that the given grammar is SLR(1) as there is no Reduce-Reduce or Shift-Reduce conflict.

you have to construct the FOLLOW sets and see if FOLLOW(Y) contains int or *. if that was the case there'd be a shift/reduce conflict and the grammar wouldn't be SLR(1).
check all the states and if none has conflicts the grammar is SLR(1).

Related

SLR parsing conflicts with epsilon production

Consider the following grammar
S -> aPbSQ | a
Q -> tS | ε
P -> r
While constructing the DFA we can see there shall be a state which contains Items
Q -> .tS
Q -> . (epsilon as a blank string)
since t is in follow(Q) there appears to be a shift - reduce conflict.
Can we conclude the nature of the grammar isn't SLR(1) ?
(Please ignore my incorrect previous answer.)
Yes, the fact that you have a shift/reduce conflict in this configuring set is sufficient to show that this grammar isn't SLR(1).

LR(0) or SLR(1) or LALR(1)

I am badly stuck on a question i am attempting from a sample final exam of compilers. I will really appreciate if someone can help me out with an explanation. Thanks
Consider the grammar G listed below
S = E $
E = E + T | T
T = T * F | F
F = ident | ( E )
Where + * ident ( ) are terminal symbols and $ is end of file.
a) is this grammar LR( 0 )? Justify your answer.
b) is the grammar SLR( 1 ) ? Justify your answer.
c) is this grammar LALR( 1 )? Justify your answer.
If you can show that the grammar is LR(0) then of course it is SLR(1) and LALR(1) because LR(0) is more restrictive.
Unfortunately, the grammar isn't LR(0).
For instance suppose you have just recognized E:
S -> E . $
You must not reduce this E to S if what follows is a + or * symbol, because E can be followed by + or * which continue to build a larger expression:
S -> E . $
E -> E . + T
T -> T . * F
This requires us to look ahead one token to know what to do in that state: to shift (+ or *) or reduce ($).
SLR(1) adds lookahead, and makes use of the follow-set information to make reductions (better than nothing, but the follow-set information globally obtained from the grammar is not context sensitive, like the state-specific lookahead sets in LALR(1)).
Under SLR(1), the above conflict goes away, because the S -> E reduction is considered only when the lookahead symbol is in the follow set of S, and the only thing in the follow set of S is the EOF symbol $. If the input symbol is not $, like +, then the reduction is not considered; a shift takes place which doesn't conflict with the reduction.
So the grammar does not fail to be SLR(1) on account of that conflict. It might, however, have some other conflict. Glancing through it, I can't see one; but to "justify that answer" properly, you have to generate all of the LR(0) state items, and go through the routine of verifying that the SLR(1) constraints are not violated. (You use the simple LR(0) items for SLR(1) because SLR(1) doesn't augment these items in any new way. Remember, it just uses the follow-set information cribbed from the grammar to eliminate conflicts.)
If it is SLR(1) then LALR(1) falls by subset relationship.
Update
The Red Dragon Book (Compilers: Principles, Techniques and Tools, Aho, Sethi, Ullman, 1988) uses exactly the same grammar in a set of examples that show the derivation of the canonical LR(0) item sets and the associated DFA, and some of the steps of filling in the parsing tables. This is in section 4.7, starting with example 4.34.

How to identify whether a grammar is LL(1), LR(0) or SLR(1)?

How do you identify whether a grammar is LL(1), LR(0), or SLR(1)?
Can anyone please explain it using this example, or any other example?
X → Yz | a
Y → bZ | ε
Z → ε
To check if a grammar is LL(1), one option is to construct the LL(1) parsing table and check for any conflicts. These conflicts can be
FIRST/FIRST conflicts, where two different productions would have to be predicted for a nonterminal/terminal pair.
FIRST/FOLLOW conflicts, where two different productions are predicted, one representing that some production should be taken and expands out to a nonzero number of symbols, and one representing that a production should be used indicating that some nonterminal should be ultimately expanded out to the empty string.
FOLLOW/FOLLOW conflicts, where two productions indicating that a nonterminal should ultimately be expanded to the empty string conflict with one another.
Let's try this on your grammar by building the FIRST and FOLLOW sets for each of the nonterminals. Here, we get that
FIRST(X) = {a, b, z}
FIRST(Y) = {b, epsilon}
FIRST(Z) = {epsilon}
We also have that the FOLLOW sets are
FOLLOW(X) = {$}
FOLLOW(Y) = {z}
FOLLOW(Z) = {z}
From this, we can build the following LL(1) parsing table:
a b z $
X a Yz Yz
Y bZ eps
Z eps
Since we can build this parsing table with no conflicts, the grammar is LL(1).
To check if a grammar is LR(0) or SLR(1), we begin by building up all of the LR(0) configurating sets for the grammar. In this case, assuming that X is your start symbol, we get the following:
(1)
X' -> .X
X -> .Yz
X -> .a
Y -> .
Y -> .bZ
(2)
X' -> X.
(3)
X -> Y.z
(4)
X -> Yz.
(5)
X -> a.
(6)
Y -> b.Z
Z -> .
(7)
Y -> bZ.
From this, we can see that the grammar is not LR(0) because there is a shift/reduce conflicts in state (1). Specifically, because we have the shift item X → .a and Y → ., we can't tell whether to shift the a or reduce the empty string. More generally, no grammar with ε-productions is LR(0).
However, this grammar might be SLR(1). To see this, we augment each reduction with the lookahead set for the particular nonterminals. This gives back this set of SLR(1) configurating sets:
(1)
X' -> .X
X -> .Yz [$]
X -> .a [$]
Y -> . [z]
Y -> .bZ [z]
(2)
X' -> X.
(3)
X -> Y.z [$]
(4)
X -> Yz. [$]
(5)
X -> a. [$]
(6)
Y -> b.Z [z]
Z -> . [z]
(7)
Y -> bZ. [z]
The shift/reduce conflict in state (1) has been eliminated because we only reduce when the lookahead is z, which doesn't conflict with any of the other items.
If you have no FIRST/FIRST conflicts and no FIRST/FOLLOW conflicts, your grammar is LL(1).
An example of a FIRST/FIRST conflict:
S -> Xb | Yc
X -> a
Y -> a
By seeing only the first input symbol "a", you cannot know whether to apply the production S -> Xb or S -> Yc, because "a" is in the FIRST set of both X and Y.
An example of a FIRST/FOLLOW conflict:
S -> AB
A -> fe | ε
B -> fg
By seeing only the first input symbol "f", you cannot decide whether to apply the production A -> fe or A -> ε, because "f" is in both the FIRST set of A and the FOLLOW set of A (A can be parsed as ε/empty and B as f).
Notice that if you have no epsilon-productions you cannot have a FIRST/FOLLOW conflict.
Simple answer:A grammar is said to be an LL(1),if the associated LL(1) parsing table has atmost one production in each table entry.
Take the simple grammar A -->Aa|b.[A is non-terminal & a,b are terminals]
then find the First and follow sets A.
First{A}={b}.
Follow{A}={$,a}.
Parsing table for Our grammar.Terminals as columns and Nonterminal S as a row element.
a b $
--------------------------------------------
S | A-->a |
| A-->Aa. |
--------------------------------------------
As [S,b] contains two Productions there is a confusion as to which rule to choose.So it is not LL(1).
Some simple checks to see whether a grammar is LL(1) or not.
Check 1: The Grammar should not be left Recursive.
Example: E --> E+T. is not LL(1) because it is Left recursive.
Check 2: The Grammar should be Left Factored.
Left factoring is required when two or more grammar rule choices share a common prefix string.
Example: S-->A+int|A.
Check 3:The Grammar should not be ambiguous.
These are some simple checks.
LL(1) grammar is Context free unambiguous grammar which can be parsed by LL(1) parsers.
In LL(1)
First L stands for scanning input from Left to Right. Second L stands
for Left Most Derivation. 1 stands for using one input symbol at each
step.
For Checking grammar is LL(1) you can draw predictive parsing table. And if you find any multiple entries in table then you can say grammar is not LL(1).
Their is also short cut to check if the grammar is LL(1) or not .
Shortcut Technique
With these two steps we can check if it LL(1) or not.
Both of them have to be satisfied.
1.If we have the production:A->a1|a2|a3|a4|.....|an.
Then,First(a(i)) intersection First(a(j)) must be phi(empty set)[a(i)-a subscript i.]
2.For every non terminal 'A',if First(A) contains epsilon
Then First(A) intersection Follow(A) must be phi(empty set).

Conversion to Chomsky Normal Form

I do need your help.
I have these productions:
1) A--> aAb
2) A--> bAa
3) A--> ε
I should apply the Chomsky Normal Form (CNF).
In order to apply the above rule I should:
eliminate ε producions
eliminate unitary productions
remove useless symbols
Immediately I get stuck. The reason is that A is a nullable symbol (ε is part of its body)
Of course I can't remove the A symbol.
Can anyone help me to get the final solution?
As the Wikipedia notes, there are two definitions of Chomsky Normal Form, which differ in the treatment of ε productions. You will have to pick the one where these are allowed, as otherwise you will never get an equivalent grammar: your grammar produces the empty string, while a CNF grammar following the other definition isn't capable of that.
To begin conversion to Chomsky normal form (using Definition (1) provided by the Wikipedia page), you need to find an equivalent essentially noncontracting grammar. A grammar G with start symbol S is essentially noncontracting iff
1. S is not a recursive variable
2. G has no ε-rules other than S -> ε if ε ∈ L(G)
Calling your grammar G, an equivalent grammar G' with a non-recursive start symbol is:
G' : S -> A
A -> aAb | bAa | ε
Clearly, the set of nullable variables of G' is {S,A}, since A -> ε is a production in G' and S -> A is a chain rule. I assume that you have been given an algorithm for removing ε-rules from a grammar. That algorithm should produce a grammar similar to:
G'' : S -> A | ε
A -> aAb | bAa | ab | ba
The grammar G'' is essentially noncontracting; you can now apply the remaining algorithms to the grammar to find an equivalent grammar in Chomsky normal form.

Shift-reduce: when to stop reducing?

I'm trying to learn about shift-reduce parsing. Suppose we have the following grammar, using recursive rules that enforce order of operations, inspired by the ANSI C Yacc grammar:
S: A;
P
: NUMBER
| '(' S ')'
;
M
: P
| M '*' P
| M '/' P
;
A
: M
| A '+' M
| A '-' M
;
And we want to parse 1+2 using shift-reduce parsing. First, the 1 is shifted as a NUMBER. My question is, is it then reduced to P, then M, then A, then finally S? How does it know where to stop?
Suppose it does reduce all the way to S, then shifts '+'. We'd now have a stack containing:
S '+'
If we shift '2', the reductions might be:
S '+' NUMBER
S '+' P
S '+' M
S '+' A
S '+' S
Now, on either side of the last line, S could be P, M, A, or NUMBER, and it would still be valid in the sense that any combination would be a correct representation of the text. How does the parser "know" to make it
A '+' M
So that it can reduce the whole expression to A, then S? In other words, how does it know to stop reducing before shifting the next token? Is this a key difficulty in LR parser generation?
Edit: An addition to the question follows.
Now suppose we parse 1+2*3. Some shift/reduce operations are as follows:
Stack | Input | Operation
---------+-------+----------------------------------------------
| 1+2*3 |
NUMBER | +2*3 | Shift
A | +2*3 | Reduce (looking ahead, we know to stop at A)
A+ | 2*3 | Shift
A+NUMBER | *3 | Shift (looking ahead, we know to stop at M)
A+M | *3 | Reduce (looking ahead, we know to stop at M)
Is this correct (granted, it's not fully parsed yet)? Moreover, does lookahead by 1 symbol also tell us not to reduce A+M to A, as doing so would result in an inevitable syntax error after reading *3 ?
The problem you're describing is an issue with creating LR(0) parsers - that is, bottom-up parsers that don't do any lookahead to symbols beyond the current one they are parsing. The grammar you've described doesn't appear to be an LR(0) grammar, which is why you run into trouble when trying to parse it w/o lookahead. It does appear to be LR(1), however, so by looking 1 symbol ahead in the input you could easily determine whether to shift or reduce. In this case, an LR(1) parser would look ahead when it had the 1 on the stack, see that the next symbol is a +, and realize that it shouldn't reduce past A (since that is the only thing it could reduce to that would still match a rule with + in the second position).
An interesting property of LR grammars is that for any grammar which is LR(k) for k>1, it is possible to construct an LR(1) grammar which is equivalent. However, the same does not extend all the way down to LR(0) - there are many grammars which cannot be converted to LR(0).
See here for more details on LR(k)-ness:
http://en.wikipedia.org/wiki/LR_parser
I'm not exactly sure of the Yacc / Bison parsing algorithm and when it prefers shifting over reducing, however I know that Bison supports LR(1) parsing which means it has a lookahead token. This means that tokens aren't passed to the stack immediately. Rather they wait until no more reductions can happen. Then, if shifting the next token makes sense it applies that operation.
First of all, in your case, if you're evaluating 1 + 2, it will shift 1. It will reduce that token to an A because the '+' lookahead token indicates that its the only valid course. Since there are no more reductions, it will shift the '+' token onto the stack and hold 2 as the lookahead. It will shift the 2 and reduce to an M since A + M produces an A and the expression is complete.

Resources