Conflict in CLR parsing - parsing

This is the grammar:
S' -> S
S-> aBc|bCc|aCd|bBd
B ->e
C ->e
I parsed in CLR then reduce/reduce conflict arose. What to do next? I have attached my solved problem below.

Anybody please tell me what to do next
Err... fix the conflict?
It's very clear even just from the last two productions, when the parser meets either c or d after e:
B -> e . {c, d}
C -> e . {c, d}
single lookahead is not enough to determine whether above condition should reduce to B or C.
Parser generators usually have a solution by taking the one that appears first in the grammar, but this is not always a good case. In above grammar, if this solution is taken, the parser won't be able to parse bec and aed due to e always reduces to B.
I suggest changing the grammar such that no conflict occurs. You know the whole grammar can only produce aec, bec, aed and bed. See what's better in the sequences to be made separate production that will reduce uniquely.

Related

Is it possible that FIRST SET contains same terminal more than one time

I am confused that can FIRST SET contains same terminal twice..
for example I have grammar
E->T+E|T FIRST(E)={a,a}
T->a FIRST(T)={a}
..
Is this correct? or I should write
FIRST(E)={a}
By definition sets can not contain the same element multiple times - this applies to first sets as much as any other set. So {a} is the proper way to write it.
I guess you're trying to compute the First and Follow sets, to construct the final predictive table, but generally, you need to resolve all the conflicts first, which are:
ε-derivation
Direct Left Recursion
Indirect Left Recursion
Ambiguous prefixes
In your example (Or part of it, I guess), you need to factor out ambiguous prefixes, the T.
E -> T E'
E' -> + E | ε
T -> a
Formally, for any non-terminal with derivation rules of the form A → αβ | αγ
1- Remove these 2 derivation rules
2- Create a rule A′ → β | γ
3- Create a rule A → α A′
Check out this Paper about Conflicts, it was very helpful for me, and you might as well check this slide and this, if you have any problem with top-down parsing.

Is this grammar LR(2) and how can i determine it?

to determine if my parser is working correctly i need to find a lr(2+) grammar. After a quick research i have found this grammar and i believe that it is lr(2). However, i am not sure how to determine this.
Terminals: b, e, o, r, s
NonTerminals: A, B, E, Q, SL
Start: P
Productions:
P -> A
A -> E B SL E | b e
B -> b | o r
E -> e | Ɛ
SL -> s SL | s
I would be glad, if someone is able to confirm or deny that this grammar is lr(2) and at best give me a brief explanation on how to determine it by myself.
Thank you very much!
I'm pretty sure it's LR(2), but I don't have an LR(2) parser generator handy to test it, which would be the definitive way to do the test. Of course, you could generate the parser tables by hand. It's not that complicated a grammar, so it shouldn't take you too long.
It's certainly not LR(1), as can be seen from the pair of inputs:
b e
b s e
The left-most derivations are:
P->A->b e
P->E B SL E->B SL E->b SL E->b s E->b s e
So at the beginning of the parse, the parser can either shift a b in order to follow the first derivation chain or reduce an empty sequence to E in order to proceed with the second derivation chain. The second token is needed to choose between these two options, hence a lookahead of at least 2 is required.
As a side note, it should be pretty simple to mine StackOverflow for LR(2) grammars; they come up from time to time in questions. Here's a few I found by searching for LALR(2): (I used a Google search with site:stackoverflow.com because SO's own search engine doesn't do well with search patterns which aren't words. Not that Google does it well, but it does do it better.)
Solving bison conflict over 2nd lookahead
Solving small shift reduce conflict
Persistent Shift - Reduce Conflict in Goldparser
How to reduce parser stack or 'unshift' the current token depending on what follows?
I didn't verify the claims in those questions and answers, and there are other questions which didn't seem to have as clear a result.
The most classic LALR(2) grammar is the grammar for Yacc itself, which is pretty ironic. Here's a simplified version:
grammar: %empty | grammar production
production: ID ':' symbols
symbols: %empty | symbols symbol
symbol: ID | QUOTED_LITERAL
That simple grammar leaves out actions and the optional semicolon. But it captures the essence of the LALR(2)-ness of the grammar, which is precisely the result of the semicolon being optional. That's not a complaint; the grammar is unambiguous so the semicolon really is redundant and no-one should be forced to type a redundant token :-)

Removing Ambiguity Caused By Dangling Else For LL(1) Grammars

In the case of the dangling else problem for compiler design, is there a reason to left factor it before removing ambiguity?
We are transforming a CFG into an LL(1) grammar so my professor is asking us to first eliminate recursion, then left factor, then remove ambiguity from our grammar. But, from what I've read, ambiguity is usually eliminated first. I'm not sure how to remove ambiguity after left factoring.
This is how what I got after left factoring it:
S -> i E t S S' | other
S' -> e S | epsilon
However, as I understand it, removing ambiguity requires a rewrite of the grammar so the grammar will always result similar to this right?
S -> U | M
M -> i E t M e M | other
U -> i E t U'
U' -> M e U | S
Or is there another way to do it? As far as I can see, this is the only way to remove ambiguity from the dangling else.
As it turns out, a good way to deal with ambiguity caused by a dangling else in an LL(1) is to handle it in the parser. Rewriting the grammar is also another way to handle it, as is adding 'begin' and 'end' in the grammar like so:
S -> i E t a S z S' | other
S' -> e S | epsilon
Although it might be intuitive for some, for other beginners, this is what the symbols mean:
S: Statement
i: if
E: Expression
t: then
a: begin
z: end
S': Statement'
e: else
other: any other productions
Note: lower case letters represent terminals; Uppercase letters represent variables.
If anything is wrong, please let me know and I'll correct it.
I think this can be a possible answer:
[After left factoring and making it unambiguous]
Let other = a
S -> iEtT | a
T -> S | aeS
I am generating all if's first and associating the else with the recent unassociated if .
If I have to get an else, I should be eliminating the possibility of getting a new if between the current unassociated if and corresponding else.
However I am allowing the possibility of getting an if after generating the corresponding else.
Point out if there are any errors.
Thank you.

Is this grammar LR(1)?

A bit confused about whether this grammar is ambiguous or not
C' -> C
C -> d C u C
C -> d C
C -> ε
I tried building the DFA for this but I get this in one of the states:
C -> d C DOT u C, $
C -> d C DOT, $
Isn't this a shift-reduce conflict, so surely it means the grammar is not LR(1)? Or does it reduce regardless since $ and u are both in the follow set of C?
It does have a shift-reduce conflict. Here's the state machine produced by selecting shift. The conflict is in state 4.
I should point out that your question is a bit off. A grammar can be unambiguous and still not LR(1).
But this one happens to be provably ambiguous. Consider the string ddudu. Two leftmost derivations are
C'->C->dCuC->ddCuCuC->dduCuC->ddudCuC->dduduC->ddudu
C'->C->dCuC->ddCuC->dduC->ddudCuC->dduduC->ddudu
The existence of these says the grammar is ambiguous.
Proving a general grammar ambiguous is an undecidable problem: there can be no algorithm for it. Happily this one is not so hard to sort out.

What are FIRST and FOLLOW sets used for in parsing?

What are FIRST and FOLLOW sets? What are they used for in parsing?
Are they used for top-down or bottom-up parsers?
Can anyone explain me FIRST and FOLLOW SETS for the following set of grammar rules:
E := E+T | T
T := T*V | T
V := <id>
They are typically used in LL (top-down) parsers to check if the running parser would encounter any situation where there is more than one way to continue parsing.
If you have the alternative A | B and also have FIRST(A) = {"a"} and FIRST(B) = {"b", "a"} then you would have a FIRST/FIRST conflict because when "a" comes next in the input you wouldn't know whether to expand A or B. (Assuming lookahead is 1).
On the other side if you have a Nonterminal that is nullable like AOpt: ("a")? then you have to make sure that FOLLOW(AOpt) doesn't contain "a" because otherwise you wouldn't know if to expand AOpt or not like here: S: AOpt "a" Either S or AOpt could consume "a" which gives us a FIRST/FOLLOW conflict.
FIRST sets can also be used during the parsing process for performance reasons. If you have a nullable nonterminal NullableNt you can expand it in order to see if it can consume anything, or it may be faster to check if FIRST(NullableNt) contains the next token and if not simply ignore it (backtracking vs predictive parsing). Another performance improvement would be to additionally provide the lexical scanner with the current FIRST set, so the scanner does not try all possible terminals but only those that are currently allowed by the context. This conflicts with reserved terminals but those are not always needed.
Bottom up parsers have different kinds of conflicts namely Reduce/Reduce and Shift/Reduce. They also use item sets to detect conflicts and not FIRST,FOLLOW.
Your grammar would't work with LL-parsers because it contains left recursion. But the FIRST sets for E, T and V would be {id} (assuming your T := T*V | T is meant to be T := T*V | V).
Answer :
E->E+T|T
left recursion
E->TE'
E'->+TE'|eipsilon
T->T*V|T
left recursion
T->VT'
T'->*VT'|epsilon
no left recursion in
V->(id)
Therefore the grammar is:
E->TE'
E'->+TE'|epsilon
T->VT'
T'->*VT'|epsilon
V-> (id)
FIRST(E)={(}
FIRST(E')={+,epsilon}
FIRST(T)={(}
FIRST(T')={*,epsilon}
FIRST(V)={(}
Starting Symbol=FOLLOW(E)={$}
E->TE',E'->TE'|epsilon:FOLLOW(E')=FOLLOW(E)={$}
E->TE',E'->+TE'|epsilon:FOLLOW(T)=FIRST(E')={+,$}
T->VT',T'->*VT'|epsilon:FOLLOW(T')=FOLLOW(T)={+,$}
T->VT',T->*VT'|epsilon:FOLLOW(V)=FIRST(T)={ *,epsilon}
Rules for First Sets
If X is a terminal then First(X) is just X!
If there is a Production X → ε then add ε to first(X)
If there is a Production X → Y1Y2..Yk then add first(Y1Y2..Yk) to first(X)
First(Y1Y2..Yk) is either
First(Y1) (if First(Y1) doesn't contain ε)
OR (if First(Y1) does contain ε) then First (Y1Y2..Yk) is everything in First(Y1) except for ε as well as everything in First(Y2..Yk)
If First(Y1) First(Y2)..First(Yk) all contain ε then add ε to First(Y1Y2..Yk) as well.
Rules for Follow Sets
First put $ (the end of input marker) in Follow(S) (S is the start symbol)
If there is a production A → aBb, (where a can be a whole string) then everything in FIRST(b) except for ε is placed in FOLLOW(B).
If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B)
If there is a production A → aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B)
Wikipedia is your friend. See discussion of LL parsers and first/follow sets.
Fundamentally they are used as the basic for parser construction, e.g., as part of parser generators. You can also use them to reason about properties of grammars, but most people don't have much of a need to do this.

Resources