Resolving reduce/reduce conflicts

Resolving reduce/reduce conflicts - parsing

We have a CFG grammar and we construct LR(1) Parsing Table. We see that one cell on the parsing table have a reduce - reduce conflict. Is it possible to solve this conflict by using more input symbols of lookahead at each step? I am asking this beacuse I think that by increasing lookahead symbols, we can(not always) only resolve shift - reduce conflicts.I mean the extra lookaheads in a reduce-reduce conflict doesn't help us. Am I right ?

It might be possible to solve a reduce/reduce conflict with more lookahead. It might also be possible to solve it by refactoring.
It really depends on the nature of the conflict. There is no general procedure.
An example of a reduce/reduce conflict which can be solved by extra lookahead:
A → something
B → A
C → A
D → B u v
D → C u w
Here, the last two productions of D are unambiguous, but the decision about reducing A to B or to C cannot be made when the u is seen. One more symbol of lookahead would do it, though, because the second next symbol determines the reduction.
The refactoring solution:
Au → A u
Bu → Au
Cu → Au
D → Bu v
D → Cu w
By deferring the B/C choice by one token, we've succeeded in removing the reduce/reduce conflict. Note that this solution will work even if u is not a single token; it could, for example, by a non-terminal. So this model may work in cases where simply increasing lookahead is not sufficient.

In general any conflict can be resolved by additional look ahead. In the extreme case you need to read to the end of the file. There is no significant difference between shift/reduce and reduce/reduce conflicts. Their resolution is kind of similar.
I wrote an article about conflict resolution. It proposes a method that allows finding out the reason of the conflict. In certain cases this helps to do refactoring of the grammar or defining the resolution strategy.
Please, take a look: http://cdsan.com/LinkPool/Data/2915/Conflicts-In-The-LR-Grammars.pdf
If you have questions, please let me know.

Related

What production rule should I use to reduce in bottom-up parsing?

So far, my understanding of the algorithm of bottom-up parsing is this.
shift a token into the stack
check the stack from top if some elements including the top can be reduced by some production rule
if the elements can be reduced, pop and push the left hand side of the production rule.
continue those steps until top is the start symbol and next input is EOF
So to support my question with an example grammar,
S → aABe
A → Abc
A → b
B → d
if we have input string as
abbcde$
we will shift a in stack
and because there are no production rule that reduces a, we shift the next token b.
Then we can find a production rule A → b and reduce b to A.
Then my question is this. We have aA on stack and the next input is b. Then how can the parser determine whether we reduce b to A we wait for c to come and use the rule A → Abc?
Well of course, reducing b to A at that point results in an error. But how does the parser know at that point that we should wait for c?
I'm sorry if I missed something while studying.

That's an excellent question, and it will be addressed in the next part of your course.
For now, it's sufficient to pretend that there is some magic black box which tells the parser when it should reduce (and, sometimes, which of several possible productions to use).
The various parsing algorithms explain the construction of this black box. Note that one possible solution is to fork reality and try both actions in parallel, but a more common solution is to process the grammar in order to work out how to predict the correct action.

How to parse this simple grammar? Is it ambiguous?

I'm delving deeper into parsing and came across an issue I don't quite understand. I made up the following grammar:
S = R | aSc
R = b | RbR
where S is the start symbol. It is possible to show that abbbc is a valid sentence based on this grammar, hopefully, that is correct but I may have completely missunderstood something. If I try to implement this using recursive descent I seem to have a problem when trying to parse abbbc, using left-derivation eg
S => aSc
aSc => aRc
at this point I would have thought that recursive descent would pick the first option in the second production because the next token is b leading to:
aRc => abc
and we're finished since there are no more non-terminals, which isn't of course abbbc. The only way to show that abbbc is valid is to pick the second option but with one lookahead I assume it would always pick b. I don't think the grammar is ambiguous unless I missed something. So what I am doing wrong?
Update: I came across this nice derivation app at https://web.stanford.edu/class/archive/cs/cs103/cs103.1156/tools/cfg/. I used to do a sanity check that abbbc is a valid sentence and it is.
Thinking more about this problem, is it true to say that I can't use LL(1) to parse this grammar but in fact need LL(2)? With two lookaheads I could correctly pick the second option in the second production because I now also know there are more tokens to be read and therefore picking b would prematurely terminate the derivation.

For starters, I’m glad you’re finding our CFG tool useful! A few of my TAs made that a while back and we’ve gotten a lot of mileage out of it.
Your grammar is indeed ambiguous. This stems from your R nonterminal:
R → b | RbR
Generally speaking, if you have recursive production rules with two copies of the same nonterminal in it, it will lead to ambiguities because there will be multiple options for how to apply the rule twice. For example, in this case, you can derive bbbbb by first expanding R to RbR, then either
expanding the left R to RbR and converting each R to a b, or
expanding the right R to RbR and converting each R to a b.
Because this grammar is ambiguous, it isn’t going to be LL(k) for any choice of k because all LL(k) grammars must be unambiguous. That means that stepping up the power of your parser won’t help here. You’ll need to rewrite the grammar to not be ambiguous.
The nonterminal R that you’ve described here generates strings of odd numbers of b’s in them, so we could try redesigning R to achieve this more directly. An initial try might be something like this:
R → b | bbR
This, unfortunately, isn’t LL(1), since after seeing a single b it’s unclear whether you’re supposed to apply the first production rule or the second. However, it is LL(2).
If you’d like an LL(1) grammar, you could do something like this:
R → bX
X → bbX | ε
This works by laying down a single b, then laying down as many optional pairs of b’s as you’d like.

why top down parser cannot handle left recursion?

I wanted to know why top down parsers cannot handle left recursion and we need to eliminate left recursion due to this as mentioned in dragon book..

Think of what it's doing. Suppose we have a left-recursive production rule A -> Aa | b, and right now we try to match that rule. So we're checking whether we can match an A here, but in order to do that, we must first check whether we can match an A here. That sounds impossible, and it mostly is. Using a recursive-descent parser, that obviously represents an infinite recursion.
It is possible using more advanced techniques that are still top-down, for example see [1] or [2].
[1]: Richard A. Frost and Rahmatullah Haﬁz. A new top-down parsing algorithm to accommodate ambiguity and left recursion in polynomial time. SIGPLAN Notices, 41(5):46–54, 2006.
[2]: R. Frost, R. Haﬁz, and P. Callaghan, Modular and efﬁcient top-down
parsing for ambiguous left-recursive grammars. ACL-IWPT, pp. 109 –
120, 2007.

Top-down parsers cannot handle left recursion
A top-down parser cannot handle left recursive productions. To understand why not, let's take a very simple left-recursive grammar.
S → a
S → S a
There is only one token, a, and only one nonterminal, S. So the parsing table has just one entry. Both productions must go into that one table entry.
The problem is that, on lookahead a, the parser cannot know if another a comes after the lookahead. But the decision of which production to use depends on that information.

What about theses grammars and the minimal parser to recognize it?

I'm trying to learn how to make a compiler. In order to do so, I read a lot about context-free language. But there are some things I cannot get by myself yet.
Since it's my first compiler there are some practices that I'm not aware of. My questions are asked with in mind to build a parser generator, not a compiler neither a lexer. Some questions may be obvious..
Among my reads are : Bottom-Up Parsing, Top-Down Parsing, Formal Grammars. The picture shown comes from : Miscellanous Parsing. All coming from the Stanford CS143 class.
Here are the points :
0) How do ( ambiguous / unambiguous ) and ( left-recursive / right-recursive ) influence the needs for one algorithm or another ? Are there other ways to qualify a grammar ?
1) An ambiguous grammar is one that have several parse trees. But shouldn't the choice of a leftmost-derivation or rightmost-derivation lead to unicity of the parse tree ?
[EDIT: Answered here ]
2.1) But still, is the ambiguity of the grammar related to k ? I mean giving a LR(2) grammar, is it ambiguous for a LR(1) parser and not ambiguous for a LR(2) one ?
[EDIT: No it's not, a LR(2) grammar means that the parser will need two tokens of lookahead to choose the right rule to use. On the other hand, an ambiguous grammar is one that possibly leads to several parse trees. ]
2.2) So a LR(*) parser, as long as you can imagine it, will have no ambiguous grammar at all and can then parse the entire set of context free languages ?
[EDIT: Answered by Ira Baxter, LR(*) is less powerful than GLR, in that it can't handle multiple parse trees. ]
3) Depending on the previous answers, what follows may be self contradictory. Considering LR parsing, do ambiguous grammars trigger shift-reduce conflict ? Can an unambiguous grammar trigger one too ? In the same way, what about reduce-reduce conflicts ?
[EDIT: this is it, ambiguous grammars leads to shift-reduce and reduce-reduce conflicts. By contrapositive, if there are no conflicts, the grammar is univocal. ]
4) The ability to parse left-recursive grammar is an advantage of LR(k) parser over LL(k), is it the only difference between them ?
[EDIT: yes. ]
5) Giving G1 :
G1 :
S -> S + S
S -> S - S
S -> a
5.1) G1 is both left-recursive, right-recursive, and ambiguous, am I right ? Is it a LR(2) grammar ? One would make it unambiguous :
G2 :
S -> S + a
S -> S - a
S -> a
5.2) Is G2 still ambiguous ? Does a parser for G2 needs two lookaheads ? By factorisation we have :
G3 :
S -> S V
V -> + a
V -> - a
S -> a
5.3) Now, does a parser for G3 need one lookahead only ? What are the counter parts for doing these transformations ? Is LR(1) the minimal parser required ?
5.4) G1 is left recursive, in order to parse it with a LL parser, one need to transform it into a right recursive grammar :
G4 :
S -> a + S
S -> a - S
S -> a
then
G5 :
S -> a V
V -> - V
V -> + V
V -> a
5.5) Does G4 need at least a LL(2) parser ? G5 only is parsable by a LL(1) parser, G1-G5 do define the same language, and this language is ( a (+/- a)^n ). Is it true ?
5.6) For each grammar G1 to G5, what is the minimal set to which it belongs ?
6) Finally, since many differents grammars may define the same language, how does one chose the grammar and the associated parser ? Is the resulting parse tree imortant ? What is the influence of the parse tree ?
I'm asking a lot, and I don't really expect a complete answer, anyway any help would be very appreciated.
Thx for reading !

"Many grammars may define the same langauge, how does one choose..."?
Usually, you choose the one that meets the following criteria:
conceptually as simple as you can make it (implication: smaller than others)
tracks the terminology in the langauge reference manual where possible
least amount of bending to meet the constraints of your parser generator
That last one can make a mess of your conceptual simplicity, and your chart of various parser styles shows the number of different issues that you face depending on your choice-of-generator. This is aggravated by the fact that choice is often made well before you actually choose the grammar.
One way to minimize grammar bending is to choose a parser generator which handles fully context-free grammars. GLR parsing has this very significant advantage. I've been using it for 15 years and have done dozens of real langauges with it.

Is this grammar SLR?

E -> A | B
A -> a | c
B -> b | c
My answer is no because it has a reduce/reduce conflict, can anyone else verify this?
Also I gained my answer through constructing the transition diagram, is there a simpler way of finding this out?
Thanks for the help!
P.S Would a Recursive Descent be able to parse this?

You're right -- starting from a 'c' in the input there's no way to decide whether to treat that as an 'A' or a 'B'. I doubt there's anything that can really parse this properly -- it's simply ambiguous. Using a different type of parser won't help; you really need to change the language.
There are some formal methods for detecting such ambiguities, but I can hardly imagine bother with them for a grammar this small. One easy way to spot this particular problem is to mentally arrange it into a tree:
The two lines coming up out of the 'c' box represent the reduce/reduce conflict. There's no reason to prefer one route from 'c' to 'E' over the other, so the grammar is ambiguous.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart