How to parse this simple grammar? Is it ambiguous? - parsing

I'm delving deeper into parsing and came across an issue I don't quite understand. I made up the following grammar:
S = R | aSc
R = b | RbR
where S is the start symbol. It is possible to show that abbbc is a valid sentence based on this grammar, hopefully, that is correct but I may have completely missunderstood something. If I try to implement this using recursive descent I seem to have a problem when trying to parse abbbc, using left-derivation eg
S => aSc
aSc => aRc
at this point I would have thought that recursive descent would pick the first option in the second production because the next token is b leading to:
aRc => abc
and we're finished since there are no more non-terminals, which isn't of course abbbc. The only way to show that abbbc is valid is to pick the second option but with one lookahead I assume it would always pick b. I don't think the grammar is ambiguous unless I missed something. So what I am doing wrong?
Update: I came across this nice derivation app at https://web.stanford.edu/class/archive/cs/cs103/cs103.1156/tools/cfg/. I used to do a sanity check that abbbc is a valid sentence and it is.
Thinking more about this problem, is it true to say that I can't use LL(1) to parse this grammar but in fact need LL(2)? With two lookaheads I could correctly pick the second option in the second production because I now also know there are more tokens to be read and therefore picking b would prematurely terminate the derivation.

For starters, I’m glad you’re finding our CFG tool useful! A few of my TAs made that a while back and we’ve gotten a lot of mileage out of it.
Your grammar is indeed ambiguous. This stems from your R nonterminal:
R → b | RbR
Generally speaking, if you have recursive production rules with two copies of the same nonterminal in it, it will lead to ambiguities because there will be multiple options for how to apply the rule twice. For example, in this case, you can derive bbbbb by first expanding R to RbR, then either
expanding the left R to RbR and converting each R to a b, or
expanding the right R to RbR and converting each R to a b.
Because this grammar is ambiguous, it isn’t going to be LL(k) for any choice of k because all LL(k) grammars must be unambiguous. That means that stepping up the power of your parser won’t help here. You’ll need to rewrite the grammar to not be ambiguous.
The nonterminal R that you’ve described here generates strings of odd numbers of b’s in them, so we could try redesigning R to achieve this more directly. An initial try might be something like this:
R → b | bbR
This, unfortunately, isn’t LL(1), since after seeing a single b it’s unclear whether you’re supposed to apply the first production rule or the second. However, it is LL(2).
If you’d like an LL(1) grammar, you could do something like this:
R → bX
X → bbX | ε
This works by laying down a single b, then laying down as many optional pairs of b’s as you’d like.

Related

What production rule should I use to reduce in bottom-up parsing?

So far, my understanding of the algorithm of bottom-up parsing is this.
shift a token into the stack
check the stack from top if some elements including the top can be reduced by some production rule
if the elements can be reduced, pop and push the left hand side of the production rule.
continue those steps until top is the start symbol and next input is EOF
So to support my question with an example grammar,
S → aABe
A → Abc
A → b
B → d
if we have input string as
abbcde$
we will shift a in stack
and because there are no production rule that reduces a, we shift the next token b.
Then we can find a production rule A → b and reduce b to A.
Then my question is this. We have aA on stack and the next input is b. Then how can the parser determine whether we reduce b to A we wait for c to come and use the rule A → Abc?
Well of course, reducing b to A at that point results in an error. But how does the parser know at that point that we should wait for c?
I'm sorry if I missed something while studying.
That's an excellent question, and it will be addressed in the next part of your course.
For now, it's sufficient to pretend that there is some magic black box which tells the parser when it should reduce (and, sometimes, which of several possible productions to use).
The various parsing algorithms explain the construction of this black box. Note that one possible solution is to fork reality and try both actions in parallel, but a more common solution is to process the grammar in order to work out how to predict the correct action.

Resolving reduce/reduce conflicts

We have a CFG grammar and we construct LR(1) Parsing Table. We see that one cell on the parsing table have a reduce - reduce conflict. Is it possible to solve this conflict by using more input symbols of lookahead at each step? I am asking this beacuse I think that by increasing lookahead symbols, we can(not always) only resolve shift - reduce conflicts.I mean the extra lookaheads in a reduce-reduce conflict doesn't help us. Am I right ?
It might be possible to solve a reduce/reduce conflict with more lookahead. It might also be possible to solve it by refactoring.
It really depends on the nature of the conflict. There is no general procedure.
An example of a reduce/reduce conflict which can be solved by extra lookahead:
A → something
B → A
C → A
D → B u v
D → C u w
Here, the last two productions of D are unambiguous, but the decision about reducing A to B or to C cannot be made when the u is seen. One more symbol of lookahead would do it, though, because the second next symbol determines the reduction.
The refactoring solution:
Au → A u
Bu → Au
Cu → Au
D → Bu v
D → Cu w
By deferring the B/C choice by one token, we've succeeded in removing the reduce/reduce conflict. Note that this solution will work even if u is not a single token; it could, for example, by a non-terminal. So this model may work in cases where simply increasing lookahead is not sufficient.
In general any conflict can be resolved by additional look ahead. In the extreme case you need to read to the end of the file. There is no significant difference between shift/reduce and reduce/reduce conflicts. Their resolution is kind of similar.
I wrote an article about conflict resolution. It proposes a method that allows finding out the reason of the conflict. In certain cases this helps to do refactoring of the grammar or defining the resolution strategy.
Please, take a look: http://cdsan.com/LinkPool/Data/2915/Conflicts-In-The-LR-Grammars.pdf
If you have questions, please let me know.

Issue with left recursion in top down parsing

I have read this to understand more the difference between top down and bottom up parsing, can anyone explain the problems associated with left recursion in a top down parser?
In a top-down parser, the parser begins with the start symbol and tries to guess which productions to apply to reach the input string. To do so, top-down parsers need to use contextual clues from the input string to guide its guesswork.
Most top-down parsers are directional parsers, which scan the input in some direction (typically, left to right) when trying to determine which productions to guess. The LL(k) family of parsers is one example of this - these parsers use information about the next k symbols of input to determine which productions to use.
Typically, the parser uses the next few tokens of input to guess productions by looking at which productions can ultimately lead to strings that start with the upcoming tokens. For example, if you had the production
A → bC
you wouldn't choose to use this production unless the next character to match was b. Otherwise, you'd be guaranteed there was a mismatch. Similarly, if the next input character was b, you might want to choose this production.
So where does left recursion come in? Well, suppose that you have these two productions:
A → Ab | b
This grammar generates all strings of one or more copies of the character b. If you see a b in the input as your next character, which production should you pick? If you choose Ab, then you're assuming there are multiple b's ahead of you even though you're not sure this is the case. If you choose b, you're assuming there's only one b ahead of you, which might be wrong. In other words, if you have to pick one of the two productions, you can't always choose correctly.
The issue with left recursion is that if you have a nonterminal that's left-recursive and find a string that might match it, you can't necessarily know whether to use the recursion to generate a longer string or avoid the recursion and generate a shorter string. Most top-down parsers will either fail to work for this reason (they'll report that there's some uncertainty about how to proceed and refuse to parse), or they'll potentially use extra memory to track each possible branch, running out of space.
In short, top-down parsers usually try to guess what to do from limited information about the string. Because of this, they get confused by left recursion because they can't always accurately predict which productions to use.
Hope this helps!
Reasons
1)The grammar which are left recursive(Direct/Indirect) can't be converted into {Greibach normal form (GNF)}* So the Left recursion can be eliminated to Right Recuraive Format.
2)Left Recursive Grammars are also nit LL(1),So again elimination of left Recursion may result into LL(1) grammer.
GNF
A Grammer of the form A->aV is Greibach Normal Form.

why top down parser cannot handle left recursion?

I wanted to know why top down parsers cannot handle left recursion and we need to eliminate left recursion due to this as mentioned in dragon book..
Think of what it's doing. Suppose we have a left-recursive production rule A -> Aa | b, and right now we try to match that rule. So we're checking whether we can match an A here, but in order to do that, we must first check whether we can match an A here. That sounds impossible, and it mostly is. Using a recursive-descent parser, that obviously represents an infinite recursion.
It is possible using more advanced techniques that are still top-down, for example see [1] or [2].
[1]: Richard A. Frost and Rahmatullah Hafiz. A new top-down parsing algorithm to accommodate ambiguity and left recursion in polynomial time. SIGPLAN Notices, 41(5):46–54, 2006.
[2]: R. Frost, R. Hafiz, and P. Callaghan, Modular and efficient top-down
parsing for ambiguous left-recursive grammars. ACL-IWPT, pp. 109 –
120, 2007.
Top-down parsers cannot handle left recursion
A top-down parser cannot handle left recursive productions. To understand why not, let's take a very simple left-recursive grammar.
S → a
S → S a
There is only one token, a, and only one nonterminal, S. So the parsing table has just one entry. Both productions must go into that one table entry.
The problem is that, on lookahead a, the parser cannot know if another a comes after the lookahead. But the decision of which production to use depends on that information.

Is this grammar SLR?

E -> A | B
A -> a | c
B -> b | c
My answer is no because it has a reduce/reduce conflict, can anyone else verify this?
Also I gained my answer through constructing the transition diagram, is there a simpler way of finding this out?
Thanks for the help!
P.S Would a Recursive Descent be able to parse this?
You're right -- starting from a 'c' in the input there's no way to decide whether to treat that as an 'A' or a 'B'. I doubt there's anything that can really parse this properly -- it's simply ambiguous. Using a different type of parser won't help; you really need to change the language.
There are some formal methods for detecting such ambiguities, but I can hardly imagine bother with them for a grammar this small. One easy way to spot this particular problem is to mentally arrange it into a tree:
The two lines coming up out of the 'c' box represent the reduce/reduce conflict. There's no reason to prefer one route from 'c' to 'E' over the other, so the grammar is ambiguous.

Resources