I wanted to know why top down parsers cannot handle left recursion and we need to eliminate left recursion due to this as mentioned in dragon book..
Think of what it's doing. Suppose we have a left-recursive production rule A -> Aa | b, and right now we try to match that rule. So we're checking whether we can match an A here, but in order to do that, we must first check whether we can match an A here. That sounds impossible, and it mostly is. Using a recursive-descent parser, that obviously represents an infinite recursion.
It is possible using more advanced techniques that are still top-down, for example see [1] or [2].
[1]: Richard A. Frost and Rahmatullah Hafiz. A new top-down parsing algorithm to accommodate ambiguity and left recursion in polynomial time. SIGPLAN Notices, 41(5):46–54, 2006.
[2]: R. Frost, R. Hafiz, and P. Callaghan, Modular and efficient top-down
parsing for ambiguous left-recursive grammars. ACL-IWPT, pp. 109 –
120, 2007.
Top-down parsers cannot handle left recursion
A top-down parser cannot handle left recursive productions. To understand why not, let's take a very simple left-recursive grammar.
S → a
S → S a
There is only one token, a, and only one nonterminal, S. So the parsing table has just one entry. Both productions must go into that one table entry.
The problem is that, on lookahead a, the parser cannot know if another a comes after the lookahead. But the decision of which production to use depends on that information.
Related
I'm delving deeper into parsing and came across an issue I don't quite understand. I made up the following grammar:
S = R | aSc
R = b | RbR
where S is the start symbol. It is possible to show that abbbc is a valid sentence based on this grammar, hopefully, that is correct but I may have completely missunderstood something. If I try to implement this using recursive descent I seem to have a problem when trying to parse abbbc, using left-derivation eg
S => aSc
aSc => aRc
at this point I would have thought that recursive descent would pick the first option in the second production because the next token is b leading to:
aRc => abc
and we're finished since there are no more non-terminals, which isn't of course abbbc. The only way to show that abbbc is valid is to pick the second option but with one lookahead I assume it would always pick b. I don't think the grammar is ambiguous unless I missed something. So what I am doing wrong?
Update: I came across this nice derivation app at https://web.stanford.edu/class/archive/cs/cs103/cs103.1156/tools/cfg/. I used to do a sanity check that abbbc is a valid sentence and it is.
Thinking more about this problem, is it true to say that I can't use LL(1) to parse this grammar but in fact need LL(2)? With two lookaheads I could correctly pick the second option in the second production because I now also know there are more tokens to be read and therefore picking b would prematurely terminate the derivation.
For starters, I’m glad you’re finding our CFG tool useful! A few of my TAs made that a while back and we’ve gotten a lot of mileage out of it.
Your grammar is indeed ambiguous. This stems from your R nonterminal:
R → b | RbR
Generally speaking, if you have recursive production rules with two copies of the same nonterminal in it, it will lead to ambiguities because there will be multiple options for how to apply the rule twice. For example, in this case, you can derive bbbbb by first expanding R to RbR, then either
expanding the left R to RbR and converting each R to a b, or
expanding the right R to RbR and converting each R to a b.
Because this grammar is ambiguous, it isn’t going to be LL(k) for any choice of k because all LL(k) grammars must be unambiguous. That means that stepping up the power of your parser won’t help here. You’ll need to rewrite the grammar to not be ambiguous.
The nonterminal R that you’ve described here generates strings of odd numbers of b’s in them, so we could try redesigning R to achieve this more directly. An initial try might be something like this:
R → b | bbR
This, unfortunately, isn’t LL(1), since after seeing a single b it’s unclear whether you’re supposed to apply the first production rule or the second. However, it is LL(2).
If you’d like an LL(1) grammar, you could do something like this:
R → bX
X → bbX | ε
This works by laying down a single b, then laying down as many optional pairs of b’s as you’d like.
I have been reading top down parsing from dragon book recently and one of questions asks to check if the given grammar is suitable for top down parsing or not. How to determine this? Are the following conditions for the grammar sufficient to be a valid one?
1) Left factored. 2) No Left Recursion. 3) Unambiguous.
A grammar that uses the leftmost derivation, is unambiguous, and has no left recursion is known as an LL(k) language. k is the amount of look-ahead used by the parser. Top down parsing uses LL(k) languages, so if the language is LL it should be top-down parsable.
sources:
http://www.csd.uwo.ca/~moreno/CS447/Lectures/Syntax.html/node14.html
https://en.wikipedia.org/wiki/Top-down_parsing
The grammar S -> a S a | a a generates all even-length strings of a's. We can devise a recursive-descent parser with backtrack for this
grammar.
If we choose to expand by production S -> aa first, then we shall
only recognize the string aa.
Thus, any reasonable recursive-descent parser will
try S -> aSa first.
Show that this recursive-descent parser recognizes inputs aa, aaaa, and
aaaaaaaa, but not aaaaaa.
The parser will try to invoke match(a);S();match(a); at first rather than match(a);match(a); as described in the problem. And note that as you try to recursively invokeS() inside the block match(a);S();match(a);, you have only invoked match(a) once, the 'a' symbol in the end is not consumed.
I'll paraphrase myself from this other answer.
It's actually a property of the "singleton match strategy" usual in naive implementations of recursive descent parsers with backtracking.
Quoting this answer by Dr Adrian Johnstone:
The trick here is to realise that many backtracking parsers use what we call a singleton match strategy, in which as soon as the parse function for a rule finds a match, it returns. In general, parse functions need to return a set of putative matches. Just try working it through, and you'll see that a singelton match parser misses some of the possible derivations.
Also, the images available in this answer will help you visualize what's happening in the case you exemplified.
According to me, the first condition for Recursive Decent Parser is that given grammar should not be Left Recursive and Non-deterministic. but here in question given grammar is non-deterministic so we need to convert that using left factoring and we'll get S->aS', S'->Sa|a and this will work I guess.
I want to solve this Grammar.
S->SS+
S->SS*
S->a
I want to construct SLR sets of items and parsing table with action and goto.
Can this grammar parse without eliminate left recursion.
Is this Grammar SLR.
No, this grammar is not SLR. It is ambiguous.
Left recursion is not a problem for LR parsers. Left recursion elimination is only necessary for LL parsers.
I am not entirely sure about this, but I think this grammar is actually SLR(1). I constructed by hand the SLR(1) table and I obtained one with no conflicts (having added a 0-transition from S' (new start symbol) -> S).
Can somebody provide a sentence that can be derived in two different ways from this grammar? I was able to get a parser for it in Bison without any warning. Are you sure it is ambiguous?
I have read this to understand more the difference between top down and bottom up parsing, can anyone explain the problems associated with left recursion in a top down parser?
In a top-down parser, the parser begins with the start symbol and tries to guess which productions to apply to reach the input string. To do so, top-down parsers need to use contextual clues from the input string to guide its guesswork.
Most top-down parsers are directional parsers, which scan the input in some direction (typically, left to right) when trying to determine which productions to guess. The LL(k) family of parsers is one example of this - these parsers use information about the next k symbols of input to determine which productions to use.
Typically, the parser uses the next few tokens of input to guess productions by looking at which productions can ultimately lead to strings that start with the upcoming tokens. For example, if you had the production
A → bC
you wouldn't choose to use this production unless the next character to match was b. Otherwise, you'd be guaranteed there was a mismatch. Similarly, if the next input character was b, you might want to choose this production.
So where does left recursion come in? Well, suppose that you have these two productions:
A → Ab | b
This grammar generates all strings of one or more copies of the character b. If you see a b in the input as your next character, which production should you pick? If you choose Ab, then you're assuming there are multiple b's ahead of you even though you're not sure this is the case. If you choose b, you're assuming there's only one b ahead of you, which might be wrong. In other words, if you have to pick one of the two productions, you can't always choose correctly.
The issue with left recursion is that if you have a nonterminal that's left-recursive and find a string that might match it, you can't necessarily know whether to use the recursion to generate a longer string or avoid the recursion and generate a shorter string. Most top-down parsers will either fail to work for this reason (they'll report that there's some uncertainty about how to proceed and refuse to parse), or they'll potentially use extra memory to track each possible branch, running out of space.
In short, top-down parsers usually try to guess what to do from limited information about the string. Because of this, they get confused by left recursion because they can't always accurately predict which productions to use.
Hope this helps!
Reasons
1)The grammar which are left recursive(Direct/Indirect) can't be converted into {Greibach normal form (GNF)}* So the Left recursion can be eliminated to Right Recuraive Format.
2)Left Recursive Grammars are also nit LL(1),So again elimination of left Recursion may result into LL(1) grammer.
GNF
A Grammer of the form A->aV is Greibach Normal Form.