LL parser grammar - parsing

I have this grammar below and trying to figure out if can be parsed using the LL parser? If not, please explain.
S --> ab | cB
A --> b | Bb
B --> aAb | cC
C --> cA | Aba
From what I understand the intersection of the two sets must be empty to pass the pairwise disjointness test.
But I am not sure where to begin and have been looking through my textbook and http://en.wikipedia.org/wiki/LL_parser#Parsing_procedure but can't quite understand or find any examples to follow along. I just need to see the procedure or steps to understand how to do other similar problems to this. Any help is appreciated.

Compute FIRST sets for all the non-terminals, and check to see if FIRST sets for the alternatives of a given non-terminal are all disjoint. If all are, it is LL, and if there are any non-terminals for which they are not it is not. If there are any ε rules, you'll need FOLLOW sets as well.
Computing FIRST1 sets is quite easy and will tell you if the grammar is LL(1). Computing FIRSTk sets is quite a bit more involved, but will tell you if the grammar is LL(k) for any specific k you calculate the FIRSTk sets for.

Related

Why bison does not convert grammar automatically?

I am learning lexer and parser, so I am reading this classical book : flex & bison (By John Levine, Publisher: O'Reilly Media).
An example is given that could not be parsed by bison :
phrase : cart_animal AND CART | work_animal AND PLOW
cart_animal-> HORSE | GOAT
work_animal -> HORSE | OX
I understand very well why it could not. Indeed, it requires TWO symbols of lookahead.
But, with a simple modification, it could be parsed :
phrase : cart_animal CART | work_animal PLOW
cart_animal-> HORSE AND | GOAT AND
work_animal -> HORSE AND | OX AND
I wonder why bison is not able to translate automatically grammar in simple cases like that ?
Because simple cases like that are pretty well all artificial, and in the case of real-life examples, it is difficult or impossible.
To be clear, if you have an LR(k) grammar with k>1 and you know the value of k, there is a mechanical transformation with which you can make an equivalent LR(1) grammar, and moreover you can, with some juggling, fix the reduction actions so that they have the same effect (at least, as long as they don't contain side effects). I don't know any parser generator which does that, in part because correctly translating the reduction actions will be tricky, and in part because the resulting LR(1) grammar is typically quite large, even for small values of k.
But, as I mentioned above, you need to know the value of k to perform this transformation, and it turns out that there is no algorithm which can take a grammar and tell you whether it is LR(k). So all you could do is try successively larger values of k until you find one which works, or you decide to give up.

How to parse this simple grammar? Is it ambiguous?

I'm delving deeper into parsing and came across an issue I don't quite understand. I made up the following grammar:
S = R | aSc
R = b | RbR
where S is the start symbol. It is possible to show that abbbc is a valid sentence based on this grammar, hopefully, that is correct but I may have completely missunderstood something. If I try to implement this using recursive descent I seem to have a problem when trying to parse abbbc, using left-derivation eg
S => aSc
aSc => aRc
at this point I would have thought that recursive descent would pick the first option in the second production because the next token is b leading to:
aRc => abc
and we're finished since there are no more non-terminals, which isn't of course abbbc. The only way to show that abbbc is valid is to pick the second option but with one lookahead I assume it would always pick b. I don't think the grammar is ambiguous unless I missed something. So what I am doing wrong?
Update: I came across this nice derivation app at https://web.stanford.edu/class/archive/cs/cs103/cs103.1156/tools/cfg/. I used to do a sanity check that abbbc is a valid sentence and it is.
Thinking more about this problem, is it true to say that I can't use LL(1) to parse this grammar but in fact need LL(2)? With two lookaheads I could correctly pick the second option in the second production because I now also know there are more tokens to be read and therefore picking b would prematurely terminate the derivation.
For starters, I’m glad you’re finding our CFG tool useful! A few of my TAs made that a while back and we’ve gotten a lot of mileage out of it.
Your grammar is indeed ambiguous. This stems from your R nonterminal:
R → b | RbR
Generally speaking, if you have recursive production rules with two copies of the same nonterminal in it, it will lead to ambiguities because there will be multiple options for how to apply the rule twice. For example, in this case, you can derive bbbbb by first expanding R to RbR, then either
expanding the left R to RbR and converting each R to a b, or
expanding the right R to RbR and converting each R to a b.
Because this grammar is ambiguous, it isn’t going to be LL(k) for any choice of k because all LL(k) grammars must be unambiguous. That means that stepping up the power of your parser won’t help here. You’ll need to rewrite the grammar to not be ambiguous.
The nonterminal R that you’ve described here generates strings of odd numbers of b’s in them, so we could try redesigning R to achieve this more directly. An initial try might be something like this:
R → b | bbR
This, unfortunately, isn’t LL(1), since after seeing a single b it’s unclear whether you’re supposed to apply the first production rule or the second. However, it is LL(2).
If you’d like an LL(1) grammar, you could do something like this:
R → bX
X → bbX | ε
This works by laying down a single b, then laying down as many optional pairs of b’s as you’d like.

why top down parser cannot handle left recursion?

I wanted to know why top down parsers cannot handle left recursion and we need to eliminate left recursion due to this as mentioned in dragon book..
Think of what it's doing. Suppose we have a left-recursive production rule A -> Aa | b, and right now we try to match that rule. So we're checking whether we can match an A here, but in order to do that, we must first check whether we can match an A here. That sounds impossible, and it mostly is. Using a recursive-descent parser, that obviously represents an infinite recursion.
It is possible using more advanced techniques that are still top-down, for example see [1] or [2].
[1]: Richard A. Frost and Rahmatullah Hafiz. A new top-down parsing algorithm to accommodate ambiguity and left recursion in polynomial time. SIGPLAN Notices, 41(5):46–54, 2006.
[2]: R. Frost, R. Hafiz, and P. Callaghan, Modular and efficient top-down
parsing for ambiguous left-recursive grammars. ACL-IWPT, pp. 109 –
120, 2007.
Top-down parsers cannot handle left recursion
A top-down parser cannot handle left recursive productions. To understand why not, let's take a very simple left-recursive grammar.
S → a
S → S a
There is only one token, a, and only one nonterminal, S. So the parsing table has just one entry. Both productions must go into that one table entry.
The problem is that, on lookahead a, the parser cannot know if another a comes after the lookahead. But the decision of which production to use depends on that information.

Is this grammar LL(1)?

I have derived the following grammar:
S -> a | aT
T -> b | bR
R -> cb | cbR
I understand that in order for a grammar to be LL(1) it has to be non-ambiguous and right-recursive. The problem is that I do not fully understand the concept of left-recursive and right-recursive grammars. I do not know whether or not the following grammar is right recursive. I would really appreciate a simple explanation of the concept of left-recursive and right-recursive grammars, and if my grammar is LL(1).
Many thanks.
This grammar is not LL(1). In an LL(1) parser, it should always be possible to determine which production to use next based on the current nonterminal symbol and the next token of the input.
Let's look at this production, for example:
S → a | aT
Now, suppose that I told you that the current nonterminal symbol is S and the next symbol of input was an a. Could you determine which production to use? Unfortunately, without more context, you couldn't do so: perhaps you're suppose to use S → a, and perhaps you're supposed to use S → aT. Using similar reasoning, you can see that all the other productions have similar problems.
This doesn't have anything to do with left or right recursion, but rather the fact that no two productions for the same nonterminal in an LL(1) grammar can have a nonempty common prefix. In fact, a simple heuristic for checking if a grammar is not LL(1) is to see if you can find two production rules like this.
Hope this helps!
The grammar has only a single recursive rule: the last one where R is the symbol on the left, and also appears on the right. It is right-recursive because in the grammar rule, R is the rightmost symbol. The rule refers to R, and that reference is rightmost.
The language is LL(1). How we know this is that we can easily construct a recursive descent parser that uses no backtracking and at most one token of lookahead.
But such a parser would be based on a slightly modified version of the grammar.
For instance the two productions: S -> a and S -> a T could be merged into a single one that can be expressed by the EBNF S -> a [ T ]. (S derives a, followed by optional T). This rule can be handled by a single parsing function for recognizing S.
The function matches a and then looks for the optional T, which would be indicated by the next input symbol being b.
We can write an LL(1) grammar for this, along these lines:
S -> a T_opt
T_opt -> b R_opt
T_opt -> <empty>
... et cetera
The optionality of T is handled explicitly, by making T (which we rename to T_opt) capable of deriving the empty string, and then condensing to a single rule for S, so that we don't have two phrases that both start with a.
So in summary, the language is LL(1), but the given grammar for it isn't. Since the language is LL(1) it is possible to find another grammar which is LL(1), and that grammar is not far off from the given one.

Is this grammar SLR?

E -> A | B
A -> a | c
B -> b | c
My answer is no because it has a reduce/reduce conflict, can anyone else verify this?
Also I gained my answer through constructing the transition diagram, is there a simpler way of finding this out?
Thanks for the help!
P.S Would a Recursive Descent be able to parse this?
You're right -- starting from a 'c' in the input there's no way to decide whether to treat that as an 'A' or a 'B'. I doubt there's anything that can really parse this properly -- it's simply ambiguous. Using a different type of parser won't help; you really need to change the language.
There are some formal methods for detecting such ambiguities, but I can hardly imagine bother with them for a grammar this small. One easy way to spot this particular problem is to mentally arrange it into a tree:
The two lines coming up out of the 'c' box represent the reduce/reduce conflict. There's no reason to prefer one route from 'c' to 'E' over the other, so the grammar is ambiguous.

Resources