Is this grammar SLR? - parsing

E -> A | B
A -> a | c
B -> b | c
My answer is no because it has a reduce/reduce conflict, can anyone else verify this?
Also I gained my answer through constructing the transition diagram, is there a simpler way of finding this out?
Thanks for the help!
P.S Would a Recursive Descent be able to parse this?

You're right -- starting from a 'c' in the input there's no way to decide whether to treat that as an 'A' or a 'B'. I doubt there's anything that can really parse this properly -- it's simply ambiguous. Using a different type of parser won't help; you really need to change the language.
There are some formal methods for detecting such ambiguities, but I can hardly imagine bother with them for a grammar this small. One easy way to spot this particular problem is to mentally arrange it into a tree:
The two lines coming up out of the 'c' box represent the reduce/reduce conflict. There's no reason to prefer one route from 'c' to 'E' over the other, so the grammar is ambiguous.

Related

Are the following grammars LL(1)?

S -> Abc|aAcb
A -> b|c|ε
I think the first one is LL(1)
S -> aAS|b
A -> a|bSA
But the problem is second one. There's no conflict problem, but I think it doesn't satisfy right-recursion.
I'm not sure about those problems.
The first grammar is not LL(1), because of several conflicts, as for example for input bc, an LL parser will need 2 tokens of look-ahead to parse it:
enter in rule S
enter in rule A
recognize character b
load the next token c
exit rule A
another b is expected, but the current token is c
go back into A
move one token backwards, again to b
exit rule A without recognizing anything (because of the epsilon)
recognize the b character after the reference to rule A that was "jumped" without the use of any token
load the next token c
recognize c
success
You have a similar case in the second alternative of S for an input acb. The grammar is not ambiguous, because in the end there is only one possible syntax tree. Its not LL(1), as in fact is LL(2).
The second grammars is deterministic - there is only one way to parse any input that is valid according to the grammar. This means that it can be used for parsing by an LL(1) parser.
I have made a tool (Tunnel Grammar Studio) that detects the grammar conflicts for nondeterministic grammars and generates parsers. This grammar in ABNF (RFC 5234) like syntax is:
S = 'a' A S / 'b'
A = 'a' / 'b' S A
The right recursion by itself does not create ambiguities inside the grammar. One way to have a right recursion ambiguity is to have some dangling element as in this grammar:
S = 'c' / 'a' S 0*1 'b'
You can read it as: rule S recognizes character c or character a followed by rule S itself and maybe (zero or one time) followed by character b.
The grammar up has a right recursion related ambiguity because of the dangling b character. Meaning that for an input aacb, there is more then one way to parse it: recognize the first a in S; enter into S again; recognize the second a; enter again into S and recognize c; exit S one time; then there are two choices:
case one) recognize the b character, exit S two times or case two) first exit S one time and then recognize the b character. Both cases are these (screen shots from the visual debugger of TGS):
This grammar is thus ambiguous (i.e. not LL(1)) because more then one syntax tree can be generated for some valid inputs. For this input the possible trees are only two, but for an input aaacb there are three trees, as there are three trees for aaacbb, because of the possible 3 places where you can 'attach' the two b characters, two of this places will have b, and one will remain empty. For input aaacbbb there is of course only one possible syntax tree, but the grammar is defined to be ambiguous if there is at least one input for which there is more then one possible syntax tree.

How to parse this simple grammar? Is it ambiguous?

I'm delving deeper into parsing and came across an issue I don't quite understand. I made up the following grammar:
S = R | aSc
R = b | RbR
where S is the start symbol. It is possible to show that abbbc is a valid sentence based on this grammar, hopefully, that is correct but I may have completely missunderstood something. If I try to implement this using recursive descent I seem to have a problem when trying to parse abbbc, using left-derivation eg
S => aSc
aSc => aRc
at this point I would have thought that recursive descent would pick the first option in the second production because the next token is b leading to:
aRc => abc
and we're finished since there are no more non-terminals, which isn't of course abbbc. The only way to show that abbbc is valid is to pick the second option but with one lookahead I assume it would always pick b. I don't think the grammar is ambiguous unless I missed something. So what I am doing wrong?
Update: I came across this nice derivation app at https://web.stanford.edu/class/archive/cs/cs103/cs103.1156/tools/cfg/. I used to do a sanity check that abbbc is a valid sentence and it is.
Thinking more about this problem, is it true to say that I can't use LL(1) to parse this grammar but in fact need LL(2)? With two lookaheads I could correctly pick the second option in the second production because I now also know there are more tokens to be read and therefore picking b would prematurely terminate the derivation.
For starters, I’m glad you’re finding our CFG tool useful! A few of my TAs made that a while back and we’ve gotten a lot of mileage out of it.
Your grammar is indeed ambiguous. This stems from your R nonterminal:
R → b | RbR
Generally speaking, if you have recursive production rules with two copies of the same nonterminal in it, it will lead to ambiguities because there will be multiple options for how to apply the rule twice. For example, in this case, you can derive bbbbb by first expanding R to RbR, then either
expanding the left R to RbR and converting each R to a b, or
expanding the right R to RbR and converting each R to a b.
Because this grammar is ambiguous, it isn’t going to be LL(k) for any choice of k because all LL(k) grammars must be unambiguous. That means that stepping up the power of your parser won’t help here. You’ll need to rewrite the grammar to not be ambiguous.
The nonterminal R that you’ve described here generates strings of odd numbers of b’s in them, so we could try redesigning R to achieve this more directly. An initial try might be something like this:
R → b | bbR
This, unfortunately, isn’t LL(1), since after seeing a single b it’s unclear whether you’re supposed to apply the first production rule or the second. However, it is LL(2).
If you’d like an LL(1) grammar, you could do something like this:
R → bX
X → bbX | ε
This works by laying down a single b, then laying down as many optional pairs of b’s as you’d like.

How LL(1) parser handle Right Associative grammar

I am trying to find how LL(1) parser handle right associative grammar. For example in case of left associative grammar like this E->+TE' first() and follow() works smoothly and parsing table generated easily. But, in case of right-recursive grammar, for example, in case of power like E->T^E/T parsing table isn't generating properly. I am searching for resources but found every example avoiding right associativity like powers.
LL algorithms handle right-recursion with no problem whatsoever. In fact, the transformation you mention turns a left-associative grammar into a right-associative one, and left-associativity needs to restored by transforming the syntax tree in a semantic rule. So if the production is really right-associative, you can use the same grammar without the need for post- processing the tree.
The problem with E -> T ^ E | T is not that it is right recursive. The problem is that the two right-hand sides start with the same non-terminal, making prediction impossible. The solution is left-factoring, which will produce E -> T E' / E' -> ε | ^ T E'.

why top down parser cannot handle left recursion?

I wanted to know why top down parsers cannot handle left recursion and we need to eliminate left recursion due to this as mentioned in dragon book..
Think of what it's doing. Suppose we have a left-recursive production rule A -> Aa | b, and right now we try to match that rule. So we're checking whether we can match an A here, but in order to do that, we must first check whether we can match an A here. That sounds impossible, and it mostly is. Using a recursive-descent parser, that obviously represents an infinite recursion.
It is possible using more advanced techniques that are still top-down, for example see [1] or [2].
[1]: Richard A. Frost and Rahmatullah Hafiz. A new top-down parsing algorithm to accommodate ambiguity and left recursion in polynomial time. SIGPLAN Notices, 41(5):46–54, 2006.
[2]: R. Frost, R. Hafiz, and P. Callaghan, Modular and efficient top-down
parsing for ambiguous left-recursive grammars. ACL-IWPT, pp. 109 –
120, 2007.
Top-down parsers cannot handle left recursion
A top-down parser cannot handle left recursive productions. To understand why not, let's take a very simple left-recursive grammar.
S → a
S → S a
There is only one token, a, and only one nonterminal, S. So the parsing table has just one entry. Both productions must go into that one table entry.
The problem is that, on lookahead a, the parser cannot know if another a comes after the lookahead. But the decision of which production to use depends on that information.

When is an ambiguous grammar or production rule OK? (bison shift/reduce warnings)

There are certainly plenty of docs and howtos on resolving shift/reduce errors. The bison docs suggest the correct solution is usually to just %expect them and deal with it.
When you have things like this:
S: S 'b' S | 't'
You can easily resolve them like this:
S: S 'b' T | T
T: 't'
My question is: Is it better to leave the grammar a touch ambiguous and %expect shift/reduce problems or is it better to try to adjust the grammar to avoid them? I suspect there is a balance and it's based on the needs of the author, but I don't really know.
As I read it, Your question is "When is an ambiguous grammar or production rule OK?"
First consider the language you are describing. What would be the implication of allowing an ambiguous production rule into the language.
Your example describes a language which might include an expression like: t b t b t b t
The expression, resolved as in your second example would be (((( t ) b t) b t ) b t ) but in an ambiguous grammer it could also become ( t b ( t b ( t b ( t)))) or even ( t b t ) b ( t b t ). Which could be valid might depend on the language. If the b operator models subtraction, it really shouldn't be ambiguous, but if it was addition, it might be ok. This really depends on the language.
The second question to consider is what the resulting grammar source file ends up looking like, after the conflicts are resolved. As with other source code, a grammar is meant to be read by humans, and secondarily also by computers. Prefer a notation that gives a clearer explanation of what the parser is trying to do from the grammar. That is, if the parser is executing some possibly undefined behavior, for example, order of evaluation of a function's arguments in an eager language, make the grammar look ambiguous.
You can guide the conflict resolution with operator precedence. Declare 'b' as an left- or right-associative operator and you have covered at least that case.
For more complex patterns, as long as the final parser produces the correct result in all cases, the warnings isn't much to worry about. Though if you can't get it to give the correct result using declarations you would have to rewrite the grammar.
In my compiler course last semester we used bison, and built a compiler for a subset of pascal.
If the language is complex enough, you will have some errors. As long as you understand why they are there, and what you'd have to do to remove them, we found it to be alright. If something was there, but due to the behaviour would work as we wanted it to, and would require much to much thought and work to make it worth while (and also complicating the grammar), we left it alone. Just make sure you fully understand the error, and document it somewhere (even for yourself), so that you always know what's going on with it.
It's a cost/benefit analysis once things get really involved, but IMHO, fixing it should be considered FIRST, then actually figure out what the work would be (and if that work breaks something else, or makes something else harder), and go from there. Never pass them off as commonplace.
When I need to prove that a grammar is unambiguous, I tend to write it first as a Parsing Expression Grammar, and then convert it by hand to whatever grammar type the tool set I'm using for the project needs. In my experience, the need for this level of proof is very rare, though, since most shift/reduce conflicts I have come across have been fairly trivial ones to show the correctness of (on the order of your example).

Resources