I have derived the following grammar:
S -> a | aT
T -> b | bR
R -> cb | cbR
I understand that in order for a grammar to be LL(1) it has to be non-ambiguous and right-recursive. The problem is that I do not fully understand the concept of left-recursive and right-recursive grammars. I do not know whether or not the following grammar is right recursive. I would really appreciate a simple explanation of the concept of left-recursive and right-recursive grammars, and if my grammar is LL(1).
Many thanks.
This grammar is not LL(1). In an LL(1) parser, it should always be possible to determine which production to use next based on the current nonterminal symbol and the next token of the input.
Let's look at this production, for example:
S → a | aT
Now, suppose that I told you that the current nonterminal symbol is S and the next symbol of input was an a. Could you determine which production to use? Unfortunately, without more context, you couldn't do so: perhaps you're suppose to use S → a, and perhaps you're supposed to use S → aT. Using similar reasoning, you can see that all the other productions have similar problems.
This doesn't have anything to do with left or right recursion, but rather the fact that no two productions for the same nonterminal in an LL(1) grammar can have a nonempty common prefix. In fact, a simple heuristic for checking if a grammar is not LL(1) is to see if you can find two production rules like this.
Hope this helps!
The grammar has only a single recursive rule: the last one where R is the symbol on the left, and also appears on the right. It is right-recursive because in the grammar rule, R is the rightmost symbol. The rule refers to R, and that reference is rightmost.
The language is LL(1). How we know this is that we can easily construct a recursive descent parser that uses no backtracking and at most one token of lookahead.
But such a parser would be based on a slightly modified version of the grammar.
For instance the two productions: S -> a and S -> a T could be merged into a single one that can be expressed by the EBNF S -> a [ T ]. (S derives a, followed by optional T). This rule can be handled by a single parsing function for recognizing S.
The function matches a and then looks for the optional T, which would be indicated by the next input symbol being b.
We can write an LL(1) grammar for this, along these lines:
S -> a T_opt
T_opt -> b R_opt
T_opt -> <empty>
... et cetera
The optionality of T is handled explicitly, by making T (which we rename to T_opt) capable of deriving the empty string, and then condensing to a single rule for S, so that we don't have two phrases that both start with a.
So in summary, the language is LL(1), but the given grammar for it isn't. Since the language is LL(1) it is possible to find another grammar which is LL(1), and that grammar is not far off from the given one.
Related
At many places (for example in this answer here), I have seen it is written that an LR(0) grammar cannot contain ε productions.
Also in Wikipedia I have seen statements like: An ε free LL(1) grammar is also SLR(1).
Now the problem which I am facing is that I cannot reason out the logic behind these statements.
Well, I know that LR(0) grammars accept the languages accepted by a DPDA by empty stack, i.e. the language they accept must have prefix property. [This prefix property can, however, be dealt with if we assume end markers and as such given any language the prefix property shall always be satisfied. Many texts like Theory of Computation by Sipser assume this end marker to simply their argument]. That being said, we can say (informally?) that a grammar is LR(0) if there is no state in the canonical collection of LR(0) items that have a shift-reduce conflict or reduce-reduce conflict.
With this background, I tried to consider the following grammar:
S -> Aa
A -> ε
canonical collection of LR(0) items
In the above DFA, I find that there is no state which has a shift-reduce conflict or reduce-reduce conflict.
So this grammar should be LR(0) as per my analysis. But it also has ε production.
Isn't this example contradicting the statement:
"no grammar with ε productions can be LR(0)"
I guess if I know the logic behind the above quoted statement then I can understand the concept better.
Actually my main problem arose with the statement :
An ε free LL(1) grammar is also SLR(1).
When I asked one of my friends, he gave the argument that as the LL(1) grammar is ε free hence it is LR(0) and hence it is SLR(1).
But I could not understand his logic either. When I asked him about reasoning, he started sharing post regarding "grammar with ε productions can never be LR(0)"...
But personally I could not think of any logic as to how "ε free LL(1) grammar is SLR(1)". Is it really related to the above property of "grammar with ε productions cannot be LR(0)"? If so, please do help me out.. If not, then should I consider asking a separate question for the second confusion?
I have got my concepts of compiler design from the dragon book by Ullman only. Also the knowledge of TOC from Ullman and from few other texts like Sipser, Linz.
A notable feature of your grammar is that A could just be eliminated. It serves absolutely no purpose. (By "eliminated", I mean simply removing all references to it; leaving productions otherwise intact.)
It is true that it's existence doesn't preclude the grammar from being LR(0). Similarly, a grammar with an unreachable non-terminal and an ε-production for that non-terminal could also be LR(0).
So it would be more accurate to say that a grammar cannot be LR(0) if it has a productive non-terminal with both an ε-production and some other productive production. But since we usually only consider reduced grammars without pointless non-terminals, I'm not sure that this additional pedantry serves much purpose.
As for your question about ε-free LL(1) grammars, here's a rough outline:
If an ε-free grammar is not LR(0), then there is some state with both a shift and a reduce action. Since the grammar is ε-free, that state was reached by way of a shift or a goto. The previous state must then have had two different productions with the same FIRST set, contradicting the LL(1) condition.
Is there an easy way to tell whether a simple grammar is suitable for recursive descent? Is eliminating left recursion and left factoring the grammar enough to achieve this ?
Not necessarily.
To build a recursive descent parser (without backtracking), you need to eliminate or resolve all predict conflicts. So one definitive test is to see if the grammar is LL(1); LL(1) grammars have no predict conflicts, by definition. Left-factoring and left-recursion elimination are necessary for this task, but they might not be sufficient, since a predict conflict might be hiding behind two competing non-terminals:
list ::= item list'
list' ::= ε
| ';' item list'
item ::= expr1
| expr2
expr1 ::= ID '+' ID
expr2 ::= ID '(' list ')
The problem with the above (or, at least, one problem) is that when the parser expects an item and sees an ID, it can't know which of expr1 and expr2 to try. (That's a predict conflict: Both non-terminals could be predicted.) In this particular case, it's pretty easy to see how to eliminate that conflict, but it's not really left-factoring since it starts by combining two non-terminals. (And in the full grammar this might be excerpted from, combining the two non-terminals might be much more difficult.)
In the general case, there is no algorithm which can turn an arbitrary grammar into an LL(1) grammar, or even to be able to say whether the language recognised by that grammar has an LL(1) grammar as well. (However, it's easy to tell whether the grammar itself is LL(1).) So there's always going to be some art and/or experimentation involved.
I think it's worth adding that you don't really need to eliminate left-recursion in a practical recursive descent parser, since you can usually turn it into a while-loop instead of recursion. For example, leaving aside the question of the two expr types above, the original grammar in an extended BNF with repetition operators might be something like
list ::= item (';' item)*
Which translates into something like:
def parse_list():
parse_item()
while peek(';'):
match(';')
parse_item()
(Error checking and AST building omitted.)
I am trying to find how LL(1) parser handle right associative grammar. For example in case of left associative grammar like this E->+TE' first() and follow() works smoothly and parsing table generated easily. But, in case of right-recursive grammar, for example, in case of power like E->T^E/T parsing table isn't generating properly. I am searching for resources but found every example avoiding right associativity like powers.
LL algorithms handle right-recursion with no problem whatsoever. In fact, the transformation you mention turns a left-associative grammar into a right-associative one, and left-associativity needs to restored by transforming the syntax tree in a semantic rule. So if the production is really right-associative, you can use the same grammar without the need for post- processing the tree.
The problem with E -> T ^ E | T is not that it is right recursive. The problem is that the two right-hand sides start with the same non-terminal, making prediction impossible. The solution is left-factoring, which will produce E -> T E' / E' -> ε | ^ T E'.
For example:
R → R bar R|RR|R star|(R)|a|b
construct an equivalent unambiguous grammar:
R → S|RbarS S→T|ST
T → U|Tstar U→a|b|(R)
How about Eliminate left-recursion for R → R bar R|RR|R star|(R)|a|b?
What's the different between Eliminate left-recursion and construct an equivalent unambiguous grammar?
An unambiguous grammar is one where for each string in the language, there is exactly one way to derive it from the grammar. In the context of compiler construction the problem with ambiguous grammar is that it's not obvious from the grammar what the parse tree for a given input string should be. Some tools solve this using their rules for resolving ambiguities while other simply require the grammar to be unambiguous.
A left-recursive grammar is one where the derivation for a given non-terminal can produce that same non-terminal again without first producing a terminal. This leads to infinite loops in recursive-descent-style parsers, but is no problems for shift-reduce parsers.
Note that an unambiguous grammar can still be left-recursive and a grammar without left recursion can still be ambiguous. Also note that depending on your tools, you may need to only remove ambiguity, but not left-recursion, or you may need to remove left-recursion, but not ambiguity (though an unambiguous grammar is generally preferable).
So the difference is that eliminating left recursion and ambiguity solve different problems and are necessary in different situation.
Is it possible to construct an LR(0) parser that could parse a language with both prefix and postfix operators? For example, if I had a grammar with the + (addition) and ! (factorial) operators with the usual precedence then 1+3! should be 1 + 3! = 1 + 6 = 7, but surely if the parser were LR(0) then when it had 1+3 on the stack it would reduce rather than shift?
Also, do right associative operators pose a problem? For example, 2^3^4 should be 2^(3^4) but again, when the parser have 2^3 on the stack how would it know to reduce or shift?
If this isn't possible is there still a way to use an LR(0) parser, possibly by altering the grammar to add brackets in the appropriate places?
LR(0) parsers have a weakness in that they can only parse prefix-free languages, languages where no string in the language is a prefix of any other. This generally makes it a bit tricky to parse expressions like these, since something like 5 is a prefix of 5!. This also explains why it's hard to get right-associative operators - given a production like
S → F | F ^ S
the parser will have a shift/reduce conflict after seeing an F because it can't tell whether to extend it or to reduce again. This is related to the prefix-free property mentioned earlier.
This weakness of LR(0) is one of the reasons why people don't use it much in practice. SLR(1) and LALR(1) parsers can usually parse these grammars because they have a token of lookahead that lets them decide whether to shift or reduce. In the above case, the parsers wouldn't encounter shift/reduce conflicts because when deciding whether to reduce an F or shift a ^, they can see to shift the ^ because there's no correct string where a ^ should appear after an S.