How to remove left-recursion in the following grammar? - parsing

Unfortunately, it is not possible for ANTLR to support direct-left recursion when the rule has parameters passed. The only viable option is to remove the left recursion. Is there a way to remove the left-recursion in the following grammar ?
a[int x]
: b a[$x] c
| a[$x - 1]
(
c a[$x - 1]
| b c
)
;
The problem is in the second alternative involving left recursion. Any kind of help would be much appreciated.

Without the parameters and easier formatting, it would look like this:
a
: b a c
| a (c a | b c)
;
When a's left recursive alternative is matched n times, it would just mean that (c a | b c) will be matched n times, pre-pended with the terminating b a c (the first alternative). That means that this rule will always start with b a c, followed by zero or more occurrences of (c a | b c):
a
: b a c (c a | b c)*
;

Related

parsing and semantic analysis using CUP - Access parser stack

I have a rule in my grammar such as
A -> B C D E {: ...some actions... :}
;
D -> /*empty*/ {: some actions using attributes of B and C :}
;
To implement the actions associated with production rule of D, I need to access the parser stack. How can I do that in CUP?
Rewrite your grammar:
A -> A1 E
A1 -> B C D
If the action for the first production requires B and C as well, then the semantic value of A1 will have to be more complicated in order to pass the semantic values through.

Is it possible to transform this grammar to be LR(1)?

The following grammar generates the sentences a, a, a, b, b, b, ..., h, b. Unfortunately it is not LR(1) so cannot be used with tools such as "yacc".
S -> a comma a.
S -> C comma b.
C -> a | b | c | d | e | f | g | h.
Is it possible to transform this grammar to be LR(1) (or even LALR(1), LL(k) or LL(1)) without the need to expand the nonterminal C and thus significantly increase the number of productions?
Not as long as you have the nonterminal C unchanged preceding comma in some rule.
In that case it is clear that a parser cannot decide, having seen an "a", and having lookahead "comma", whether to reduce or shift. So with C unchanged, this grammar is not LR(1), as you have said.
But the solution lies in the two phrases, "having seen an 'a'" and "C unchanged". You asked if there's fix that doesn't expand C. There isn't, but you could expand C "a little bit" by removing "a" from C, since that's the source of the problem:
S -> a comma a .
S -> a comma b .
S -> C comma b .
C -> b | c | d | e | f | g | h .
So, we did not "significantly" increase the number of productions.

Transform a grammar G into LL(1)

I have the following grammar and I need to convert it to LL(1) grammar
G = (N; T; P; S) N = {S,A,B,C} T = {a, b, c, d}
P = {
S -> CbSb | adB | bc
A -> BdA | b
B -> aCd | ë
C -> Cca | bA | a
}
The point is that I know how to convert when its just a production, but I can't find any clear method of solving this on the internet.
Thanks in advance!
Remove left recursion, direct and indirect.
Build an LA(k) table. If there's no ambiguity, the grammar (and the language) is LL(k).
The obvious left recursion in the grammar is:
S ==> C... ==> C...

Eliminating Left Recursion

So I have some grammar that doesn't work for a top-down parser due to it having left recursion:
L::= A a | B b
A::= L b | a a
B::= b B b | b a
So in order to fix this, I have to remove the left recursion. To do this, I do a little substitute-like-thing:
L::= A a | B b
A::= A a b | B b b | a a (I plugged in the possible values of "L")
A then turns to (I believe):
A::= a a A' | B b b
A'::= a b A' | ε
I'm fairly certain that I'm correct up to there (wouldn't be surprised if I'm not, though). Where I'm struggling is now removing the left recursion out of "B b b". I've tried going about this so many ways, and I don't think any of them work. Here's one that seems most logical, but ugly as well (thus saying it's probably wrong). Starting by manipulating B::= b B b | b a
B::= b B b | b a
B::= b B' (they both start with b, so maybe i can pull it out?)
B'::= B b | a
B'::= b B' b | a (substituted B's value in)
B'::= b B" | a
B"::= b B" |a B" | ε
So I guess to show what the finalized B's would be:
B::= b B'
B'::= b B" | a
B"::= b B" | a B" | ε
This seems way too ugly to be correct. Especially since I'd have to plug that into the new "A" terminals that I created.
Can someone help me out? No idea if I'm going about this the right way. I'm supposed to be able to create an LL(1) parse table afterward (should be able to do that part on my own).
Thanks.
In a parser that tries to expand nonterminals from the left, if some nonterminal can expand to a string with itself on the left, the parser may expand that nonterminal forever without actually parsing anything. This is left recursion. On the other hand, it is perfectly fine if a nonterminal expands to a string with some different nonterminal on the left, as long as no chain of expansions produces a string with the original nonterminal on the left. Similarly, it is fine if nonterminals aren't all the way to the right in the expansion, as long as they're not all the way to the left.
tl;dr There's nothing wrong with B b b or b B b. You've removed all the left recursion. You don't need to keep going.

ANTLR rewrite rules in grammar file

I have a rule that looks like this:
a : (b | c) d;
b : 'B';
c : 'C';
d : 'D';
With this grammar ANTLR builds a flat parse tree. How can I rewrite the first rule (and leave the other two unchanged) so that whatever is matched is being returned under a root node called A?
If the first production rule was like this:
a : b d;
then it could have been rewritten as
a : b d -> ^(A b d)
and it would have solved my problem. However the first grammar rule yields more than one possibility for the resulting parse tree ^(A b d) or ^(A c d).
How do I express this when rewriting the rule?
You can use the ? operator in the rewrite as follows.
a : (b | c) d -> ^(A b? c? d);

Resources