Problems about LL(1) grammar transformation - parsing

I have some problem in transforming the following non LL(1) grammar into LL(1) grammar. Is it possible to be transformed?
> A ::= B | A ; B
> B ::= C | [ A ]
> C ::= D | C , D
> D ::= x | (C)
where ;, x, (, ), [,] are terminals.

The main problems here are the productions
A → A ; B
and
C → C, D
which are left-recursive. In both cases, these productions will generate a string of objects separated by some kind of delimeter (semicolon in the first case, comma in the second), so you can rewrite them like this:
A → B ; A
C → D, C
This gives the grammar
A → B | B; A
B → C | [A]
C → D | D, C
D → x | (C)
The problem now is that there are productions for A and C that have a common prefix. But that's nothing to worry about: we can left-factor them like this:
A → B H
H → ε | ; A
B → C | [A]
C → D I
I → ε | C
D → x | (C)
I believe that this grammar is now LL(1).

Related

How to make this grammar LL(1)

I want to know if it would be possible to transform this grammar to LL(1). This is the grammar:
A -> B
| C
B -> a
| a ';'
C -> a D
| a D ';'
D -> ';' a
| D ';' a
Since this language is regular ( a; | a(;a)+;? ), then yes, it would be possible.
Not sure if I'm using the right syntax, but the language is basically a; (using A->B) or any string that starts with an a, followed one or more ;a pairs, optionally adding another ; on the end.
This is the same grammar but simpler:
A -> a | a ';' | a ';' A
It still not LL(1). But removing left factor now it is LL(1):
A -> a B B -> ε | ';' C C -> ε | A

Is it possible to transform this grammar to be LR(1)?

The following grammar generates the sentences a, a, a, b, b, b, ..., h, b. Unfortunately it is not LR(1) so cannot be used with tools such as "yacc".
S -> a comma a.
S -> C comma b.
C -> a | b | c | d | e | f | g | h.
Is it possible to transform this grammar to be LR(1) (or even LALR(1), LL(k) or LL(1)) without the need to expand the nonterminal C and thus significantly increase the number of productions?
Not as long as you have the nonterminal C unchanged preceding comma in some rule.
In that case it is clear that a parser cannot decide, having seen an "a", and having lookahead "comma", whether to reduce or shift. So with C unchanged, this grammar is not LR(1), as you have said.
But the solution lies in the two phrases, "having seen an 'a'" and "C unchanged". You asked if there's fix that doesn't expand C. There isn't, but you could expand C "a little bit" by removing "a" from C, since that's the source of the problem:
S -> a comma a .
S -> a comma b .
S -> C comma b .
C -> b | c | d | e | f | g | h .
So, we did not "significantly" increase the number of productions.

Finding FIRST sets in a grammar

Today I am reading how to find First and Follow of a grammar. I saw this grammar:
S → ACB | CbB | Ba
A → da | BC
B → g | ε
C → h | ε
The claim is that
FIRST(S) = FIRST(ABC) U FIRST(CbB) U FIRST(Ba)
= {d, g, h, ε} U {h, b} U {g, a}
= {d, g, h, ε, b, a}
I don't understand how a and b are in this set. Can anyone explain this?
Notice that B and C both are nullable (they can produce ε). This means that from the production
S → CbB
we get that b ∈ FIRST(S), since if we use the production C → ε we can get a production that starts with b.
Similarly, note that
S → Ba
is a production, so we get a ∈ FIRST(S) because we can use the production B → ε to get an a at the front of a string derivable from S.
Hope this helps!

Multiple entries in an LL(1) parsing table?

Given this grammar:
S → S1 S2
S1 → a | ε
S2 → ab | ε
Therefore, we have
FIRST(S1) = { a, ε }
FOLLOW(S1) = { a }
Does that mean that in the parsing table I'll have multiple definitions in the row for S1 and the column for a?
Yes, that's correct. (However, note that your FOLLOW set is wrong; it also contains the end-of-input marker $). The issue here is that if the parser sees an a, it can't tell if that's because it wants to use the derivation
S → S1S2 → a S2
Or the derivation
S → S1S2 → S2 → ab
To fix this, you can note that your grammar only generates the strings { a, ab, aab }. Therefore, you can build an LL(1) for the language grammar that directly produces those three strings:
S → aY
Y → ε | aZ
Z → ε | b
Hope this helps!

Making a Grammar LL(1)

I have the following grammar:
S → a S b S | b S a S | ε
Since I'm trying to write a small compiler for it, I'd like to make it LL(1). I see that there seems to be a FIRST/FOLLOW conflict here, and I know I have to use substitution to resolve it, but I'm not exactly sure how to go about it. Here is my proposed grammar, but I'm not sure if it's correct:
S-> aSbT | epsilon
T-> bFaF| epsilon
F-> epsilon
Can someone help out?
In his original paper on LR parsing, Knuth gives the following grammar for this language, which he conjectures "is the briefest possible unambiguous grammar for this language:"
S → ε | aAbS | bBaS
A → ε | aAbA
B → ε | bBaB
Intuitively, this tries to break up any string of As and Bs into blocks that balance out completely. Some blocks start with a and end with b, while others start with b and end with a.
We can compute FIRST and FOLLOW sets as follows:
FIRST(S) = { ε, a, b }
FIRST(A) = { ε, a }
FIRST(B) = { ε, b }
FOLLOW(S) = { $ }
FOLLOW(A) = { b }
FOLLOW(B) = { a }
Based on this, we get the following LL(1) parse table:
| a | b | $
--+-------+-------+-------
S | aAbS | bBaS | e
A | aAbA | e |
B | e | bBaB |
And so this grammar is not only LR(1), but it's LL(1) as well.
Hope this helps!

Resources