How to determine if the language is LL(1)? - parsing

P → PL | L
L → N; | M; | C
N → print E
M → print "W"
W → TW | ε
C → if E {P} | if E {P} else {P}
E → (EOE) | V (note: this has a variable O)
O → + | - | *
V → 0 | 1 | 2 | 3 (note: this has a terminal 0 (zero))
T → a | b | c | d
For the above grammar G, is it not LL(1) because it evokes FIRST/FIRST conflict when trying to predict the production of P?
I am really struggling on how to prove that it is not LL(1)...
Any help or advice would be very thankful!

A left-recursive grammar cannot be LL(k) for any k. P → P L is left-recursive.
In addition, L has two productions starting with the same terminal, so it is impossible to choose between them with only one symbol of lookahead.

Related

Writing the production rules of this finite state machine

Consider the following state diagram which accepts the alphabet {0,1} and accepts if the input string has two consecutive 0's or 1's:
01001 --> Accept
101 --> Reject
How would I write the production rules to show this? Is it just:
D -> C0 | B1 | D0 | D1
C -> A0 | B0
B -> A1 | C1
And if so, how would the terminals (0,1) be differentiated from the states (A,B,C) ? And should the state go before or after the input? That is, should it be A1 or 1A for example?
The grammar you suggest has no A: it's not a non-terminal because it has no production rules, and it's not a terminal because it's not present in the input. You could make that work by writing, for example, C → 0 | B 0, but a more general solution is to make A into a non-terminal using an ε-rule: A → ε and then
C → A 0 | B 0.
B0 is misleading, because it looks like a single thing. But it's two grammatical symbols, a non-terminal (B) and a terminal 0.
With those modifications, your grammar is fine. It's a left linear grammar; a right linear grammar can also be constructed from the FSA by considering in-transitions rather than out-transitions. In this version, the epsilon production corresponds to final states rather than initial states.
A → 1 B | 0 C
B → 0 C | 1 D
C → 1 B | 0 D
D → 0 D | 1 D | ε
If it's not obvious why the FSM corresponds to these two grammars, it's probably worth grabbing a pad of paper and constructing a derivation with each grammar for a few sample sentences. Compare the derivations you produce with the progress through the FSM for the same input.

Cannot get LL(1) form of grammar for recursive descent parser

I have grammar:
S -> bU | ad | d
U -> Ufab | VSc | bS
V -> fad | f | Ua
To contruct recursive descent parser I need LL(1) form.
Best I got is:
S -> bU | ad | d
U -> fY | bSX
Y -> adScX | ScX
X -> fabX | aScX | ε
Removed left recursions and done some left factoring but I am stuck.
Tried for several hours but I cannot get it...
E.g. valid string are:
bbdfabadc
bbdfabfabfabfab
bfadadcfabfab
bbadaadc
bfbbdfabc
Obviously my grammar form is ambiguous for some so I cannot make recursive descent parser...
From answer:
S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adA | bUc | dc
Z -> ε | XZ
A -> Sc | c
Still not LL(1). First and follow for Z are not disjoint.
Generally to make a grammar LL(1) you'll need to repeatedly left factor and remove left recursion until you've managed to get rid of all the non-LL things. Which you do first depends on the grammar, but in this case you'll want to start with left factoring
To left factor the rule
U -> Ufab | VSc | bS
you need to first substitute V giving
U -> Ufab | fadSc | fSc | UaSc | bS
which you then left factor into
U -> UX | fY | bS
X -> fab | aSc
Y -> adSc | Sc
now U is simple enough that you can eliminate the left recursion directly:
U -> fYZ | bSZ
Z -> ε | XZ
giving you
S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adSc | Sc
Z -> ε | XZ
Now you still have a left factoring problem with Y so you need to substitute S:
Y -> adSc | bUc | adc | dc
which you left factor to
Y -> adA | bUc | dc
A -> Sc | c
giving an almost LL(1) grammar:
S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adA | bUc | dc
Z -> ε | XZ
A -> Sc | c
but now things are stuck as the epsilon rule for Z means we need FIRST(X) and FOLLOW(Z) to be disjoint (in order to decide between the two Z rules). This is generally indicative of a non-LL language, as there's some trailing context that could be associated with more than one rule (via the S -> bU -> bbSZ -> bbbUZ -> bbbbSZZ exapansion chain -- trailing Zs can be recognized but either might be empty). Often times you can still recognize this language with a simple recursive-descent parser (or LL-style state table) by simply resolving the Z ambiguity/conflict in favor of the non-epsilon rule.

Is this grammar LL(1)

I have been asked to convert:
S → Sa | bSb | bc
to LL(1) so far I have:
S → bY
Y → SbF | cF
F → aF | ε
Is this LL(1)? If not would this be LL(1):
S → bY
Y → bYbF | cF
F → aF | ε
if neither of these would somebody please give me the correct answer and why thanks in advance!
This is what I would do:
S → Sa | bSb | bc
Remove left recursion:
F -> aF | EPSILON```
Now left factor:
F -> aF | EPSILON
X -> SbF | cF```
Check the First and Follows:
S: b
X: b, c
F: a, EPSILON```
```Follows():
S: $, b
X: $, b
F: $, b```
Everything checks out so it is LL(1) parsable.

Step by step elimination of this indirect left recursion

I've seen this algorithm one should be able to use to remove all left recursion.
Yet I'm running into problems with this particular grammar:
A -> Cd
B -> Ce
C -> A | B | f
Whatever I try I end up in loops or with a grammar that is still indirect left recursive.
What are the steps to properly implement this algorithm on this grammar?
Rule is that you first establish some kind of order for non-terminals, and then find all paths where indirect recursion happens.
In this case order would be A < B < C, and possible paths for recursion of non-terminal C would be
C=> A => Cd
and
C=> B => Ce
so new rules for C would be
C=> Cd | Ce | f
now you can simply just remove direct left recursion:
C=> fC'
C'=> dC' | eC' | eps
and the resulting non-recursive grammar would be:
A => Cd
B => Ce
C => fC'
C' => dC' | eC' | eps
Figured it out already.
My confusion was that in this order, the algorithm seemed to do nothing, so I figured that must be wrong, and started replacing A -> Cd in the first iteration (ignoring j cannot go beyond i) getting into infinite loops.
1) By reordering the rules:
C -> A | B | f
A -> Cd
B -> Ce
2) replace C in A -> Cd
C -> A | B | f
A -> Ad | Bd | fd
B -> Ce
3) B not yet in range of j, so leave that and replace direct left recursion of A
C -> A | B | f
A -> BdA' | fdA'
A'-> dA' | epsylon
B -> Ce
4) replace C in B -> Ce
C -> A | B | f
A -> BdA' | fdA'
A'-> dA' | epsylon
B -> Ae | Be | fe
5) not done yet! also need to replace the new rule B -> Ae (production of A is in range of j)
C -> A | B | f
A -> BdA' | fdA'
A'-> dA' | epsylon
B -> BdA'e | fdA'e | Be | fe
6) replace direct left recursion in productions of B
C -> A | B | f
A -> BdA' | fdA'
A'-> dA' | epsylon
B -> fdA'eB' | feB'
B'-> dA'eB' | eB' | epsylon
woohoo! left-recursion free grammar!

How to implement a left recursion eliminator?

How can i implement an eliminator for this?
A := AB |
AC |
D |
E ;
This is an example of so called immediate left recursion, and is removed like this:
A := DA' |
EA' ;
A' := ε |
BA' |
CA' ;
The basic idea is to first note that when parsing an A you will necessarily start with a D or an E. After the D or an E you will either end (tail is ε) or continue (if we're in a AB or AC construction).
The actual algorithm works like this:
For any left-recursive production like this: A -> A a1 | ... | A ak | b1 | b2 | ... | bm replace the production with A -> b1 A' | b2 A' | ... | bm A' and add the production A' -> ε | a1 A' | ... | ak A'.
See Wikipedia: Left Recursion for more information on the elimination algorithm (including elimination of indirect left recursion).
Another form available is:
A := (D | E) (B | C)*
The mechanics of doing it are about the same but some parsers might handle that form better. Also consider what it will take to munge the action rules along with the grammar its self; the other form requires the factoring tool to generate a new type for the A' rule to return where as this form doesn't.

Resources