I have grammar:
S -> bU | ad | d
U -> Ufab | VSc | bS
V -> fad | f | Ua
To contruct recursive descent parser I need LL(1) form.
Best I got is:
S -> bU | ad | d
U -> fY | bSX
Y -> adScX | ScX
X -> fabX | aScX | ε
Removed left recursions and done some left factoring but I am stuck.
Tried for several hours but I cannot get it...
E.g. valid string are:
bbdfabadc
bbdfabfabfabfab
bfadadcfabfab
bbadaadc
bfbbdfabc
Obviously my grammar form is ambiguous for some so I cannot make recursive descent parser...
From answer:
S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adA | bUc | dc
Z -> ε | XZ
A -> Sc | c
Still not LL(1). First and follow for Z are not disjoint.
Generally to make a grammar LL(1) you'll need to repeatedly left factor and remove left recursion until you've managed to get rid of all the non-LL things. Which you do first depends on the grammar, but in this case you'll want to start with left factoring
To left factor the rule
U -> Ufab | VSc | bS
you need to first substitute V giving
U -> Ufab | fadSc | fSc | UaSc | bS
which you then left factor into
U -> UX | fY | bS
X -> fab | aSc
Y -> adSc | Sc
now U is simple enough that you can eliminate the left recursion directly:
U -> fYZ | bSZ
Z -> ε | XZ
giving you
S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adSc | Sc
Z -> ε | XZ
Now you still have a left factoring problem with Y so you need to substitute S:
Y -> adSc | bUc | adc | dc
which you left factor to
Y -> adA | bUc | dc
A -> Sc | c
giving an almost LL(1) grammar:
S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adA | bUc | dc
Z -> ε | XZ
A -> Sc | c
but now things are stuck as the epsilon rule for Z means we need FIRST(X) and FOLLOW(Z) to be disjoint (in order to decide between the two Z rules). This is generally indicative of a non-LL language, as there's some trailing context that could be associated with more than one rule (via the S -> bU -> bbSZ -> bbbUZ -> bbbbSZZ exapansion chain -- trailing Zs can be recognized but either might be empty). Often times you can still recognize this language with a simple recursive-descent parser (or LL-style state table) by simply resolving the Z ambiguity/conflict in favor of the non-epsilon rule.
I want to design a DFA for the following language after fixing ambiguity.
I thought and tried a lot but couldn't get a proper answer.
S->aA|aB|lambda
A->aA|aS
B->bB|aB|b
I recommend first getting an NFA by considering this to be a regular grammar; then, determinize the NFA, and then we can write down a new grammar that's equivalent to this one but unambiguous (for the same reason the determinized automaton is deterministic). Writing down the NFA for this grammar is easy: productions of the form X -> sY translate into transitions from state X to state Y on input s. Similarly, transitions of the form X -> lambda mean X is an accepting state, and transitions of the form X -> b imply a new accepting state that transitions to a dead state.
We need states for each nonterminal symbol S, A and B; and we will have transitions for every production. Our NFA looks like this:
/---a----\
| |
V |
----->(S)--a-->(A)<--\
| | |
a \--a-/ /--a,b--\
| | |
V V |
/--->(B)--b-->(X)-a,b->(Y)<-----/
| |
\-a,b-/
Here, states (S) and (X) are accepting, state (Y) is a dead state (we didn't really need to depict this explicitly, but bear with me) and this automaton is totally equivalent to the grammar. Now, we need to determinize this. States of the determinized automaton will correspond to subsets of states from the nondeterministic version. Our first deterministic state will correspond to the set containing just (S), and we will figure out the other required subsets (of which we can have at most 32, since we have 5 states and 2 to the power of 5 is 32) using the transitions:
Q s Q'
{(S)} a {(A),(B)}
{(S)} b empty
{(A),(B)} a {(A),(B),(S)}
{(A),(B)} b {(B),(X)}
{(A),(B),(S)} a {(A),(B),(S)}
{(A),(B),(S)} b {(B),(X)}
{(B),(X)} a {(B),(Y)}
{(B),(X)} b {(B),(X),(Y)}
{(B),(Y)} a {(B),(Y)}
{(B),(Y)} b {(B),(X),(Y)}
{(B),(X),(Y)} a {(B),(Y)}
{(B),(X),(Y)} b {(B),(X),(Y)}
We encountered six states, plus a dead state (empty) which we can name q1 through q6, plus qD. All of the states corresponding to subsets with either (S) or (X) in them are accepting, and (S) is the initial state. Our DFA looks like this:
/-a,b-\
| |
V |
----->(q1)--b-->(qD)----/
|
a /--a--\
| | |
V V |
(q2)--a-->(q3)----/
| |
b |
| b
V |
/--(q4)<------/ /--b--\
| | | |
| \------b------(q6)<---+
a /--a----\ | |
| | | | |
\-->(q5)<-----+--a-/ |
| |
\---------b---------/
Finally, we can read off the unambiguous regular grammar from our DFA:
(q1) -> a(q2) | b(qD) | lambda
(qD) -> a(qD) | b(qD)
(q2) -> a(q3) | b(q4)
(q3) -> a(q3) | b(q4) | lambda
(q4) -> a(q5) | b(q6) | lambda
(q5) -> a(q5) | b(q6)
(q6) -> a(q5) | b(q6) | lambda
I'm attempting to come up with a non-ambiguous grammar for arithmetic expressions to make an Earley parser faster but I seem to be having trouble.
This is the given ambiguous grammar
S -> E | S,S
E -> E+E | E-E | E*E | (E) | -E | V
V -> a | b | c
this is my attempt at making it unambiguous
S -> S+E | S-E | E | (S+E) | (S-E) | (E)
E -> E*T | E
T -> -V | V
V -> a | b | c
It parses everything fine but there isn't any significant speedup as compared to using the ambiguous one.
I have this grammar
S->S+S|SS|(S)|S*|a
I want to know how to eliminate the left recursion from this grammar because the S+S is really confusing...
Let's see if we can simplify the given grammar.
S -> S*|S+S|SS|(S)|a
We can write it as;
S -> S*|SQ|SS|B|a
Q -> +S
B -> (S)
Now, you can eliminate left recursion in familiar territory.
S -> BS'|aS'
S' -> *S'|QS'|SS'|e
Q -> +S
B -> (S)
Note that e is epsilon/lambda.
We have removed the left recursion, so we no longer have need of Q and B.
S -> (S)S'|aS'
S' -> *S'|+SS'|SS'|e
You'll find this useful when dealing with left recursion elimination.
My answer using theory from this reference
How to Eliminate Left recursion in Context-Free-Grammar.
S --> S+S | SS | S* | a | (S)
-------------- -------
Sα form β form
Left-Recursive-Rules Non-Left-Recursive-Rules
We can write like
S ---> Sα1 | Sα2 | Sα3 | β1 | β2
Rules to convert in equivalent Non-recursive grammar:
S ---> β1 | β2
Z ---> α1 |
α2 | α3
Z ---> α1Z |
α2Z | α3Z
S ---> β1Z | β2Z
Where
α1 = +S
α2 = S
α3 = *
And β-productions not start starts with S:
β1 = a
β2 = (S)
Grammar without left-recursion:
Non- left recursive Productions S --> βn
S --> a | (S)
Introduce new variable Z with following productions: Z ---> αn and Z --> αnZ
Z --> +S | S | *
and
Z --> +SZ | SZ | *Z
And new S productions: S --> βnZ
S --> aZ | (S)Z
Second form (answer)
Productions Z --> +S | S | * and Z --> +SZ | SZ | *Z can be combine as Z --> +SZ | SZ | *Z| ^ where ^ is null-symbol.
Z --> ^ use to remove Z from production rules.
So second answer:
S --> aZ | (S)Z and Z --> +SZ | SZ | *Z| ^
How can i implement an eliminator for this?
A := AB |
AC |
D |
E ;
This is an example of so called immediate left recursion, and is removed like this:
A := DA' |
EA' ;
A' := ε |
BA' |
CA' ;
The basic idea is to first note that when parsing an A you will necessarily start with a D or an E. After the D or an E you will either end (tail is ε) or continue (if we're in a AB or AC construction).
The actual algorithm works like this:
For any left-recursive production like this: A -> A a1 | ... | A ak | b1 | b2 | ... | bm replace the production with A -> b1 A' | b2 A' | ... | bm A' and add the production A' -> ε | a1 A' | ... | ak A'.
See Wikipedia: Left Recursion for more information on the elimination algorithm (including elimination of indirect left recursion).
Another form available is:
A := (D | E) (B | C)*
The mechanics of doing it are about the same but some parsers might handle that form better. Also consider what it will take to munge the action rules along with the grammar its self; the other form requires the factoring tool to generate a new type for the A' rule to return where as this form doesn't.