Converting ambiguous to unambigous grammar for arithmetic expressions - parsing

I'm attempting to come up with a non-ambiguous grammar for arithmetic expressions to make an Earley parser faster but I seem to be having trouble.
This is the given ambiguous grammar
S -> E | S,S
E -> E+E | E-E | E*E | (E) | -E | V
V -> a | b | c
this is my attempt at making it unambiguous
S -> S+E | S-E | E | (S+E) | (S-E) | (E)
E -> E*T | E
T -> -V | V
V -> a | b | c
It parses everything fine but there isn't any significant speedup as compared to using the ambiguous one.

Related

Cannot get LL(1) form of grammar for recursive descent parser

I have grammar:
S -> bU | ad | d
U -> Ufab | VSc | bS
V -> fad | f | Ua
To contruct recursive descent parser I need LL(1) form.
Best I got is:
S -> bU | ad | d
U -> fY | bSX
Y -> adScX | ScX
X -> fabX | aScX | ε
Removed left recursions and done some left factoring but I am stuck.
Tried for several hours but I cannot get it...
E.g. valid string are:
bbdfabadc
bbdfabfabfabfab
bfadadcfabfab
bbadaadc
bfbbdfabc
Obviously my grammar form is ambiguous for some so I cannot make recursive descent parser...
From answer:
S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adA | bUc | dc
Z -> ε | XZ
A -> Sc | c
Still not LL(1). First and follow for Z are not disjoint.
Generally to make a grammar LL(1) you'll need to repeatedly left factor and remove left recursion until you've managed to get rid of all the non-LL things. Which you do first depends on the grammar, but in this case you'll want to start with left factoring
To left factor the rule
U -> Ufab | VSc | bS
you need to first substitute V giving
U -> Ufab | fadSc | fSc | UaSc | bS
which you then left factor into
U -> UX | fY | bS
X -> fab | aSc
Y -> adSc | Sc
now U is simple enough that you can eliminate the left recursion directly:
U -> fYZ | bSZ
Z -> ε | XZ
giving you
S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adSc | Sc
Z -> ε | XZ
Now you still have a left factoring problem with Y so you need to substitute S:
Y -> adSc | bUc | adc | dc
which you left factor to
Y -> adA | bUc | dc
A -> Sc | c
giving an almost LL(1) grammar:
S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adA | bUc | dc
Z -> ε | XZ
A -> Sc | c
but now things are stuck as the epsilon rule for Z means we need FIRST(X) and FOLLOW(Z) to be disjoint (in order to decide between the two Z rules). This is generally indicative of a non-LL language, as there's some trailing context that could be associated with more than one rule (via the S -> bU -> bbSZ -> bbbUZ -> bbbbSZZ exapansion chain -- trailing Zs can be recognized but either might be empty). Often times you can still recognize this language with a simple recursive-descent parser (or LL-style state table) by simply resolving the Z ambiguity/conflict in favor of the non-epsilon rule.

Designing a DFA

I want to design a DFA for the following language after fixing ambiguity.
I thought and tried a lot but couldn't get a proper answer.
S->aA|aB|lambda
A->aA|aS
B->bB|aB|b
I recommend first getting an NFA by considering this to be a regular grammar; then, determinize the NFA, and then we can write down a new grammar that's equivalent to this one but unambiguous (for the same reason the determinized automaton is deterministic). Writing down the NFA for this grammar is easy: productions of the form X -> sY translate into transitions from state X to state Y on input s. Similarly, transitions of the form X -> lambda mean X is an accepting state, and transitions of the form X -> b imply a new accepting state that transitions to a dead state.
We need states for each nonterminal symbol S, A and B; and we will have transitions for every production. Our NFA looks like this:
/---a----\
| |
V |
----->(S)--a-->(A)<--\
| | |
a \--a-/ /--a,b--\
| | |
V V |
/--->(B)--b-->(X)-a,b->(Y)<-----/
| |
\-a,b-/
Here, states (S) and (X) are accepting, state (Y) is a dead state (we didn't really need to depict this explicitly, but bear with me) and this automaton is totally equivalent to the grammar. Now, we need to determinize this. States of the determinized automaton will correspond to subsets of states from the nondeterministic version. Our first deterministic state will correspond to the set containing just (S), and we will figure out the other required subsets (of which we can have at most 32, since we have 5 states and 2 to the power of 5 is 32) using the transitions:
Q s Q'
{(S)} a {(A),(B)}
{(S)} b empty
{(A),(B)} a {(A),(B),(S)}
{(A),(B)} b {(B),(X)}
{(A),(B),(S)} a {(A),(B),(S)}
{(A),(B),(S)} b {(B),(X)}
{(B),(X)} a {(B),(Y)}
{(B),(X)} b {(B),(X),(Y)}
{(B),(Y)} a {(B),(Y)}
{(B),(Y)} b {(B),(X),(Y)}
{(B),(X),(Y)} a {(B),(Y)}
{(B),(X),(Y)} b {(B),(X),(Y)}
We encountered six states, plus a dead state (empty) which we can name q1 through q6, plus qD. All of the states corresponding to subsets with either (S) or (X) in them are accepting, and (S) is the initial state. Our DFA looks like this:
/-a,b-\
| |
V |
----->(q1)--b-->(qD)----/
|
a /--a--\
| | |
V V |
(q2)--a-->(q3)----/
| |
b |
| b
V |
/--(q4)<------/ /--b--\
| | | |
| \------b------(q6)<---+
a /--a----\ | |
| | | | |
\-->(q5)<-----+--a-/ |
| |
\---------b---------/
Finally, we can read off the unambiguous regular grammar from our DFA:
(q1) -> a(q2) | b(qD) | lambda
(qD) -> a(qD) | b(qD)
(q2) -> a(q3) | b(q4)
(q3) -> a(q3) | b(q4) | lambda
(q4) -> a(q5) | b(q6) | lambda
(q5) -> a(q5) | b(q6)
(q6) -> a(q5) | b(q6) | lambda

Can this sentence be parsed using this grammar?

Here is a selection of a lexicon and a grammar
Noun -> stench | breeze | wumpus | pits ...
Verb -> is | feels | smells | smell | see | stinks ...
Adjective -> right | dead | smelly | breezy ...
Adverb -> here | ahead | nearby ...
Pronoun -> me | you | I | it ...
RelPro -> that | which | who | whom ...
Name -> John | Mary | Boston ...
Article -> the | a | an | every ...
Prep -> to | in | on | near ...
Conj -> and | or | but | yet ...
Digit -> 0 | 1 | 2 | 3 | 4 ...
Grammar rules is below:
S -> NP VP | S Conj S
NP -> Pronoun | Namae | Noun | Article Noun | Article
Adjs Noun | Digit Digit | NP PP | NP RelClause
VP -> Verb | VP NP | VP Adjective | VP PP | VP Adverb
Adjs -> Adjective | Adjective Adjs
PP -> Prep NP
RelClause -> RelPro VP
The Question: Is this sentence "Mary smells the wumpus in the pit that stinks." generated by this grammar?
My Answer: No, because "pit" was not defined in the grammar.
Question from me to experts at nlp: Is my logic and understanding of parse trees correct in and my given answer correct? the reason it could not be generated because "pit" was not defined in the grammar. Note: I am able to create a parse tree and draw it, if i change the sentence to "pits".

Step by step elimination of this indirect left recursion

I've seen this algorithm one should be able to use to remove all left recursion.
Yet I'm running into problems with this particular grammar:
A -> Cd
B -> Ce
C -> A | B | f
Whatever I try I end up in loops or with a grammar that is still indirect left recursive.
What are the steps to properly implement this algorithm on this grammar?
Rule is that you first establish some kind of order for non-terminals, and then find all paths where indirect recursion happens.
In this case order would be A < B < C, and possible paths for recursion of non-terminal C would be
C=> A => Cd
and
C=> B => Ce
so new rules for C would be
C=> Cd | Ce | f
now you can simply just remove direct left recursion:
C=> fC'
C'=> dC' | eC' | eps
and the resulting non-recursive grammar would be:
A => Cd
B => Ce
C => fC'
C' => dC' | eC' | eps
Figured it out already.
My confusion was that in this order, the algorithm seemed to do nothing, so I figured that must be wrong, and started replacing A -> Cd in the first iteration (ignoring j cannot go beyond i) getting into infinite loops.
1) By reordering the rules:
C -> A | B | f
A -> Cd
B -> Ce
2) replace C in A -> Cd
C -> A | B | f
A -> Ad | Bd | fd
B -> Ce
3) B not yet in range of j, so leave that and replace direct left recursion of A
C -> A | B | f
A -> BdA' | fdA'
A'-> dA' | epsylon
B -> Ce
4) replace C in B -> Ce
C -> A | B | f
A -> BdA' | fdA'
A'-> dA' | epsylon
B -> Ae | Be | fe
5) not done yet! also need to replace the new rule B -> Ae (production of A is in range of j)
C -> A | B | f
A -> BdA' | fdA'
A'-> dA' | epsylon
B -> BdA'e | fdA'e | Be | fe
6) replace direct left recursion in productions of B
C -> A | B | f
A -> BdA' | fdA'
A'-> dA' | epsylon
B -> fdA'eB' | feB'
B'-> dA'eB' | eB' | epsylon
woohoo! left-recursion free grammar!

Left recursion elimination

I have this grammar
S->S+S|SS|(S)|S*|a
I want to know how to eliminate the left recursion from this grammar because the S+S is really confusing...
Let's see if we can simplify the given grammar.
S -> S*|S+S|SS|(S)|a
We can write it as;
S -> S*|SQ|SS|B|a
Q -> +S
B -> (S)
Now, you can eliminate left recursion in familiar territory.
S -> BS'|aS'
S' -> *S'|QS'|SS'|e
Q -> +S
B -> (S)
Note that e is epsilon/lambda.
We have removed the left recursion, so we no longer have need of Q and B.
S -> (S)S'|aS'
S' -> *S'|+SS'|SS'|e
You'll find this useful when dealing with left recursion elimination.
My answer using theory from this reference
How to Eliminate Left recursion in Context-Free-Grammar.
S --> S+S | SS | S* | a | (S)
-------------- -------
Sα form β form
Left-Recursive-Rules Non-Left-Recursive-Rules
We can write like
S ---> Sα1 | Sα2 | Sα3 | β1 | β2
Rules to convert in equivalent Non-recursive grammar:
S ---> β1 | β2
Z ---> α1 |
α2 | α3
Z ---> α1Z |
α2Z | α3Z
S ---> β1Z | β2Z
Where
α1 = +S
α2 = S
α3 = *
And β-productions not start starts with S:
β1 = a
β2 = (S)
Grammar without left-recursion:
Non- left recursive Productions S --> βn
S --> a | (S)
Introduce new variable Z with following productions: Z ---> αn and Z --> αnZ
Z --> +S | S | *
and
Z --> +SZ | SZ | *Z
And new S productions: S --> βnZ
S --> aZ | (S)Z
Second form (answer)
Productions Z --> +S | S | * and Z --> +SZ | SZ | *Z can be combine as Z --> +SZ | SZ | *Z| ^ where ^ is null-symbol.
Z --> ^ use to remove Z from production rules.
So second answer:
S --> aZ | (S)Z and Z --> +SZ | SZ | *Z| ^

Resources