What is the difference between Left Factoring and Left Recursion ? I understand that Left factoring is a predictive top down parsing technique. But I get confused when I hear these two terms.
Left factoring is removing the common left factor that appears in two productions of the same non-terminal. It is done to avoid back-tracing by the parser. Suppose the parser has a look-ahead, consider this example:
A -> qB | qC
where A, B and C are non-terminals and q is a sentence.
In this case, the parser will be confused as to which of the two productions to choose and it might have to back-trace. After left factoring, the grammar is converted to:
A -> qD
D -> B | C
In this case, a parser with a look-ahead will always choose the right production.
Left recursion is a case when the left-most non-terminal in a production of a non-terminal is the non-terminal itself (direct left recursion) or through some other non-terminal definitions, rewrites to the non-terminal again (indirect left recursion).
Consider these examples:
(1) A -> Aq (direct)
(2) A -> Bq
B -> Ar (indirect)
Left recursion has to be removed if the parser performs top-down parsing.
Left Factoring is a grammar transformation technique. It consists in "factoring out" prefixes which are common to two or more productions.
For example, going from:
A → α β | α γ
to:
A → α A'
A' → β | γ
Left Recursion is a property a grammar has whenever you can derive from a given variable (non terminal) a rhs that begins with the same variable, in one or more steps.
For example:
A → A α
or
A → B α
B → A γ
There is a grammar transformation technique called Elimination of left recursion, which provides a method to generate, given a left recursive grammar, another grammar that is equivalent and is not left recursive.
The relationship/confusion between both terms probably derives from the fact that both transformation techniques may need to be applied to a grammar before being able to derive a predictive top down parser for it.
This is the way I've seen the two terms used:
Left recursion: when one or more productions can be reached from themselves with no tokens consumed in-between.
Left factoring: a process of transformation, turning the grammar from a left-recursive form to an equivalent non-left-recursive form.
left factor :
Let the given grammar :
A-->ab1 | ab2 | ab3
1) we can see that, for every production, there is a common prefix & if we choose any production here, it is not confirmed that we will not need to backtrack.
2) it is non deterministic, because we cannot choice any production and be assured that we will reach at our desired string by making the correct parse tree.
but if we rewrite the grammar in a way that is deterministic and also leaves us flexible enough to convert it into any string that is possible without backtracking, it will be:
A --> aA',
A' --> b1 | b2| b3
now if we are asked to make the parse tree for string ab2 and now we don't need back tracking. Because we can always choose the correct production when we get A' thus we will generate the correct parse tree.
Left recursion :
A --> Aa | b
here it is clear that the left child of A will always be A if we choose the first production,this is left recursion .because , A is calling itself over and over again .
the generated string from this grammar is :
ba*
since this cannot be in a grammar ... we eliminate the left recursion by writing :
A --> bA'
A' --> E | aA'
now we will not have left recursion and also we can generate ba* .
Left Recursion:
A grammar is left recursive if it has a nonterminal A such that there is a derivation A -> Aα | β where α and β are sequences of terminals and nonterminals that do not start with A.
While designing a top down-parser, if the left recursion exist in the grammar then the parser falls in an infinite loop, here because A is trying to match A itself, which is not possible.
We can eliminate the above left recursion by rewriting the offending production. As-
A -> βA'
A' -> αA' | epsilon
Left Factoring: Left factoring is required to eliminate non-determinism of a grammar. Suppose a grammar, S -> abS | aSb
Here, S is deriving the same terminal a in the production rule(two alternative choices for S), which follows non-determinism. We can rewrite the production to defer the decision of S as-
S -> aS'
S' -> bS | Sb
Thus, S' can be replaced for bS or Sb
Here is a simple way to differentiate between both terms:
Left Recursion:
When leftmost Element of a production is the Producing element itself (Non Terminal Element).
e.g. A -> Aα / Aβ
Left Factoring:
When leftmost Element of a production (Terminal element) is repeated in the same production.
e.g. A -> αB / αC
Furthermore,
If a Grammar is Left Recursive, it might result into infinite loop hence we need to Eliminate Left Recursion.
If a Grammar is Left Factoring, it confuses the parser hence we need to Remove Left Factoring as well.
left recursion:= when left hand non terminal is same as right hand non terminal.
Example:
A->A&|B where & is alpha.
We can remove left ricursion using rewrite this production as like.
A->BA'
A'->&A'|€
Left factor mean productn should not non deterministic. .
Example:
A->&A|&B|&C
Related
I am confused that can FIRST SET contains same terminal twice..
for example I have grammar
E->T+E|T FIRST(E)={a,a}
T->a FIRST(T)={a}
..
Is this correct? or I should write
FIRST(E)={a}
By definition sets can not contain the same element multiple times - this applies to first sets as much as any other set. So {a} is the proper way to write it.
I guess you're trying to compute the First and Follow sets, to construct the final predictive table, but generally, you need to resolve all the conflicts first, which are:
ε-derivation
Direct Left Recursion
Indirect Left Recursion
Ambiguous prefixes
In your example (Or part of it, I guess), you need to factor out ambiguous prefixes, the T.
E -> T E'
E' -> + E | ε
T -> a
Formally, for any non-terminal with derivation rules of the form A → αβ | αγ
1- Remove these 2 derivation rules
2- Create a rule A′ → β | γ
3- Create a rule A → α A′
Check out this Paper about Conflicts, it was very helpful for me, and you might as well check this slide and this, if you have any problem with top-down parsing.
I have this grammar
E -> E + i
E -> i
The augmented grammar
E' -> E
E -> E + i
E -> i
Now I try to expand the item set 0
I0)
E' -> .E
+E -> .E + i
+E -> .i
Then, since I have .E in I0 I would expand it but then I will get another E rule, and so on, this is my first doubt.
Assuming that this is alright the next item sets are
I0)
E' -> .E
+E -> .E + i
+E -> .i
I1) (I moved the dot from I0, no variables at rhs of dot)
E' -> E.
E -> E. + i
E -> i.
I2) (I moved the dot from I1, no vars at rhs of dot)
E -> E +. i
I3) (I moved the dot from I2, also no vars)
E -> E + i.
Then I will have this DFA
I0 -(E, i)-> I1 -(+)-> I2 -(i)-> I3
| |
+-(∅)-> acpt <-(∅)--+
I'm missing something because E -> E + i must accept i + i + .. but the DFA doesn't goes back to the I0, so it seems wrong to me. My guess is that it should have a I0 to I0 transition, but I then I don't know that to do with the dot.
What you call the "expansion" of the item set is actually a closure; that's how it's described in all the descriptions of the algorithm I've seen (at least in textbooks). Like any closure operation, you just keep on doing the transformation until you reach a fixed-point; once you've included the productions for E, they're included.
But the essential point is that you're not building a DFA. You're building a pushdown automaton, and the DFA is just one part of it. The DFA is used for shift operations; when you shift a new terminal (because the current parse stack is not a handle), you do a state transition according to the DFA. But you also push the current state onto the PDA's stack.
The interesting part is what happens when the automaton decides to perform a reduction, which replaces the right-hand side of a production with its left-hand side non-terminal. (The right-hand side at the top of the stack is called a "handle".) To do the reduction, you unwind the stack, popping each right-hand side symbol (and the corresponding DFA state) until you reach the beginning of the production. What that does is rewind the DFA to the state it was in before it shifted the first symbol from the right-hand side. (Note that it is only at this point that you know for sure which production was used.) With the DFA thus reset, you can now shift the non-terminal which was encountered, do the corresponding DFA transition, and continue with the parse.
The basis for this procedure is the fact that the parser stack is at all times a "viable prefix"; that is, a sequence of symbols which are the prefix of some right sentential form which can be derived from the start symbol. What's interesting about the set of viable prefixes for a context-free grammar is that it is a regular language, and consequently can be recognised by a DFA. The reduction procedure given above precisely represents this recognition procedure when handles are "pruned" (to use Knuth's original vocabulary).
In that sense, it doesn't really matter what procedure is used to determine which handle is to be pruned, as long as it provides a valid answer. You could, for example, fork the parse every time a potential handle is noticed at the top of the stack, and continue in parallel with both forks. With clever stack management, this parallel search can be done in worst-case O(n3) time for any context-free grammar (and this can be reduced if the grammar is not ambiguous). That's a very rough description of Earley parsers.
But in the case of an LR(k) parser, we require that the grammar be unambiguous, and we also require that we can identify a reduction by looking at no more than k more symbols from the input stream, which is an O(1) operation since k is fixed. If at each point in the parse we know whether to reduce or not, and if so which reduction to choose, then the reductions can be implemented as I outlined above. Each reduction can be performed in O(1) time for a fixed grammar (since the maximum size of a right-hand side in a particular grammar is fixed), and since the number of reductions in a parse is linear in the size of the input, the entire parse can be done in linear time.
That was all a bit informal, but I hope it serves as an intuitive explanation. If you're interested in the formal proof, Donald Knuth's original 1965 paper (On the Translation of Languages from Left to Right) is easy to find and highly readable as these things go.
How do you eliminate a left recursion of the following type. I can't seem to be able to apply the general rule on this particular one.
A -> A | a | b
By using the elimination rule you get:
A -> aA' | bA'
A' -> A' | epsilon
Which still has left recursion.
Does this say anything about the grammar being/not being LL(1)?
Thank you.
Notice that the rule
A → A
is, in a sense, entirely useless. It doesn't do anything to a derivation to apply this rule. As a result, we can safely remove it from the grammar without changing what the grammar produces. This leaves
A → a | b
which is LL(1).
Given the following grammar:
S -> S + S | S S | (S) | S* | a
S -> S S + | S S * | a
For the life of me I can't seem to figure out how to compute the FIRST and FOLLOW for the above grammar. The recursive non-terminal of S confuses me. Does that mean I have to factor out the grammar first before computing the FIRST and FOLLOW?
The general rule for computing FIRST sets in CFGs without ε productions is the following:
Initialize FIRST(A) as follows: for each production A → tω, where t is a terminal, add t to FIRST(A).
Repeatedly apply the following until nothing changes: for each production of the form A → Bω, where B is a nonterminal, set FIRST(A) = FIRST(A) ∪ FIRST(B).
We could follow the above rules as written, but there's something interesting here we can notice. Your grammar only has a single nonterminal, so that second rule - which imports elements into the FIRST set of one nonterminal from FIRST sets from another nonterminal - won't actually do anything. In other words, we can compute the FIRST set just by applying that initial rule. And that's not too bad here - we just look at all the productions that start with a terminal and get FIRST(S) = { a, ( }.
What are FIRST and FOLLOW sets? What are they used for in parsing?
Are they used for top-down or bottom-up parsers?
Can anyone explain me FIRST and FOLLOW SETS for the following set of grammar rules:
E := E+T | T
T := T*V | T
V := <id>
They are typically used in LL (top-down) parsers to check if the running parser would encounter any situation where there is more than one way to continue parsing.
If you have the alternative A | B and also have FIRST(A) = {"a"} and FIRST(B) = {"b", "a"} then you would have a FIRST/FIRST conflict because when "a" comes next in the input you wouldn't know whether to expand A or B. (Assuming lookahead is 1).
On the other side if you have a Nonterminal that is nullable like AOpt: ("a")? then you have to make sure that FOLLOW(AOpt) doesn't contain "a" because otherwise you wouldn't know if to expand AOpt or not like here: S: AOpt "a" Either S or AOpt could consume "a" which gives us a FIRST/FOLLOW conflict.
FIRST sets can also be used during the parsing process for performance reasons. If you have a nullable nonterminal NullableNt you can expand it in order to see if it can consume anything, or it may be faster to check if FIRST(NullableNt) contains the next token and if not simply ignore it (backtracking vs predictive parsing). Another performance improvement would be to additionally provide the lexical scanner with the current FIRST set, so the scanner does not try all possible terminals but only those that are currently allowed by the context. This conflicts with reserved terminals but those are not always needed.
Bottom up parsers have different kinds of conflicts namely Reduce/Reduce and Shift/Reduce. They also use item sets to detect conflicts and not FIRST,FOLLOW.
Your grammar would't work with LL-parsers because it contains left recursion. But the FIRST sets for E, T and V would be {id} (assuming your T := T*V | T is meant to be T := T*V | V).
Answer :
E->E+T|T
left recursion
E->TE'
E'->+TE'|eipsilon
T->T*V|T
left recursion
T->VT'
T'->*VT'|epsilon
no left recursion in
V->(id)
Therefore the grammar is:
E->TE'
E'->+TE'|epsilon
T->VT'
T'->*VT'|epsilon
V-> (id)
FIRST(E)={(}
FIRST(E')={+,epsilon}
FIRST(T)={(}
FIRST(T')={*,epsilon}
FIRST(V)={(}
Starting Symbol=FOLLOW(E)={$}
E->TE',E'->TE'|epsilon:FOLLOW(E')=FOLLOW(E)={$}
E->TE',E'->+TE'|epsilon:FOLLOW(T)=FIRST(E')={+,$}
T->VT',T'->*VT'|epsilon:FOLLOW(T')=FOLLOW(T)={+,$}
T->VT',T->*VT'|epsilon:FOLLOW(V)=FIRST(T)={ *,epsilon}
Rules for First Sets
If X is a terminal then First(X) is just X!
If there is a Production X → ε then add ε to first(X)
If there is a Production X → Y1Y2..Yk then add first(Y1Y2..Yk) to first(X)
First(Y1Y2..Yk) is either
First(Y1) (if First(Y1) doesn't contain ε)
OR (if First(Y1) does contain ε) then First (Y1Y2..Yk) is everything in First(Y1) except for ε as well as everything in First(Y2..Yk)
If First(Y1) First(Y2)..First(Yk) all contain ε then add ε to First(Y1Y2..Yk) as well.
Rules for Follow Sets
First put $ (the end of input marker) in Follow(S) (S is the start symbol)
If there is a production A → aBb, (where a can be a whole string) then everything in FIRST(b) except for ε is placed in FOLLOW(B).
If there is a production A → aB, then everything in FOLLOW(A) is in FOLLOW(B)
If there is a production A → aBb, where FIRST(b) contains ε, then everything in FOLLOW(A) is in FOLLOW(B)
Wikipedia is your friend. See discussion of LL parsers and first/follow sets.
Fundamentally they are used as the basic for parser construction, e.g., as part of parser generators. You can also use them to reason about properties of grammars, but most people don't have much of a need to do this.