How to eliminate this Left Recursion for LL Parser - parsing

How do you eliminate a left recursion of the following type. I can't seem to be able to apply the general rule on this particular one.
A -> A | a | b
By using the elimination rule you get:
A -> aA' | bA'
A' -> A' | epsilon
Which still has left recursion.
Does this say anything about the grammar being/not being LL(1)?
Thank you.

Notice that the rule
A → A
is, in a sense, entirely useless. It doesn't do anything to a derivation to apply this rule. As a result, we can safely remove it from the grammar without changing what the grammar produces. This leaves
A → a | b
which is LL(1).

Related

Removing Ambiguity Caused By Dangling Else For LL(1) Grammars

In the case of the dangling else problem for compiler design, is there a reason to left factor it before removing ambiguity?
We are transforming a CFG into an LL(1) grammar so my professor is asking us to first eliminate recursion, then left factor, then remove ambiguity from our grammar. But, from what I've read, ambiguity is usually eliminated first. I'm not sure how to remove ambiguity after left factoring.
This is how what I got after left factoring it:
S -> i E t S S' | other
S' -> e S | epsilon
However, as I understand it, removing ambiguity requires a rewrite of the grammar so the grammar will always result similar to this right?
S -> U | M
M -> i E t M e M | other
U -> i E t U'
U' -> M e U | S
Or is there another way to do it? As far as I can see, this is the only way to remove ambiguity from the dangling else.
As it turns out, a good way to deal with ambiguity caused by a dangling else in an LL(1) is to handle it in the parser. Rewriting the grammar is also another way to handle it, as is adding 'begin' and 'end' in the grammar like so:
S -> i E t a S z S' | other
S' -> e S | epsilon
Although it might be intuitive for some, for other beginners, this is what the symbols mean:
S: Statement
i: if
E: Expression
t: then
a: begin
z: end
S': Statement'
e: else
other: any other productions
Note: lower case letters represent terminals; Uppercase letters represent variables.
If anything is wrong, please let me know and I'll correct it.
I think this can be a possible answer:
[After left factoring and making it unambiguous]
Let other = a
S -> iEtT | a
T -> S | aeS
I am generating all if's first and associating the else with the recent unassociated if .
If I have to get an else, I should be eliminating the possibility of getting a new if between the current unassociated if and corresponding else.
However I am allowing the possibility of getting an if after generating the corresponding else.
Point out if there are any errors.
Thank you.

SLR parsing conflicts with epsilon production

Consider the following grammar
S -> aPbSQ | a
Q -> tS | ε
P -> r
While constructing the DFA we can see there shall be a state which contains Items
Q -> .tS
Q -> . (epsilon as a blank string)
since t is in follow(Q) there appears to be a shift - reduce conflict.
Can we conclude the nature of the grammar isn't SLR(1) ?
(Please ignore my incorrect previous answer.)
Yes, the fact that you have a shift/reduce conflict in this configuring set is sufficient to show that this grammar isn't SLR(1).

fixing a grammar to LR(0)

Question:
Given the following grammar, fix it to an LR(O) grammar:
S -> S' $
S'-> aS'b | T
T -> cT | c
Thoughts
I've been trying this for quite sometime, using automatic tools for checking my fixed grammars, with no success. Our professor likes asking this kind of questions on test without giving us a methodology for approaching this (except for repeated trying). Is there any method that can be applied to answer these kind of questions? Can anyone show this method can be applied on this example?
I don't know of an automatic procedure, but the basic idea is to defer decisions. That is, if at a particular state in the parse, both shift and reduce actions are possible, find a way to defer the reduction.
In the LR(0) parser, you can make a decision based on the token you just shifted, but not on the token you (might be) about to shift. So you need to move decisions to the end of productions, in a manner of speaking.
For example, your language consists of all sentences { ancmbn$ | n ≥ 0, m > 0}. If we restrict that to n > 0, then an LR(0) grammar can be constructed by deferring the reduction decision to the point following a b:
S -> S' $.
S' -> U | a S' b.
U -> a c T.
T -> b | c T.
That grammar is LR(0). In the original grammar, at the itemset including T -> c . and T -> c . T, both shift and reduce are possible: shift c and reduce before b. By moving the b into the production for T, we defer the decision until after the shift: after shifting b, a reduction is required; after c, the reduction is impossible.
But that forces every sentence to have at least one b. It omits sentences for which n = 0 (that is, the regular language c*$). That subset has an LR(0) grammar:
S -> S' $.
S' -> c | S' c.
We can construct the union of these two languages in a straight-forward manner, renaming one of the S's:
S -> S1' $ | S2' $.
S1' -> U | a S1' b.
U -> a c T.
T -> b | c T.
S2' -> c | S2' c.
This grammar is LR(0), but the form in which the end-of-input sentinel $ has been included seems to be cheating. At least, it violates the rule for augmented grammars, because an augmented grammar's base rule is always S -> S' $ where S' and $ are symbols not used in the original grammar.
It might seem that we could avoid that technicality by right-factoring:
S -> S' $
S' -> S1' | S2'
Unfortunately, while that grammar is still deterministic, and does recognise exactly the original language, it is not LR(0).
(Many thanks to #templatetypedef for checking the original answer, and identifying a flaw, and also to #Dennis, who observed that c* was omitted.)

Difference between Left Factoring and Left Recursion

What is the difference between Left Factoring and Left Recursion ? I understand that Left factoring is a predictive top down parsing technique. But I get confused when I hear these two terms.
Left factoring is removing the common left factor that appears in two productions of the same non-terminal. It is done to avoid back-tracing by the parser. Suppose the parser has a look-ahead, consider this example:
A -> qB | qC
where A, B and C are non-terminals and q is a sentence.
In this case, the parser will be confused as to which of the two productions to choose and it might have to back-trace. After left factoring, the grammar is converted to:
A -> qD
D -> B | C
In this case, a parser with a look-ahead will always choose the right production.
Left recursion is a case when the left-most non-terminal in a production of a non-terminal is the non-terminal itself (direct left recursion) or through some other non-terminal definitions, rewrites to the non-terminal again (indirect left recursion).
Consider these examples:
(1) A -> Aq (direct)
(2) A -> Bq
B -> Ar (indirect)
Left recursion has to be removed if the parser performs top-down parsing.
Left Factoring is a grammar transformation technique. It consists in "factoring out" prefixes which are common to two or more productions.
For example, going from:
A → α β | α γ
to:
A → α A'
A' → β | γ
Left Recursion is a property a grammar has whenever you can derive from a given variable (non terminal) a rhs that begins with the same variable, in one or more steps.
For example:
A → A α
or
A → B α
B → A γ
There is a grammar transformation technique called Elimination of left recursion, which provides a method to generate, given a left recursive grammar, another grammar that is equivalent and is not left recursive.
The relationship/confusion between both terms probably derives from the fact that both transformation techniques may need to be applied to a grammar before being able to derive a predictive top down parser for it.
This is the way I've seen the two terms used:
Left recursion: when one or more productions can be reached from themselves with no tokens consumed in-between.
Left factoring: a process of transformation, turning the grammar from a left-recursive form to an equivalent non-left-recursive form.
left factor :
Let the given grammar :
A-->ab1 | ab2 | ab3
1) we can see that, for every production, there is a common prefix & if we choose any production here, it is not confirmed that we will not need to backtrack.
2) it is non deterministic, because we cannot choice any production and be assured that we will reach at our desired string by making the correct parse tree.
but if we rewrite the grammar in a way that is deterministic and also leaves us flexible enough to convert it into any string that is possible without backtracking, it will be:
A --> aA',
A' --> b1 | b2| b3
now if we are asked to make the parse tree for string ab2 and now we don't need back tracking. Because we can always choose the correct production when we get A' thus we will generate the correct parse tree.
Left recursion :
A --> Aa | b
here it is clear that the left child of A will always be A if we choose the first production,this is left recursion .because , A is calling itself over and over again .
the generated string from this grammar is :
ba*
since this cannot be in a grammar ... we eliminate the left recursion by writing :
A --> bA'
A' --> E | aA'
now we will not have left recursion and also we can generate ba* .
Left Recursion:
A grammar is left recursive if it has a nonterminal A such that there is a derivation A -> Aα | β where α and β are sequences of terminals and nonterminals that do not start with A.
While designing a top down-parser, if the left recursion exist in the grammar then the parser falls in an infinite loop, here because A is trying to match A itself, which is not possible.
We can eliminate the above left recursion by rewriting the offending production. As-
A -> βA'
A' -> αA' | epsilon
Left Factoring: Left factoring is required to eliminate non-determinism of a grammar. Suppose a grammar, S -> abS | aSb
Here, S is deriving the same terminal a in the production rule(two alternative choices for S), which follows non-determinism. We can rewrite the production to defer the decision of S as-
S -> aS'
S' -> bS | Sb
Thus, S' can be replaced for bS or Sb
Here is a simple way to differentiate between both terms:
Left Recursion:
When leftmost Element of a production is the Producing element itself (Non Terminal Element).
e.g. A -> Aα / Aβ
Left Factoring:
When leftmost Element of a production (Terminal element) is repeated in the same production.
e.g. A -> αB / αC
Furthermore,
If a Grammar is Left Recursive, it might result into infinite loop hence we need to Eliminate Left Recursion.
If a Grammar is Left Factoring, it confuses the parser hence we need to Remove Left Factoring as well.
left recursion:= when left hand non terminal is same as right hand non terminal.
Example:
A->A&|B where & is alpha.
We can remove left ricursion using rewrite this production as like.
A->BA'
A'->&A'|€
Left factor mean productn should not non deterministic. .
Example:
A->&A|&B|&C

Conversion to Chomsky Normal Form

I do need your help.
I have these productions:
1) A--> aAb
2) A--> bAa
3) A--> ε
I should apply the Chomsky Normal Form (CNF).
In order to apply the above rule I should:
eliminate ε producions
eliminate unitary productions
remove useless symbols
Immediately I get stuck. The reason is that A is a nullable symbol (ε is part of its body)
Of course I can't remove the A symbol.
Can anyone help me to get the final solution?
As the Wikipedia notes, there are two definitions of Chomsky Normal Form, which differ in the treatment of ε productions. You will have to pick the one where these are allowed, as otherwise you will never get an equivalent grammar: your grammar produces the empty string, while a CNF grammar following the other definition isn't capable of that.
To begin conversion to Chomsky normal form (using Definition (1) provided by the Wikipedia page), you need to find an equivalent essentially noncontracting grammar. A grammar G with start symbol S is essentially noncontracting iff
1. S is not a recursive variable
2. G has no ε-rules other than S -> ε if ε ∈ L(G)
Calling your grammar G, an equivalent grammar G' with a non-recursive start symbol is:
G' : S -> A
A -> aAb | bAa | ε
Clearly, the set of nullable variables of G' is {S,A}, since A -> ε is a production in G' and S -> A is a chain rule. I assume that you have been given an algorithm for removing ε-rules from a grammar. That algorithm should produce a grammar similar to:
G'' : S -> A | ε
A -> aAb | bAa | ab | ba
The grammar G'' is essentially noncontracting; you can now apply the remaining algorithms to the grammar to find an equivalent grammar in Chomsky normal form.

Resources