How to remove left recursion from a grammar with beta missing? - parsing

Here's the problem:
A -> A*B | A+CDE | Aab
All of the productions start with A. I guess that satisfies the rule? As you can see, it is missing beta. How do I perform left recursion on it? Can left recursion be even performed on it?
What I have learned so far is that if it was like this: A -> A*B | A+CDE | Aab | b
Then I would consider b as beta and solve as:
A -> bA'
A' -> *BA' | +CDEA' | abA' | ϵ
Without beta, I am confused. What do I consider as beta?

You need to reduce the grammar by removing all useless and non-productive non-terminals before you try to remove right-recursion. Since A is non-productive (it has no non-recursive production), it will get eliminated along with any production which uses it. That certainly gets rid of the left-recursion :-).

Related

Removing Ambiguity Caused By Dangling Else For LL(1) Grammars

In the case of the dangling else problem for compiler design, is there a reason to left factor it before removing ambiguity?
We are transforming a CFG into an LL(1) grammar so my professor is asking us to first eliminate recursion, then left factor, then remove ambiguity from our grammar. But, from what I've read, ambiguity is usually eliminated first. I'm not sure how to remove ambiguity after left factoring.
This is how what I got after left factoring it:
S -> i E t S S' | other
S' -> e S | epsilon
However, as I understand it, removing ambiguity requires a rewrite of the grammar so the grammar will always result similar to this right?
S -> U | M
M -> i E t M e M | other
U -> i E t U'
U' -> M e U | S
Or is there another way to do it? As far as I can see, this is the only way to remove ambiguity from the dangling else.
As it turns out, a good way to deal with ambiguity caused by a dangling else in an LL(1) is to handle it in the parser. Rewriting the grammar is also another way to handle it, as is adding 'begin' and 'end' in the grammar like so:
S -> i E t a S z S' | other
S' -> e S | epsilon
Although it might be intuitive for some, for other beginners, this is what the symbols mean:
S: Statement
i: if
E: Expression
t: then
a: begin
z: end
S': Statement'
e: else
other: any other productions
Note: lower case letters represent terminals; Uppercase letters represent variables.
If anything is wrong, please let me know and I'll correct it.
I think this can be a possible answer:
[After left factoring and making it unambiguous]
Let other = a
S -> iEtT | a
T -> S | aeS
I am generating all if's first and associating the else with the recent unassociated if .
If I have to get an else, I should be eliminating the possibility of getting a new if between the current unassociated if and corresponding else.
However I am allowing the possibility of getting an if after generating the corresponding else.
Point out if there are any errors.
Thank you.

Ambiguity in pattern matching syntax

I came across an oddity in the F# pattern matching syntax today, which can lead to apparent failures in the exhaustivity check.
type Thing =
| This
| That
| Other
let useThing =
function
| This -> "A"
| That -> "A"
| That -> "B" // compiler complains
| Other -> "B"
In the above scenario the compiler helpfully tells me that the second That rule will never be matched. However, if I had tried to make the code a bit more compact and had written
let useThing =
function
| This | That -> "A"
| That | Other -> "B"
I do not get any help from the compiler. I think the reason is that | This | That ->. "A" is not a shortcut for | This -> "A" | That -> "A", even though it looks very much like it is (and I've seen many code samples that treat it as such). Instead, from what I can find, the pipe symbol is used both to separate individual patterns, and also as OR pattern.
This is not a big issue for most DUs, but I encountered the problem when mapping a DU with a large number of cases into another DU with a small number of cases. My attempt to use the shortcut syntax caused a bug.
So my questions are:
Is my interpretation correct?
Is there any workaround apart from listing each pattern on a separate line?
Your interpretation is correct.
By leaving out the actions for the first This and second That you are creating an OR pattern as described in Pattern Matching (F#)
To me this is slightly confusing, too, since the logical 'or' is || in F#.
And while it is easy to see the first bar as new alternative and second bar as or in your formatting it becomes less obvious in
let useThing =
function
| This
| That -> "A"
| That
| Other -> "B"
However the compiler can tell whether a whole pattern is useless but it cannot simplify a pattern.
That | Other has a valid match and is therefore not redundant as considered by the compiler.
You can think of much more involved patterns where it would be not at all clear if parts can be left out or how to simplify them.

How to eliminate this Left Recursion for LL Parser

How do you eliminate a left recursion of the following type. I can't seem to be able to apply the general rule on this particular one.
A -> A | a | b
By using the elimination rule you get:
A -> aA' | bA'
A' -> A' | epsilon
Which still has left recursion.
Does this say anything about the grammar being/not being LL(1)?
Thank you.
Notice that the rule
A → A
is, in a sense, entirely useless. It doesn't do anything to a derivation to apply this rule. As a result, we can safely remove it from the grammar without changing what the grammar produces. This leaves
A → a | b
which is LL(1).

Converting given ambiguous arithmetic expression grammar to unambiguous LL(1)

In this term, I have course on Compilers and we are currently studying syntax - different grammars and types of parsers. I came across a problem which I can't exactly figure out, or at least I can't make sure I'm doing it correctly. I already did 2 attempts and counterexamples were found.
I am given this ambiguous grammar for arithmetic expressions:
E → E+E | E-E | E*E | E/E | E^E | -E | (E)| id | num , where ^ stands for power.
I figured out what the priorities should be. Highest priority are parenthesis, followed by power, followed by unary minus, followed by multiplication and division, and then there is addition and substraction. I am asked to convert this into equivalent LL(1) grammar. So I wrote this:
E → E+A | E-A | A
A → A*B | A/B | B
B → -C | C
C → D^C | D
D → (E) | id | num
What seems to be the problem with this is not equivalent grammar to the first one, although it's non-ambiguous. For example: Given grammar can recognize input: --5 while my grammar can't. How can I make sure I'm covering all cases? How should I modify my grammar to be equivalent with the given one? Thanks in advance.
Edit: Also, I would of course do elimination of left recursion and left factoring to make this LL(1), but first I need to figure out this main part I asked above.
Here's one that should work for your case
E = E+A | E-A | A
A = A*C | A/C | C
C = C^B | B
B = -B | D
D = (E) | id | num
As a sidenote: pay also attention to the requirements of your task since some applications might assign higher priority to the unary minus operator with respect to the power binary operator.

fixing a grammar to LR(0)

Question:
Given the following grammar, fix it to an LR(O) grammar:
S -> S' $
S'-> aS'b | T
T -> cT | c
Thoughts
I've been trying this for quite sometime, using automatic tools for checking my fixed grammars, with no success. Our professor likes asking this kind of questions on test without giving us a methodology for approaching this (except for repeated trying). Is there any method that can be applied to answer these kind of questions? Can anyone show this method can be applied on this example?
I don't know of an automatic procedure, but the basic idea is to defer decisions. That is, if at a particular state in the parse, both shift and reduce actions are possible, find a way to defer the reduction.
In the LR(0) parser, you can make a decision based on the token you just shifted, but not on the token you (might be) about to shift. So you need to move decisions to the end of productions, in a manner of speaking.
For example, your language consists of all sentences { ancmbn$ | n ≥ 0, m > 0}. If we restrict that to n > 0, then an LR(0) grammar can be constructed by deferring the reduction decision to the point following a b:
S -> S' $.
S' -> U | a S' b.
U -> a c T.
T -> b | c T.
That grammar is LR(0). In the original grammar, at the itemset including T -> c . and T -> c . T, both shift and reduce are possible: shift c and reduce before b. By moving the b into the production for T, we defer the decision until after the shift: after shifting b, a reduction is required; after c, the reduction is impossible.
But that forces every sentence to have at least one b. It omits sentences for which n = 0 (that is, the regular language c*$). That subset has an LR(0) grammar:
S -> S' $.
S' -> c | S' c.
We can construct the union of these two languages in a straight-forward manner, renaming one of the S's:
S -> S1' $ | S2' $.
S1' -> U | a S1' b.
U -> a c T.
T -> b | c T.
S2' -> c | S2' c.
This grammar is LR(0), but the form in which the end-of-input sentinel $ has been included seems to be cheating. At least, it violates the rule for augmented grammars, because an augmented grammar's base rule is always S -> S' $ where S' and $ are symbols not used in the original grammar.
It might seem that we could avoid that technicality by right-factoring:
S -> S' $
S' -> S1' | S2'
Unfortunately, while that grammar is still deterministic, and does recognise exactly the original language, it is not LR(0).
(Many thanks to #templatetypedef for checking the original answer, and identifying a flaw, and also to #Dennis, who observed that c* was omitted.)

Resources