I read several papers about creating LR(1) item sets, but none of them pertained to left recursive grammars, such as those used for parsing expressions. If I had the following grammar,
E -> E + T | T
T -> T * F | F
F -> (E) | NUMBER
How would I go about creating the LR(1) item sets?
Left recursion isn't inherently a problem for LR(1) parsers, and the rules for determining the configurating sets and lookaheads is the same regardless of whether your grammar has left recursion.
In your case, we begin by augmenting the grammar with our new start symbol:
S -> E
E -> E + T | T
T -> T * F | F
F -> (E) | NUMBER
Our initial configurating set corresponds to looking at the production S -> E with a lookahead of $. Initially, that gives us the following:
(1)
S -> .E [$]
We now need to expand out what E could be. That gives us these new items:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
Now, let's look at the item E -> .E + T [$]. We need to expand out what E could be here, and the rules for doing so are the same as in the non-left-recursive case: we list all productions for E with the dot at the front, with a lookahead given by what follows the E in the production E -> .E + T [$]. In this case we're looking for an E with a lookahead of + because that's what follows is in the production. That adds these items:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
E -> .E + T [+]
E -> .T [+]
From here, we expand out all the cases where there's a dot before a T, which gives the following:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
E -> .E + T [+]
E -> .T [+]
T -> .T * F [$]
T -> .F [$]
T -> .T * F [+]
T -> .F [+]
We now have to expand out the Ts in the context of T -> .T * F [$], and we do so by listing all productions of T followed by what the T is followed by in T -> .T * F [$] (namely, *). That gives us the following:
(1)
S -> .E [$]
E -> .E + T [$]
E -> .T [$]
E -> .E + T [+]
E -> .T [+]
T -> .T * F [$]
T -> .F [$]
T -> .T * F [+]
T -> .F [+]
T -> .T * F [*]
T -> .F [*]
And from here we'd expand out the productions that have a dot before the F. Do you see how to do that based on how things have played out so far?
Related
I was writing an LL(1) parser for an expression grammar. I had the following grammar:
E -> E + E
E -> E - E
E -> E * E
E -> E / E
E -> INT
However, this is left recursive and I removed the left recursion with the following grammar:
E -> INT E'
E' -> + INT E'
E' -> - INT E'
E' -> * INT E'
E' -> / INT E'
E' -> ε
If I was to have the expression 1 + 2 * 3, how would the parser know to evaluate the multiplication before the addition?
Try this:
; an expression is an addition
E -> ADD
; an addition is a multiplication that is optionally followed
; by +- and another addition
ADD -> MUL T
T -> PM ADD
T -> ε
PM -> +
PM -> -
; a multiplication is an integer that is optionally followed
; by */ and another multiplication
MUL -> INT G
G -> MD MUL
G -> ε
MD -> *
MD -> /
; an integer is a digit that is optionally followed by an integer
INT -> DIGIT J
J -> INT
J -> ε
; digits
DIGIT -> 0
DIGIT -> 1
DIGIT -> 2
DIGIT -> 3
DIGIT -> 4
DIGIT -> 5
DIGIT -> 6
DIGIT -> 7
DIGIT -> 8
DIGIT -> 9
So I have this grammar I'm trying to build an LR(1) table for
E' -> E
E -> E + E
E -> E * E
E -> ( E )
E -> a
So far, this my table
I'm trying to solve the conflicts here. I thought about changing the grammar to postfix instead of infix but I'm not really sure if I can do that. Any ideas?
Here is your grammar, with precedence:
E' -> E
E -> E + T
E -> T
T -> T * F
T -> F
F -> ( E )
F -> a
Don't forget the extra E -> T, and T -> F, as without it the grammar will be useless.
Note: This will not work with LR(0), because you'll get a conflict.
I have the following grammar for basic arithmetic expressions
E -> E + T
E -> T
T -> T * F
T -> F
F -> (E)
F -> id
Where E is expression, T is term, F is factor. I'm wondering how I can extend this grammar to support further arithmetic operations such exponents possibly represented with ^ or logarithm.
Thanks
Since exponentation has higher precedence you could use the following grammar:
E -> E + T
E -> T
T -> T * F
T -> F
F -> G ^ F
F -> G
G -> log(E)
G -> (E)
G -> id
I have trouble understanding how to compute the lookaheads.
Lets say that I have this extend grammar:
S'-> S
S -> L=R | R
L -> *R | i
R -> L
I wrote the State 0 so:
S'-> .S, {$}
S -> .L=R, {$}
S -> .R, {$}
L -> .*R, {=,$}
L -> .i, {=,$}
R -> .L {=,$}
Using many parsing emulator i see that all calculators says:
R -> .L {$}
Why? Can't the R be followed by a "="?
This is not my homework, I'm trying to understand LALR(1) grammars. So I found this
S -> aEa | bEb | aFb | bFa
E -> e
F -> e
I wrote the LR items, but I can't figure out
why this is an LR(1) grammar and not LALR(1)?
Can anyone help me? Thank you
Let's begin by constructing LR(1) configurating sets for the grammar:
(1)
S' -> .S [$]
S -> .aEa [$]
S -> .aFb [$]
S -> .bFa [$]
S -> .bEb [$]
(2)
S' -> S. [$]
(3)
S -> a.Ea [$]
S -> a.Fb [$]
E -> .e [a]
F -> .e [b]
(4)
E -> e. [a]
F -> e. [b]
(5)
S -> aE.a [$]
(6)
S -> aEa. [$]
(7)
S -> aF.b [$]
(8)
S -> aFb. [$]
(9)
S -> b.Fa [$]
S -> b.Eb [$]
E -> .e [b]
F -> .e [a]
(10)
E -> e. [b]
F -> e. [a]
(11)
S -> bF.a [$]
(12)
S -> bFa. [$]
(13)
S -> bE.b [$]
(14)
S -> bEb. [$]
If you'll notice, states (4) and (10) have the same core, so in the LALR(1) automaton we'd merge them together to form the new state
(4, 10)
E -> e. [a, b]
F -> e. [a, b]
Which now has a reduce/reduce conflict in it (all conflicts in LALR(1) that weren't present in the LR(1) parser are reduce/reduce, by the way). This accounts for why the grammar is LR(1) but not LALR(1).
Hope this helps!