proving that a grammar is LL(1) - parsing

I'm given the following grammar :
S -> A a A b | B b B a
A -> epsilon
B -> epsilon
I know that it's obvious that it's LL(1), but I'm facing troubles constructing the parsing table.. I followed the algorithm word by word to find the first and follow of each non-terminal , correct me if I'm wrong:
First(S) = {a,b}
First(A) = First(B) = epsilon
Follow(S) = {$}
Follow(A) = {a,b}
Follow(B) = {a,b}
when I construct the parsing table, according to the algorithm, I get a conflict under the $ symbol... what the hell am I doing wrong??
a b $
A A-> epsilon
B B-> epsilon
S S -> AaAb
S -> BbBa
is it ok if I get 2 productions under $ or something?? or am I constructing the parsing table wrong? please help I'm new to the compiler course

There is a tiny mistake. Algorithm is as follows from dragon book,
for each rule (S -> A):
for each terminal a in First(A):
add (S -> A) to M[S, a]
if First(A) contains empty:
for each terminal b in Follow(S):
add (S -> A) to M[S, b]
Let's take them one by one.
S -> AaAb. Here, First(AaAb) = {a}. So add S -> AaAb to M[S, a].
S -> BbBa. Here, First(BbBa) = {b}. So add S -> BbBa to M[S, b].
A -> epsilon. Here, Follow(A) = {a, b}. So add A -> epsilon to M[A, a] and M[A, b].
B -> epsilon. Here, Follow(B) = {a, b}. So add B -> epsilon to M[B, a] and M[B, b].

Related

checking if the following grammar is LR(0) and LR(1)

I have the following grammar:
S -> A
A -> B | B[*]
B -> [AB]
AB -> *,AB | epsilon
S,A,B,AB are variables and [ ] , * are terminals
Is it:
LR(0)?
LR(1)?
and how can I prove it?

Compilers: First and Follow Sets of a grammar that does not contain epsilon

In my current compilers course, I've understood how to find the first and follow sets of a grammar, and so far all of the grammars I have dealt with have contained epsilon. Now I am being asked to find the first and follow sets of a grammar without epsilon, and to determine whether it is LR(0) and SLR. Not having epsilon has thrown me off, so I don't know if I've done it correctly. I would appreciate any comments on whether I am on the right track with the first and follow sets, and how to begin determining if it is LR(0)
Consider the following grammar describing Lisp arithmetic:
S -> E // S is start symbol, E is expression
E -> (FL) // F is math function, L is a list
L -> LI | I // I is an item in a list
I -> n | E // an item is a number n or an expression E
F -> + | - | *
FIRST:
FIRST(S)= FIRST(E) = {(}
FIRST(L)= FIRST(I) = {n,(}
FIRST(F) = {+, -, *}
FOLLOW:
FOLLOW(S) = {$}
FOLLOW(E) = FOLLOW(L) = {), n, $}
FOLLOW(I) = {),$}
FOLLOW(F) = {),$}
The FIRST sets are right, but the FOLLOW sets are incorrect.
The FOLLOW(S) = {$} is right, though technically this is for the augmented grammar S' -> S$ .
E appears on the right side of S -> E and I -> E, both of which mean that the follow of that set is in the follow of E, so: FOLLOW(E) = FOLLOW(S) ∪ FOLLOW(I) .
L appears on the right hand side of L -> LI, which gives FOLLOW(L) ⊇ FIRST(I) , and E -> (FL), which gives FOLLOW(L) ⊇ {)} .
I appears on the right side of L -> LI | I , which gives FOLLOW(I) = FOLLOW(L) .
F appears on the right side in E -> (FL) , which gives FOLLOW(F) = FIRST(L)
Solving for these gives:
FOLLOW(F) = {n, (}
FOLLOW(L) = FIRST(I) ∪ {)} = {n, (, )}
FOLLOW(I) = {n, (, )}
FOLLOW(E) = {$} ∪ {n, (, )} = {n, (, ), $}

LR(1) - How to compute Lookaheads

I have trouble understanding how to compute the lookaheads.
Lets say that I have this extend grammar:
S'-> S
S -> L=R | R
L -> *R | i
R -> L
I wrote the State 0 so:
S'-> .S, {$}
S -> .L=R, {$}
S -> .R, {$}
L -> .*R, {=,$}
L -> .i, {=,$}
R -> .L {=,$}
Using many parsing emulator i see that all calculators says:
R -> .L {$}
Why? Can't the R be followed by a "="?

Is this incremental parser a functor, if so how would `fmap` be implemented?

I really hate asking this kind of question but I'm at the end of my wits here. I am writing an incremental parser but for some reason, just cannot figure out how to implement functor instance for it. Here's the code dump:
Input Data Type
Input is data type yielded by parser to the coroutine. It contains the current list of input chars being operated on by coroutine and end of line condition
data Input a = S [a] Bool deriving (Show)
instance Functor Input where
fmap g (S as x) = S (g <$> as) x
Output Data Type
Output is data type yielded by coroutine to Parser. It is either a Failed message, Done [b], or Partial ([a] -> Output a b), where [a] is the current buffer passed back to the parser
data Output a b = Fail String | Done [b] | Partial ([a] -> Output a b)
instance Functor (Output a) where
fmap _ (Fail s) = Fail s
fmap g (Done bs) = Done $ g <$> bs
fmap g (Partial f) = Partial $ \as -> g <$> f as
The Parser
The parser takes [a] and yields a buffer [a] to coroutine, which yields back Output a b
data ParserI a b = PP { runPi :: [a] -> (Input a -> Output a b) -> Output a b }
Functor Implementation
It seems like all I have to do is fmap the function g onto the coroutine, like follows:
instance Functor (ParserI a) where
fmap g p = PP $ \as k -> runPi p as (\xs -> fmap g $ k xs)
But it does not type check:
Couldn't match type `a1' with `b'
`a1' is a rigid type variable bound by
the type signature for
fmap :: (a1 -> b) -> ParserI a a1 -> ParserI a b
at Tests.hs:723:9
`b' is a rigid type variable bound by
the type signature for
fmap :: (a1 -> b) -> ParserI a a1 -> ParserI a b
at Tests.hs:723:9
Expected type: ParserI a b
Actual type: ParserI a a1
As Philip JF declared, it's not possible to have an instance Functor (ParserI a). The proof goes by variance of functors—any (mathematical) functor must, for each of its arguments, be either covariant or contravariant. Normal Haskell Functors are always covariant which is why
fmap :: (a -> b) -> (f a -> f b)`
Haskell Contravariant functors have the similar
contramap :: (b -> a) -> (f a -> f b)`
In your case, the b index in ParserI a b would have to be both covariant and contravariant. The quick way of figuring this out is to relate covariant positions to + and contravariant to - and build from some basic rules.
Covariant positions are function results, contravariant are function inputs. So a type mapping like type Func1 a b c = (a, b) -> c has a ~ -, b ~ -, and c ~ +. If you have functions in output positions, you multiply all of the argument variances by +1. If you have functions in input positions you multiply all the variances by -1. Thus
type Func2 a b c = a -> (b -> c)
has the same variances as Func1 but
type Func3 a b c = (a -> b) -> c
has a ~ 1, b ~ -1, and c ~ 1. Using these rules you can pretty quickly see that Output has variances like Output - + and then ParserI uses Output in both negative and positive positions, thus it can't be a straight up Functor.
But there are generalizations like Contravariant. The particular generalization of interest is Profunctor (or Difunctors which you see sometimes) which goes like so
class Profunctor f where
promap :: (a' -> a) -> (b -> b') -> (f a b -> f a' b')
the quintessential example of which being (->)
instance Profunctor (->) where
promap f g orig = g . orig . f
i.e. it "extends" the function both after (like a usual Functor) and before. Profunctors f are thus always mathematical functors of arity 2 with variance signature f - +.
So, by generalizing your ParserI slightly, letting there be an extra parameter to split the ouput types in half, we can make it a Profunctor.
data ParserIC a b b' = PP { runPi :: [a] -> (Input a -> Output a b) -> Output a b' }
instance Profunctor (ParserIC a) where
promap before after (PP pi) =
PP $ \as k -> fmap after $ pi as (fmap before . k)
and then you can wrap it up
type ParserI a b = ParserIC a b b
and provide a slightly less convenient mapping function over b
mapPi :: (c -> b) -> (b -> c) -> ParserI a b -> ParserI a c
mapPi = promap
which really drives home the burden of having the variances go both ways---you need to have bidirectional maps!

Building LR(1) configuration lookahead

I really have some troubles to cauculate the lookahead when building the LR(1) item sets, i had tried some lecture notes form different sites, but still...
My example is
S -> E + S | E
E -> num | ( S )
The item set is
I0:
S’ -> . S $
S -> . E + S $
S -> . E $
E -> . num +,$
E -> . ( S ) +,$
I1:
S ->E .+ S $
S ->E . $
The first item in set I0
S’ -> . S $
is initialization.
The second item in set I0
S -> . E + S $
means there is nothing on stack, we expect to read E+S, then reduce iff the token after E+S is $.
The third item in set I0
S -> . E $
means that we expect to read E and reduce iff the token after E is $.
Then i am confused about the fouth item in set I0,
E -> . num +,$
I have no ideas why there are + and $ tokens.
and if anyone can explain this for me in plain English please.
For each configuration [A –> u•Bv, a] in I, for each production B –> w in G', and for
each terminal b in First(va) such that [B –> •w, b] is not in I: add [B –> •w, b] to I.
Thanks!!!
I think i figured it out.
i am using the algorithm of
for set I0:
Begin with [S' -> .S, $]
Match [A -> α.Bβ, a]
Then add in [B -> .γ, b]
Where terminal b is FIRST(βa)
for set I1...In
Compute GOTO(I0,X)
Add in X productions and LOOKAHEAD token
In the example
S -> E + S
S -> E
E -> num
E -> ( S )
Firstly,
S’ -> . S $
we try to match it to [A -> α.Bβ, a], That is
A =S', α = ε, B = S , β = ε , a = $ and
FIRST(βa) = {$}
Add in [B -> .γ, b], which are
S -> . E + S $ ...1
S -> . E $ ...2
in I0.
Then, we need to add in productions for E as 1 and 2.
In this case, our [A -> α.Bβ, a] are 1 and 2.
Thus, FIRST(βa) = { + , $ }, and we have
E -> . num +,$
E -> . ( S ) +,$
Now, we compute GOTO(I0, X)
For X = E
we move dot one position and found no productions need to be added. So we just add in second component $ from
S -> . E + S $
S -> . E $
which gives us I1
S ->E .+ S $
S ->E . $
and so on...
So, is this the correct and efficient way when building LR(1) item sets?
For
E -> . num +,$
E -> . ( S ) +,$
the +,$ indicate that only these tokens can follow a number or a closing parenthesis. Think about it: The grammar does noty allow adjacent num's or ()'s, they must either be at the end of the sentence or followed by a +.
As for translation request, it is a fancy way of saying how to calculate the set of tokens that can follow a given token. The +,$ above are an example. They are the only legal tokens that can follow num and ).

Resources