Calculating First and Follow of a grammar - parsing

I'm trying to calculate First and Follow of the following grammar:
S -> A B C D E
A -> a
A -> EPSILON
B -> b
B -> EPSILON
C -> c
D -> d
D -> EPSILON
E -> e
E -> EPSILON
I calculated them and got First(S)={a,b,c}. But using this tools, says: First(S)= {a, ε, c, b}. Why epsilon is part of First(S)? As I understand it should not be there. Is it my mistake or a bug? In case it's a bug. Are there other tools I can use to verify my results? In case it's my mistake, it would be helpful to understand why. Printscreen:
Also I got Follow(C)={d,e,$} but their result is Follow(C)={c, d, $}. Why?

Related

Which productions are considered in LR(1) lookahead?

I'm currently looking at two closure calculation examples using the tool at
http://jsmachines.sourceforge.net/machines/lr1.html
Example 1
S -> A c
A -> b B
B -> A b
Here, in the initial state ends up with a closure of:
[S -> .A c, $]; [A -> .b B, c]}
Example 2
S -> A B
A -> a
B -> b
B -> ''
The calculated first step closure is:
{[S -> .A B, $]; [A -> .a, b/$]}
In example 1, why is the follow of b from rule 3 not included in the lookahead? In case 2, we follow B to figure out that $ is part of the lookahead so is there some special reason to not consider all rules in case 1?
When doing a closure with ". A α" we use FIRST(α) as the lookahead, and only include the containing (parent) lookahead if ε ∈ FIRST(α). In example 1, ε ∉ FIRST(c), so the lookahead is just c. In example 2, ε ∈ FIRST(B), so we add the containing lookahead ($ in this case) to the lookahead.
FOLLOW is never relevant.

Find a s-grammar (simple grammar)

find a simple grammar (a.k.a s-grammar) for the following language:
L={(ab)2mb :m>=0}
[i did this but it is wrong]
S-> aASBB|b
A-> a
B->b
What about this?
S -> aA | T
A -> bB
B -> aC
C -> bS
T -> b
This is a regular grammar - all productions of the form X -> sY or X -> t, and corresponds to a minimal DFA for the language in question via a direct mapping of productions to transactions and nonterminal symbols to states.

Parse error on input `_'

Line 7, at the _. I've no idea what might be the problem. Any tips?
term :: Parser Expr
term s1 = case factor s1 of
Just (a, s2) -> case s2 of
'*':s3 -> case term s3 of
Just (b, s4) -> Just (Mul a b, s4)
Nothing -> Just (a, s2)
_ -> Just (a, s2)
Nothing -> Nothing
I'm trying to parse a string into an Expr (selfmade datatype). I think this is how we're supposed to do it but i can't test it since i can't compile it properly. GHCI and Ghc wall gives me the same error. Parse error at that specific point.
My code is larger than this but this is the rellevant piece of code.
edit: Code posted here, sorry.
It is a syntax problem. Haskell uses two-dimensional syntax, thus each part of the case statement should have same indentation.
So, to fix the error move line 7 two characters left
term :: Parser Expr
term s1 = case factor s1 of
Just (a, s2) -> case s2 of
'*':s3 -> case term s3 of
Just (b, s4) -> Just (Mul a b, s4)
Nothing -> Just (a, s2)
_ -> Just (a, s2)
Nothing -> Nothing

SLR Parsing - with an epsilon production

Say I have:
S -> A
A -> B C A a | ϵ
B -> k | ϵ
C -> m
Now in the initial state S' -> S, I'm going to include:
S' -> .S
Then the closure of S:
A -> .B C A a , A -> .
Closure would also include B -> .k and B -> . obviously.
But since B -> ϵ is a production, would I also have to include C -> ,m in the initial state? Since in A -> B C A a, B can be ϵ.
I just wanted to know if I'm right and if this is the right way to deal with epsilons in grammar. If not, do guide me in the right direction. Thanks!
No, C -> . m is not part of the initial state, because C cannot be reduced without a preceding B (even if the B is reduced from ε).

LR(1) Item DFA - Computing Lookaheads

I have trouble understanding how to compute the lookaheads for the LR(1)-items.
Lets say that I have this grammar:
S -> AB
A -> aAb | a
B -> d
A LR(1)-item is an LR(0) item with a lookahead. So we will get the following LR(0)-item for state 0:
S -> .AB , {lookahead}
A -> .aAb, {lookahead}
A -> .a, {lookahead}
State: 1
A -> a.Ab, {lookahead}
A -> a. ,{lookahead}
A -> .aAb ,{lookahead}
A ->.a ,{lookahead}
Can somebody explain how to compute the lookaheads ? What is the general approach ?
Thank you in advance
The lookaheads used in an LR(1) parser are computed as follows. First, the start state has an item of the form
S -> .w ($)
for every production S -> w, where S is the start symbol. Here, the $ marker denotes the end of the input.
Next, for any state that contains an item of the form A -> x.By (t), where x is an arbitrary string of terminals and nonterminals and B is a nonterminal, you add an item of the form B -> .w (s) for every production B -> w and for every terminal in the set FIRST(yt). (Here, FIRST refers to FIRST sets, which are usually introduced when talking about LL parsers. If you haven't seen them before, I would take a few minutes to look over those lecture notes).
Let's try this out on your grammar. We start off by creating an item set containing
S -> .AB ($)
Next, using our second rule, for every production of A, we add in a new item corresponding to that production and with lookaheads of every terminal in FIRST(B$). Since B always produces the string d, FIRST(B$) = d, so all of the productions we introduce will have lookahead d. This gives
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
Now, let's build the state corresponding to seeing an 'a' in this initial state. We start by moving the dot over one step for each production that starts with a:
A -> a.Ab (d)
A -> a. (d)
Now, since the first item has a dot before a nonterminal, we use our rule to add one item for each production of A, giving those items lookahead FIRST(bd) = b. This gives
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
Continuing this process will ultimately construct all the LR(1) states for this LR(1) parser. This is shown here:
[0]
S -> .AB ($)
A -> .aAb (d)
A -> .a (d)
[1]
A -> a.Ab (d)
A -> a. (d)
A -> .aAb (b)
A -> .a (b)
[2]
A -> a.Ab (b)
A -> a. (b)
A -> .aAb (b)
A -> .a (b)
[3]
A -> aA.b (d)
[4]
A -> aAb. (d)
[5]
S -> A.B ($)
B -> .d ($)
[6]
B -> d. ($)
[7]
S -> AB. ($)
[8]
A -> aA.b (b)
[9]
A -> aAb. (b)
In case it helps, I taught a compilers course last summer and have all the lecture slides available online. The slides on bottom-up parsing should cover all of the details of LR parsing and parse table construction, and I hope that you find them useful!
Hope this helps!
here is the LR(1) automaton for the grammar as the follow has been done above
I think it's better for the understanding to trying draw the automaton and the flow will make the idea of the lookaheads clearer
The LR(1) item set constructed by you should have two more items.
I8 A--> aA.b , b from I2
I9 A--> aAb. , b from I8
I also get 11 states, not 8:
State 0
S: .A B ["$"]
A: .a A b ["d"]
A: .a ["d"]
Transitions
S -> 1
A -> 2
a -> 5
Reductions
none
State 1
S_Prime: S .$ ["$"]
Transitions
none
Reductions
none
State 2
S: A .B ["$"]
B: .d ["$"]
Transitions
B -> 3
d -> 4
Reductions
none
State 3
S: A B .["$"]
Transitions
none
Reductions
$ => S: A B .
State 4
B: d .["$"]
Transitions
none
Reductions
$ => B: d .
State 5
A: a .A b ["d"]
A: .a A b ["b"]
A: .a ["b"]
A: a .["d"]
Transitions
A -> 6
a -> 8
Reductions
d => A: a .
State 6
A: a A .b ["d"]
Transitions
b -> 7
Reductions
none
State 7
A: a A b .["d"]
Transitions
none
Reductions
d => A: a A b .
State 8
A: a .A b ["b"]
A: .a A b ["b"]
A: .a ["b"]
A: a .["b"]
Transitions
A -> 9
a -> 8
Reductions
b => A: a .
State 9
A: a A .b ["b"]
Transitions
b -> 10
Reductions
none
State 10
A: a A b .["b"]
Transitions
none
Reductions
b => A: a A b .

Resources