LR(1) parsing, problem with look ahead symbols - parsing

I understand the concept of LR(1) parsing and lookahead symbols. I have the solution to the exercise and it does not agree with my solution.
I'm trying to fill the LR(1) parsing table for the grammar below:
S->xAz
S->BAx
A->Ay
A->e
B->yB
B->y
Ι don't have to extend the grammar since S does not appear in any right hand side of any rule.
First(A)=y,e
First(Ax)=x,y
First(B)=y
First(Ay)=y
Lookahead symbols in brackets.
So, I0 = Closure(S->.xAz($) , S->.BAx($) ) =
S->.xAz($)
S->.BAx($)
B->.yB(x,y)
B->.y(x,y)
When i try GOTO(0,x) i think that i should go to:
S->x.Az($)
A->.Ay(z)
A->. (z)
To find the lookahead symbol for A->. & A->.Ay i take First(z). But the official book solution says the lookeahead is (z,y).
Where does that y comes from?
Thank you in advance

Related

Problem with parsing file ending on a newline

It seems a bit like a trivial question, but I am stuck on parsing the end of file EOF using my own island grammar. I am using the new VScode extension btw.
I've mostly been using the examples from the basic recipes and have a simple grammar with the following layout rules:
layout Whitespace = [\t-\n\r\ ]*;
lexical IntegerLiteral = [0-9]+ !>> [0-9];
lexical Comment = "%%" ![\n]* $;
Using this, and some rules it parses some simple files, but will give a parse error anytime a file ends in a newline. (newlines in between lines are no problem).
Am is missing something obvious?
Thanks!
It sounds a bit like your grammar is missing a start nonterminal. All grammar rules get whitespace in between their constituent symbols but not at the start or the end.
A start nonterminal is the exception:
start syntax Islands = Island+;
Islands parseIslands(loc input)
= parse(#start[Islands], input).top;
Passing the start nonterminal to parse will allow the file to start and end with whitespace, and using the .top field you can ignore that whitespace from the parse tree again by projecting out the middle Islands tree.
Island grammars tend to be a complex beast, so without sharing the full grammar and input string, it might be a bit hard to answer this question. But I'll share some generic feedback.
he layout production might be ambiguous, if any other part of your language has optional parts. Rascal's parsing is non-greedy. So if you have:
lexical A = "a";
lexical B = "b";
lexical C = "c";
syntax A = A? B? C;
After fusing in the layouts, this becomes:
A` = A? Whitespace? B? Whitespace? C;
Now since whitespace is not eating all characters, the grammar is ambigous, as the parser can "bind" a whitespace between the A and B, or between the B and C. So in most cases, you want to make sure it's a greedy match by adding a follow restriction:
layout Whitespace = [\t-\n \r \ ]* !>> [\t-\n \r \ ];
Also, I fixed a bug, the layout definition didn't include a space as valid whitespace. Rascal allows for spaces in the character class (for readability), so in case we need to add a space, you have to say \ .
For the rest, it looks okay, but like I started with, island grammars are a bit harder to debug without both the full syntax, and what you want to have as water and what as island.

First and follow in the following grammar

The following grammar is given:-
E->E+T|T
T->T*F|F
F->id
I have tried to find the first and follow. Can anyone verify it whether its correct???
First(E)={id}
First(T)={id}
First(F)={id}
Follow(E)={+,id}
Follow(T)={+}
Follow(F)={id,*}
FIRST sets are correct,
FOLLOW(A) of non-terminal A is the set of terminal symbols that can follow in the
derivation sequence
FOLLOW(E), check where it is there in the right-hand side of production. It is there in
E->E+T
what follows E when we consider this production for derivation is '+' and '$'(End of Input) is also added to the follow of start symbol
FOLLOW(E) ={+,$}
FOLLOW(T), it is there in right-hand side of three productions
E-> E+T E->T T->T*F
FOLLOW(T)={*} U FOLLOW(E)={*,+,$}
FOLLOW(F), it is there in right-hand side of two productions
T->T*F T->F
FOLLOW(F)=FOLLOW(T)={*,+,$}
If you are doing this exercise for computing LL(1) parsing table then first eliminate left recursion and proceed.

How to find the follow of the following grammar

Grammar
S->(A)
A->CB
B->;A|ε
C->x|S
I have find the First of the grammar:
First(S)={(}
First(B)={;,ε}
First(C)={x,(}
First(A)=First(C)={x,(}
I have trouble finding the Follow of the grammar.
The Follow sets are
Follow(S)={$,),;}
Follow(B)={)}
Follow(C)={:,)}
Follow(A)={)}

Eliminating Epsilon Production for Left Recursion Elimination

Im following the algorithm for left recursion elimination from a grammar.It says remove the epsilon production if there is any
I have the following grammer
S-->Aa/b
A-->Ac/Sd/∈
I can see after removing the epsilon productions the grammer becomes
1) S-->Aa/a/b
2)A-->Ac/Sd/c/d
Im confused where the a/b comes in 1) and c/d comes in 2)
Can someone explain this?
lets look at the rule S->Aa, if A->∈ then S->∈a giving just S->a, so together with the previous rules we get S->Aa|a|b
now lets check the rule A->Ac and A->∈c which gives us A->c.
what about A->Sd? I dont see how you got A->d as a rule. if that is a rule, then the string "da" is accepted by this grammar (S->Aa & A->d --> "da"), but try to construct this string with the original grammar - if you start with S and the string finishes with a, it means you must use S->Aa, but then in order to have a "d" you must use A->Sd, which forces us to have another "a" or "b", meaning we cannot construct this string, and the rule A->d is not correct.

Help with Shift/Reduce conflict - Trying to model (X A)* (X B)*

Im trying to model the EBNF expression
("declare" "namespace" ";")* ("declare" "variable" ";")*
I have built up the yacc (Im using MPPG) grammar, which seems to represent this, but it fails to match my test expression.
The test case i'm trying to match is
declare variable;
The Token stream from the lexer is
KW_Declare
KW_Variable
Separator
The grammar parse says there is a "Shift/Reduce conflict, state 6 on KW_Declare". I have attempted to solve this with "%left PrologHeaderList PrologBodyList", but neither solution works.
Program : Prolog;
Prolog : PrologHeaderList PrologBodyList;
PrologHeaderList : /*EMPTY*/
| PrologHeaderList PrologHeader;
PrologHeader : KW_Declare KW_Namespace Separator;
PrologBodyList : /*EMPTY*/
| PrologBodyList PrologBody;
PrologBody : KW_Declare KW_Variable Separator;
KW_Declare KW_Namespace KW_Variable Separator are all tokens with values "declare", "naemsapce", "variable", ";".
It's been a long time since I've used anything yacc-like, but here are a couple of suggestions that may or may not help.
It seems that you need a 2-token lookahead in this situation. The parser gets to the last PrologHeader, and it has to decide whether the next construct is a PrologHeader or a PrologBody, and it can't tell that from the KW_Declare. If there's a directive to increase lookahead in this situation, it will probably solve the problem.
You could also introduce context into your actions: rather than define PrologHeaderList and PrologBodyList, define PrologRuleList and have the actions throw an error if a header appears after a body. Ugly, but sometimes you have to do it: what appears simple in a grammar may not be simple in the generated parser.
A hackish approach might be to combine the tokens: rather than KW_Declare and KW_Variable, have your lexer recognize the space and use KW_Declare_Variable. Since both are keywords, you're not going to run into namespace collision problems.
The grammar at the top is regular so IIRC you can plot it out as a DFA (or a NDA and convert it to a DFA) and then convert the DFA to a grammar. It's bean a while so I'll leave the work as an exercise for the reader.

Resources