Look ahead in LR(1) parsing

Look ahead in LR(1) parsing - parsing

I have some problem to understand how to make look ahead in LR(1). I've already found this question about the same problem LR(1) - Items, Look Ahead but this don't help me.
S'->.S,$
S->.L=R,$
S->.R,$
L->.*R,=/$
L->.id,=/$
R->.L,$
I understand the lookahead of S' and S production, but not the L and R one...
can you help me please? thank you in advance.

For a really nice treatment of LR/SLR/LALR parsing, I'd recommend The Dragon Book.
Constructing the LR(1) sets of items is in Section 4.7.2, the procedure CLOSURE.
For your example, consider performing "expanding" (during CLOSURE) the LR(1) item S->.L=R,$: the result are the LR(1) items
L->.*R,=
L->.id,=
The lookaheads are each terminal symbol in FIRST("=R$"), i.e. everything that follows the non-terminal in front of the dot, plus the lookahead, The new items' lookaheads are just = at this point.
Next, consider "expansion" of S->.R,$. Again, the lookaheads in the new items are all the terminal symbols in FIRST("$"): nothing follows the R, hence only the lookahead. This produces the item
R ->.L,$
Further expanding this item, using as lookaheads all the non-terminals in FIRST("$") gives us
L->.*R,$
L->.id,$
So, you can see that L->.*R,=/$ in you original example is just a shorthand notation for the two individual items, L->.*R,= and L->.*R,$, each of them obtained along separate "expansion" chains.

Related

why recursive decent parser can not parse the aaaaaa EX(4.4.5) Ullman ravisethi

The grammar S -> a S a | a a generates all even-length strings of a's. We can devise a recursive-descent parser with backtrack for this
grammar.
If we choose to expand by production S -> aa first, then we shall
only recognize the string aa.
Thus, any reasonable recursive-descent parser will
try S -> aSa first.
Show that this recursive-descent parser recognizes inputs aa, aaaa, and
aaaaaaaa, but not aaaaaa.

The parser will try to invoke match(a);S();match(a); at first rather than match(a);match(a); as described in the problem. And note that as you try to recursively invokeS() inside the block match(a);S();match(a);, you have only invoked match(a) once, the 'a' symbol in the end is not consumed.

I'll paraphrase myself from this other answer.
It's actually a property of the "singleton match strategy" usual in naive implementations of recursive descent parsers with backtracking.
Quoting this answer by Dr Adrian Johnstone:
The trick here is to realise that many backtracking parsers use what we call a singleton match strategy, in which as soon as the parse function for a rule finds a match, it returns. In general, parse functions need to return a set of putative matches. Just try working it through, and you'll see that a singelton match parser misses some of the possible derivations.
Also, the images available in this answer will help you visualize what's happening in the case you exemplified.

According to me, the first condition for Recursive Decent Parser is that given grammar should not be Left Recursive and Non-deterministic. but here in question given grammar is non-deterministic so we need to convert that using left factoring and we'll get S->aS', S'->Sa|a and this will work I guess.

How to solve this Grammar through SLR?

I want to solve this Grammar.
S->SS+
S->SS*
S->a
I want to construct SLR sets of items and parsing table with action and goto.
Can this grammar parse without eliminate left recursion.
Is this Grammar SLR.

No, this grammar is not SLR. It is ambiguous.
Left recursion is not a problem for LR parsers. Left recursion elimination is only necessary for LL parsers.

I am not entirely sure about this, but I think this grammar is actually SLR(1). I constructed by hand the SLR(1) table and I obtained one with no conflicts (having added a 0-transition from S' (new start symbol) -> S).
Can somebody provide a sentence that can be derived in two different ways from this grammar? I was able to get a parser for it in Bison without any warning. Are you sure it is ambiguous?

Issue with left recursion in top down parsing

I have read this to understand more the difference between top down and bottom up parsing, can anyone explain the problems associated with left recursion in a top down parser?

In a top-down parser, the parser begins with the start symbol and tries to guess which productions to apply to reach the input string. To do so, top-down parsers need to use contextual clues from the input string to guide its guesswork.
Most top-down parsers are directional parsers, which scan the input in some direction (typically, left to right) when trying to determine which productions to guess. The LL(k) family of parsers is one example of this - these parsers use information about the next k symbols of input to determine which productions to use.
Typically, the parser uses the next few tokens of input to guess productions by looking at which productions can ultimately lead to strings that start with the upcoming tokens. For example, if you had the production
A → bC
you wouldn't choose to use this production unless the next character to match was b. Otherwise, you'd be guaranteed there was a mismatch. Similarly, if the next input character was b, you might want to choose this production.
So where does left recursion come in? Well, suppose that you have these two productions:
A → Ab | b
This grammar generates all strings of one or more copies of the character b. If you see a b in the input as your next character, which production should you pick? If you choose Ab, then you're assuming there are multiple b's ahead of you even though you're not sure this is the case. If you choose b, you're assuming there's only one b ahead of you, which might be wrong. In other words, if you have to pick one of the two productions, you can't always choose correctly.
The issue with left recursion is that if you have a nonterminal that's left-recursive and find a string that might match it, you can't necessarily know whether to use the recursion to generate a longer string or avoid the recursion and generate a shorter string. Most top-down parsers will either fail to work for this reason (they'll report that there's some uncertainty about how to proceed and refuse to parse), or they'll potentially use extra memory to track each possible branch, running out of space.
In short, top-down parsers usually try to guess what to do from limited information about the string. Because of this, they get confused by left recursion because they can't always accurately predict which productions to use.
Hope this helps!

Reasons
1)The grammar which are left recursive(Direct/Indirect) can't be converted into {Greibach normal form (GNF)}* So the Left recursion can be eliminated to Right Recuraive Format.
2)Left Recursive Grammars are also nit LL(1),So again elimination of left Recursion may result into LL(1) grammer.
GNF
A Grammer of the form A->aV is Greibach Normal Form.

Can an LL(1) parse table be valid if there is a column with no entries in its cells?

I'm doing exam questions for revision for an exam. One of the questions is to construct an LL(1) parse table from the first and follow sets calculated in the previous question.
Now I am nearly positive that I have constructed the first and follow sets correctly and the table does not have any duplicate entries in any of it's cells, so I assumed that the grammar was a valid LL(1) grammar (we are asked to determine if it is valid hence why I needed to construct the table).
However the next question is to convert the grammar into a valid LL(1) grammar, obviously implying that it is not LL(1)
So my question is actually 2 questions.
Is the grammar not an LL(1) grammar due to the fact that there is a column without any entries?
OR
If this is allowable in an LL(1) parse table, is it most likely that I went wrong creating the first and follow sets?
Here is my working out of the question and the grammar which is in the box
http://imgur.com/UwmOAvX

It is perfectly ok for a column to have no symbols -- that just means that the terminal in question not in the FIRST set of any non-terminal, which can easily happen for symbols that don't appear in lead context anywhere (for example, a ) will often be such a symbol.)
In your case, the problem appears to be that you forgot to put the rule B -> B v into the table. You also have an error in FIRST(D) and FOLLOW(B) -- the latter comes from the former.

How to fix grammar with optional non-terminal?

I wrote grammar for LALR parser and I am stuck at optional non-terminal. Consider for example C++ dereference, when you can write:
******expression;
Of course you can write:
expression;
And here is my problem, dereference non terminal is optional really and this has such impact on grammar, that now parser sees it fits everywhere (almost), because, well, it might be empty.
Is there a common pattern how should I rewrite the grammar to fix it?
I would also be grateful for pointing out some book or other resources which deals with "common problems & patterns when writing grammars".

First of all, the problem you are having is not the one you are claiming to have. Having a nullable (possibly empty) nonterminal does not mean that the parser will try to stick it everywhere. (I use the term “nullable” here to avoid confusion, because “optional” might refer to an optional occurrence of a nonterminal, as in x? where x is the nonterminal name). It just means that whenever you use that nonterminal in your grammar, the parser might skip over it or match with an empty word (details are according to the rules of the particular parsing algorithm, in your case LALR).
Secondly, the problem most probably is that the resulting grammar is ambiguous. My guess is that you used some kind of combination of right recursion for defining the nonterminal with the stars, and having an asterisk as a binary multiplication operator. (Feel free to update the question with a grammar fragment, then I might be able to offer more detailed help).
Thirdly, and mainly concerning your quest for general problems and patterns in grammars: usually people would not put the stars in one nonterminal and the expression in another, because ultimately you would want to transform your parse tree into an abstract syntax tree on which you probably intend to perform some calculations, in that case you would prefer to have a construction that says “dereference of a dereference of a dereference of an expression” rather than “three stars followed by an expression”. Again, the answer would have been less vague if you provided more details.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart