Needed example of LR(1) grammar that is not LR(0)? - parsing

Anyone would give me an processed example of LR(1) grammar which is not LR(0) Grammar? I was just trying to find out why LR(1) parser is more efficient and powerful, and tried an example of grammar and found it non LR(0) ,there was conflict in parsing table,then tried LR(1) also no use...
A very simple example of grammar ,(augmented)
S->A
A->aBed | aEef
B->m
E->m
Needed details analysis.
Anyone would explain with examples? Getting confused here.

For example:
S -> Aa | Bb
A->c
B->c
In order to decide if a c is an A or a B, you need to know the following symbol.
In real life, you most commonly need LR(1) for epsilon productions:
OPTIONAL_A -> ε | A
MULTI_A -> ε | MULTI_A A
... where ε matches only the empty string. In order to reduce an epsilon production, you always need to see past it.

Related

Is this grammar LR(2) and how can i determine it?

to determine if my parser is working correctly i need to find a lr(2+) grammar. After a quick research i have found this grammar and i believe that it is lr(2). However, i am not sure how to determine this.
Terminals: b, e, o, r, s
NonTerminals: A, B, E, Q, SL
Start: P
Productions:
P -> A
A -> E B SL E | b e
B -> b | o r
E -> e | Ɛ
SL -> s SL | s
I would be glad, if someone is able to confirm or deny that this grammar is lr(2) and at best give me a brief explanation on how to determine it by myself.
Thank you very much!
I'm pretty sure it's LR(2), but I don't have an LR(2) parser generator handy to test it, which would be the definitive way to do the test. Of course, you could generate the parser tables by hand. It's not that complicated a grammar, so it shouldn't take you too long.
It's certainly not LR(1), as can be seen from the pair of inputs:
b e
b s e
The left-most derivations are:
P->A->b e
P->E B SL E->B SL E->b SL E->b s E->b s e
So at the beginning of the parse, the parser can either shift a b in order to follow the first derivation chain or reduce an empty sequence to E in order to proceed with the second derivation chain. The second token is needed to choose between these two options, hence a lookahead of at least 2 is required.
As a side note, it should be pretty simple to mine StackOverflow for LR(2) grammars; they come up from time to time in questions. Here's a few I found by searching for LALR(2): (I used a Google search with site:stackoverflow.com because SO's own search engine doesn't do well with search patterns which aren't words. Not that Google does it well, but it does do it better.)
Solving bison conflict over 2nd lookahead
Solving small shift reduce conflict
Persistent Shift - Reduce Conflict in Goldparser
How to reduce parser stack or 'unshift' the current token depending on what follows?
I didn't verify the claims in those questions and answers, and there are other questions which didn't seem to have as clear a result.
The most classic LALR(2) grammar is the grammar for Yacc itself, which is pretty ironic. Here's a simplified version:
grammar: %empty | grammar production
production: ID ':' symbols
symbols: %empty | symbols symbol
symbol: ID | QUOTED_LITERAL
That simple grammar leaves out actions and the optional semicolon. But it captures the essence of the LALR(2)-ness of the grammar, which is precisely the result of the semicolon being optional. That's not a complaint; the grammar is unambiguous so the semicolon really is redundant and no-one should be forced to type a redundant token :-)

LALR(1) and SLR(1) parser

I get a hypothesis from our teacher and he want from us to search and validate it. We have SLR(1) and LALR(1) parser. The hypothesis is:
Suppose we have a language structure called X. If We couldn't provide a LALR(1) grammar for this structure, we couldn't provide a SLR(1) too and maybe a LR(1) grammar could solve problem. but If we could provide a LALR(1) grammar for this structure, we could provide a SLR(1) too.
If you search in internet, you find a lot of sites which say this grammar is not SLR(1) but it is LALR(1):
S -> R
S -> L = R
L -> * R
L -> id
R -> L
("id", "*" and "=" are terminals and others are non-terminals)
If we try to find SLR(1) items, we will see shift/reduce conflict. it is true, but my hypothesis say something else. In our hypothesis, we talk about language described by grammar not grammar itself! We can remove "R" and convert grammar to LL(1) and It is also SLR(1) and LALR(1):
S -> LM
M -> epsilon
M -> = L
L -> * L
L -> id
You can try this grammar and you can see that this grammar describe same language as last grammar and has SLR(1) and LALR(1) grammar!
so my problem is not finding a grammar which is LALR(1) but not SLR(1). There are a lot of them in internet. I want to know is there any language which has LALR(1) grammar but not SLR(1) grammar? and if our hypothesis is true, then there is no need to LALR(1) and SLR(1) could do everything for us, however LALR(1) is easier to use and maybbe in future, a language reject this hypothesis.
I'm sorry for bad English.
Thanks.
Every LR(k) language has an SLR(1) grammar.
There is a proof in this paper from 1976, which provides an algorithm for constructing the SLR(1) grammar, if you have an LR(k) grammar and know the value of k. Unfortunately, there is no algorithm which can tell you definitely whether a CFG is LR(k), much less provide the value of k. (If you somehow know that the grammar is LR(k), you can try successive values of k until you find one which works. But that procedure will never terminate if the grammar is not LR(k).)
The above comes from this reference question on the Computing Science StackExchange site, which is a better place for this kind of question.
LR(1) > LALR(1) > SLR(1)
LR(1) is the most powerful, LALR(1) in less powerful and SLR(1) is least powerful.
This is fact, because of the way the lookahead sets are computed. (1) means lookahead of one token. Here is a grammar that is LR(1), but not LALR(1) and definitely not SLR(1):
G : S... <eof>
;
S : c A1 t ';'
| c A2 n ';'
| r A2 t ';'
| r A1 n ';'
;
A1 : a
;
A2 : a
;
This grammar cannot be made LALR(1) or SLR(1). Or sure you can remove A1 and A2 and
replace them with a, but then you have a different grammar. The problem is that an
action may be attached to the rule A1 : a and a different action my be attached to A2 : a. For example:
A1 : a => X()
;
A2 : a => Y()
;
An SLR(1) parser generator will report conflicts in your grammar that are not real conflicts. I'm talking about the real world using large grammars (e.g. C11.grm).
The SLR(1) lookahead computation is simplistic, getting the lookaheads from the grammar, instead of the LR(0) state machine created by an LALR(1) parser generator.
That is why Frank DeRemer's paper, in 1969, on LALR(1) is so important.
By looking at the grammar, A1 can be followed by either t or n, so this is a conflict
reported by SLR(1), but there is an LR(1) state machine in which there is no conflict on which follows A1.

fixing a grammar to LR(0)

Question:
Given the following grammar, fix it to an LR(O) grammar:
S -> S' $
S'-> aS'b | T
T -> cT | c
Thoughts
I've been trying this for quite sometime, using automatic tools for checking my fixed grammars, with no success. Our professor likes asking this kind of questions on test without giving us a methodology for approaching this (except for repeated trying). Is there any method that can be applied to answer these kind of questions? Can anyone show this method can be applied on this example?
I don't know of an automatic procedure, but the basic idea is to defer decisions. That is, if at a particular state in the parse, both shift and reduce actions are possible, find a way to defer the reduction.
In the LR(0) parser, you can make a decision based on the token you just shifted, but not on the token you (might be) about to shift. So you need to move decisions to the end of productions, in a manner of speaking.
For example, your language consists of all sentences { ancmbn$ | n ≥ 0, m > 0}. If we restrict that to n > 0, then an LR(0) grammar can be constructed by deferring the reduction decision to the point following a b:
S -> S' $.
S' -> U | a S' b.
U -> a c T.
T -> b | c T.
That grammar is LR(0). In the original grammar, at the itemset including T -> c . and T -> c . T, both shift and reduce are possible: shift c and reduce before b. By moving the b into the production for T, we defer the decision until after the shift: after shifting b, a reduction is required; after c, the reduction is impossible.
But that forces every sentence to have at least one b. It omits sentences for which n = 0 (that is, the regular language c*$). That subset has an LR(0) grammar:
S -> S' $.
S' -> c | S' c.
We can construct the union of these two languages in a straight-forward manner, renaming one of the S's:
S -> S1' $ | S2' $.
S1' -> U | a S1' b.
U -> a c T.
T -> b | c T.
S2' -> c | S2' c.
This grammar is LR(0), but the form in which the end-of-input sentinel $ has been included seems to be cheating. At least, it violates the rule for augmented grammars, because an augmented grammar's base rule is always S -> S' $ where S' and $ are symbols not used in the original grammar.
It might seem that we could avoid that technicality by right-factoring:
S -> S' $
S' -> S1' | S2'
Unfortunately, while that grammar is still deterministic, and does recognise exactly the original language, it is not LR(0).
(Many thanks to #templatetypedef for checking the original answer, and identifying a flaw, and also to #Dennis, who observed that c* was omitted.)

Automatic grammar transformation for left-factoring; and left-recursion removal

Standard methods are readily available to transform a context-free grammar which is not LL(1) into an equivalent grammar which is. Are there any tools available which can automate this process?
In the examples below I use upper-case lettering for non-terminals, and lower-case for terminals.
The following left-recursive non-terminal:
A -> A a | b
can be transformed into a right-recursive form:
A -> b A'
A' -> NIL | a A'
Note though that left-recursive production rules ensure that expressions associate to the left, and similarly for right recursive productions; and so a grammar modification will also change expression associativity.
Another issue is indirect left-recursion, such as the following:
A -> B a
B -> A b
Left-factoring is also used to ensure that only one look-ahead token is required by the parser. The following production must look ahead by two tokens:
A -> a b | a c
This can also be refactored; to:
A -> a (b | c)
Are there any software tools which can automate these grammar transformations; and so produce an equivalent grammar suitable for a LL(1) parser?
The Haskell grammar-combinators library here allows a grammar to be transformed into a non-left-recursive form. The input grammar must though be a parsing expression grammar.

Grammar for arithmetic expressions without alternatives?

How to write Unambiguous Grammar for arithmetic expressions e.g. a+(b+c)*d
E.g.
E -> E + T | T
T -> T * F | F
F -> ( E ) | i
WITHOUT alternatives - in my case without |T and |F and |i
This should be possible by adding more sentences to the grammar but I'm having hard time to figure out how...
NOTE: this is for University... so may be not a good real world Grammar :)
What you're asking for is impossible. If you do not have alternative productions in your grammar, then it is not possible for there to be any decisions about which productions to use. As a result, your grammar will either generate no strings, or will generate a single string. Grammars with these properties are called LL(0) grammars and are not at all practical.
Hope this helps!

Resources