I am badly stuck on a question i am attempting from a sample final exam of compilers. I will really appreciate if someone can help me out with an explanation. Thanks
Consider the grammar G listed below
S = E $
E = E + T | T
T = T * F | F
F = ident | ( E )
Where + * ident ( ) are terminal symbols and $ is end of file.
a) is this grammar LR( 0 )? Justify your answer.
b) is the grammar SLR( 1 ) ? Justify your answer.
c) is this grammar LALR( 1 )? Justify your answer.
If you can show that the grammar is LR(0) then of course it is SLR(1) and LALR(1) because LR(0) is more restrictive.
Unfortunately, the grammar isn't LR(0).
For instance suppose you have just recognized E:
S -> E . $
You must not reduce this E to S if what follows is a + or * symbol, because E can be followed by + or * which continue to build a larger expression:
S -> E . $
E -> E . + T
T -> T . * F
This requires us to look ahead one token to know what to do in that state: to shift (+ or *) or reduce ($).
SLR(1) adds lookahead, and makes use of the follow-set information to make reductions (better than nothing, but the follow-set information globally obtained from the grammar is not context sensitive, like the state-specific lookahead sets in LALR(1)).
Under SLR(1), the above conflict goes away, because the S -> E reduction is considered only when the lookahead symbol is in the follow set of S, and the only thing in the follow set of S is the EOF symbol $. If the input symbol is not $, like +, then the reduction is not considered; a shift takes place which doesn't conflict with the reduction.
So the grammar does not fail to be SLR(1) on account of that conflict. It might, however, have some other conflict. Glancing through it, I can't see one; but to "justify that answer" properly, you have to generate all of the LR(0) state items, and go through the routine of verifying that the SLR(1) constraints are not violated. (You use the simple LR(0) items for SLR(1) because SLR(1) doesn't augment these items in any new way. Remember, it just uses the follow-set information cribbed from the grammar to eliminate conflicts.)
If it is SLR(1) then LALR(1) falls by subset relationship.
Update
The Red Dragon Book (Compilers: Principles, Techniques and Tools, Aho, Sethi, Ullman, 1988) uses exactly the same grammar in a set of examples that show the derivation of the canonical LR(0) item sets and the associated DFA, and some of the steps of filling in the parsing tables. This is in section 4.7, starting with example 4.34.
Related
to determine if my parser is working correctly i need to find a lr(2+) grammar. After a quick research i have found this grammar and i believe that it is lr(2). However, i am not sure how to determine this.
Terminals: b, e, o, r, s
NonTerminals: A, B, E, Q, SL
Start: P
Productions:
P -> A
A -> E B SL E | b e
B -> b | o r
E -> e | Ɛ
SL -> s SL | s
I would be glad, if someone is able to confirm or deny that this grammar is lr(2) and at best give me a brief explanation on how to determine it by myself.
Thank you very much!
I'm pretty sure it's LR(2), but I don't have an LR(2) parser generator handy to test it, which would be the definitive way to do the test. Of course, you could generate the parser tables by hand. It's not that complicated a grammar, so it shouldn't take you too long.
It's certainly not LR(1), as can be seen from the pair of inputs:
b e
b s e
The left-most derivations are:
P->A->b e
P->E B SL E->B SL E->b SL E->b s E->b s e
So at the beginning of the parse, the parser can either shift a b in order to follow the first derivation chain or reduce an empty sequence to E in order to proceed with the second derivation chain. The second token is needed to choose between these two options, hence a lookahead of at least 2 is required.
As a side note, it should be pretty simple to mine StackOverflow for LR(2) grammars; they come up from time to time in questions. Here's a few I found by searching for LALR(2): (I used a Google search with site:stackoverflow.com because SO's own search engine doesn't do well with search patterns which aren't words. Not that Google does it well, but it does do it better.)
Solving bison conflict over 2nd lookahead
Solving small shift reduce conflict
Persistent Shift - Reduce Conflict in Goldparser
How to reduce parser stack or 'unshift' the current token depending on what follows?
I didn't verify the claims in those questions and answers, and there are other questions which didn't seem to have as clear a result.
The most classic LALR(2) grammar is the grammar for Yacc itself, which is pretty ironic. Here's a simplified version:
grammar: %empty | grammar production
production: ID ':' symbols
symbols: %empty | symbols symbol
symbol: ID | QUOTED_LITERAL
That simple grammar leaves out actions and the optional semicolon. But it captures the essence of the LALR(2)-ness of the grammar, which is precisely the result of the semicolon being optional. That's not a complaint; the grammar is unambiguous so the semicolon really is redundant and no-one should be forced to type a redundant token :-)
I get a hypothesis from our teacher and he want from us to search and validate it. We have SLR(1) and LALR(1) parser. The hypothesis is:
Suppose we have a language structure called X. If We couldn't provide a LALR(1) grammar for this structure, we couldn't provide a SLR(1) too and maybe a LR(1) grammar could solve problem. but If we could provide a LALR(1) grammar for this structure, we could provide a SLR(1) too.
If you search in internet, you find a lot of sites which say this grammar is not SLR(1) but it is LALR(1):
S -> R
S -> L = R
L -> * R
L -> id
R -> L
("id", "*" and "=" are terminals and others are non-terminals)
If we try to find SLR(1) items, we will see shift/reduce conflict. it is true, but my hypothesis say something else. In our hypothesis, we talk about language described by grammar not grammar itself! We can remove "R" and convert grammar to LL(1) and It is also SLR(1) and LALR(1):
S -> LM
M -> epsilon
M -> = L
L -> * L
L -> id
You can try this grammar and you can see that this grammar describe same language as last grammar and has SLR(1) and LALR(1) grammar!
so my problem is not finding a grammar which is LALR(1) but not SLR(1). There are a lot of them in internet. I want to know is there any language which has LALR(1) grammar but not SLR(1) grammar? and if our hypothesis is true, then there is no need to LALR(1) and SLR(1) could do everything for us, however LALR(1) is easier to use and maybbe in future, a language reject this hypothesis.
I'm sorry for bad English.
Thanks.
Every LR(k) language has an SLR(1) grammar.
There is a proof in this paper from 1976, which provides an algorithm for constructing the SLR(1) grammar, if you have an LR(k) grammar and know the value of k. Unfortunately, there is no algorithm which can tell you definitely whether a CFG is LR(k), much less provide the value of k. (If you somehow know that the grammar is LR(k), you can try successive values of k until you find one which works. But that procedure will never terminate if the grammar is not LR(k).)
The above comes from this reference question on the Computing Science StackExchange site, which is a better place for this kind of question.
LR(1) > LALR(1) > SLR(1)
LR(1) is the most powerful, LALR(1) in less powerful and SLR(1) is least powerful.
This is fact, because of the way the lookahead sets are computed. (1) means lookahead of one token. Here is a grammar that is LR(1), but not LALR(1) and definitely not SLR(1):
G : S... <eof>
;
S : c A1 t ';'
| c A2 n ';'
| r A2 t ';'
| r A1 n ';'
;
A1 : a
;
A2 : a
;
This grammar cannot be made LALR(1) or SLR(1). Or sure you can remove A1 and A2 and
replace them with a, but then you have a different grammar. The problem is that an
action may be attached to the rule A1 : a and a different action my be attached to A2 : a. For example:
A1 : a => X()
;
A2 : a => Y()
;
An SLR(1) parser generator will report conflicts in your grammar that are not real conflicts. I'm talking about the real world using large grammars (e.g. C11.grm).
The SLR(1) lookahead computation is simplistic, getting the lookaheads from the grammar, instead of the LR(0) state machine created by an LALR(1) parser generator.
That is why Frank DeRemer's paper, in 1969, on LALR(1) is so important.
By looking at the grammar, A1 can be followed by either t or n, so this is a conflict
reported by SLR(1), but there is an LR(1) state machine in which there is no conflict on which follows A1.
Below is a a Bison grammar which illustrates my problem. The actual grammar that I'm using is more complicated.
%glr-parser
%%
s : e | p '=' s;
p : fp | p ',' fp;
fp : 'x';
e : te | e ';' te;
te : fe | te ',' fe;
fe : 'x';
Some examples of input would be:
x
x = x
x,x = x,x
x,x = x;x
x,x,x = x,x;x,x
x = x,x = x;x
What I'm after is for the x's on the left side of an '=' to be parsed differently than those on the right. However, the set of legal "expressions" which may appear on the right of an '='-sign is larger than those on the left (because of the ';').
Bison prints the message (input file was test.y):
test.y: conflicts: 1 reduce/reduce.
There must be some way around this problem. In C, you have a similar situation. The program below passes through gcc with no errors.
int main(void) {
int x;
int *px;
x;
*px;
*px = x = 1;
}
In this case, the 'px' and 'x' get treated differently depending on whether they appear to the left or right of an '='-sign.
You're using %glr-parser, so there's no need to "fix" the reduce/reduce conflict. Bison just tells you there is one, so that you know you grammar might be ambiguous, so you might need to add ambiguity resolution with %dprec or %merge directives. But in your case, the grammar is not ambiguous, so you don't need to do anything.
A conflict is NOT an error, its just an indication that your grammar is not LALR(1).
The reduce-reduce conflict in your grammar comes from the context:
... = ... x ,
At this point, the parser has to decide whether x is an fe or an fp, and it cannot know with one symbol lookahead. Indeed, it cannot know with any finite lookahead, you could have any number of repetitions of x , following that point without encountering a =, ; or the end of the input, any of which would reveal the answer.
This is not quite the same as the C issue, which can be resolved with single symbol lookahead. However, the C example is a classic illustration of why SLR(1) grammars are less powerful than LALR(1) grammars -- it's used for that purpose in the dragon book -- and a similarly problematic grammar is an example of the difference between LALR(1) and LR(1); it can be found in the bison manual (here):
def: param_spec return_spec ',';
param_spec: type | name_list ':' type;
return_spec: type | name ':' type;
type: "id";
name: "id";
name_list: name | name ',' name_list;
(The bison manual explains how to resolve this issue for LALR(1) grammars, although using a GLR grammar is always a possibility.)
The key to resolving such conflicts without using a GLR grammar is to avoid forcing the parser to make premature decisions.
For example, it is traditional to distinguish syntactically between lvalues and rvalues, and some languages continue to do so. C and C++ do not, however; and this turns out to be an extremely powerful feature in C++ because it allows the definition of functions which can act as lvalues.
In C, I think it's just to simplify the grammar a bit: the C grammar allows the result of any unary operator to appear on the left hand side of an assignment operator, but unary operators are actually a mix of lvalues (*v, v[expr]) and rvalues (sizeof v, f(expr)). The grammar could have distinguished between the two kinds of unary operators, but it could not resolve the actual restriction, which is that only modifiable lvalues may appears on the left side of an assignment operator.
C++ allows an arbitrary expression to appear on the left-hand side of an assignment operator (although some need to be parenthesized); consequently, the following is totally legal:
(predicate(x) ? *some_pointer : some_variable) = 42;
In your case, you could resolve the conflict syntactically by replacing te with p, since both non-terminals produce the same set of derivations. That's probably not the general solution, unless it is really the case in your full grammar that left-side expressions are a strict subset of right-side expressions. In a full grammar, you might end up with three types of expression (left-only, right-only, common), which could considerably complicated the grammar, and leaving the resolution for semantic analysis might prove to be easier (and even, as in the case of C++, surprisingly useful).
How to write Unambiguous Grammar for arithmetic expressions e.g. a+(b+c)*d
E.g.
E -> E + T | T
T -> T * F | F
F -> ( E ) | i
WITHOUT alternatives - in my case without |T and |F and |i
This should be possible by adding more sentences to the grammar but I'm having hard time to figure out how...
NOTE: this is for University... so may be not a good real world Grammar :)
What you're asking for is impossible. If you do not have alternative productions in your grammar, then it is not possible for there to be any decisions about which productions to use. As a result, your grammar will either generate no strings, or will generate a single string. Grammars with these properties are called LL(0) grammars and are not at all practical.
Hope this helps!
I am learning now about parsers on my Theory Of Compilation course.
I need to find an example for grammar which is in LL(1) but not in LALR.
I know it should be exist. please help me think of the most simple example to this problem.
Some googling brings up this example for a non-LALR(1) grammar, which is LL(1):
S ::= '(' X
| E ']'
| F ')'
X ::= E ')'
| F ']'
E ::= A
F ::= A
A ::= ε
The LALR(1) construction fails, because there is a reduce-reduce conflict between E and F. In the set of LR(0) states, there is a state made up of
E ::= A . ;
F ::= A . ;
which is needed for both S and X contexts. The LALR(1) lookahead sets for these items thus mix up tokens originating from the S and X productions. This is different for LR(1), where there are different states for these cases.
With LL(1), decisions are made by looking at FIRST sets of the alternatives, where ')' and ']' always occur in different alternatives.
From the Dragon book (Second Edition, p. 242):
The class of grammars that can be parsed using LR methods is a proper superset of the class of grammars that can be parsed with predictive or LL methods. For a grammar to be LR(k), we must be able to recognize the occurrence of the right side of a production in a right-sentential form, with k input symbols of lookahead. This requirement is far less stringent than that for LL(k) grammars where we must be able to recognize the use of a production seeing only the first k symbols of what the right side derives. Thus, it should not be surprising that LR grammars can describe more languages than LL grammars.