Grammar for arithmetic expressions without alternatives? - parsing

How to write Unambiguous Grammar for arithmetic expressions e.g. a+(b+c)*d
E.g.
E -> E + T | T
T -> T * F | F
F -> ( E ) | i
WITHOUT alternatives - in my case without |T and |F and |i
This should be possible by adding more sentences to the grammar but I'm having hard time to figure out how...
NOTE: this is for University... so may be not a good real world Grammar :)

What you're asking for is impossible. If you do not have alternative productions in your grammar, then it is not possible for there to be any decisions about which productions to use. As a result, your grammar will either generate no strings, or will generate a single string. Grammars with these properties are called LL(0) grammars and are not at all practical.
Hope this helps!

Related

Is this grammar LR(2) and how can i determine it?

to determine if my parser is working correctly i need to find a lr(2+) grammar. After a quick research i have found this grammar and i believe that it is lr(2). However, i am not sure how to determine this.
Terminals: b, e, o, r, s
NonTerminals: A, B, E, Q, SL
Start: P
Productions:
P -> A
A -> E B SL E | b e
B -> b | o r
E -> e | Ɛ
SL -> s SL | s
I would be glad, if someone is able to confirm or deny that this grammar is lr(2) and at best give me a brief explanation on how to determine it by myself.
Thank you very much!
I'm pretty sure it's LR(2), but I don't have an LR(2) parser generator handy to test it, which would be the definitive way to do the test. Of course, you could generate the parser tables by hand. It's not that complicated a grammar, so it shouldn't take you too long.
It's certainly not LR(1), as can be seen from the pair of inputs:
b e
b s e
The left-most derivations are:
P->A->b e
P->E B SL E->B SL E->b SL E->b s E->b s e
So at the beginning of the parse, the parser can either shift a b in order to follow the first derivation chain or reduce an empty sequence to E in order to proceed with the second derivation chain. The second token is needed to choose between these two options, hence a lookahead of at least 2 is required.
As a side note, it should be pretty simple to mine StackOverflow for LR(2) grammars; they come up from time to time in questions. Here's a few I found by searching for LALR(2): (I used a Google search with site:stackoverflow.com because SO's own search engine doesn't do well with search patterns which aren't words. Not that Google does it well, but it does do it better.)
Solving bison conflict over 2nd lookahead
Solving small shift reduce conflict
Persistent Shift - Reduce Conflict in Goldparser
How to reduce parser stack or 'unshift' the current token depending on what follows?
I didn't verify the claims in those questions and answers, and there are other questions which didn't seem to have as clear a result.
The most classic LALR(2) grammar is the grammar for Yacc itself, which is pretty ironic. Here's a simplified version:
grammar: %empty | grammar production
production: ID ':' symbols
symbols: %empty | symbols symbol
symbol: ID | QUOTED_LITERAL
That simple grammar leaves out actions and the optional semicolon. But it captures the essence of the LALR(2)-ness of the grammar, which is precisely the result of the semicolon being optional. That's not a complaint; the grammar is unambiguous so the semicolon really is redundant and no-one should be forced to type a redundant token :-)

Needed example of LR(1) grammar that is not LR(0)?

Anyone would give me an processed example of LR(1) grammar which is not LR(0) Grammar? I was just trying to find out why LR(1) parser is more efficient and powerful, and tried an example of grammar and found it non LR(0) ,there was conflict in parsing table,then tried LR(1) also no use...
A very simple example of grammar ,(augmented)
S->A
A->aBed | aEef
B->m
E->m
Needed details analysis.
Anyone would explain with examples? Getting confused here.
For example:
S -> Aa | Bb
A->c
B->c
In order to decide if a c is an A or a B, you need to know the following symbol.
In real life, you most commonly need LR(1) for epsilon productions:
OPTIONAL_A -> ε | A
MULTI_A -> ε | MULTI_A A
... where ε matches only the empty string. In order to reduce an epsilon production, you always need to see past it.

Automatic grammar transformation for left-factoring; and left-recursion removal

Standard methods are readily available to transform a context-free grammar which is not LL(1) into an equivalent grammar which is. Are there any tools available which can automate this process?
In the examples below I use upper-case lettering for non-terminals, and lower-case for terminals.
The following left-recursive non-terminal:
A -> A a | b
can be transformed into a right-recursive form:
A -> b A'
A' -> NIL | a A'
Note though that left-recursive production rules ensure that expressions associate to the left, and similarly for right recursive productions; and so a grammar modification will also change expression associativity.
Another issue is indirect left-recursion, such as the following:
A -> B a
B -> A b
Left-factoring is also used to ensure that only one look-ahead token is required by the parser. The following production must look ahead by two tokens:
A -> a b | a c
This can also be refactored; to:
A -> a (b | c)
Are there any software tools which can automate these grammar transformations; and so produce an equivalent grammar suitable for a LL(1) parser?
The Haskell grammar-combinators library here allows a grammar to be transformed into a non-left-recursive form. The input grammar must though be a parsing expression grammar.

LR(0) or SLR(1) or LALR(1)

I am badly stuck on a question i am attempting from a sample final exam of compilers. I will really appreciate if someone can help me out with an explanation. Thanks
Consider the grammar G listed below
S = E $
E = E + T | T
T = T * F | F
F = ident | ( E )
Where + * ident ( ) are terminal symbols and $ is end of file.
a) is this grammar LR( 0 )? Justify your answer.
b) is the grammar SLR( 1 ) ? Justify your answer.
c) is this grammar LALR( 1 )? Justify your answer.
If you can show that the grammar is LR(0) then of course it is SLR(1) and LALR(1) because LR(0) is more restrictive.
Unfortunately, the grammar isn't LR(0).
For instance suppose you have just recognized E:
S -> E . $
You must not reduce this E to S if what follows is a + or * symbol, because E can be followed by + or * which continue to build a larger expression:
S -> E . $
E -> E . + T
T -> T . * F
This requires us to look ahead one token to know what to do in that state: to shift (+ or *) or reduce ($).
SLR(1) adds lookahead, and makes use of the follow-set information to make reductions (better than nothing, but the follow-set information globally obtained from the grammar is not context sensitive, like the state-specific lookahead sets in LALR(1)).
Under SLR(1), the above conflict goes away, because the S -> E reduction is considered only when the lookahead symbol is in the follow set of S, and the only thing in the follow set of S is the EOF symbol $. If the input symbol is not $, like +, then the reduction is not considered; a shift takes place which doesn't conflict with the reduction.
So the grammar does not fail to be SLR(1) on account of that conflict. It might, however, have some other conflict. Glancing through it, I can't see one; but to "justify that answer" properly, you have to generate all of the LR(0) state items, and go through the routine of verifying that the SLR(1) constraints are not violated. (You use the simple LR(0) items for SLR(1) because SLR(1) doesn't augment these items in any new way. Remember, it just uses the follow-set information cribbed from the grammar to eliminate conflicts.)
If it is SLR(1) then LALR(1) falls by subset relationship.
Update
The Red Dragon Book (Compilers: Principles, Techniques and Tools, Aho, Sethi, Ullman, 1988) uses exactly the same grammar in a set of examples that show the derivation of the canonical LR(0) item sets and the associated DFA, and some of the steps of filling in the parsing tables. This is in section 4.7, starting with example 4.34.

Example of an LR grammar that cannot be represented by LL?

All LL grammars are LR grammars, but not the other way around, but I still struggle to deal with the distinction. I'm curious about small examples, if any exist, of LR grammars which do not have an equivalent LL representation.
Well, as far as grammars are concerned, its easy -- any simple left-recursive grammar is LR (probably LR(1)) and not LL. So a list grammar like:
list ::= list ',' element | element
is LR(1) (assuming the production for element is) but not LL. Such grammars can be fairly easily converted into LL grammars by left-factoring and such, so this is not too interesting however.
Of more interest is LANGUAGES that are LR but not LL -- that is a language for which there exists an LR(1) grammar but no LL(k) grammar for any k. An example is things that need optional trailing matches. For example, the language of any number of a symbols followed by the same number or fewer b symbols, but not more bs -- { a^i b^j | i >= j }. There's a trivial LR(1) grammar:
S ::= a S | P
P ::= a P b | \epsilon
but no LL(k) grammar. The reason is that an LL grammar needs to decide whether to match an a+b pair or an odd a when looking at an a, while the LR grammar can defer that decision until after it sees the b or the end of the input.
This post on cs.stackechange.com has lots of references about this

Resources