How can I tell Bison I also expect reduce-reduce conflicts? - parsing

My C#-ish toy grammar now has its first reduce-reduce conflicts! I'm so proud of me.
It seems all right to me, however (I switched off to a GLR parser for the occasion). The problem is, while I know the %expect directive can shut up Bison about shift/reduce conflicts, I can't find the equivalent for reduce/reduce conflicts. So what should I use to make it silent about my 3 shift/reduces and my 2 reduce/reduces?

From the GNU Bison documentation, found here
For normal LALR(1) parsers,
reduce/reduce conflicts are more
serious, and should be eliminated
entirely. Bison will always report
reduce/reduce conflicts for these
parsers. With GLR parsers, however,
both kinds of conflicts are routine;
otherwise, there would be no need to
use GLR parsing. Therefore, it is also
possible to specify an expected number
of reduce/reduce conflicts in GLR
parsers, using the declaration:
%expect-rr n

Related

Epsilon(ε) productions and LR(0) grammars and LL(1) grammars

At many places (for example in this answer here), I have seen it is written that an LR(0) grammar cannot contain ε productions.
Also in Wikipedia I have seen statements like: An ε free LL(1) grammar is also SLR(1).
Now the problem which I am facing is that I cannot reason out the logic behind these statements.
Well, I know that LR(0) grammars accept the languages accepted by a DPDA by empty stack, i.e. the language they accept must have prefix property. [This prefix property can, however, be dealt with if we assume end markers and as such given any language the prefix property shall always be satisfied. Many texts like Theory of Computation by Sipser assume this end marker to simply their argument]. That being said, we can say (informally?) that a grammar is LR(0) if there is no state in the canonical collection of LR(0) items that have a shift-reduce conflict or reduce-reduce conflict.
With this background, I tried to consider the following grammar:
S -> Aa
A -> ε
canonical collection of LR(0) items
In the above DFA, I find that there is no state which has a shift-reduce conflict or reduce-reduce conflict.
So this grammar should be LR(0) as per my analysis. But it also has ε production.
Isn't this example contradicting the statement:
"no grammar with ε productions can be LR(0)"
I guess if I know the logic behind the above quoted statement then I can understand the concept better.
Actually my main problem arose with the statement :
An ε free LL(1) grammar is also SLR(1).
When I asked one of my friends, he gave the argument that as the LL(1) grammar is ε free hence it is LR(0) and hence it is SLR(1).
But I could not understand his logic either. When I asked him about reasoning, he started sharing post regarding "grammar with ε productions can never be LR(0)"...
But personally I could not think of any logic as to how "ε free LL(1) grammar is SLR(1)". Is it really related to the above property of "grammar with ε productions cannot be LR(0)"? If so, please do help me out.. If not, then should I consider asking a separate question for the second confusion?
I have got my concepts of compiler design from the dragon book by Ullman only. Also the knowledge of TOC from Ullman and from few other texts like Sipser, Linz.
A notable feature of your grammar is that A could just be eliminated. It serves absolutely no purpose. (By "eliminated", I mean simply removing all references to it; leaving productions otherwise intact.)
It is true that it's existence doesn't preclude the grammar from being LR(0). Similarly, a grammar with an unreachable non-terminal and an ε-production for that non-terminal could also be LR(0).
So it would be more accurate to say that a grammar cannot be LR(0) if it has a productive non-terminal with both an ε-production and some other productive production. But since we usually only consider reduced grammars without pointless non-terminals, I'm not sure that this additional pedantry serves much purpose.
As for your question about ε-free LL(1) grammars, here's a rough outline:
If an ε-free grammar is not LR(0), then there is some state with both a shift and a reduce action. Since the grammar is ε-free, that state was reached by way of a shift or a goto. The previous state must then have had two different productions with the same FIRST set, contradicting the LL(1) condition.

Parsing in compiler design

As far as I know, Left recursion is not a problem for LR parser.And I know that an ambiguous grammar can't be parsed by any kind of parser.So if I have an ambiguous grammar as follows,how can I remove ambiguity so that I can check if this grammar is SLR(1) or not?
E->E+E|E-E|(E)|id
And one more question,is left factoring needed for a grammar to check if the grammar is LL(1) or SLR(1)?
Any help will be appreciated.
Any parser generator you are likely to encounter will be able to handle the ambiguities in your grammar simply.
Your grammar produces shift/reduce conflicts. These are not necessarily a problem (as are reduce/reduce conflicts). The default action on a shift/reduce conflict in every parser generator is to shift, which solves your problem. There are usually mechanisms (as in YACC or Bison) to ignore this as a warning.
You can remove the conflicts in your grammar by setting up multiple levels of expressions so that you force the precedence of the operators.

Testing grammar for ambiguities

I'm writing a grammar for a formal language. Ideally I'd want that grammar to be unambiguous, but that might not be possible. In either case, I want to know about all possible ambiguities while developing the grammar. How can I do that?
So far, most of the time when I developed a language, I'd turn to Bison, write a LR(1) grammar for it, run Bison in verbose mode and have a look at all the shift-reduce and reduce-reduce conflicts it tells me about. Make sure that I agree with its choice in each case.
But now I'm in a project where Bison doesn't have a code generator for one of the required target languages, and where ANTLR is already being used. Plus the language isn't LR(1), and rewriting it as LR(1) would entail additional syntax checks after the parser is done, thus reducing the expressiveness of the grammar as a tool to describe the language.
So I'm now working with ANTLR, fed it my grammar, and all seems to be working well. But ANTLR doesn't seem to check for ambiguities at compile time. For example, the following grammar is ambiguous:
grammar test;
lst: '(' ')' {System.out.println("a");}
| '(' elts ')' {System.out.println("b");} ;
elts: elt (',' elt)* ;
elt: 'x' | /* empty */ ;
The input () could be interpreted as the empty list, or it could be interpreted as a list consisting of a single empty element. The generated parser chooses the former interpretation, but I'd like to be able to manually verify that choice.
Is there command line switch I can use to get ANTLR to tell me about ambiguities?
Or perhaps an option I can set in the grammar file?
Or should I use some other tool to check the grammar for ambiguities?
If so, is there one which can already read ANTLR syntax, or do I have to strip out all the actions and boil this down to BNF myself?
The ANTLRErrorListener.reportAmbiguity method suggests that ANTLR might be able to perform some ambiguity testing at runtime. But I guess that's only going to tell you whether the parsing of a given input is ambiguous. Is there some strategy how I could leverage this to detect all ambiguities, using a carefully selected set of inputs?
Well, as far as I know, ANTLR has no real option to check for ambiguity, other than the errors it produced IF you write an ambiguous grammar and feed an input that triggers the ambiguity. I do, however know a few tools which can check for ambiguity. They all have different syntax, and I don't know any tool which uses ANTLR grammar.
A software called AtoCC has a tool called KfG which can check for ambiguity.
ACLA (Ambiguity Checking with Language Approximations).
Context Free Grammar Tool.
Personally, I find tool 3 easiest to use, but is the most limited as well. It is important, however to note that none of the tools can be 100% sure; if the tools says you're grammar is ambiguous, it is ambiguous, however if they say you're grammar is unambiguous, they might still be ambiguous, as they have no way of testing an infinite number of ways, that your language can be written.
Hope this helps.

LR(1) grammar: how to tell? examples for/against?

I'm currently having a look at GNU Bison to parse program code (or actually to extend a program that uses Bison for doing that). I understand that Bison can only (or: best) handle LR(1) grammars, i.e. a special form of context-free grammars; and I actually also (believe to) understand the rules of context-free and LR(1) grammars.
However, somehow I'm lacking a good understanding of the notion of a LR(1) grammar. Assume SQL, for instance. SQL incorporates - I believe - a context-free grammar. But is it also a LR(1) grammar? How could I tell? And if yes, what would violate the LR(1) rules?
LR(1) means that you can choose proper rule to reduce by knowing all tokens that will be reduced plus one token after them. There are no problems with AND in boolean queries and in BETWEEN operation. The following grammar, for example is LL(1), and thus is LR(1) too:
expr ::= and_expr | between_expr | variable
and_expr ::= expr "and" expr
between_expr ::= "between" expr "and" expr
variable ::= x
I believe that the whole SQL grammar is even simpler than LR(1). Probably LR(0) or even LL(n).
Some of my customers created SQL and DB2 parsers using my LALR(1) parser generator and used them successfully for many years. The grammars they sent me are LALR(1) (except for the shift-reduce conflicts which are resolved the way you would want). For the purists -- not LALR(1), but work fine in practice, no GLR or LR(1) needed. You don't even need the more powerful LR(1), AFAIK.
I think the best way to figure this out is to find an SQL grammar and a good LALR/LR(1) parser generator and see if you get a conflict report. As I remember an SQL grammar (a little out of date) that is LALR(1), is available in this download: http://lrstar.tech/downloads.html
LRSTAR is an LR(1) parser generator that will give you a conflict report. It's also LR(*) if you cannot resolve the conflicts.

LL(1) table-driven (non-recursive) generator

Please, need help. I'm searching for a LL(1) table-driven (non-recursive) generator. Can't find anything on Internet. All I found is a bunch of LR or recursive parsing generators :( Thanks in advance.
I did some searching since LL(1) table-driven compilers with ANTLR or ANTLR3 and found several pages in one of my old compiler books. "The Theory and Practice of Compiler Writing" by Tremblay and Sorenson. 1985
It predates the dragon books.
Section 6-2 is 38 pages.
6-2 Top-Down Parsing with No Backup
6-2.1 Notions of Parsing with No Backup
6-2.2 Simple LL(1) Grammars
6-2.3 LL(1) Grammars without e-Rules
6-2.4 LL(1) Grammars with e-Rules
6-2.5 Error Handling for LL(1) Parsers
EDIT
Found this: LL(1) Parser Applet
EDIT
You might be able to find a copy of "The Theory and Practice of Compiler Writing" in a near by library using WorldCat

Resources