Simple YACC grammar with problems - parsing

I want to implement a simple YACC parser, but I have problems with my grammar:
%union {
s string
b []byte
t *ASTNode
}
%token AND OR NOT
%token<s> STRING
%token<b> BYTES
%type<t> expr
%left AND
%left OR
%right NOT
%%
expr: '(' expr ')' {{ $$ = $2 }}
| expr AND expr {{ $$ = NewASTComplexNode(OPND_AND, $1, $3) }}
| expr AND NOT expr {{ $$ = NewASTComplexNode(OPND_AND, $1, NewASTComplexNode(OPND_NOT, $4, nil)) }}
| NOT expr AND expr {{ $$ = NewASTComplexNode(OPND_AND, NewASTComplexNode(OPND_NOT, $2, nil), $4) }}
| expr OR expr {{ $$ = NewASTComplexNode(OPND_OR, $1, $3) }}
| STRING {{ $$ = NewASTSimpleNode(OPRT_STRING, $1) }}
| BYTES {{ $$ = NewASTSimpleNode(OPRT_BYTES, $1) }}
;
Cam someone explain me why it gives me these errors?:
rule expr: NOT expr AND expr never reduced
1 rules never reduced
conflicts: 3 reduce/reduce

In a comment, it is clarified that the requirement is that:
The NOT operator should apply only to operands of AND and [the operands] shouldn't be both NOT.
The second part of that requirement is a little odd, since the AND operator is defined to be left-associative. That would mean that
a AND NOT b AND NOT c
would be legal, because it is grouped as (a AND NOT b) AND NOT c, in which both AND operators have one positive term. But rotating the arguments (which might not change the semantics at all) produces:
NOT b AND NOT c AND a
which would be illegal because the first grouping (NOT b AND NOT c) contains two NOT expressions.
It might be that the intention was that any conjunction (sequence of AND operators) contain at least one positive term.
Both of these constraints are possible, but they cannot be achieved using operator precedence declarations.
Operator precedence can be used to resolve ambiguity in an ambiguous grammar (and expr: expr OR expr is certainly ambiguous, since it allows OR to associate in either direction). But it cannot be used to import additional requirements on operands, particularly not a requirement which takes both operands into account [Note 1]. In order to do that, you need to write out an unambiguous grammar. Fortunately, that's not too difficult.
The basis for the unambiguous grammar is what is sometimes called cascading precedence style; this effectively encodes the precedence into rules, so that precedence declarations are unnecessary. The basic boolean grammar looks like this:
expr: conjunction /* Cascade to the next level */
| expr OR conjunction
conjunction
: term /* Continue the cascade */
| conjunction AND term
term: atom /* NOT is a prefix operator */
| NOT term /* which is not what you want */
atom: '(' expr ')'
| STRING
Each precedence level has a corresponding non-terminal, and the levels "cascade" in the sense that each one, except the last, includes the next level as a unit production.
To adapt this to the requirement that NOT be restricted to at most one operand of an AND operator, we can write out the possibilities more or less as you did in the original grammar, but respecting the cascading style:
expr: conjunction
| expr OR conjunction
conjunction
: atom /* NOT is integrated, so no need for 'term' */
| conjunction AND atom
| conjunction AND NOT atom /* Can extend with a negative */
| NOT atom AND atom /* Special case for negative at beginning */
atom: '(' expr ')'
| STRING
The third rule for conjunction (conjunction: conjunction AND NOT atom) allows any number of NOT applications to be added at the end of a list of operands, but does not allow consecutive NOT operands at the beginning of the list. A single NOT at the beginning is allowed by the fourth rule.
If you prefer the rule that a conjunction has to have at least one positive term, you can use the following very simple adaptation:
expr: conjunction
| expr OR conjunction
conjunction
: atom /* NOT is integrated, so no need for 'term' */
| conjunction AND atom
| conjunction AND NOT atom
| negative AND atom /* Possible initial list of negatives */
negative /* This is not part of the cascade */
: NOT atom
| negative AND NOT atom
atom: '(' expr ')'
| STRING
In this variant, negative will match, for example, NOT a AND NOT b AND NOT c. But because it's not in the cascade, that sequence doesn't automatically become a valid expression. In order for it to be used as an expression, it needs to be part of conjunction: negative AND atom, which requires that the sequence include a positive.
Notes
The %nonassoc precedence declaration can be used to reject chained use of operators from the same precedence level. It's a bit of a hack, and can sometimes have unexpected consequences. The expected use case is operators like < in languages like C which don't have special handling for chained comparison; using %nonassoc you can declare that chaining is illegal in the precedence level for comparison operators.
But %nonassoc only works within a single precedence level, and it only works if all the operators in the precedence level require the same treatment. If the intended grammar does not fully conform to those requirements, it will be necessary -- as with this grammar -- to abandon the use of precedence declarations and write out an unambiguous grammar.

Related

Remove ambiguity in grammar for expression casting

I'm working on a small translator in JISON, but I've run into a problem when trying to implement the cast of expressions, since it generates an ambiguity in the grammar when trying to add the production of cast. I need to add the productions to the cast option, so in principle I should have something like this:
expr: OPEN_PAREN type CLOSE_PAREN expr
However, since in my grammar I must be able to have expressions in parentheses, I already have the following production, so the grammar is now ambiguous:
expr: '(' expr ')'
Initially I had the following grammar for expressions:
expr : expr PLUS expr
| expr MINUS expr
| expr TIMESexpr
| expr DIV expr
| expr MOD expr
| expr POWER expr
| MINUS expr %prec UMINUS
| expr LESS_THAN expr
| expr GREATER_THAN expr
| expr LESS_OR_EQUAL expr
| expr GREATER_OR_EQUAL expr
| expr EQUALS expr
| expr DIFFERENT expr
| expr OR expr
| expr AND expr
| NOT expr
| OPEN_PAREN expr CLOSE_PAREN
| INT_LITERAL
| DOUBLE_LITERAL
| BOOLEAN_LITERAL
| CHAR_LITERAL
| STRING_LTIERAL
| ID;
Ambiguity was handled by applying the following precedence and associativity rules:
%left 'ASSIGNEMENT'
%left 'OR'
%left 'AND'
%left 'XOR'
%left 'EQUALS', 'DIFFERENT'
%left 'LESS_THAN ', 'GREATER_THAN ', 'LESS_OR_EQUAL ', 'GREATER_OR_EQUAL '
%left 'PLUS', 'MINUS'
%left 'TIMES', 'DIV', 'MOD'
%right 'POWER'
%right 'UMINUS', 'NOT'
I can't find a way to write a production that allows me to add the cast without falling into an ambiguity. Is there a way to modify this grammar without having to write an unambiguous grammar? Is there a way I can resolve this issue using JISON, which I may not have been able to see?
Any ideas are welcome.
This is what I was trying, however it's still ambiguous:
expr: OPEN_PAREN type CLOSE_PAREN expr
| OPEN_PAREN expr CLOSE_PAREN
The problem is that you don't specify the precedence of the cast operator, which is effectively a unary operator whose precedence should be the same as any other unary operator, such as NOT. (See below for a discussion of UMINUS.)
The parsing conflicts you received are not related to the fact that expr: '(' expr ')' is also a production. That would prevent LL(1) parsing, because the two productions start with the same sequence, but that's not an ambiguity. It doesn't affect bottom-up parsing in any way; the two productions are unambiguously recognisable.
Rather, the conflicts are the result of the parser not knowing whether (type)a+b means ((type)a+b or (type)(a+b), which is no different from the ambiguity of unary minus (should -a/b be parsed as (-a)/b or -(a/b)?), which is resolved by putting UMINUS at the end of the precedence list.
In the case of casts, you don't need to use a %prec declaration with a pseudo-token; that's only necessary for - because - could also be a binary operator, with a different (reduction) precedence. The precedence of the production:
expr: '(' type ')' expr
is ) (at least in yacc/bison), because that's the last terminal in the production. There's no need to give ) a shift precedence, because the grammar requires it to always be shifted.
Three notes:
Assignment is right-associative. a = b = 3 means a = (b = 3), not (a = b) = 3.
In the particular case of unary minus (and, by extension, unary plus if you feel like implementing it), there's a good argument for putting it ahead of exponentiation, so that -a**b is parsed as -(a**b). But that doesn't mean you should move other unary operators up from the end; (type)a**b should be parsed as ((type)a)**b. Nothing says that all unary operators have to have the same precedence.
When you add postfix operators -- notably function calls and array subscripts -- you will want to put them after the unary prefix operators. -a[3] most certainly does not mean (-a)[3]. These postfix operators are, in a way, duals of the prefix operators. As noted above, expr: '(' type ')' expr has precedence ')', which is only used as a reduction precedence. Conversely, expr: expr '(' expr-list ')' does not require a reduction precedence; the relevant token whose shift precedence needs to be declared is (.
So, according to all the above, your precedence declarations might be:
%right ASSIGNMENT
%left OR
%left AND
%left XOR
%left EQUALS DIFFERENT
%left LESS_THAN GREATER_THAN LESS_OR_EQUAL GREATER_OR_EQUAL
%left PLUS MINUS
%left TIMES DIV MOD
%right UMINUS
%right POWER
%right NOT CLOSE_PAREN
%right OPEN_PAREN OPEN_BRACKET
I listed all the unary operators using right associativity, which is somewhat arbitrary; either %left or %right would have the same effect, since it is impossible for a unary operator to compete with another instance of the same operator for the same operand; for unary operators, only the precedence level makes any difference. But it's customary to mark unary operators with %right.
Bison allows the use of %precedence to declare precedence levels for operators which have no associativity, but Jison doesn't have that feature. Both Bison and Jison do allow the use of %nonassoc, but that's very different: it says that it is a syntax error if either operand to the operator is an application of the same operator. That restriction is, for example, sometimes applied to comparison operators, in order to make a < b < c a syntax error.
Usually the way this problem is handled is by having type names as distinct keywords that can't be expressions by themselves. That way, after seeing an (, the next token being a type means it is a cast and the next token being an identifier means it is an expression, so there is no ambiguity.
However, your grammar appears to allow type names (INT, DOUBLE, etc) as expressions. This doesn't make a lot of sense, and causes your parsing problem, as differentiating between a cast and a parenthesized expression will require more lookahead.
The easiest fix would be to remove these productions (though you should still have something like expr : CONSTANT_LITERAL for literal constants)

Combining unary operators with different precedence

I was having some trouble with Bison creating an operator as such:
<- = identity postfix operator with a low precedence to force evaluation of what's on the left first, e.g. 1+2<-*3 (equivalent (1+2)*3) as well as -> which is a prefix operator which does the same thing but to the right.
I was not able to get the syntax to work properly and tested with Python using - not False, which resulted in a syntax error (in Python, - has a greater precedence than not). However, this is not a problem in C or C++, where - and !/not have the same precedence.
Of course, the difference in precedence has nothing to do with the relationship between the 2 operators, only a relationship with other operators that result in the relative precedences between them.
Why is chaining prefix or postfix operators with different precedences a problem when parsing and how can implement the <- and -> operators while still having higher-precedence operators like !, ++, NOT, etc.?
Obligatory Bison (this pattern is repeated for all operators, where copy has greater precedence than post_unary):
post_unary:
copy
| post_unary "++"
| post_unary "--"
| post_unary '!'
;
Chaining operators in this category, e.g. x ! -- ! works fine syntactically.
Ok, let me suggest a possible erroneous grammar based on your sketch:
low_postfix:
mid_infix
| low_postfix "<-"
mid_infix:
high_postfix
| mid_infix '+' high_postfix
high_postfix:
term
| high_postfix "++"
term:
ID
'(' expr ')'
It should be clear just looking at those productions that var <- ++ is not part of the language. The only things that can be used as an operand to ++ are terms and other applications of ++. var <- is neither of these things.
On the other hand, var ++ <- is fine, because the operand to <- can be a mid_infix which can be a high_postfix which is an application of the ++ operator.
If the intention were to allow both of those postfix sequences, then that grammar is incorrect.
A version of that cascade is present in the Python grammar (albeit using prefix operators) which is why not - False is OK, but - not False is a syntax error. I'm reluctant to call that a bug because it may have been intentional. (Really, neither of those expressions makes much sense.) We could disagree about the value of such an intention but not on SO, which prefers to avoid opinionated discussions.
Note that what we might call "strict precedence" in this grammar and the Python grammar is by no means restricted to combinations of unary operators. Here's another one which you have likely never tried:
$ python3 -c 'print(41 + not False)'
File "<string>", line 1
print(41 + not False)
^
SyntaxError: invalid syntax
So, how can we fix that?
On some level, it would be nice to be able to just write an unambiguous grammar which conveyed our intention. And it is certainly possible to write an unambiguous grammar, which would convey the intention to bison. But it's at least an open question as to whether it would convey anything to a human reader, because the massive clutter of multiple rules necessary in order to keep track of what is and is not an acceptable grouping would be pretty daunting.
On the other hand, it's dead simple to do with bison/yacc precedence declarations. We just list the operators in order, and the parser generator resolves all the ambiguities accordingly. [See Note 1 below]
Here's a similar grammar to the above, with precedence declarations. (I left the actions in place in case you want to play with it, although it's by no means a Reproducible Example; the infrastructure it relies upon is much bigger than the grammar itself, and of little use to anyone other than me. So you'll have to define the three functions and fill in some of the bison type declarations. Or just delete the AST functions and use your own.)
%left ','
%precedence "<-"
%precedence "->"
%left '+'
%left '*'
%precedence NEG
%right "++" '('
%%
expr: expr ',' expr { $$ = make_binop(OP_LIST, $1, $3); }
| "<-" expr { $$ = make_unop(OP_LARR, $2); }
| expr "->" { $$ = make_unop(OP_RARR, $1); }
| expr '+' expr { $$ = make_binop(OP_ADD, $1, $3); }
| expr '*' expr { $$ = make_binop(OP_MUL, $1, $3); }
| '-' expr %prec NEG { $$ = make_unop(OP_NEG, $2); }
| expr '(' expr ')' %prec '(' { $$ = make_binop(OP_CALL, $1, $3); }
| "++" expr { $$ = make_unop(OP_PREINC, $2); }
| expr "++" { $$ = make_unop(OP_POSTINC, $1); }
| VALUE { $$ = make_ident($1); }
| '(' expr ')' { $$ = $2; }
A couple of notes:
I used %prec NEG on the unary minus production in order to separate that production from the subtraction production. I also used a %prec declaration to modify the precedence of the call production (the default would be ')'), although in this particular case that's unnecessary. It is necessary to put '(' into the precedence list, though. ( is the lookahead symbol which is used in precedence comparisons.
For many unary operators, I used bison %precedence declaration in the precedence list, rather than %right or %left. Really, there is no such thing as associativity with unary operators, so I think that it's more self-documenting to use %precedence, which doesn't resolve conflicts involving reductions and shifts in the same precedence level. However, even though there is no such thing as associativity between unary operators, the nature of the precedence resolution algorithm is that you can put prefix operators and postfix operators in the same precedence level and choose whether the postfix or prefix operators have priority by using %right or %left, respectively. %right is almost always correct. I did that with ++, because I was a bit lazy by the time I got to that point.
This does "work" (I think). It certainly resolves all the conflicts; bison happily produces a parser without warnings. And the tests that I tried worked at least as I expected them to:
? a++->
=> [-> [++/post a]]
? a->++
=> [++/post [-> a]]
? 3*f(a)+2
=> [+ [* 3 [CALL f a]] 2]
? 3*f(a)->+2
=> [+ [-> [* 3 [CALL f a]]] 2]
? 2+<-f(a)*3
=> [+ 2 [<- [* [CALL f a] 3]]]
? 2+<-f(a)*3->
=> [+ 2 [<- [-> [* [CALL f a] 3]]]]
But there are some expressions where the operator precedence, while "correct", might not be easily explained to a novice user. For example, although the arrow operators look somewhat like parentheses, they don't group that way. Furthermore, the decision as to which of the two operators has higher precedence seems to me to be totally arbitrary (and indeed I might have done it differently from what you expected). Consider:
? <-2*f(a)->+3
=> [<- [+ [-> [* 2 [CALL f a]]] 3]]
? <-2+f(a)->*3
=> [<- [* [-> [+ 2 [CALL f a]]] 3]]
? 2+<-f(a)->*3
=> [+ 2 [<- [* [-> [CALL f a]] 3]]]
There's also something a bit odd about how the arrow operators override normal operator precedence, so that you can't just drop them into a formula without changing its meaning:
? 2+f(a)*3
=> [+ 2 [* [CALL f a] 3]]
? 2+f(a)->*3
=> [* [-> [+ 2 [CALL f a]]] 3]
If that's your intention, fine. It's your language.
Note that there are operator precedence problems which are not quite so easy to solve by just listing operators in precedence order. Sometimes it would be convenient for a binary operator to have different binding power on the left- and right-hand sides.
A classic (but perhaps controversial) case is the assignment operator, if it is an operator. Assignment must associate to the right (because parsing a = b = 0 as (a = b) = 0 would be ridiculous), and the usual expectation is that it greedily accepts as much to the right as possible. If assignment had consistent precedence, then it would also accept as much to the left as possible, which seems a bit strange, at least to me. If a = 2 + b = 7 is meaningful, my intuitions say that its meaning should be a = (2 + (b = 7)) [Note 2]. That would require differential precedence, which is a bit complicated but not unheard of. C solves this problem by restricting the left-hand side of the assignment operators to (syntactic) lvalues, which cannot be binary operator expressions. But in C++, it really does mean a = ((2 + b) = 7), which is semantically valid if 2 + b has been overloaded by a function which returns a reference.
Notes
Precedence declarations do not really add any power to the parser generator. The languages it can produce a parser for are exactly the same languages; it produces the same sort of parsing machine (a pushdown automaton); and it is at least theoretically possible to take that pushdown automaton and reverse engineer a grammar out of it. (In practice, the grammars produced by this process are usually monstrous. But they exist.)
All that the precedence declarations do is resolve parsing conflicts (typically in an ambiguous grammar) according to some user-supplied rules. So it's worth asking why it's so much simpler with precedence declarations than by writing an unambiguous grammar.
The simple hand-waving answer is that precedence rules only apply when there is a conflict. If the parser is in a state where only one action is possible, that's the action which remains, regardless of what the precedence rules might say. In a simple expression grammar, an infix operator followed by a prefix operator is not at all ambiguous: the prefix operator must be shifted, because there is no reduce action for a partial sequence ending with an infix operator.
But when we're writing a grammar, we have to specify explicitly what constructs are possible at each point in the grammar, which we usually do by defining a bunch of non-terminals, each corresponding to some parsing state. An unambiguous grammar for expressions already has split the expression non-terminal into a cascading series of non-terminals, one for each operator precedence value. But unary operators do not have the same binding power on both sides (since, as noted above, one side of the unary operator cannot take an operand). That means that a binary operator could well be able to accept a unary operator for one of its operands, and not be able to accept the same unary operator for its other operand. Which in turn means that we need to split all of our non-terminals again, corresponding to whether the non-terminal appears on the left or the right side of a binary operator.
That's a lot of work, and it's really easy to make a mistake. If you're lucky, the mistake will result in a parsing conflict; but equally it could result in the grammar not being able to recognise a particular construct which you would never think of trying, but which some irate language user feels is an absolute necessity. (Like 41 + not False)
It's possible that my intuitions have been permanently marked by learning APL at a very early age. In APL, all operators associate to the right, basically without any precedence differences.

ANTLR4 - Syntax error on '#' (alternative rule label)

I have made a grammar that will be used with ANTLR4 with the following definition for expressions:
// Expressions
Expr : Integer # Expr_Integer
| Float # Expr_Float
| Double # Expr_Double
| String # Expr_String
| Variable # Expr_Variable
| FuncCall # Expr_FuncCall
| Expr Op_Infix Expr # Expr_Infix
| Op_Prefix Expr # Expr_Prefix
| Expr Op_Postfix # Expr_Postfix
| Expr 'is' Id # Expr_Is
| 'this' # Expr_This
| Expr '?' Expr ':' Expr # Expr_Ternary
| '(' Expr ')' # Expr_Bracketed
;
I added the labels so that I could easily differentiate between the different expression types when analysing the generated syntax tree. However, ANTLR4 throws the following error for every single one of the above lines (excluding the one with the comment):
error(50): Ash.g4:88:19: syntax error: '#' came as a complete surprise to me while looking for lexer rule element
Line 88 is the final rule alternative ( '(' Expr ')' )
I have look through the documentation and various online examples and my syntax seems correct.
What could be causing the error to be thrown?
In Antlr, rules beginning with an uppercase letter are lexer rules, and those beginning with an lowercase letter are parser rules. Antlr uses these definitions a lot to define what you can and cannot do. Usually, the lexer is faster to proccess but less powerful than the parser.
In your case, Expr should definitely be a parser rule, as basically every other rule you have referenced there. Changing it to expr should match the expected behavior.
As a rule of thumb, lexer rules are to be used only when there is no context, it doesn't matter what is next to the generated token. Things like numeric constants, string constants, identifiers and such.

Why in some cases I can't use a token as precedence marker

Assume this code works:
left '*'
left '+'
expr: expr '+' expr
| expr '*' expr
;
I want to define an other precedence marker like:
left MULTIPLY
left PLUS
expr: expr '+' expr %prec PLUS
| expr '*' expr %prec MULTIPLY
;
Yet this is not actually effective.
I suppose these two forms should be equivalent, however, they're not.
It's not on practical problem. I just want to know the reason and the principle for this phenomenon.
Thanks.
Yacc precedence rules aren't really about the precedence of expressions, though they can be used for that. Instead, they are a way of resolving shift/reduce conflicts (and ONLY shift/reduce conflicts) explicitly.
Understanding how it works requires understanding how shift/reduce (bottom up) parsing works. The basic idea is that you read token symbols from the input and push ("shift") those tokens onto a stack. When the symbols on the top of the stack match the right hand side of some rule in the grammar, you may "reduce" the rule, popping the symbols from the stack and replacing them with a single symbol from the left side of the rule. You repeat this process, shifting tokens and reducing rules until you've read the entire input and reduced it to a single instance of the start symbol, at which point you've successfully parsed the entire input.
The essential problem with the above (and what the whole machinery of the parser generator is solving) is knowing when to reduce a rule vs when to shift a token if both are possible. The parser generator (yacc or bison) builds a state machine that tracks which symbols have been shifted and so knows what 'partially matched' rules are currently possible and limits shifts just to those tokens that can match more of such a rule. This does not work if the grammar in question is not LALR(1), and so in such cases yacc/bsion reports shift/reduce or reduce/reduce conflicts.
The way that precedence rules work to resolve shift reduce conflicts is by assigning a precedence to certain tokens and rules in the grammar. Whenever there is a shift/reduce conflict between a token to be shifted and a rule to be reduced, and BOTH have a precedence it will do whichever one has higher precedence. If they have the SAME precedence, then it looks at the %left/%right/%nonassoc flag associated with the precedence level -- %left means reduce, %right means shift, and %nonassoc means do neither and treat it as a syntax error.
The only tricky remaining bit is how tokens and rules get their precedence. Tokens get theirs from the %left/%right/%nonassoc directive they are in, which also sets the ordering. Rules get precedence from a %prec directive OR from the right-most terminal on their right-hand-side. So when you have:
%left '*'
%left '+'
expr: expr '+' expr
| expr '*' expr
;
You are setting the precedence of '*' and '+' with the %left directives, and the two rules get their precedence from those tokens.
When you have:
%left MULTIPLY
%left PLUS
expr: expr '+' expr %prec PLUS
| expr '*' expr %prec MULTIPLY
;
You are setting the precedence of the tokens MULTIPLY and PLUS and then explicitly setting the rules to have those precedences. However you are NOT SETTING ANY PRECEDENCE for the tokens '*' and '+'. So when there is a shift/reduce conflict between one of the two rules and either '*' or '+', precedence does not resolve it because the token has no precedence.
You say you are not trying to solve a specific, practical problem. And from your question, I'm a little confused about how you are trying to use the precedence marker.
I think you will find that you don't need to use the precedence marker often. It is usually simpler, and clearer to the reader, to rewrite your grammar so that precedence is explicitly accounted for. To give multiply and divide higher precedence than add and subtract, you can do something like this (example adapted from John Levine, lex & yacc 2/e, 1992):
%token NAME NUMBER
%%
stmt : NAME '=' expr
| expr
;
expr : expr '+' term
| expr '-' term
| term
;
term : term '*' factor
| term '/' factor
| factor
;
factor : '(' expr ')'
| '-' factor
| NUMBER
;
In your example, PLUS and MULTIPLY are not real tokens; you can't use them interchangeably with '+' and '*'. Levine calls them pseudo-tokens. They are there to link your productions back to your list of precedences that you have defined with %left and %nonassoc declarations. He gives this example of how you might use %prec to give unary minus high precedence even though the '-' token has low precedence:
%token NAME NUMBER
%left '-' '+'
%left '*' '/'
%nonassoc UMINUS
%%
stmt : NAME '=' expr
| expr
;
expr : expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '-' expr %prec UMINUS
| '(' expr ')'
| NUMBER
;
To sum up, I would recommend following the pattern of my first code example rather than the second; make the grammar explicit.
Shift-reduce conflicts are a conflict between trying to reduce a production versus shifting a token and moving to the nest state. When Bison is resolving a conflict its not comparing two rules and choosing one of them - its comparing one rule that it wants to reduce and the token that you want to shift in the other rule(s). This might be clearer if you have two rules to shift:
expr: expr '+' expr
| expr '*' expr
| expr '*' '*' expr
The reason this is all confusing is that the way Bison gives a precedence to the "reduce" rule is to associate it with a token (the last terminal in the rule by default or the token from the prec declaration) and then it uses the precedence table to compares that token to the token you are trying to shift. Basically, prec declarations only make sense for the "reduce" part of a conflict and they are not counted for the shift part.
One way to see this is with the following grammar
command: IF '(' expr ')' command %prec NOELSE
: IF '(' expr ')' command ELSE command
In this grammar you need to choose between reducing the first rule or shifting the ELSE token. You do this by either giving precedences to the ')' token and to the ELSE token or by using a prec declaration and giving a precedence for NOELSE instead of ')'. If you try to give a prec declaration to the second it will get ignored and Bison will continue trying to look for the precedence of the ELSE token in the precedence table.

Creating a parser rule

I'm currently in a CSCI class, compiler at my college. I have to write a parser for the compiler and I've already done Adding subtracting multiplying dividing and the assignment statement. My question is we now have to do the less than equal (<=) and the greater than equal (>=) and I'm not sure how to write the rule for it...
I was thinking something like...
expr LESSTHAN expr { $1 <= $3 }
expr GREATERTHAN expr { $1 >= $3 }
any suggestions?
You should include a more precise question. Here are some general suggestions though.
The structure of the rule for relational operations should be the same as of the arithmetic operations. In both cases you have binary operators. The difference is that one returns a number, the other returns a boolean value. While 1 + 1 >= 3 usually is valid syntax, other combinations like 1 >= 2 => 5 is most likely invalid. Of course there are exceptions. Some languages allow it as syntactic sugar for multiple operations. Others simply define that boolean values are just integers (0 and 1). It's up to you (or your assignment) what you want the syntax to look like.
Anyway, you probably don't simply want append those rules to expr, but create a new rule. This way you distinguish between relational and arithmetical expressions.
expr :
expr PLUS expr |
expr MINUS expr |
... ;
relational_expr :
expr LESSTHAN expr |
expr GREATERTHAN expr ;
assignment :
identifier '=' relational_expr |
identifier '=' expr ;

Resources