comparison operators precendence in Bison - comparison

I'm trying to set rules for comparison operators :== <= !=, etc...
I already have this precendence list:
%nonassoc "=="
%left '+' '-'
%left '*' '/'
%right '^'
%left UNARY
the first line with == doesn't work. I guess it's because "==" isn't a character but a string, but I can't figure out hoe to do it otherwise.
It's supposed to be nonassoc, so that 1==2==3 will fail. thanks

As you write, Bison doesn't understand "==". You can use single-character tokens such as '+' directly, but for multi-character tokens you need to define them using Bison's %token directive. Then you must let the scanner return that token code.

Related

Remove ambiguity in grammar for expression casting

I'm working on a small translator in JISON, but I've run into a problem when trying to implement the cast of expressions, since it generates an ambiguity in the grammar when trying to add the production of cast. I need to add the productions to the cast option, so in principle I should have something like this:
expr: OPEN_PAREN type CLOSE_PAREN expr
However, since in my grammar I must be able to have expressions in parentheses, I already have the following production, so the grammar is now ambiguous:
expr: '(' expr ')'
Initially I had the following grammar for expressions:
expr : expr PLUS expr
| expr MINUS expr
| expr TIMESexpr
| expr DIV expr
| expr MOD expr
| expr POWER expr
| MINUS expr %prec UMINUS
| expr LESS_THAN expr
| expr GREATER_THAN expr
| expr LESS_OR_EQUAL expr
| expr GREATER_OR_EQUAL expr
| expr EQUALS expr
| expr DIFFERENT expr
| expr OR expr
| expr AND expr
| NOT expr
| OPEN_PAREN expr CLOSE_PAREN
| INT_LITERAL
| DOUBLE_LITERAL
| BOOLEAN_LITERAL
| CHAR_LITERAL
| STRING_LTIERAL
| ID;
Ambiguity was handled by applying the following precedence and associativity rules:
%left 'ASSIGNEMENT'
%left 'OR'
%left 'AND'
%left 'XOR'
%left 'EQUALS', 'DIFFERENT'
%left 'LESS_THAN ', 'GREATER_THAN ', 'LESS_OR_EQUAL ', 'GREATER_OR_EQUAL '
%left 'PLUS', 'MINUS'
%left 'TIMES', 'DIV', 'MOD'
%right 'POWER'
%right 'UMINUS', 'NOT'
I can't find a way to write a production that allows me to add the cast without falling into an ambiguity. Is there a way to modify this grammar without having to write an unambiguous grammar? Is there a way I can resolve this issue using JISON, which I may not have been able to see?
Any ideas are welcome.
This is what I was trying, however it's still ambiguous:
expr: OPEN_PAREN type CLOSE_PAREN expr
| OPEN_PAREN expr CLOSE_PAREN
The problem is that you don't specify the precedence of the cast operator, which is effectively a unary operator whose precedence should be the same as any other unary operator, such as NOT. (See below for a discussion of UMINUS.)
The parsing conflicts you received are not related to the fact that expr: '(' expr ')' is also a production. That would prevent LL(1) parsing, because the two productions start with the same sequence, but that's not an ambiguity. It doesn't affect bottom-up parsing in any way; the two productions are unambiguously recognisable.
Rather, the conflicts are the result of the parser not knowing whether (type)a+b means ((type)a+b or (type)(a+b), which is no different from the ambiguity of unary minus (should -a/b be parsed as (-a)/b or -(a/b)?), which is resolved by putting UMINUS at the end of the precedence list.
In the case of casts, you don't need to use a %prec declaration with a pseudo-token; that's only necessary for - because - could also be a binary operator, with a different (reduction) precedence. The precedence of the production:
expr: '(' type ')' expr
is ) (at least in yacc/bison), because that's the last terminal in the production. There's no need to give ) a shift precedence, because the grammar requires it to always be shifted.
Three notes:
Assignment is right-associative. a = b = 3 means a = (b = 3), not (a = b) = 3.
In the particular case of unary minus (and, by extension, unary plus if you feel like implementing it), there's a good argument for putting it ahead of exponentiation, so that -a**b is parsed as -(a**b). But that doesn't mean you should move other unary operators up from the end; (type)a**b should be parsed as ((type)a)**b. Nothing says that all unary operators have to have the same precedence.
When you add postfix operators -- notably function calls and array subscripts -- you will want to put them after the unary prefix operators. -a[3] most certainly does not mean (-a)[3]. These postfix operators are, in a way, duals of the prefix operators. As noted above, expr: '(' type ')' expr has precedence ')', which is only used as a reduction precedence. Conversely, expr: expr '(' expr-list ')' does not require a reduction precedence; the relevant token whose shift precedence needs to be declared is (.
So, according to all the above, your precedence declarations might be:
%right ASSIGNMENT
%left OR
%left AND
%left XOR
%left EQUALS DIFFERENT
%left LESS_THAN GREATER_THAN LESS_OR_EQUAL GREATER_OR_EQUAL
%left PLUS MINUS
%left TIMES DIV MOD
%right UMINUS
%right POWER
%right NOT CLOSE_PAREN
%right OPEN_PAREN OPEN_BRACKET
I listed all the unary operators using right associativity, which is somewhat arbitrary; either %left or %right would have the same effect, since it is impossible for a unary operator to compete with another instance of the same operator for the same operand; for unary operators, only the precedence level makes any difference. But it's customary to mark unary operators with %right.
Bison allows the use of %precedence to declare precedence levels for operators which have no associativity, but Jison doesn't have that feature. Both Bison and Jison do allow the use of %nonassoc, but that's very different: it says that it is a syntax error if either operand to the operator is an application of the same operator. That restriction is, for example, sometimes applied to comparison operators, in order to make a < b < c a syntax error.
Usually the way this problem is handled is by having type names as distinct keywords that can't be expressions by themselves. That way, after seeing an (, the next token being a type means it is a cast and the next token being an identifier means it is an expression, so there is no ambiguity.
However, your grammar appears to allow type names (INT, DOUBLE, etc) as expressions. This doesn't make a lot of sense, and causes your parsing problem, as differentiating between a cast and a parenthesized expression will require more lookahead.
The easiest fix would be to remove these productions (though you should still have something like expr : CONSTANT_LITERAL for literal constants)

How does the declaration nonassoc work in fsyacc?

I'm trying to figure out how %nonassoc works in fsyacc.
I have this code snippet that I'm trying to figure out how it works.
%nonassoc letprec
%left DEQ LTH
%left PLUS MINUS
DEQ = '=='
LTH = '<'
I'm not sure my understanding is fully correct right now.
As of my understanding, nonassoc means that you can alter the parenthesis in an expression, without affecting the result. I have also read it used in fsyacc for invalid expressions, though I am confused how nonassoc letprec works in this code snippet.
I have a feeling it's something about the two following declarations both being Left associative. and what would happen if I omitted %nonassoc letprec? What would it mean if I put %nonassoc letprec between the two %left declarations?

operator precedence with more than 2 recursions

I am trying some combinations of operator precedence and associativity on bison. While some cases it looks odd, basic question appears that if the below rule is valid which do appear not wrong.
expr: expr OP1 expr OP5 '+' expr
According to bison info page, rule takes precedence from last terminal symbol or precedence explicitly assigned to it.
Below is a precedence and full expr rules paste from code:
%left OP4
%left OP3
%left OP2
%left OP1
%left '*'
%left '/'
%left '+'
%left '-'
expr: NUM { $$ = $1; }
| expr OP2 expr OP5 '+' expr { printf("+"); }
| expr OP1 expr OP5 '-' expr { printf("-"); }
| expr OP4 expr OP5 '*' expr { printf("*"); }
| expr OP3 expr OP5 '/' expr { printf("/"); }
;
Below is data tokens given:
1op11op5-2op22op5+3op33op5/4op44op5*5
Output on executing parser is below as expected.
-+/*
Now, once precedence is flipped between arithmetic operators and OP, result is reversed indicating that not last terminals are influencing the rule precedence.
%left '*'
%left '/'
%left '+'
%left '-'
%left OP4
%left OP3
%left OP2
%left OP1
Now output of parser is reverse which indicates last terminal precendence is not helping:
*/-+
Further if for first combination with only arithmetic operators at higher precedence than OP operators, if precedence of OP operators is removed, result is still as second combination with no precendece of rules playing. This result makes it difficult to conclude that second expr is used for precdence rather than 3rd one.
What can be concluded from above results.
What can be the precedence and associativity logic in case more than 2 recursions are used in rules?
The 'precdence' rules in bison really don't have much to do with operator precedence in the traditional sense -- they're really just a hack for resolving shift-reduce conflicts in a way that can implement simple operator precedence in an ambiguous grammar. So to understand how they work, you need to understand shift/reduce parsing and how the precedence rules are used to resolve conflicts.
The actual mechanism is actually pretty simple. Whenever bison has a shift/reduce conflict in the parser it generates for a grammar, it looks at the rule (to reduce) and the token (to shift) involved in the conflict, and if BOTH have assigned precedence, it resolves the conflict in favor of whichever one has higher precedence. That's it.
So this works pretty well for resolving the precedence of simple binary operators, but if you try to use it for anything more complex, it will likely get you into trouble. In your examples, the conflicts all come between the rules and the shifts for OP1/2/3/4, so all that matters is the relatvie precedence of those two groups -- the precedence within each group is irrelevant as there are never any conflicts between them. So when the rules are higher precedence, they'll reduce left to right, and when the OP tokens are higher precedence it will shift (resulting in eventual right to left reductions).

Why is this Yacc/bison rule useless?

with
%nonassoc ELSE
%nonassoc THEN
I get
$ bison -dv tiger.yy
tiger.yy:74.5-28: warning: rule useless in parser due to conflicts [-Wother]
: IF exp THEN exp ELSE exp
^^^^^^^^^^^^^^^^^^^^^^^^
but with
%nonassoc THEN
%nonassoc ELSE
the rule works.
What's going on here? why is this the case?
As the warning says, the rule is useless because if THEN has precedence over ELSE, then the resolution of a shift/reduce conflict makes it impossible for the rule to be applied.
I presume that the grammar actually includes something like:
exp: IF exp THEN exp ELSE exp
| IF exp THEN exp
because if the ELSE clause is mandatory, there wouldn't be a conflict. The rule above has a shift/reduce conflict because when the ELSE is the lookahead token in the parsing of IF exp THEN IF exp THEN exp ELSE... it would be possible to either shift the ELSE or reduce the innerIF exp THEN exp to exp.
In order to correctly parse the expression, it is necessary to favour the shift action, so that the ELSE will be associated with the innermost available IF. Without precedence declarations, that would be the default resolution, since yacc/bison prefers shift over reduce. However, if bison uses a default resolution, it also produces a warning about the resolution. To avoid the warning, it is common to explicitly force the default resolution by giving ELSE precedence over THEN. That's what
%nonassoc THEN
%nonassoc ELSE
does. If you write the precedence declarations in the other order,
%nonassoc ELSE
%nonassoc THEN
then you are giving THEN precedence over ELSE, which means that you are instructing the parser generator to prefer reducing the production whose last nonterminal is THEN over shifting ELSE. Bison/yacc will obey that request, but if it does so it can never shift the ELSE making the rule containing the ELSE useless.

Why in some cases I can't use a token as precedence marker

Assume this code works:
left '*'
left '+'
expr: expr '+' expr
| expr '*' expr
;
I want to define an other precedence marker like:
left MULTIPLY
left PLUS
expr: expr '+' expr %prec PLUS
| expr '*' expr %prec MULTIPLY
;
Yet this is not actually effective.
I suppose these two forms should be equivalent, however, they're not.
It's not on practical problem. I just want to know the reason and the principle for this phenomenon.
Thanks.
Yacc precedence rules aren't really about the precedence of expressions, though they can be used for that. Instead, they are a way of resolving shift/reduce conflicts (and ONLY shift/reduce conflicts) explicitly.
Understanding how it works requires understanding how shift/reduce (bottom up) parsing works. The basic idea is that you read token symbols from the input and push ("shift") those tokens onto a stack. When the symbols on the top of the stack match the right hand side of some rule in the grammar, you may "reduce" the rule, popping the symbols from the stack and replacing them with a single symbol from the left side of the rule. You repeat this process, shifting tokens and reducing rules until you've read the entire input and reduced it to a single instance of the start symbol, at which point you've successfully parsed the entire input.
The essential problem with the above (and what the whole machinery of the parser generator is solving) is knowing when to reduce a rule vs when to shift a token if both are possible. The parser generator (yacc or bison) builds a state machine that tracks which symbols have been shifted and so knows what 'partially matched' rules are currently possible and limits shifts just to those tokens that can match more of such a rule. This does not work if the grammar in question is not LALR(1), and so in such cases yacc/bsion reports shift/reduce or reduce/reduce conflicts.
The way that precedence rules work to resolve shift reduce conflicts is by assigning a precedence to certain tokens and rules in the grammar. Whenever there is a shift/reduce conflict between a token to be shifted and a rule to be reduced, and BOTH have a precedence it will do whichever one has higher precedence. If they have the SAME precedence, then it looks at the %left/%right/%nonassoc flag associated with the precedence level -- %left means reduce, %right means shift, and %nonassoc means do neither and treat it as a syntax error.
The only tricky remaining bit is how tokens and rules get their precedence. Tokens get theirs from the %left/%right/%nonassoc directive they are in, which also sets the ordering. Rules get precedence from a %prec directive OR from the right-most terminal on their right-hand-side. So when you have:
%left '*'
%left '+'
expr: expr '+' expr
| expr '*' expr
;
You are setting the precedence of '*' and '+' with the %left directives, and the two rules get their precedence from those tokens.
When you have:
%left MULTIPLY
%left PLUS
expr: expr '+' expr %prec PLUS
| expr '*' expr %prec MULTIPLY
;
You are setting the precedence of the tokens MULTIPLY and PLUS and then explicitly setting the rules to have those precedences. However you are NOT SETTING ANY PRECEDENCE for the tokens '*' and '+'. So when there is a shift/reduce conflict between one of the two rules and either '*' or '+', precedence does not resolve it because the token has no precedence.
You say you are not trying to solve a specific, practical problem. And from your question, I'm a little confused about how you are trying to use the precedence marker.
I think you will find that you don't need to use the precedence marker often. It is usually simpler, and clearer to the reader, to rewrite your grammar so that precedence is explicitly accounted for. To give multiply and divide higher precedence than add and subtract, you can do something like this (example adapted from John Levine, lex & yacc 2/e, 1992):
%token NAME NUMBER
%%
stmt : NAME '=' expr
| expr
;
expr : expr '+' term
| expr '-' term
| term
;
term : term '*' factor
| term '/' factor
| factor
;
factor : '(' expr ')'
| '-' factor
| NUMBER
;
In your example, PLUS and MULTIPLY are not real tokens; you can't use them interchangeably with '+' and '*'. Levine calls them pseudo-tokens. They are there to link your productions back to your list of precedences that you have defined with %left and %nonassoc declarations. He gives this example of how you might use %prec to give unary minus high precedence even though the '-' token has low precedence:
%token NAME NUMBER
%left '-' '+'
%left '*' '/'
%nonassoc UMINUS
%%
stmt : NAME '=' expr
| expr
;
expr : expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| '-' expr %prec UMINUS
| '(' expr ')'
| NUMBER
;
To sum up, I would recommend following the pattern of my first code example rather than the second; make the grammar explicit.
Shift-reduce conflicts are a conflict between trying to reduce a production versus shifting a token and moving to the nest state. When Bison is resolving a conflict its not comparing two rules and choosing one of them - its comparing one rule that it wants to reduce and the token that you want to shift in the other rule(s). This might be clearer if you have two rules to shift:
expr: expr '+' expr
| expr '*' expr
| expr '*' '*' expr
The reason this is all confusing is that the way Bison gives a precedence to the "reduce" rule is to associate it with a token (the last terminal in the rule by default or the token from the prec declaration) and then it uses the precedence table to compares that token to the token you are trying to shift. Basically, prec declarations only make sense for the "reduce" part of a conflict and they are not counted for the shift part.
One way to see this is with the following grammar
command: IF '(' expr ')' command %prec NOELSE
: IF '(' expr ')' command ELSE command
In this grammar you need to choose between reducing the first rule or shifting the ELSE token. You do this by either giving precedences to the ')' token and to the ELSE token or by using a prec declaration and giving a precedence for NOELSE instead of ')'. If you try to give a prec declaration to the second it will get ignored and Bison will continue trying to look for the precedence of the ELSE token in the precedence table.

Resources