How does the declaration nonassoc work in fsyacc? - f#

I'm trying to figure out how %nonassoc works in fsyacc.
I have this code snippet that I'm trying to figure out how it works.
%nonassoc letprec
%left DEQ LTH
%left PLUS MINUS
DEQ = '=='
LTH = '<'
I'm not sure my understanding is fully correct right now.
As of my understanding, nonassoc means that you can alter the parenthesis in an expression, without affecting the result. I have also read it used in fsyacc for invalid expressions, though I am confused how nonassoc letprec works in this code snippet.
I have a feeling it's something about the two following declarations both being Left associative. and what would happen if I omitted %nonassoc letprec? What would it mean if I put %nonassoc letprec between the two %left declarations?

Related

ANTLR: Why is this grammar rule for a tuples not LL(1)?

I have the following grammar rules defined to cover tuples of the form: (a), (a,), (a,b), (a,b,) and so on. However, antlr3 gives the warning:
"Decision can match input such as "COMMA" using multiple alternatives: 1, 2
I believe this means that my grammar is not LL(1). This caught me by surprise as, based on my extremely limited understanding of this topic, the parser would only need to look one token ahead from (COMMA)? to ')' in order to know which comma it was on.
Also based on the discussion I found here I am further confused: Amend JSON - based grammar to allow for trailing comma
And their source code here: https://github.com/doctrine/annotations/blob/1.13.x/lib/Doctrine/Common/Annotations/DocParser.php#L1307
Is this because of the kind of parser that antlr is trying to generate and not because my grammar isn't LL(1)? Any insight would be appreciated.
options {k=1; backtrack=no;}
tuple : '(' IDENT (COMMA IDENT)* (COMMA)? ')';
DIGIT : '0'..'9' ;
LOWER : 'a'..'z' ;
UPPER : 'A'..'Z' ;
IDENT : (LOWER | UPPER | '_') (LOWER | UPPER | '_' | DIGIT)* ;
edit: changed typo in tuple: ... from (IDENT)? to (COMMA)?
Note:
The question has been edited since this answer was written. In the original, the grammar had the line:
tuple : '(' IDENT (COMMA IDENT)* (IDENT)? ')';
and that's what this answer is referring to.
That grammar works without warnings, but it doesn't describe the language you intend to parse. It accepts, for example, (a, b c) but fails to accept (a, b,).
My best guess is that you actually used something like the grammars in the links you provide, in which the final optional element is a comma, not an identifier:
tuple : '(' IDENT (COMMA IDENT)* (COMMA)? ')';
That does give the warning you indicate, and it won't match (a,) (for example), because, as the warning says, the second alternative has been disabled.
LL(1) as a property of formal grammars only applies to grammars with fixed right-hand sides, as opposed to the "Extended" BNF used by many top-down parser generators, including Antlr, in which a right-hand side can be a set of possibilities. It's possible to expand EBNF using additional non-terminals for each subrule (although there is not necessarily a canonical expansion, and expansions might differ in their parsing category). But, informally, we could extend the concept of LL(k) by saying that in every EBNF right-hand side, at every point where there is more than one alternative, the parser must be able to predict the appropriate alternative looking only at the next k tokens.
You're right that the grammar you provide is LL(1) in that sense. When the parser has just seen IDENT, it has three clear alternatives, each marked by a different lookahead token:
COMMA ↠ predict another repetition of (COMMA IDENT).
IDENT ↠ predict (IDENT).
')' ↠ predict an empty (IDENT)?.
But in the correct grammar (with my modification above), IDENT is a syntax error and COMMA could be either another repetition of ( COMMA IDENT ), or it could be the COMMA in ( COMMA )?.
You could change k=1 to k=2, thereby allowing the parser to examine the next two tokens, and if you did so it would compile with no warnings. In effect, that grammar is LL(2).
You could make an LL(1) grammar by left-factoring the expansion of the EBNF, but it's not going to be as pretty (or as easy for a reader to understand). So if you have a parser generator which can cope with the grammar as written, you might as well not worry about it.
But, for what it's worth, here's a possible solution:
tuple : '(' idents ')' ;
idents : IDENT ( COMMA ( idents )? )? ;
Untested because I don't have a working Antlr3 installation, but it at least compiles the grammar without warnings. Sorry if there is a problem.
It would probably be better to use tuple : '(' (idents)? ')'; in order to allow empty tuples. Also, there's no obvious reason to insist on COMMA instead of just using ',', assuming that '(' and ')' work as expected on Antlr3.

operator precedence with more than 2 recursions

I am trying some combinations of operator precedence and associativity on bison. While some cases it looks odd, basic question appears that if the below rule is valid which do appear not wrong.
expr: expr OP1 expr OP5 '+' expr
According to bison info page, rule takes precedence from last terminal symbol or precedence explicitly assigned to it.
Below is a precedence and full expr rules paste from code:
%left OP4
%left OP3
%left OP2
%left OP1
%left '*'
%left '/'
%left '+'
%left '-'
expr: NUM { $$ = $1; }
| expr OP2 expr OP5 '+' expr { printf("+"); }
| expr OP1 expr OP5 '-' expr { printf("-"); }
| expr OP4 expr OP5 '*' expr { printf("*"); }
| expr OP3 expr OP5 '/' expr { printf("/"); }
;
Below is data tokens given:
1op11op5-2op22op5+3op33op5/4op44op5*5
Output on executing parser is below as expected.
-+/*
Now, once precedence is flipped between arithmetic operators and OP, result is reversed indicating that not last terminals are influencing the rule precedence.
%left '*'
%left '/'
%left '+'
%left '-'
%left OP4
%left OP3
%left OP2
%left OP1
Now output of parser is reverse which indicates last terminal precendence is not helping:
*/-+
Further if for first combination with only arithmetic operators at higher precedence than OP operators, if precedence of OP operators is removed, result is still as second combination with no precendece of rules playing. This result makes it difficult to conclude that second expr is used for precdence rather than 3rd one.
What can be concluded from above results.
What can be the precedence and associativity logic in case more than 2 recursions are used in rules?
The 'precdence' rules in bison really don't have much to do with operator precedence in the traditional sense -- they're really just a hack for resolving shift-reduce conflicts in a way that can implement simple operator precedence in an ambiguous grammar. So to understand how they work, you need to understand shift/reduce parsing and how the precedence rules are used to resolve conflicts.
The actual mechanism is actually pretty simple. Whenever bison has a shift/reduce conflict in the parser it generates for a grammar, it looks at the rule (to reduce) and the token (to shift) involved in the conflict, and if BOTH have assigned precedence, it resolves the conflict in favor of whichever one has higher precedence. That's it.
So this works pretty well for resolving the precedence of simple binary operators, but if you try to use it for anything more complex, it will likely get you into trouble. In your examples, the conflicts all come between the rules and the shifts for OP1/2/3/4, so all that matters is the relatvie precedence of those two groups -- the precedence within each group is irrelevant as there are never any conflicts between them. So when the rules are higher precedence, they'll reduce left to right, and when the OP tokens are higher precedence it will shift (resulting in eventual right to left reductions).

Why is this Yacc/bison rule useless?

with
%nonassoc ELSE
%nonassoc THEN
I get
$ bison -dv tiger.yy
tiger.yy:74.5-28: warning: rule useless in parser due to conflicts [-Wother]
: IF exp THEN exp ELSE exp
^^^^^^^^^^^^^^^^^^^^^^^^
but with
%nonassoc THEN
%nonassoc ELSE
the rule works.
What's going on here? why is this the case?
As the warning says, the rule is useless because if THEN has precedence over ELSE, then the resolution of a shift/reduce conflict makes it impossible for the rule to be applied.
I presume that the grammar actually includes something like:
exp: IF exp THEN exp ELSE exp
| IF exp THEN exp
because if the ELSE clause is mandatory, there wouldn't be a conflict. The rule above has a shift/reduce conflict because when the ELSE is the lookahead token in the parsing of IF exp THEN IF exp THEN exp ELSE... it would be possible to either shift the ELSE or reduce the innerIF exp THEN exp to exp.
In order to correctly parse the expression, it is necessary to favour the shift action, so that the ELSE will be associated with the innermost available IF. Without precedence declarations, that would be the default resolution, since yacc/bison prefers shift over reduce. However, if bison uses a default resolution, it also produces a warning about the resolution. To avoid the warning, it is common to explicitly force the default resolution by giving ELSE precedence over THEN. That's what
%nonassoc THEN
%nonassoc ELSE
does. If you write the precedence declarations in the other order,
%nonassoc ELSE
%nonassoc THEN
then you are giving THEN precedence over ELSE, which means that you are instructing the parser generator to prefer reducing the production whose last nonterminal is THEN over shifting ELSE. Bison/yacc will obey that request, but if it does so it can never shift the ELSE making the rule containing the ELSE useless.

comparison operators precendence in Bison

I'm trying to set rules for comparison operators :== <= !=, etc...
I already have this precendence list:
%nonassoc "=="
%left '+' '-'
%left '*' '/'
%right '^'
%left UNARY
the first line with == doesn't work. I guess it's because "==" isn't a character but a string, but I can't figure out hoe to do it otherwise.
It's supposed to be nonassoc, so that 1==2==3 will fail. thanks
As you write, Bison doesn't understand "==". You can use single-character tokens such as '+' directly, but for multi-character tokens you need to define them using Bison's %token directive. Then you must let the scanner return that token code.

Detect wrong grammar (error alternatives) for quick fixes

In my ANTLR grammar I would like to detect wrong keywords or wrong typed constants, e.g. 'null' instead of 'NULL'. I added error alternatives in the grammar, e.g.:
| extra='null' #error1
If the wrong constant is detected in my custom editor I can fix it by replacing it with the correct constant.
But I don't know if this is the correct way to address and detect wrong keywords or constants in a grammar.
In addition I tried to detect missing closing in the grammar (see ANTLR book chapter 9.4):
.......
| 'if' '(' expr expr #error2
| '{' exprlist #error3
| 'while' '(' expr expr #error4
........
But this massively slows down the parsing process and so I think that it is wrong to do so.
My questions are:
Is it correct to detect wrong keywords, constants, etc. in that way as described above?
Is it somehow possible to catch missing closing in the grammar without the massive speed decrease?
Any help is appreciated.
I found out that I can extract the second Token from the rule (to add a brace at that position):
| 'if' '(' expr expr #error2
with:
Token myToken = tokens.get(ctx.getChild(2).getSourceInterval().b);
parser.notifyErrorListeners(........
However it massively decreases the speed of the parser.

Resources