Practical solution to fix a Grammar Problem - parsing

We have little snippets of vb6 code (the only use a subset of features) that gets wirtten by non-programmers. These are called rules. For the people writing these they are hard to debug so somebody wrote a kind of add hoc parser to be able to evaluete the subexpressions and thereby show better where the problem is.
This addhoc parser is very bad and does not really work woll. So Im trying to write a real parser (because im writting it by hand (no parser generator I could understand with vb6 backends) I want to go with recursive decent parser). I had to reverse-engineer the grammer because I could find anything. (Eventully I found something http://www.notebar.com/GoldParserEngine.html but its LALR and its way bigger then i need)
Here is the grammer for the subset of VB.
<Rule> ::= expr rule | e
<Expr> ::= ( expr )
| Not_List CompareExpr <and_or> expr
| Not_List CompareExpr
<and_or> ::= Or | And
<Not_List> ::= Not Not_List | e
<CompareExpr> ::= ConcatExpr comp CompareExpr
|ConcatExpr
<ConcatExpr> ::= term term_tail & ConcatExpr
|term term_tail
<term> ::= factor factor_tail
<term_tail> ::= add_op term term_tail | e
<factor> ::= add_op Value | Value
<factor_tail> ::= multi_op factor factor_tail | e
<Value> ::= ConstExpr | function | expr
<ConstExpr> ::= <bool> | number | string | Nothing
<bool> ::= True | False
<Nothing> ::= Nothing | Null | Empty
<function> ::= id | id ( ) | id ( arg_list )
<arg_list> ::= expr , arg_list | expr
<add_op> ::= + | -
<multi_op> ::= * | /
<comp> ::= > | < | <= | => | =< | >= | = | <>
All in all it works pretty good here are some simple examples:
my_function(1, 2 , 3)
looks like
(Programm
(rule
(expr
(Not_List)
(CompareExpr
(ConcatExpr
(term
(factor
(value
(function
my_function
(arg_list
(expr
(Not_List)
(CompareExpr
(ConcatExpr (term (factor (value 1))) (term_tail))))
(arg_list
(expr
(Not_List)
(CompareExpr
(ConcatExpr (term (factor (value 2))) (term_tail))))
(arg_list
(expr
(Not_List)
(CompareExpr
(ConcatExpr (term (factor (value 3))) (term_tail))))
(arg_list))))))))
(term_tail))))
(rule)))
Now whats my problem?
if you have code that looks like this (( true OR false ) AND true) I have a infinit recursion but the real problem is that in the (true OR false) AND true (after the first ( expr ) ) is understood as only (true or false).
Here is the Parstree:
So how to solve this. Should I change the grammer somehow or use some implmentation hack?
Something hard exmplale in case you need it.
(( f1 OR f1 ) AND (( f3="ALL" OR f4="test" OR f5="ALL" OR f6="make" OR f9(1, 2) ) AND ( f7>1 OR f8>1 )) OR f8 <> "")

You have several issues that I see.
You are treating OR and AND as equal precedence operators. You should have separate rules for OR, and for AND. Otherwise you will the wrong precedence (therefore evaluation) for the expression A OR B AND C.
So as a first step, I'd revise your rules as follows:
<Expr> ::= ( expr )
| Not_List AndExpr Or Expr
| Not_List AndExpr
<AndExpr> ::=
| CompareExpr And AndExpr
| Not_List CompareExpr
Next problem is that you have ( expr ) at the top level of your list. What if I write:
A AND (B OR C)
To fix this, change these two rules:
<Expr> ::= Not_List AndExpr Or Expr
| Not_List AndExpr
<Value> ::= ConstExpr | function | ( expr )
I think your implementation of Not is not appropriate. Not is an operator,
just with one operand, so its "tree" should have a Not node and a child which
is the expression be Notted. What you have a list of Nots with no operands.
Try this instead:
<Expr> ::= AndExpr Or Expr
| AndExpr
<Value> ::= ConstExpr | function | ( expr ) | Not Value
I haven't looked, but I think VB6 expressions have other messy things in them.
If you notice, the style of Expr and AndExpr I have written use right recursion to avoid left recursion. You should change your Concat, Sum, and Factor rules to follow a similar style; what you have is pretty complicated and hard to follow.

If they are just creating snippets then perhaps VB5 is "good enough" for creating them. And if VB5 is good enough, the free VB5 Control Creation Edition might be worth tracking down for them to use:
http://www.thevbzone.com/vbcce.htm
You could have them start from a "test harness" project they add snippets to, and they can even test them out.
With a little orientation this will probably prove much more practical than hand crafting a syntax analyzer, and a lot more useful since they can test for more than correct syntax.
Where VB5 is lacking you might include a static module in the "test harness" that provides a rough and ready equivalent of Split(), Replace(), etc:
http://support.microsoft.com/kb/188007

Related

How would I implement operator-precedence in my grammar?

I'm trying to make an expression parser and although it works, it does calculations chronologically rather than by BIDMAS; 1 + 2 * 3 + 4 returns 15 instead of 11. I've rewritten the parser to use recursive descent parsing and a proper grammar which I thought would work, but it makes the same mistake.
My grammar so far is:
exp ::= term op exp | term
op ::= "/" | "*" | "+" | "-"
term ::= number | (exp)
It also lacks other features but right now I'm not sure how to make division precede multiplication, etc.. How should I modify my grammar to implement operator-precedence?
Try this:
exp ::= add
add ::= mul (("+" | "-") mul)*
mul ::= term (("*" | "/") term)*
term ::= number | "(" exp ")"
Here ()* means zero or more times. This grammar will produce right associative trees and it is deterministic and unambiguous. The multiplication and the division are with the same priority. The addition and subtraction also.

Find out if there is a theory fow writing expression section of BNF Grammar

I want to write a new parser for mathematical program. When i write BNF grammar for that, I'm stuck in expressions. And this is the BNF notation I wrote.
program : expression*
expression: additive_expression
additive_expression : multiplicative_expression
| additive_expression '+' multiplicative_expression
| additive_expression '-' multiplicative_expression
multiplicative_expression : number
| number '*' multiplicative_expression
| number '/' multiplicative_expression
But i cannot understand how to write BNF expression grammar for ++, --, +=, -=, &&, || etc. this operators. I mean operators in languages like C, C++, C#, Python, Java etc.
I know that when using +, -, *, / operators, grammar should be written according to the BODMAS theory.
What i want to know is whether any theory should be used in writing grammar for other operators?
I took a good look at the C++ Language grammar ( expressions part ). But I can not understand it.
<expression> ::= <assignment_expression>
| <assignment_expression> <expression>
<assignment_expression> ::= <logical_or_expression> "=" <assignment_expression>
| <logical_or_expression> "+=" <assignment_expression>
| <logical_or_expression> "-=" <assignment_expression>
| <logical_or_expression> "*=" <assignment_expression>
| <logical_or_expression> "/=" <assignment_expression>
| <logical_or_expression> "%=" <assignment_expression>
| <logical_or_expression> "<<=" <assignment_expression>
| <logical_or_expression> ">>=" <assignment_expression>
| <logical_or_expression> "&=" <assignment_expression>
| <logical_or_expression> "|=" <assignment_expression>
| <logical_or_expression> "^=" <assignment_expression>
| <logical_or_expression>
<constant_expression> ::= <conditional_expression>
<conditional_expression> ::= <logical_or_expression>
<logical_or_expression> ::= <logical_or_expression> "||" <logical_and_expression>
| <logical_and_expression>
<logical_and_expression> ::= <logical_and_expression> "&&" <inclusive_or_expression>
| <inclusive_or_expression>
<inclusive_or_expression> ::= <inclusive_or_expression> "|" <exclusive_or_expression>
| <exclusive_or_expression>
<exclusive_or_expression> ::= <exclusive_or_expression> "^" <and_expression>
| <and_expression>
<and_expression> ::= <and_expression> "&" <equality_expression>
| <equality_expression>
<equality_expression> ::= <equality_expression> "==" <relational_expression>
| <equality_expression> "!=" <relational_expression>
| <relational_expression>
<relational_expression> ::= <relational_expression> ">" <shift_expression>
| <relational_expression> "<" <shift_expression>
| <relational_expression> ">=" <shift_expression>
| <relational_expression> "<=" <shift_expression>
| <shift_expression>
<shift_expression> ::= <shift_expression> ">>" <addictive_expression>
| <shift_expression> "<<" <addictive_expression>
| <addictive_expression>
<addictive_expression> ::= <addictive_expression> "+" <multiplicative_expression>
| <addictive_expression> "-" <multiplicative_expression>
| <multiplicative_expression>
<multiplicative_expression> ::= <multiplicative_expression> "*" <unary_expression>
| <multiplicative_expression> "/" <unary_expression>
| <multiplicative_expression> "%" <unary_expression>
| <unary_expression>
<unary_expression> ::= "++" <unary_expression>
| "--" <unary_expression>
| "+" <unary_expression>
| "-" <unary_expression>
| "!" <unary_expression>
| "~" <unary_expression>
| "size" <unary_expression>
| <postfix_expression>
<postfix_expression> ::= <postfix_expression> "++"
| <postfix_expression> "--"
| <primary_expression>
<primary_expression> ::= <integer_literal>
| <floating_literal>
| <character_literal>
| <string_literal>
| <boolean_literal>
| "(" <expression> ")"
| IDENTIFIER
<integer_literal> ::= INTEGER
<floating_literal> ::= FLOAT
<character_literal> ::= CHARACTER
<string_literal> ::= STRING
<boolean_literal> ::= "true"
| "false"
Can you help me understand this? I searched a lot on the internet about this. But I could not find the right solution for this.
Thank you
I gather that the problem is not that you don't know how to write BNF grammars. Rather, you don't know which grammar you should write. In other words, if someone told you the precedence order for your operators, you would have no trouble writing down a grammar which parsed the operators with that particular precedence order. But you don't know which precedence order you should use.
Unfortunately there is not an International High Commission on Operator Precedence, and neither does any religion that I know of offer a spiritual reference including divinely inspired operator precedence rules.
For some operators, the precedence order is reasonably clear: BODMAS was adopted for good reasons, for example, but the main reason is that most people already wrote arithmetic according to those rules. You can certainly make a plausible mathematical argument based on group theory as to why it seems natural to give multiplication precedence over addition, but there will always be the suspicion that the argument was produced post facto to justify an already-made decision. Be that as it may, the argument for making multiplication bind more tightly than addition also works for giving bitwise-and precedence over bitwise-or, or boolean-and precedence over boolean-or. (Although I note that C compiler writers don't trust that programmers will have that particular intuition, since most modern compilers issue a warning if you omit redundant parentheses in such expressions.)
One precedence relationship which I think just about everyone would agree was intuitive is giving arithmetic operators precedence over comparison operators, and comparison operators precedence operators precedence over boolean operators. Most programmers would find a language which chose to interpret a > b + 7 as meaning "add seven to the boolean value of comparing a with b". Similarly, it might be considered outrageous to interpret a > 0 || b > 0 in any way other than (a > 0) || (b > 0). But other precedence choices seem a lot more arbitrary, and not all languages make them the same. (For example, the relative precedence of "not equal to" and "greater than", or of "exclusive or" and "inclusive or".
So what's a novice language designer to do? Well, first appeal to your own intuitions, particularly if you have used the operators in question a lot. Also, ask your friends, contacts, and potential language users what they think. Look at what other languages have done, but look with a critical eye. Search for complaints on language-specific forums. It may well be that the original language designer made an unfortunate (or even stupid) choice, and that choice now cannot be changed because it would break too many existing programs. (Which is why it is a good thing that you are worrying about operator precedence: getting it wrong can bring serious future problems.)
Direct experimentation can help, too, particularly with operators which are rarely used together. Define a plausible expression using the operators, and then write it out two (or more) times leaving out parentheses according to the various possible rules. Which of those looks more understandable? (Again, recruit your friends and ask them the same question.)
If, in the end, you really cannot decide what the precedence order between a particular pair of operators is, you can consider one more solution: Make parentheses mandatory for these operators. (Which is the intent of the C compiler whining about leaving out redundant parentheses in boolean expressions.)

Unbalanced tree. Most probably caused by unbalanced markers

I'm working on an IntelliJ plugin which will add support for a custom language. Currently, I'm still just trying to get used to grammar kit and how plugin development works.
To that end, I've started working on a parser for basic expressions:
(1.0 * 5 + (3.44 ^ -2))
Following the documentation provided by JetBrains, I've attempted to write BNF and JFlex grammars for the above example.
The generated code for these grammars compiles, but when the plugin is run, it crashes with:
java.lang.Throwable: Unbalanced tree. Most probably caused by unbalanced markers. Try calling setDebugMode(true) against PsiBuilder passed to identify exact location of the problem
Enabling debug mode prints a long list of traces:
java.lang.Throwable: Created at the following trace.
at com.intellij.lang.impl.MarkerOptionalData.notifyAllocated(MarkerOptionalData.java:83)
at com.intellij.lang.impl.PsiBuilderImpl.createMarker(PsiBuilderImpl.java:820)
at com.intellij.lang.impl.PsiBuilderImpl.precede(PsiBuilderImpl.java:457)
at com.intellij.lang.impl.PsiBuilderImpl.access$700(PsiBuilderImpl.java:51)
at com.intellij.lang.impl.PsiBuilderImpl$StartMarker.precede(PsiBuilderImpl.java:361)
java.lang.Throwable: Created at the following trace.
at com.intellij.lang.impl.MarkerOptionalData.notifyAllocated(MarkerOptionalData.java:83)
at com.intellij.lang.impl.PsiBuilderImpl.createMarker(PsiBuilderImpl.java:820)
at com.intellij.lang.impl.PsiBuilderImpl.mark(PsiBuilderImpl.java:810)
at com.intellij.lang.impl.PsiBuilderAdapter.mark(PsiBuilderAdapter.java:107)
at com.intellij.lang.parser.GeneratedParserUtilBase.enter_section_(GeneratedParserUtilBase.java:432)
at com.example.intellij.mylang.MyLangParser.exp_expr_0(MyLangParser.java:154)
java.lang.Throwable: Created at the following trace.
at com.intellij.lang.impl.MarkerOptionalData.notifyAllocated(MarkerOptionalData.java:83)
at com.intellij.lang.impl.PsiBuilderImpl.createMarker(PsiBuilderImpl.java:820)
at com.intellij.lang.impl.PsiBuilderImpl.precede(PsiBuilderImpl.java:457)
at com.intellij.lang.impl.PsiBuilderImpl.access$700(PsiBuilderImpl.java:51)
at com.intellij.lang.impl.PsiBuilderImpl$StartMarker.precede(PsiBuilderImpl.java:361)
Even with these debug logs, I still don't understand what's going wrong. I've tried googling around, and I can't even figure out what 'marker' means in this context...
Here's the BNF grammar:
root ::= expr *
expr ::= add_expr
left add_expr ::= add_op mod_expr | mod_expr
private add_op ::= '+'|'-'
left mod_expr ::= mod_op int_div_expr | int_div_expr
private mod_op ::= 'mod'
left int_div_expr ::= int_div_op mult_expr | mult_expr
private int_div_op ::= '\'
left mult_expr ::= mult_op unary_expr | unary_expr
private mult_op ::= '*'|'/'
unary_expr ::= '-' unary_expr | '+' unary_expr | exp_expr
left exp_expr ::= exp_op exp_expr | value
private exp_op ::= '^'
// TODO: Add support for left_expr. Example: "someVar.x"
value ::= const_expr | '(' expr ')'
const_expr ::= bool_literal | integer_literal | FLOAT_LITERAL | STRING_LITERAL | invalid
bool_literal ::= 'true' | 'false'
integer_literal ::= INT_LITERAL | HEX_LITERAL
I figured out the issue. It had nothing to do with my BNF. The problem was that in my jflex file I was calling yybegin(YYINITIAL) while already in the YYINITIAL state.

YACC grammar for arithmetic expressions, with no surrounding parentheses

I want to write the rules for arithmetic expressions in YACC; where the following operations are defined:
+ - * / ()
But, I don't want the statement to have surrounding parentheses. That is, a+(b*c) should have a matching rule but (a+(b*c)) shouldn't.
How can I achieve this?
The motive:
In my grammar I define a set like this: (1,2,3,4) and I want (5) to be treated as a 1-element set. The ambiguity causes a reduce/reduce conflict.
Here's a pretty minimal arithmetic grammar. It handles the four operators you mention and assignment statements:
stmt: ID '=' expr ';'
expr: term | expr '-' term | expr '+' term
term: factor | term '*' factor | term '/' factor
factor: ID | NUMBER | '(' expr ')' | '-' factor
It's easy to define "set" literals:
set: '(' ')' | '(' expr_list ')'
expr_list: expr | expr_list ',' expr
If we assume that a set literal can only appear as the value in an assignment statement, and not as the operand of an arithmetic operator, then we would add a syntax for "expressions or set literals":
value: expr | set
and modify the syntax for assignment statements to use that:
stmt: ID '=' value ';'
But that leads to the reduce/reduce conflict you mention because (5) could be an expr, through the expansion expr → term → factor → '(' expr ')'.
Here are three solutions to this ambiguity:
1. Explicitly remove the ambiguity
Disambiguating is tedious but not particularly difficult; we just define two kinds of subexpression at each precedence level, one which is possibly parenthesized and one which is definitely not surrounded by parentheses. We start with some short-hand for a parenthesized expression:
paren: '(' expr ')'
and then for each subexpression type X, we add a production pp_X:
pp_term: term | paren
and modify the existing production by allowing possibly parenthesized subexpressions as operands:
term: factor | pp_term '*' pp_factor | pp_term '/' pp_factor
Unfortunately, we will still end up with a shift/reduce conflict, because of the way expr_list was defined. Confronted with the beginning of an assignment statement:
a = ( 5 )
having finished with the 5, so that ) is the lookahead token, the parser does not know whether the (5) is a set (in which case the next token will be a ;) or a paren (which is only valid if the next token is an operand). This is not an ambiguity -- the parse could be trivially resolved with an LR(2) parse table -- but there are not many tools which can generate LR(2) parsers. So we sidestep the issue by insisting that the expr_list has to have two expressions, and adding paren to the productions for set:
set: '(' ')' | paren | '(' expr_list ')'
expr_list: expr ',' expr | expr_list ',' expr
Now the parser doesn't need to choose between expr_list and expr in the assignment statement; it simply reduces (5) to paren and waits for the next token to clarify the parse.
So that ends up with:
stmt: ID '=' value ';'
value: expr | set
set: '(' ')' | paren | '(' expr_list ')'
expr_list: expr ',' expr | expr_list ',' expr
paren: '(' expr ')'
pp_expr: expr | paren
expr: term | pp_expr '-' pp_term | pp_expr '+' pp_term
pp_term: term | paren
term: factor | pp_term '*' pp_factor | pp_term '/' pp_factor
pp_factor: factor | paren
factor: ID | NUMBER | '-' pp_factor
which has no conflicts.
2. Use a GLR parser
Although it is possible to explicitly disambiguate, the resulting grammar is bloated and not really very clear, which is unfortunate.
Bison can generated GLR parsers, which would allow for a much simpler grammar. In fact, the original grammar would work almost without modification; we just need to use the Bison %dprec dynamic precedence declaration to indicate how to disambiguate:
%glr-parser
%%
stmt: ID '=' value ';'
value: expr %dprec 1
| set %dprec 2
expr: term | expr '-' term | expr '+' term
term: factor | term '*' factor | term '/' factor
factor: ID | NUMBER | '(' expr ')' | '-' factor
set: '(' ')' | '(' expr_list ')'
expr_list: expr | expr_list ',' expr
The %dprec declarations in the two productions for value tell the parser to prefer value: set if both productions are possible. (They have no effect in contexts in which only one production is possible.)
3. Fix the language
While it is possible to parse the language as specified, we might not be doing anyone any favours. There might even be complaints from people who are surprised when they change
a = ( some complicated expression ) * 2
to
a = ( some complicated expression )
and suddenly a becomes a set instead of a scalar.
It is often the case that languages for which the grammar is not obvious are also hard for humans to parse. (See, for example, C++'s "most vexing parse").
Python, which uses ( expression list ) to create tuple literals, takes a very simple approach: ( expression ) is always an expression, so a tuple needs to either be empty or contain at least one comma. To make the latter possible, Python allows a tuple literal to be written with a trailing comma; the trailing comma is optional unless the tuple contains a single element. So (5) is an expression, while (), (5,), (5,6) and (5,6,) are all tuples (the last two are semantically identical).
Python lists are written between square brackets; here, a trailing comma is again permitted, but it is never required because [5] is not ambiguous. So [], [5], [5,], [5,6] and [5,6,] are all lists.

Relation between grammar and operator associativity

Some compiler books / articles / papers talk about design of a grammar and the relation of its operator's associativity. I'm a big fan of top-down, especially recursive descent, parsers and so far most (if not all) compilers I've written use the following expression grammar:
Expr ::= Term { ( "+" | "-" ) Term }
Term ::= Factor { ( "*" | "/" ) Factor }
Factor ::= INTEGER | "(" Expr ")"
which is an EBNF representation of this BNF:
Expr ::= Term Expr'
Expr' ::= ( "+" | "-" ) Term Expr' | ε
Term ::= Factor Term'
Term' ::= ( "*" | "/" ) Factor Term' | ε
Factor = INTEGER | "(" Expr ")"
According to what I read, some regards this grammar as being "wrong" due to the change of operator associativity (left to right for those 4 operators) proven by the growing parse tree to the right instead of left. For a parser implemented through attribute grammar, this might be true as l-attribute value requires that this value created first then passed to child nodes. however, when implementing with normal recursive descent parser, it's up to me whether to construct this node first then pass to child nodes (top-down) or let child nodes be created first then add the returned value as the children of this node (passed in this node's constructor) (bottom-up). There should be something I miss here because I don't agree with the statement saying this grammar is "wrong" and this grammar has been used in many languages esp. Wirthian ones. Usually (or all?) the reading that says it promotes LR parsing instead of LL.
I think the issue here is that a language has an abstract syntax which is just like:
E ::= E + E | E - E | E * E | E / E | Int | (E)
but this is actually implemented via a concrete syntax which is used to specify associativity and precedence. So, if you're writing a recursive decent parse, you're implicitly writing the concrete syntax into it as you go along and that's fine, though it may be good to specify it exactly as a phrase-structured grammar as well!
There are a couple of issues with your grammar if it is to be a fully-fledged concrete grammar. First of all, you need to add productions to just 'go to the next level down', so relaxing your syntax a bit:
Expr ::= Term + Term | Term - Term | Term
Term ::= Factor * Factor | Factor / Factor | Factor
Factor ::= INTEGER | (Expr)
Otherwise there's no way to derive valid sentences starting from the start symbol (in this case Expr). For example, how would you derive '1 * 2' without those extra productions?
Expr -> Term
-> Factor * Factor
-> 1 * Factor
-> 1 * 2
We can see the other grammar handles this in a slightly different way:
Expr -> Term Expr'
-> Factor Term' Expr'
-> 1 Term' Expr'
-> 1 * Factor Term' Expr'
-> 1 * 2 Term' Expr'
-> 1 * 2 ε Expr'
-> 1 * 2 ε ε
= 1 * 2
but this achieves the same effect.
Your parser is actually non-associative. To see this ask how E + E + E would be parsed and find that it couldn't. Whichever + is consumed first, we get E on one side and E + E on the other, but then we're trying to parse E + E as a Term which is not possible. Equivalently, think about deriving that expression from the start symbol, again not possible.
Expr -> Term + Term
-> ? (can't get another + in here)
The other grammar is left-associative ebcase an arbitrarily long sting of E + E + ... + E can be derived.
So anyway, to sum up, you're right that when writing the RDP, you can implement whatever concrete version of the abstract syntax you like and you probably know a lot more about that than me. But there are these issues when trying to produce the grammar which describes your RDP precisely. Hope that helps!
To get associative trees, you really need to have the trees formed with the operator as the subtree root node, with children having similar roots.
Your implementation grammar:
Expr ::= Term Expr'
Expr' ::= ( "+" | "-" ) Term Expr' | ε
Term ::= Factor Term'
Term' ::= ( "*" | "/" ) Factor Term' | ε
Factor ::= INTEGER | "(" Expr ")"
must make that awkward; if you implement recursive descent on this, the Expr' routine has no access to the "left child" and so can't build the tree. You can always patch this up by passing around pieces (in this case, passing tree parts up the recursion) but that just seems awkward. You could have chosen this instead as a grammar:
Expr ::= Term ( ("+"|"-") Term )*;
Term ::= Factor ( ( "*" | "/" ) Factor )* ;
Factor ::= INTEGER | "(" Expr ")"
which is just as easy (easier?) to code recursive descent-wise, but now you can form the trees you need without trouble.
This doesn't really get you associativity; it just shapes the trees so that it could be allowed. Associativity means that the tree ( + (+ a b) c) means the same thing as (+ a (+ b c)); its actually a semantic property (sure doesn't work for "-" but the grammar as posed can't distinguish).
We have a tool (the DMS Software Reengineering Toolkit) that includes parsers and term-rewriting (using source-to-source transformations) in which the associativity is explicitly expressed. We'd write your grammar:
Expr ::= Term ;
[Associative Commutative] Expr ::= Expr "+" Term ;
Expr ::= Expr "-" Term ;
Term ::= Factor ;
[Associative Commutative] Term ::= Term "*" Factor ;
Term ::= Term "/" Factor ;
Factor ::= INTEGER ;
Factor ::= "(" Expr ")" ;
The grammar seems longer and clumsier this way, but it in fact allows us to break out the special cases and mark them as needed. In particular, we can now distinguish operators that are associative from those that are not, and mark them accordingly. With that semantic marking, our tree-rewrite engine automatically accounts for associativity and commutativity. You can see a full example of such DMS rules being used to symbolically simplify high-school algebra using explicit rewrite rules over a typical expression grammar that don't have to account for such semantic properties. That is built into the rewrite engine.

Resources