In the following example, the order matters in terms of precedence:
grammar Precedence;
root: expr EOF;
expr
: expr ('+'|'-') expr
| expr ('*' | '/') expr
| Atom
;
Atom: [0-9]+;
WHITESPACE: [ \t\r\n] -> skip;
For example, on the expression 1+1*2 the above would produce the following parse tree which would evaluate to (1+1)*2=4:
Whereas if I changed the first and second alternations in the expr I would then get the following parse tree which would evaluate to 1+(1*2)=3:
What are the 'rules' then for when it actually matters where the ordering in an alternation occurs? Is this only relevant if it one of the 'edges' of the alternation recursively calls the expr? For example, something like ~ expr or expr + expr would matter, but something like func_call '(' expr ')' or Atom would not. Or, when is it important to order things for precedence?
If ANTLR did not have the rule to give precedence to the first alternative that could match, then either of those trees would be valid interpretations of your input (and means the grammar is technically ambiguous).
However, when there are two alternatives that could be used to match your input, then ANTLR will use the first alternative to resolve the ambiguity, in this case establishing operator precedence, so typically you would put the multiplication/division operator before the addition/subtraction, since that would be the traditional order of operations:
grammar Precedence;
root: expr EOF;
expr
: expr ('+'|'-') expr
| expr ('*' | '/') expr
| Atom
;
Atom: [0-9]+;
WHITESPACE: [ \t\r\n] -> skip;
Most grammar authors will just put them in precedence order, but things like Atoms or parenthesized exprs won’t really care about the order since there’s only a single alternative that could be used.
I'm trying to write a grammar for Prolog interpreter. When I run grun from command line on input like "father(john,mary).", I get a message saying "no viable input at 'father(john,'" and I don't know why. I've tried rearranging rules in my grammar, used different entry points etc., but still get the same error. I'm not even sure if it's caused by my grammar or something else like antlr itself. Can someone point out what is wrong with my grammar or think of what could be the cause if not the grammar?
The commands I ran are:
antlr4 -no-listener -visitor Expr.g4
javac *.java
grun antlr.Expr start tests/test.txt -gui
And this is the resulting parse tree:
Here is my grammar:
grammar Expr;
#header{
package antlr;
}
//start rule
start : (program | query) EOF
;
program : (rule_ '.')*
;
query : conjunction '?'
;
rule_ : compound
| compound ':-' conjunction
;
conjunction : compound
| compound ',' conjunction
;
compound : Atom '(' elements ')'
| '.(' elements ')'
;
list : '[]'
| '[' element ']'
| '[' elements ']'
;
element : Term
| list
| compound
;
elements : element
| element ',' elements
;
WS : [ \t\r\n]+ -> skip ;
Atom : [a-z]([a-z]|[A-Z]|[0-9]|'_')*
| '0'
;
Var : [A-Z]([a-z]|[A-Z]|[0-9]|'_')*
;
Term : Atom
| Var
;
The lexer will always produce the same tokens for any input. The lexer does not "listen" to what the parser is trying to match. The rules the lexer applies are quite simple:
try to match as many characters as possible
when 2 or more lexer rules match the same amount of characters, let the rule defined first "win"
Because of the 2nd rule, the rule Term will never be matched. And moving the Term rule above Var and Atom will cause the latter rules to be never matched. The solution: "promote" the Term rule to a parser rule:
start : (program | query) EOF
;
program : (rule_ '.')*
;
query : conjunction '?'
;
rule_ : compound (':-' conjunction)?
;
conjunction : compound (',' conjunction)?
;
compound : Atom '(' elements ')'
| '.' '(' elements ')'
;
list : '[' elements? ']'
;
element : term
| list
| compound
;
elements : element (',' element)*
;
term : Atom
| Var
;
WS : [ \t\r\n]+ -> skip ;
Atom : [a-z] [a-zA-Z0-9_]*
| '0'
;
Var : [A-Z] [a-zA-Z0-9_]*
;
I have checked similar questions surrounding this issue but none seems to provide a solution to my version of the problem.
I just started Antlr4 recently and all has been going nicely until I hit this particular roadblock.
My grammar is a basic math expression grammar but for some reason I noticed the generated parser(?) is unable to walk from paser-rule "equal" to paser-rule "expr", in order to reach lexer-rule "NAME".
grammar MathCraze;
NUM : [0-9]+ ('.' [0-9]+)?;
WS : [ \t]+ -> skip;
NL : '\r'? '\n' -> skip;
NAME: [a-zA-Z_][a-zA-Z_0-9]*;
ADD: '+';
SUB : '-';
MUL : '*';
DIV : '/';
POW : '^';
equal
: add # add1
| NAME '=' equal # assign
;
add
: mul # mul1
| add op=('+'|'-') mul # addSub
;
mul
: exponent # power1
| mul op=('*'|'/') exponent # mulDiv
;
exponent
: expr # expr1
| expr '^' exponent # power
;
expr
: NUM # num
| NAME # name
| '(' add ')' # parens
;
If I pass a word as input, sth like "variable", the parser throws the error above, but if I pass a number as input (say "78"), the parser walks the tree successfully (i.e, from rule "equal" to "expr").
equal equal
| |
add add
| |
mul mul
| |
exponent exponent
| |
expr expr
| |
NUM NAME
| |
"78" # No Error "variable" # Error! Tree walk doesn't reach here.
I've checked for every type of ambiguity I know of, so I'm probably missing something here.
I'm using Antlr5.6 by the way and I will appreciate if this problem gets solved. Thanks in advance.
Your style of expression hierarchy is the one we use in parsers written by hand or in ANTLR v3, from low to high precedence.
As Raven said, ANTLR 4 is much more powerful. Note the <assoc = right> specification in the power rule, which is usually right-associative.
grammar Question;
question
: line+ EOF
;
line
: expr NL
| assign NL
;
assign
: NAME '=' expr # assignSingle
| NAME '=' assign # assignMulti
;
expr // from high to low precedence
: <assoc = right> expr '^' expr # power
| expr op=( '*' | '/' ) expr # mulDiv
| expr op=( '+' | '-' ) expr # addSub
| '(' expr ')' # parens
| atom_r # atom
;
atom_r
: NUM
| NAME
;
NAME: [a-zA-Z_][a-zA-Z_0-9]*;
NUM : [0-9]+ ('.' [0-9]+)?;
WS : [ \t]+ -> skip;
NL : [\r\n]+ ;
Run with the -gui option to see the parse tree :
$ echo $CLASSPATH
.:/usr/local/lib/antlr-4.6-complete.jar
$ alias grun
alias grun='java org.antlr.v4.gui.TestRig'
$ grun Question question -gui data.txt
and this data.txt file :
variable
78
a + b * c
a * b + c
a = 8 + (6 * 9)
a ^ b
a ^ b ^ c
7 * 2 ^ 5
a = b = c = 88
.
Added
Using your original grammar and starting with the equal rule, I have the following error :
$ grun Q2 equal -tokens data.txt
[#0,0:7='variable',<NAME>,1:0]
[#1,9:10='78',<NUM>,2:0]
...
[#41,89:88='<EOF>',<EOF>,10:0]
line 2:0 no viable alternative at input 'variable78'
If I start with rule expr, there is no error :
$ grun Q2 expr -tokens data.txt
[#0,0:7='variable',<NAME>,1:0]
...
[#41,89:88='<EOF>',<EOF>,10:0]
$
Run grun with the -gui option and you'll see the difference :
running with expr, the input token variable is catched in NAME, rule expr is satisfied and terminates;
running with equal it's all in error. The parser tries the first alternative equal -> add -> mul -> exponent -> expr -> NAME => OK. It consumes the token variable and tries to do something with the next token 78. It rolls back in each rule, see if it can do something with the alt of rule, but each alt requires an operator. Thus it arrives in equal and starts again with the token variable, this time using the alt | NAME '='. NAME consumes the token, then the rule requires '=', but the input is 78 and does not satisfies it. As there is no other choice, it says there is no viable alternative.
$ grun Q2 equal -tokens data.txt
[#0,0:7='variable',<NAME>,1:0]
[#1,8:7='<EOF>',<EOF>,1:8]
line 1:8 no viable alternative at input 'variable'
If variable is the only token, same reasoning : first alternative equal -> add -> mul -> exponent -> expr -> NAME => OK, consumes variable, back to equal, tries the alt which requires '=', but the input is at EOF. That's why it says there is no viable alternative.
$ grun Q2 equal -tokens data.txt
[#0,0:1='78',<NUM>,1:0]
[#1,2:1='<EOF>',<EOF>,1:2]
If 78 is the only token, do the same reasoning : first alternative equal -> add -> mul -> exponent -> expr -> NUM => OK, consumes 78, back to equal. The alternative is not an option. Satisfied ? oops, what about EOF.
Now let's add a NUM alt to equal :
equal
: add # add1
| NAME '=' equal # assign
| NUM '=' equal # assignNum
;
$ grun Q2 equal -tokens data.txt
[#0,0:1='78',<NUM>,1:0]
[#1,2:1='<EOF>',<EOF>,1:2]
line 1:2 no viable alternative at input '78'
First alternative equal -> add -> mul -> exponent -> expr -> NUM => OK, consumes 78, back to equal. Now there is also an alt for NUM, starts again, this time using the alt | NUM '='. NUM consumes the token 78,
then the parser requires '=', but the input is at EOF, hence the message.
Now let's add a new rule with EOF and let's run the grammar from all :
all : equal EOF ;
$ grun Q2 all -tokens data.txt
[#0,0:1='78',<NUM>,1:0]
[#1,2:1='<EOF>',<EOF>,1:2]
$ grun Q2 all -tokens data.txt
[#0,0:7='variable',<NAME>,1:0]
[#1,8:7='<EOF>',<EOF>,1:8]
The input corresponds to the grammar, and there is no more message.
Although I can't answer your question about why the parser can't reach NAME in expr I'd like to point out that with Antlr4 you can use direct left recursion in your rule specification which makes your grammar more compact and omproves readability.
With that in mind your grammar could be rewritten as
math:
assignment
| expression
;
assignment:
ID '=' (assignment | expression)
;
expression:
expression '^' expression
| expression ('*' | '/') expression
| expression ('+' | '-') expression
| NAME
| NUM
;
That grammar hapily takes a NAME as part of an expression so I guess it would solve your problem.
If you're really interested in why it didn't work with your grammar then I'd first check if the lexer has matched the input into the expected tokens. Afterwards I would have a look at the parse tree to see what the parser is making of the given token sequence and then trying to do the parsing manually accoding to your grammar and during that you should be able to find the point at which the parser does something different from what you'd expect it to do.
I am defining a grammar in ANTLR that will express an expression which includes logical operator and parenthesis together.
Here is the grammar
grammar simpleGrammar;
/* This will be the entry point of the parser. */
parse
:
expression EOF
;
expression
:
expression binOp expression | ID | unOp (expression) | '(' expression ')'
;
binOp
:
('AND' | 'OR')
;
unOp
:
'NOT'
;
ID :
('a'..'z' | 'A'..'Z')+
;
The defined grammar can able to express parse tree without parenthesis but when I input an example with parenthesis for example, (Apple OR Bananana)AND Orange
It is showing MismatchedTokenException
So, It will be really appreciated if someone explains how to define the grammar in order to express the parenthesis.
You forgot to tell ANTLR what to do with whitespace. For example:
WS : [ \t\r\n] -> skip;
Add this and you grammar will work.
As a side note, your grammar has the same precedence for the AND and OR operators. And these operators have higher precedence than NOT. As this goes against conventional rules, I'd advise you to write your expression rule like this instead:
expression
: '(' expression ')' # parenExp
| 'NOT' expression # notExpr
| expression 'AND' expression # andExpr
| expression 'OR' expression # orExpr
| ID # atomExpr
;
I would like to be able to write a "meta-rule" in ANTLR4 that takes a rule as an input argument and performs a set modification to that rule. Here's an example grammar:
grammar G;
WS: [ \t\n\r] + -> skip;
CHAR: [a-z];
term: (CHAR)+;
sum: term ('+' term)+;
pterm: '(' term ')' | '(' pterm ')';
psum: '(' sum ')' | '(' psum ')';
expr: term | sum | pterm | psum;
The rules for pterm and psum perform the same action on term and sum, enclosing them in possibly nested parentheses. I would like to be able to replace the last three lines above with something like the following:
enclose[rule]: '(' rule ')' | '(' enclose(rule) ')';
expr: term | sum | enclose(term) | enclose(sum);
Is there a way to construct a meta-rule like this?
The short answer is, no.
Better to resolve by refactoring the grammar and identifying the structurally significant terms:
expr: LPAREN sum RPAREN | LPAREN expr RPAREN ;
sum : term ('+' term)* ; // changed to Kleene star
term: CHAR+ ;
LPAREN : '(' ;
RPAREN : ')' ;
CHAR : [a-z] ;
WS : [ \t\n\r]+ -> skip ;
The sum rule will consume all terms, so the expr rule only needs to handle sums.