ANTLR 4 mismatched input on parsing - parsing

I'm on my first steps using ANTLR 4 with IntelliJ. I am trying to create a simple Recursive Climbing Parser for mathematical expressions. I get an error
line 1:0 mismatched input '3' expecting {VARIABLE; REALNUM, INTNUM}
It seems like the lexer does not correctly turn the 3 into the token, the parser uses, but I can not find the Problem there.
Lexer:
lexer grammar testLexer;
PLUS: '+';
MINUS: '-';
TIMES: '*';
DIV: '/';
SIN: 'sin'|'Sin'|'SIN';
COS: 'cos'|'Cos'|'COS';
TAN: 'tan'|'Tan'|'TAN';
LN: 'ln'|'LN'|'Ln';
LOG: 'Log'|'log'|'LOG';
SQRT: 'sqrt'|'Sqrt'|'SQRT';
LBRACE: '(';
RBRACE: ')';
POW: '^';
SPACE: ' ' -> skip;
EQUAL: '=';
VARIABLE: [a-zA-Z][a-zA-Z0-9]*;
INTNUM: [0-9]+;
REALNUM: [0-9]+[,|.][0-9]+;
WS: [\r\t\n]+ -> skip;
SEMICOLON: ';';
Parser:
parser
grammar testParser;
expression returns [double value]
: exp=additiveExpression {$value = $exp.value;};
equalityExpression returns [double value]
: m1 = additiveExpression (EQUAL additiveExpression)* {$value = $m1.value;};
additiveExpression returns [double value]
: m2 = multiplikativeExpression {$value = $m2.value;}
(PLUS m1=multiplikativeExpression {$value += $m1.value;}
|MINUS m1=multiplikativeExpression {$value -= $m1.value;}
)* ;
multiplikativeExpression returns [double value]
: m3 = powExpression {$value = $m3.value;}
(TIMES powExpression {$value *= $m3.value;}
|DIV powExpression {$value /= $m3.value;}
)* ;
powExpression returns [double value]
: (bracedExpression)
(POW (m4=expression) {$value = Math.pow($value, $m4.value);}
)*;
bracedExpression returns [double value]
: (LBRACE m5 = expression RBRACE {$value = $m5.value;}
|LBRACE m6 = unaryExpression RBRACE {$value = $m6.value;}
| m7 =unaryExpression {$value = $m7.value;});
unaryExpression returns [double value]
: m7= atomExpression {$value = $m7.value;}
| (SIN m6=bracedExpression {$value = Math.sin($m6.value);}
|COS m6=bracedExpression {$value = Math.cos($m6.value);}
|TAN m6=bracedExpression {$value = Math.tan($m6.value);}
|LOG m6=bracedExpression {$value = Math.log($m6.value);}
|SQRT m6=bracedExpression {$value = Math.sqrt($m6.value);}
)
|EOF;
atomExpression returns [double value]
: VARIABLE {$value = 1;}
|m7 = REALNUM {$value = Double.parseDouble($m7.text);}
| m7 = INTNUM {$value = Integer.parseInt($m7.text);};
The input is just the simple term 3, but the error also occurs on longer input strings like 2+3.

Your example lexes just fine for me. Here is the TestRig output with -tokens turned on:
C:\prj\ANTLR_SO_BENCH\Bench>java org.antlr.v4.gui.TestRig Grammar1 program -tokens SOURCE.txt
[#0,0:0='3',<INTNUM>,1:0]
[#1,1:1='+',<'+'>,1:1]
[#2,2:2='2',<INTNUM>,1:2]
[#3,5:4='<EOF>',<EOF>,2:0]
And REALNUM tokens work too:
C:\prj\ANTLR_SO_BENCH\Bench>java org.antlr.v4.gui.TestRig Grammar1 program -tokens SOURCE.txt
[#0,0:3='3.14',<REALNUM>,1:0]
[#1,5:5='*',<'*'>,1:5]
[#2,7:9='2.1',<REALNUM>,1:7]
[#3,12:11='<EOF>',<EOF>,2:0]
So I'm not sure anymore what to recommend based on your question.

Related

ANTLR4 precedence of operator

This is my grammar:
grammar FOOL;
#header {
import java.util.ArrayList;
}
#lexer::members {
public ArrayList<String> lexicalErrors = new ArrayList<>();
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
prog : exp SEMIC #singleExp
| let exp SEMIC #letInExp
| (classdec)+ SEMIC (let)? exp SEMIC #classExp
;
classdec : CLASS ID ( EXTENDS ID )? (LPAR (vardec ( COMMA vardec)*)? RPAR)? (CLPAR ((fun SEMIC)+)? CRPAR)?;
let : LET (dec SEMIC)+ IN ;
vardec : type ID ;
varasm : vardec ASM exp ;
fun : type ID LPAR ( vardec ( COMMA vardec)* )? RPAR (let)? exp ;
dec : varasm #varAssignment
| fun #funDeclaration
;
type : INT
| BOOL
| ID
;
exp : left=term (operator=(PLUS | MINUS) right=term)*
;
term : left=factor (operator=(TIMES | DIV) right=factor)*
;
factor : left=value (operator=(EQ | LESSEQ | GREATEREQ | GREATER | LESS | AND | OR ) right=value)*
;
value : MINUS?INTEGER #intVal
| (NOT)? ( TRUE | FALSE ) #boolVal
| LPAR exp RPAR #baseExp
| IF cond=exp THEN CLPAR thenBranch=exp CRPAR (ELSE CLPAR elseBranch=exp CRPAR)? #ifExp
| MINUS?ID #varExp
| THIS #thisExp
| funcall #funExp
| (ID | THIS) DOT funcall #methodExp
| NEW ID ( LPAR (exp (COMMA exp)* )? RPAR)? #newExp
| PRINT ( exp ) #print
;
/* PRINT LPAR exp RPAR */
funcall
: ID ( LPAR (exp (COMMA exp)* )? RPAR )
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
SEMIC : ';' ;
COLON : ':' ;
COMMA : ',' ;
EQ : '==' ;
ASM : '=' ;
PLUS : '+' ;
MINUS : '-' ;
TIMES : '*' ;
DIV : '/' ;
TRUE : 'true' ;
FALSE : 'false' ;
LPAR : '(' ;
RPAR : ')' ;
CLPAR : '{' ;
CRPAR : '}' ;
IF : 'if' ;
THEN : 'then' ;
ELSE : 'else' ;
PRINT : 'print' ;
LET : 'let' ;
IN : 'in' ;
VAR : 'var' ;
FUN : 'fun' ;
INT : 'int' ;
BOOL : 'bool' ;
CLASS : 'class' ;
EXTENDS : 'extends' ;
THIS : 'this' ;
NEW : 'new' ;
DOT : '.' ;
LESSEQ : ('<=' | '=<') ;
GREATEREQ : ('>=' | '=>') ;
GREATER: '>' ;
LESS : '<' ;
AND : '&&' ;
OR : '||' ;
NOT : '!' ;
//Numbers
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+;
//IDs
fragment CHAR : 'a'..'z' |'A'..'Z' ;
ID : CHAR (CHAR | DIGIT)* ;
//ESCAPED SEQUENCES
WS : (' '|'\t'|'\n'|'\r')-> skip;
LINECOMENTS : '//' (~('\n'|'\r'))* -> skip;
BLOCKCOMENTS : '/*'( ~('/'|'*')|'/'~'*'|'*'~'/'|BLOCKCOMENTS)* '*/' -> skip;
ERR_UNKNOWN_CHAR
: . { lexicalErrors.add("UNKNOWN_CHAR " + getText()); }
;
I think that there is a problem in the grammar concerning the precedence of operator.
In particular, this one
let
int x = (5-2)+4;
in
print x;
prints 7, while this one:
let
int x = 5-2+4;
in
print x;
prints 9.
Why the first one works? How can I make the second one working, only changing the grammar?
I think there is something to change in exp, term or factor.
This is the first parse tree http://it.tinypic.com/r/2nj8tqw/9 .
This is the second parse tree http://it.tinypic.com/r/2iv02z6/9 .
exp : left=term (operator=(PLUS | MINUS) right=exp)?
This produces parse tree that is causing it. Simply put, 5 - 2 + 4 will be parsed as:
term PLUS exp
2 term MINUS exp
2 term
4
This should help, although you'll have to change the evaluation logic:
exp : left=term (operator=(PLUS | MINUS) right=term)*
Same for factor and any other possible binary operations.

ANTLR won't parse this easy input for simple calculator grammar

grammar TestCSharpParser;
options {
language=CSharp3;
}
#parser::namespace { Demo.Antlr }
#lexer::namespace { Demo.Antlr }
parse returns [double value]
: exp EOF {$value = $exp.value;}
;
exp returns [double value]
: addExp {$value = $addExp.value;}
;
addExp returns [double value]
: a=mulExp {$value = $a.value;}
( '+' b=mulExp {$value += $b.value;}
| '-' b=mulExp {$value -= $b.value;}
)*
;
mulExp returns [double value]
: a=unaryExp {$value = $a.value;}
( '*' b=unaryExp {$value *= $b.value;}
| '/' b=unaryExp {$value /= $b.value;}
)*
;
unaryExp returns [double value]
: '-' atom {$value = -1.0 * $atom.value;}
| atom {$value = $atom.value;}
;
atom returns [double value]
: Number {$value = Double.Parse($Number.Text, CultureInfo.InvariantCulture);}
| '(' exp ')' {$value = $exp.value;}
;
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
Space
: (' ' | '\t' | '\r' | '\n'){$channel = HIDDEN;}
;
The grammar won't parse the simple statement 4/5 or (4/5) tried this using ANTLRWorks.
Does anyone have any idea why this is happening? This to my mind should work correctly.
It keeps giving me the NoViableAltException.
I see several problems related to the use of the CSharp3 target.
The CSharp2 and CSharp3 targets define the constant Hidden instead of HIDDEN
ANTLRWorks cannot be used to generate parsers for grammars targeting the CSharp2 or CSharp3 targets. The parser must be generated either by MSBuild (preferred) or by using Antlr3.exe. These are documented on the ANTLR 3 C# Releases wiki page.
ANTLRWorks cannot be used to test parsers generated for the CSharp2 or CSharp3 targets. Any results reported by the interpreter or debugger cannot be trusted.

antlr unparsed tokens at the end does not generate error

Getting an error is usually unpleasant, but sometimes it is also unpleasant when you expect one and do not get it. My parser does not generate error for this string "2)". please suggest me a solution?
grammar BasicArithmetic;
options {
language = Java;
output = AST;
}
expression returns [double value]:
p1=pm{$value=$pm.value;};
// never never reference FRAGMENTS from parsers
pm returns [double value]:
p1=dm{$value = $p1.value;}
(PLUS^p2=dm{$value += $p2.value;}|
MINUS^p2=dm{$value -= $p2.value;}
)*;
dm returns [double value]:
p1=atom {$value = $p1.value;}
( DIV^ p2=atom {$value /= $p2.value;}|
MUL^ p2=atom {$value *= $p2.value;}|
POW^ p2=atom {$value = Math.pow($value, $p2.value);}
)*;
atom returns [double value]:
p1=Number {$value = Double.parseDouble($p1.text);}
| LP p2=pm RP{$value = $p2.value;};
Number: Digit+;
MUL : '*';
DIV : '/';
PLUS : '+';
MINUS : '-';
POW : '^';
LP : '(';
RP : ')';
fragment Digit:'0'..'9';
WS :('\t'| ' '| '\r'| '\n'| '\u000C')+{$channel = HIDDEN;};
You need to change your grammar to specify that you expect an EOF token after your top-level rule finishes:
expression returns [double value]:
p1=pm EOF {$value=$pm.value;};

Error generating files in ANTLR

So I'm trying to write a parser in ANTLR, this is my first time using it and I'm running into a problem that I can't find a solution for, apologies if this is a very simple problem. Anyway, the error I'm getting is:
"(100): Expr.g:1:13:syntax error: antlr: MismatchedTokenException(74!=52)"
The code I'm currently using is:
grammar Expr.g;
options{
output=AST;
}
tokens{
MAIN = 'main';
OPENBRACKET = '(';
CLOSEBRACKET = ')';
OPENCURLYBRACKET = '{';
CLOSECURLYBRACKET = '}';
COMMA = ',';
SEMICOLON = ';';
GREATERTHAN = '>';
LESSTHAN = '<';
GREATEROREQUALTHAN = '>=';
LESSTHANOREQUALTHAN = '<=';
NOTEQUAL = '!=';
ISEQUALTO = '==';
WHILE = 'while';
IF = 'if';
ELSE = 'else';
READ = 'read';
OUTPUT = 'output';
PRINT = 'print';
RETURN = 'return';
READC = 'readc';
OUTPUTC = 'outputc';
PLUS = '+';
MINUS = '-';
DIVIDE = '/';
MULTIPLY = '*';
PERCENTAGE = '%';
}
#header {
//package test;
import java.util.HashMap;
}
#lexer::header {
//package test;
}
#members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}
prog: stat+ ;
stat: expr NEWLINE {System.out.println($expr.value);}
| ID '=' expr NEWLINE
{memory.put($ID.text, new Integer($expr.value));}
| NEWLINE
;
expr returns [int value]
: e=multExpr {$value = $e.value;}
( '+' e=multExpr {$value += $e.value;}
| '-' e=multExpr {$value -= $e.value;}
)*
;
multExpr returns [int value]
: e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;})*
;
atom returns [int value]
: INT {$value = Integer.parseInt($INT.text);}
| ID
{
Integer v = (Integer)memory.get($ID.text);
if ( v!=null ) $value = v.intValue();
else System.err.println("undefined variable "+$ID.text);
}
| '(' e=expr ')' {$value = $e.value;}
;
IDENT : ('a'..'z'^|'A'..'Z'^)+ ; : .;
INT : '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS : (' '|'\t')+ {skip();} ;
Thanks for any help.
EDIT: Well, I'm an idiot, it's just a formatting error. Thanks for the responses from those who helped out.
You have some illegal characters after your IDENT token:
IDENT : ('a'..'z'^|'A'..'Z'^)+ ; : .;
The : .; are invalid there. And you're also trying to mix the tree-rewrite operator ^ inside a lexer rule, which is illegal: remove them. Lastly, you've named it IDENT while in your parser rules, you're using ID.
It should be:
ID : ('a'..'z' | 'A'..'Z')+ ;

Assignment as expression in Antlr grammar

I'm trying to extend the grammar of the Tiny Language to treat assignment as expression. Thus it would be valid to write
a = b = 1; // -> a = (b = 1)
a = 2 * (b = 1); // contrived but valid
a = 1 = 2; // invalid
Assignment differs from other operators in two aspects. It's right associative (not a big deal), and its left-hand side is has to be a variable. So I changed the grammar like this
statement: assignmentExpr | functionCall ...;
assignmentExpr: Identifier indexes? '=' expression;
expression: assignmentExpr | condExpr;
It doesn't work, because it contains a non-LL(*) decision. I also tried this variant:
assignmentExpr: Identifier indexes? '=' (expression | condExpr);
but I got the same error. I am interested in
This specific question
Given a grammar with a non-LL(*) decision, how to find the two paths that cause the problem
How to fix it
I think you can change your grammar like this to achieve the same, without using syntactic predicates:
statement: Expr ';' | functionCall ';'...;
Expr: Identifier indexes? '=' Expr | condExpr ;
condExpr: .... and so on;
I altered Bart's example with this idea in mind:
grammar TL;
options {
output=AST;
}
tokens {
ROOT;
}
parse
: stat+ EOF -> ^(ROOT stat+)
;
stat
: expr ';'
;
expr
: Id Assign expr -> ^(Assign Id expr)
| add
;
add
: mult (('+' | '-')^ mult)*
;
mult
: atom (('*' | '/')^ atom)*
;
atom
: Id
| Num
| '('! expr ')' !
;
Assign : '=' ;
Comment : '//' ~('\r' | '\n')* {skip();};
Id : 'a'..'z'+;
Num : '0'..'9'+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
And for the input:
a=b=4;
a = 2 * (b = 1);
you get following parse tree:
The key here is that you need to "assure" the parser that inside an expression, there is something ahead that satisfies the expression. This can be done using a syntactic predicate (the ( ... )=> parts in the add and mult rules).
A quick demo:
grammar TL;
options {
output=AST;
}
tokens {
ROOT;
ASSIGN;
}
parse
: stat* EOF -> ^(ROOT stat+)
;
stat
: expr ';' -> expr
;
expr
: add
;
add
: mult ((('+' | '-') mult)=> ('+' | '-')^ mult)*
;
mult
: atom ((('*' | '/') atom)=> ('*' | '/')^ atom)*
;
atom
: (Id -> Id) ('=' expr -> ^(ASSIGN Id expr))?
| Num
| '(' expr ')' -> expr
;
Comment : '//' ~('\r' | '\n')* {skip();};
Id : 'a'..'z'+;
Num : '0'..'9'+;
Space : (' ' | '\t' | '\r' | '\n')+ {skip();};
which will parse the input:
a = b = 1; // -> a = (b = 1)
a = 2 * (b = 1); // contrived but valid
into the following AST:

Resources