Error generating files in ANTLR - parsing

So I'm trying to write a parser in ANTLR, this is my first time using it and I'm running into a problem that I can't find a solution for, apologies if this is a very simple problem. Anyway, the error I'm getting is:
"(100): Expr.g:1:13:syntax error: antlr: MismatchedTokenException(74!=52)"
The code I'm currently using is:
grammar Expr.g;
options{
output=AST;
}
tokens{
MAIN = 'main';
OPENBRACKET = '(';
CLOSEBRACKET = ')';
OPENCURLYBRACKET = '{';
CLOSECURLYBRACKET = '}';
COMMA = ',';
SEMICOLON = ';';
GREATERTHAN = '>';
LESSTHAN = '<';
GREATEROREQUALTHAN = '>=';
LESSTHANOREQUALTHAN = '<=';
NOTEQUAL = '!=';
ISEQUALTO = '==';
WHILE = 'while';
IF = 'if';
ELSE = 'else';
READ = 'read';
OUTPUT = 'output';
PRINT = 'print';
RETURN = 'return';
READC = 'readc';
OUTPUTC = 'outputc';
PLUS = '+';
MINUS = '-';
DIVIDE = '/';
MULTIPLY = '*';
PERCENTAGE = '%';
}
#header {
//package test;
import java.util.HashMap;
}
#lexer::header {
//package test;
}
#members {
/** Map variable name to Integer object holding value */
HashMap memory = new HashMap();
}
prog: stat+ ;
stat: expr NEWLINE {System.out.println($expr.value);}
| ID '=' expr NEWLINE
{memory.put($ID.text, new Integer($expr.value));}
| NEWLINE
;
expr returns [int value]
: e=multExpr {$value = $e.value;}
( '+' e=multExpr {$value += $e.value;}
| '-' e=multExpr {$value -= $e.value;}
)*
;
multExpr returns [int value]
: e=atom {$value = $e.value;} ('*' e=atom {$value *= $e.value;})*
;
atom returns [int value]
: INT {$value = Integer.parseInt($INT.text);}
| ID
{
Integer v = (Integer)memory.get($ID.text);
if ( v!=null ) $value = v.intValue();
else System.err.println("undefined variable "+$ID.text);
}
| '(' e=expr ')' {$value = $e.value;}
;
IDENT : ('a'..'z'^|'A'..'Z'^)+ ; : .;
INT : '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS : (' '|'\t')+ {skip();} ;
Thanks for any help.
EDIT: Well, I'm an idiot, it's just a formatting error. Thanks for the responses from those who helped out.

You have some illegal characters after your IDENT token:
IDENT : ('a'..'z'^|'A'..'Z'^)+ ; : .;
The : .; are invalid there. And you're also trying to mix the tree-rewrite operator ^ inside a lexer rule, which is illegal: remove them. Lastly, you've named it IDENT while in your parser rules, you're using ID.
It should be:
ID : ('a'..'z' | 'A'..'Z')+ ;

Related

Need help starting with Tatsu to parse grammar

I am getting a Tatsu error
"tatsu.exceptions.FailedExpectingEndOfText: (1:1) Expecting end of text"
running a test, using a grammar I supplied - it is not clear what the problem is.
In essence, the statement calling the parser is:
ast = parse(GRAMMAR, '(instance ?FIFI Dog)')
The whole python file follows:
GRAMMAR = """
##grammar::SUOKIF
KIF = {KIFexpression}* $ ;
WHITESPACE = /\s+/ ;
StringLiteral = /['"'][A-Za-z]+['"']/ ;
NumericLiteral = /[0-9]+/ ;
Identifier = /[A-Za-z]+/ ;
LPAREN = "(" ;
RPAREN = ")" ;
QUESTION = "?" ;
MENTION = "#" ;
EQUALS = "=" ;
RARROW = ">" ;
LARROW = "<" ;
NOT = "not"|"NOT" ;
OR = "or"|"OR" ;
AND = "and"|"AND" ;
FORALL = "forall"|"FORALL" ;
EXISTS = "exists"|"EXISTS" ;
STRINGLITERAL = {StringLiteral} ;
NUMERICLITERAL = {NumericLiteral} ;
IDENTIFIER = {Identifier} ;
KIFexpression
= Word
| Variable
| String
| Number
| Sentence
;
Sentence = Equation
| RelSent
| LogicSent
| QuantSent
;
LogicSent
= Negation
| Disjunction
| Conjunction
| Implication
| Equivalence
;
QuantSent
= UniversalSent
| ExistentialSent
;
Word = IDENTIFIER ;
Variable = ( QUESTION | MENTION ) IDENTIFIER ;
String = STRINGLITERAL ;
Number = NUMERICLITERAL ;
ArgumentList
= {KIFexpression}*
;
VariableList
= {Variable}+
;
Equation = LPAREN EQUALS KIFexpression KIFexpression RPAREN ;
RelSent = LPAREN ( Variable | Word ) ArgumentList RPAREN ;
Negation = LPAREN NOT KIFexpression RPAREN ;
Disjunction
= LPAREN OR ArgumentList RPAREN
;
Conjunction
= LPAREN AND ArgumentList RPAREN
;
Implication
= LPAREN EQUALS RARROW KIFexpression KIFexpression RPAREN
;
Equivalence
= LPAREN LARROW EQUALS RARROW KIFexpression KIFexpression RPAREN
;
UniversalSent
= LPAREN FORALL LPAREN VariableList RPAREN KIFexpression RPAREN
;
ExistentialSent
= LPAREN EXISTS LPAREN VariableList RPAREN KIFexpression RPAREN
;
"""
if __name__ == '__main__':
import pprint
import json
from tatsu import parse
from tatsu.util import asjson
ast = parse(GRAMMAR, '(instance ?FIFI Dog)')
print('# PPRINT')
pprint.pprint(ast, indent=2, width=20)
print()
print('# JSON')
print(json.dumps(asjson(ast), indent=2))
print()
Can anyone help me with a fix?
Thanks.
Colin Goldberg
I can see two problems with that grammar.
As written in man pages, rule names that start with upper case character have special meaning. Change all the rule names to lower case.
Also let's review IDENTIFIER rule:
IDENTIFIER = {Identifier} ;
This means that identifier can be used multiple times, or may be missing at all. Remove the closure by defining IDENTIFIER directly:
IDENTIFIER = /[A-Za-z]+/ ;
You can do the same for NUMERICLITERAL and STRINGLITERAL.
When I did those steps, the expression could be parsed.
You need to pass the name of the "start" symbol to parse().
You can also define:
start = KIF ;
in the grammar.

ANTLR 4 mismatched input on parsing

I'm on my first steps using ANTLR 4 with IntelliJ. I am trying to create a simple Recursive Climbing Parser for mathematical expressions. I get an error
line 1:0 mismatched input '3' expecting {VARIABLE; REALNUM, INTNUM}
It seems like the lexer does not correctly turn the 3 into the token, the parser uses, but I can not find the Problem there.
Lexer:
lexer grammar testLexer;
PLUS: '+';
MINUS: '-';
TIMES: '*';
DIV: '/';
SIN: 'sin'|'Sin'|'SIN';
COS: 'cos'|'Cos'|'COS';
TAN: 'tan'|'Tan'|'TAN';
LN: 'ln'|'LN'|'Ln';
LOG: 'Log'|'log'|'LOG';
SQRT: 'sqrt'|'Sqrt'|'SQRT';
LBRACE: '(';
RBRACE: ')';
POW: '^';
SPACE: ' ' -> skip;
EQUAL: '=';
VARIABLE: [a-zA-Z][a-zA-Z0-9]*;
INTNUM: [0-9]+;
REALNUM: [0-9]+[,|.][0-9]+;
WS: [\r\t\n]+ -> skip;
SEMICOLON: ';';
Parser:
parser
grammar testParser;
expression returns [double value]
: exp=additiveExpression {$value = $exp.value;};
equalityExpression returns [double value]
: m1 = additiveExpression (EQUAL additiveExpression)* {$value = $m1.value;};
additiveExpression returns [double value]
: m2 = multiplikativeExpression {$value = $m2.value;}
(PLUS m1=multiplikativeExpression {$value += $m1.value;}
|MINUS m1=multiplikativeExpression {$value -= $m1.value;}
)* ;
multiplikativeExpression returns [double value]
: m3 = powExpression {$value = $m3.value;}
(TIMES powExpression {$value *= $m3.value;}
|DIV powExpression {$value /= $m3.value;}
)* ;
powExpression returns [double value]
: (bracedExpression)
(POW (m4=expression) {$value = Math.pow($value, $m4.value);}
)*;
bracedExpression returns [double value]
: (LBRACE m5 = expression RBRACE {$value = $m5.value;}
|LBRACE m6 = unaryExpression RBRACE {$value = $m6.value;}
| m7 =unaryExpression {$value = $m7.value;});
unaryExpression returns [double value]
: m7= atomExpression {$value = $m7.value;}
| (SIN m6=bracedExpression {$value = Math.sin($m6.value);}
|COS m6=bracedExpression {$value = Math.cos($m6.value);}
|TAN m6=bracedExpression {$value = Math.tan($m6.value);}
|LOG m6=bracedExpression {$value = Math.log($m6.value);}
|SQRT m6=bracedExpression {$value = Math.sqrt($m6.value);}
)
|EOF;
atomExpression returns [double value]
: VARIABLE {$value = 1;}
|m7 = REALNUM {$value = Double.parseDouble($m7.text);}
| m7 = INTNUM {$value = Integer.parseInt($m7.text);};
The input is just the simple term 3, but the error also occurs on longer input strings like 2+3.
Your example lexes just fine for me. Here is the TestRig output with -tokens turned on:
C:\prj\ANTLR_SO_BENCH\Bench>java org.antlr.v4.gui.TestRig Grammar1 program -tokens SOURCE.txt
[#0,0:0='3',<INTNUM>,1:0]
[#1,1:1='+',<'+'>,1:1]
[#2,2:2='2',<INTNUM>,1:2]
[#3,5:4='<EOF>',<EOF>,2:0]
And REALNUM tokens work too:
C:\prj\ANTLR_SO_BENCH\Bench>java org.antlr.v4.gui.TestRig Grammar1 program -tokens SOURCE.txt
[#0,0:3='3.14',<REALNUM>,1:0]
[#1,5:5='*',<'*'>,1:5]
[#2,7:9='2.1',<REALNUM>,1:7]
[#3,12:11='<EOF>',<EOF>,2:0]
So I'm not sure anymore what to recommend based on your question.

ANTLR 3 bug, mismatched input, but what's wrong?

I have the following problem:
My ANTLR 3 grammar compiles, but my simple testprogram doesn't work. The grammar is as follows:
grammar Rietse;
options {
k=1;
language=Java;
output=AST;
}
tokens {
COLON = ':' ;
SEMICOLON = ';' ;
OPAREN = '(' ;
CPAREN = ')' ;
COMMA = ',' ;
OCURLY = '{' ;
CCURLY = '}' ;
SINGLEQUOTE = '\'' ;
// operators
BECOMES = '=' ;
PLUS = '+' ;
MINUS = '-' ;
TIMES = '*' ;
DIVIDE = '/' ;
MODULO = '%' ;
EQUALS = '==' ;
LT = '<' ;
LTE = '<=' ;
GT = '>' ;
GTE = '>=' ;
UNEQUALS = '!=' ;
AND = '&&' ;
OR = '||' ;
NOT = '!' ;
// keywords
PROGRAM = 'program' ;
COMPOUND = 'compound' ;
UNARY = 'unary' ;
DECL = 'decl' ;
SDECL = 'sdecl' ;
STATIC = 'static' ;
PRINT = 'print' ;
READ = 'read' ;
IF = 'if' ;
THEN = 'then' ;
ELSE = 'else' ;
DO = 'do' ;
WHILE = 'while' ;
// types
INTEGER = 'int' ;
CHAR = 'char' ;
BOOLEAN = 'boolean' ;
TRUE = 'true' ;
FALSE = 'false' ;
}
#lexer::header {
package Eindopdracht;
}
#header {
package Eindopdracht;
}
// Parser rules
program
: program2 EOF
-> ^(PROGRAM program2)
;
program2
: (declaration* statement)+
;
declaration
: STATIC type IDENTIFIER SEMICOLON -> ^(SDECL type IDENTIFIER)
| type IDENTIFIER SEMICOLON -> ^(DECL type IDENTIFIER)
;
type
: INTEGER
| CHAR
| BOOLEAN
;
statement
: assignment_expr SEMICOLON!
| while_stat SEMICOLON!
| print_stat SEMICOLON!
| if_stat SEMICOLON!
| read_stat SEMICOLON!
;
while_stat
: WHILE^ OPAREN! or_expr CPAREN! OCURLY! statement+ CCURLY! // while (expression) {statement+}
;
print_stat
: PRINT^ OPAREN! or_expr (COMMA! or_expr)* CPAREN! // print(expression)
;
read_stat
: READ^ OPAREN! IDENTIFIER (COMMA! IDENTIFIER)+ CPAREN! // read(expression)
;
if_stat
: IF^ OPAREN! or_expr CPAREN! comp_expr (ELSE! comp_expr)? // if (expression) compound else compound
;
assignment_expr
: or_expr (BECOMES^ or_expr)*
;
or_expr
: and_expr (OR^ and_expr)*
;
and_expr
: compare_expr (AND^ compare_expr)*
;
compare_expr
: plusminus_expr ((LT|LTE|GT|GTE|EQUALS|UNEQUALS)^ plusminus_expr)?
;
plusminus_expr
: timesdivide_expr ((PLUS | MINUS)^ timesdivide_expr)*
;
timesdivide_expr
: unary_expr ((TIMES | DIVIDE | MODULO)^ unary_expr)*
;
unary_expr
: operand
| PLUS operand -> ^(UNARY PLUS operand)
| MINUS operand -> ^(UNARY MINUS operand)
| NOT operand -> ^(UNARY NOT operand)
;
operand
: TRUE
| FALSE
| charliteral
| IDENTIFIER
| NUMBER
| OPAREN! or_expr CPAREN!
;
comp_expr
: OCURLY program2 CCURLY -> ^(COMPOUND program2)
;
// Lexer rules
charliteral
: SINGLEQUOTE! LETTER SINGLEQUOTE!
;
IDENTIFIER
: LETTER (LETTER | DIGIT)*
;
NUMBER
: DIGIT+
;
COMMENT
: '//' .* '\n'
{ $channel=HIDDEN; }
;
WS
: (' ' | '\t' | '\f' | '\r' | '\n')+
{ $channel=HIDDEN; }
;
fragment DIGIT : ('0'..'9') ;
fragment LOWER : ('a'..'z') ;
fragment UPPER : ('A'..'Z') ;
fragment LETTER : LOWER | UPPER ;
// EOF
I then use the following java file to test programs:
package Package;
import java.io.FileInputStream;
import java.io.InputStream;
import org.antlr.runtime.ANTLRInputStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.tree.BufferedTreeNodeStream;
import org.antlr.runtime.tree.CommonTree;
import org.antlr.runtime.tree.CommonTreeNodeStream;
import org.antlr.runtime.tree.DOTTreeGenerator;
import org.antlr.runtime.tree.TreeNodeStream;
import org.antlr.stringtemplate.StringTemplate;
public class Rietse {
public static void main (String[] args)
{
String inputFile = args[0];
try {
InputStream in = inputFile == null ? System.in : new FileInputStream(inputFile);
RietseLexer lexer = new RietseLexer(new ANTLRInputStream(in));
CommonTokenStream tokens = new CommonTokenStream(lexer);
RietseParser parser = new RietseParser(tokens);
RietseParser.program_return result = parser.program();
} catch (RietseException e) {
System.err.print("ERROR: RietseException thrown by compiler: ");
System.err.println(e.getMessage());
} catch (RecognitionException e) {
System.err.print("ERROR: recognition exception thrown by compiler: ");
System.err.println(e.getMessage());
e.printStackTrace();
} catch (Exception e) {
System.err.print("ERROR: uncaught exception thrown by compiler: ");
System.err.println(e.getMessage());
e.printStackTrace();
}
}
}
And at last, the testprogram itself:
print('a');
Now when I run this, I get the following errors:
line 1:7 mismatched input 'a' expecting LETTER
line 1:9 mismatched input ')' expecting LETTER
I have no clue whatsoever what causes this bug. I have tried several changes of things but nothing fixed it. Does anyone here know what's wrong with my code and how I can fix it?
Every bit of help is greatly appreciated, thanks in advance.
Greetings,
Rien
Using a rule:
CHARLITERAL
: SINGLEQUOTE (LETTER | DIGIT) SINGLEQUOTE
;
and changing operand to:
operand
: TRUE
| FALSE
| CHARLITERAL
| IDENTIFIER
| NUMBER
| OPAREN! or_expr CPAREN!
;
will fix the problem. It does give the problem of having singlequotes in the AST, but that can be fixed optionally by changing the text of the node with the
setText(String);
method.
Turn charliteral into a lexer rule (rename it to CHARLITERAL). Right now, the string 'a' is tokenized like this: SINGLEQUOTE IDENTIFIER SINGLEQUOTE, so you're getting an IDENTIFIER instead of a LETTER.
I wonder how this code can compile at all given that you're using a fragment (LETTER) from a parser rule.

ANTLR won't parse this easy input for simple calculator grammar

grammar TestCSharpParser;
options {
language=CSharp3;
}
#parser::namespace { Demo.Antlr }
#lexer::namespace { Demo.Antlr }
parse returns [double value]
: exp EOF {$value = $exp.value;}
;
exp returns [double value]
: addExp {$value = $addExp.value;}
;
addExp returns [double value]
: a=mulExp {$value = $a.value;}
( '+' b=mulExp {$value += $b.value;}
| '-' b=mulExp {$value -= $b.value;}
)*
;
mulExp returns [double value]
: a=unaryExp {$value = $a.value;}
( '*' b=unaryExp {$value *= $b.value;}
| '/' b=unaryExp {$value /= $b.value;}
)*
;
unaryExp returns [double value]
: '-' atom {$value = -1.0 * $atom.value;}
| atom {$value = $atom.value;}
;
atom returns [double value]
: Number {$value = Double.Parse($Number.Text, CultureInfo.InvariantCulture);}
| '(' exp ')' {$value = $exp.value;}
;
Number
: ('0'..'9')+ ('.' ('0'..'9')+)?
;
Space
: (' ' | '\t' | '\r' | '\n'){$channel = HIDDEN;}
;
The grammar won't parse the simple statement 4/5 or (4/5) tried this using ANTLRWorks.
Does anyone have any idea why this is happening? This to my mind should work correctly.
It keeps giving me the NoViableAltException.
I see several problems related to the use of the CSharp3 target.
The CSharp2 and CSharp3 targets define the constant Hidden instead of HIDDEN
ANTLRWorks cannot be used to generate parsers for grammars targeting the CSharp2 or CSharp3 targets. The parser must be generated either by MSBuild (preferred) or by using Antlr3.exe. These are documented on the ANTLR 3 C# Releases wiki page.
ANTLRWorks cannot be used to test parsers generated for the CSharp2 or CSharp3 targets. Any results reported by the interpreter or debugger cannot be trusted.

antlr unparsed tokens at the end does not generate error

Getting an error is usually unpleasant, but sometimes it is also unpleasant when you expect one and do not get it. My parser does not generate error for this string "2)". please suggest me a solution?
grammar BasicArithmetic;
options {
language = Java;
output = AST;
}
expression returns [double value]:
p1=pm{$value=$pm.value;};
// never never reference FRAGMENTS from parsers
pm returns [double value]:
p1=dm{$value = $p1.value;}
(PLUS^p2=dm{$value += $p2.value;}|
MINUS^p2=dm{$value -= $p2.value;}
)*;
dm returns [double value]:
p1=atom {$value = $p1.value;}
( DIV^ p2=atom {$value /= $p2.value;}|
MUL^ p2=atom {$value *= $p2.value;}|
POW^ p2=atom {$value = Math.pow($value, $p2.value);}
)*;
atom returns [double value]:
p1=Number {$value = Double.parseDouble($p1.text);}
| LP p2=pm RP{$value = $p2.value;};
Number: Digit+;
MUL : '*';
DIV : '/';
PLUS : '+';
MINUS : '-';
POW : '^';
LP : '(';
RP : ')';
fragment Digit:'0'..'9';
WS :('\t'| ' '| '\r'| '\n'| '\u000C')+{$channel = HIDDEN;};
You need to change your grammar to specify that you expect an EOF token after your top-level rule finishes:
expression returns [double value]:
p1=pm EOF {$value=$pm.value;};

Resources