Unusual ANTLR error when attempting to reorganize grammar into two files - lua

I am reorganizing my grammar into two files in order to accomodate a tree grammar; Lua.g and LuaGrammar.g. Lua.g will have all of my lexer rules, and LuaGrammar.g will have all of my tree grammar and parser rules. However, when i try and compile LuaGrammar.g i get the following error:
[00:28:37] error(10): internal error: C:\Users\RCIX\Desktop\AguaLua\Project\trunk\AguaLua\AguaLua\ANTLR Data\LuaGrammar.g : java.lang.IllegalArgumentException: Can't find template ruleRefBang.st; group hierarchy is [CSharp2]
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:507)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:484)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:484)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.grammar.v2.CodeGenTreeWalker.getRuleElementST(CodeGenTreeWalker.java:152)
org.antlr.grammar.v2.CodeGenTreeWalker.atom(CodeGenTreeWalker.java:1986)
org.antlr.grammar.v2.CodeGenTreeWalker.element(CodeGenTreeWalker.java:1708)
org.antlr.grammar.v2.CodeGenTreeWalker.element(CodeGenTreeWalker.java:1556)
org.antlr.grammar.v2.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:1306)
org.antlr.grammar.v2.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1081)
org.antlr.grammar.v2.CodeGenTreeWalker.ebnf(CodeGenTreeWalker.java:1871)
org.antlr.grammar.v2.CodeGenTreeWalker.element(CodeGenTreeWalker.java:1704)
org.antlr.grammar.v2.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:1306)
org.antlr.grammar.v2.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1081)
org.antlr.grammar.v2.CodeGenTreeWalker.rule(CodeGenTreeWalker.java:797)
org.antlr.grammar.v2.CodeGenTreeWalker.rules(CodeGenTreeWalker.java:588)
org.antlr.grammar.v2.CodeGenTreeWalker.grammarSpec(CodeGenTreeWalker.java:530)
org.antlr.grammar.v2.CodeGenTreeWalker.grammar(CodeGenTreeWalker.java:336)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:432)
org.antlr.Tool.generateRecognizer(Tool.java:641)
org.antlr.Tool.process(Tool.java:454)
org.antlr.works.generate.CodeGenerate.generate(CodeGenerate.java:104)
org.antlr.works.generate.CodeGenerate.run(CodeGenerate.java:185)
java.lang.Thread.run(Unknown Source)
And, i'm getting the following error:
[00:34:58] error(100): C:\Users\RCIX\Desktop\AguaLua\Project\trunk\AguaLua\AguaLua\ANTLR Data\Lua.g:0:0: syntax error: codegen: <AST>:0:0: unexpected end of subtree
when attempting to generate Lua.g. Why am i getting these errors, and how can i fix them? (Using ANTLR V3, am able to provide grammar files)
Update: here is the grammar file i am trying to compile.
tree grammar LuaGrammar;
options {
backtrack=true;
language=CSharp2;
output=AST;
tokenVocab=Lua;
filter=true;
ASTLabelType=CommonTree;
}
assignment
:
^('=' left=NAME right=NAME) {Ast. };
/*
chunk : (stat (';'!)?)* (laststat (';'!)?)?;
block : chunk;
stat : varlist1 '='^ explist1 |
functioncall |
doblock |
'while'^ exp doblock |
'repeat'^ block untilrule |
'if'^ exp thenchunk elseifchunk* elsechunk? 'end'! |
'for'^ forinitializer doblock |
'for'^ namelist inlist doblock |
'function'^ funcname funcbody |
'local' 'function' NAME funcbody |
'local'^ namelist localstat? ;
localstat
: '='^ explist1;
untilrule
: 'until'^ exp;
elseifchunk
: 'elseif'^ exp thenchunk;
thenchunk
: 'then'^ block;
elsechunk
: 'else'^ block;
forinitializer
: NAME '='^ exp ','! exp (','! exp)?;
doblock
: 'do'^ block 'end'!;
inlist
: 'in'^ explist1;
laststat : 'return'^ (explist1)? | 'break';
dotname : '.'! funcname;
colonname
: ':' NAME;
funcname : NAME^ (dotname | colonname)?;
varlist1 : var (','! var)*;
namelist : NAME (','! NAME)*;
explist1 : (exp ','!)* exp;
*/
/*
exp : expelement (binop^ exp)* ;
expelement
: ('nil' | 'false' | 'true' | number | stringrule | '...' | /*function |*\ prefixexp | tableconstructor | unop exp);
var: (namevar | dotvar | expvar | arrayvar)?;
namevar
: NAME^ var;
dotvar
: '.'! var;
expvar
: '('^ exp ')'! var;
arrayvar
: '['^ var ']'! var;
varSuffix: nameAndArgs* ('[' exp ']' | '.' NAME);
prefixexp: varOrExp nameAndArgs*;
functioncall: varOrExp nameAndArgs+;
varOrExp: var | '('! exp ')'!;
nameAndArgs: (':' NAME)? argsrule;
argsrule : '(' (explist1)? ')' | tableconstructor | stringrule ;
function : 'function' funcbody;
funcbody : funcparams funcblock;
funcblock
: ')'^ block 'end'!;
funcparams
: '('^ parlist1? ;
parlist1 : namelist (','! '...')? | '...';
tableconstructor : '{'^ (fieldlist)? '}'!;
fieldlist : field (fieldsep! field)* (fieldsep!)?;
field : '['! exp ']'! '='^ exp | NAME '='^ exp | exp;
*/
fieldsep : ',' | ';';
binop : '+' | '-' | '*' | '/' | '^' | '%' | '..' |
'<' | '<=' | '>' | '>=' | '==' | '~=' |
'and' | 'or';
unop : '-' | 'not' | '#';
number : INT | FLOAT | EXP | HEX;
stringrule : NORMALSTRING | CHARSTRING | LONGSTRING;
Lua.g:
/*
* Lua 5.1 grammar
*
* Nicolai Mainiero
* May 2007
*
* This is a Lua (http://www.lua.org) grammar for the version 5.1 for ANTLR 3.
* I tested it with basic and extended examples and it worked fine. It is also used
* for LunarEclipse (http://lunareclipse.sf.net) a Lua editor based on Eclipse.
*
* Thanks to Johannes Luber and Gavin Lambert who helped me with some mutually left recursion.
*
*/
grammar Lua;
options {
backtrack=true;
language=CSharp2;
//output=AST;
//ASTLabelType=CommonTree;
}
#lexer::namespace{AguaLua}
chunk : (stat (';'!)?)* (laststat (';'!)?)?;
block : chunk;
stat : varlist1 '='^ explist1 |
functioncall |
doblock |
'while'^ exp doblock |
'repeat'^ block untilrule |
'if'^ exp thenchunk elseifchunk* elsechunk? 'end'! |
'for'^ forinitializer doblock |
'for'^ namelist inlist doblock |
'function'^ funcname funcbody |
'local' 'function' NAME funcbody |
'local'^ namelist localstat? ;
localstat
: '='^ explist1;
untilrule
: 'until'^ exp;
elseifchunk
: 'elseif'^ exp thenchunk;
thenchunk
: 'then'^ block;
elsechunk
: 'else'^ block;
forinitializer
: NAME '='^ exp ','! exp (','! exp)?;
doblock
: 'do'^ block 'end'!;
inlist
: 'in'^ explist1;
laststat : 'return'^ (explist1)? | 'break';
dotname : '.'! funcname;
colonname
: ':' NAME;
funcname : NAME^ (dotname | colonname)?;
varlist1 : var (','! var)*;
namelist : NAME (','! NAME)*;
explist1 : (exp ','!)* exp;
exp : expelement (binop^ exp)* ;
expelement
: ('nil' | 'false' | 'true' | number | stringrule | '...' | function | prefixexp | tableconstructor | unop exp);
var: (namevar | dotvar | expvar | arrayvar)?;
namevar
: NAME^ var;
dotvar
: '.'! var;
expvar
: '('^ exp ')'! var;
arrayvar
: '['^ var ']'! var;
varSuffix: nameAndArgs* ('[' exp ']' | '.' NAME);
prefixexp: varOrExp nameAndArgs*;
functioncall: varOrExp nameAndArgs+;
varOrExp: var | '('! exp ')'!;
nameAndArgs: (':' NAME)? argsrule;
argsrule : '(' (explist1)? ')' | tableconstructor | stringrule ;
function : 'function' funcbody;
funcbody : funcparams funcblock;
funcblock
: ')'^ block 'end'!;
funcparams
: '('^ parlist1? ;
parlist1 : namelist (','! '...')? | '...';
tableconstructor : '{'^ (fieldlist)? '}'!;
fieldlist : field (fieldsep! field)* (fieldsep!)?;
field : '['! exp ']'! '='^ exp | NAME '='^ exp | exp;
fieldsep : ',' | ';';
binop : '+' | '-' | '*' | '/' | '^' | '%' | '..' |
'<' | '<=' | '>' | '>=' | '==' | '~=' |
'and' | 'or';
unop : '-' | 'not' | '#';
number : INT | FLOAT | EXP | HEX;
stringrule : NORMALSTRING | CHARSTRING | LONGSTRING;
// LEXER
NAME :('a'..'z'|'A'..'Z'|'_')(options{greedy=true;}: 'a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
INT : ('0'..'9')+;
FLOAT :INT '.' INT ;
EXP : (INT| FLOAT) ('E'|'e') ('-')? INT;
HEX :'0x' ('0'..'9'| 'a'..'f')+ ;
NORMALSTRING
: '"' ( EscapeSequence | ~('\\'|'"') )* '"'
;
CHARSTRING
: '\'' ( EscapeSequence | ~('\''|'\\') )* '\''
;
LONGSTRING
: '['('=')*'[' ( EscapeSequence | ~('\\'|']') )* ']'('=')*']'
;
fragment
EscapeSequence
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UnicodeEscape
| OctalEscape
;
fragment
OctalEscape
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UnicodeEscape
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit
;
fragment
HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
COMMENT
: '--[[' ( options {greedy=false;} : . )* ']]' {Skip();}
;
LINE_COMMENT : '--' (~ NEWLINE)* {Skip();};
fragment NEWLINE : '\r'|'\n' | '\r\n' ;
WS : (' '|'\t'|'\u000C') {Skip();};
(both are based off of a grammar produced by Nicolai Mainero and available at ANTLR's site, Lua 5.1 grammar)
If i uncomment anymore than this, it comes up with the error above.

Okay, a 'Can't find template ruleRefBang.st' has something to do with the illegal use of a "tree exclude" operator: !. Usually, it is a contradicting rewrite rule: somewhere you have a ! and then rewrite it using -> but use that ignored token anyway. Since I cannot see a -> in your grammar, that can't be the case (unless you simplified the tree grammar to post here and removed some rewrite rules?).
Anyway, I'd start by removing all ! operators in your tree grammar and if your grammar then works put them, one by one, back in again. Then you should be able to pin point the place in your grammar that houses the illegal !.
Good luck!

Related

Parse Rules Decaf grammar antlr4

I am creating parser and lexer rules for Decaf programming language written in ANTLR4. I'm trying to parse a test file and keep getting an error, there must be something wrong in the grammar but i cant figure it out.
My test file looks like:
class Program {
int i[10];
}
The error is : line 2:8 mismatched input '10' expecting INT_LITERAL
And here is the full Decaf.g4 grammar file
grammar Decaf;
/*
LEXER RULES
-----------
Lexer rules define the basic syntax of individual words and symbols of a
valid Decaf program. Lexer rules follow regular expression syntax.
Complete the lexer rules following the Decaf Language Specification.
*/
CLASS : 'class';
INT : 'int';
RETURN : 'return';
VOID : 'void';
IF : 'if';
ELSE : 'else';
FOR : 'for';
BREAK : 'break';
CONTINUE : 'continue';
CALLOUT : 'callout';
TRUE : 'True' ;
FALSE : 'False' ;
BOOLEAN : 'boolean';
LCURLY : '{';
RCURLY : '}';
LBRACE : '(';
RBRACE : ')';
LSQUARE : '[';
RSQUARE : ']';
ADD : '+';
SUB : '-';
MUL : '*';
DIV : '/';
EQ : '=';
SEMI : ';';
COMMA : ',';
AND : '&&';
LESS : '<';
GREATER : '>';
LESSEQUAL : '<=' ;
GREATEREQUAL : '>=' ;
EQUALTO : '==' ;
NOTEQUAL : '!=' ;
EXCLAMATION : '!';
fragment CHAR : (' '..'!') | ('#'..'&') | ('('..'[') | (']'..'~') | ('\\'[']) | ('\\"') | ('\\') | ('\t') | ('\n');
CHAR_LITERAL : '\'' CHAR '\'';
//STRING_LITERAL : '"' CHAR+ '"' ;
HEXMARK : '0x';
fragment HEXA : [a-fA-F];
fragment HEXDIGIT : DIGIT | HEXA ;
HEX_LITERAL : HEXMARK HEXDIGIT+;
STRING : '"' (ESC|.)*? '"';
fragment ESC : '\\"' | '\\\\';
fragment DIGIT : [0-9];
DECIMAL_LITERAL : DIGIT(DIGIT)*;
COMMENT : '//' ~('\n')* '\n' -> skip;
WS : (' ' | '\n' | '\t' | '\r') + -> skip;
fragment ALPHA : [a-zA-Z] | '_';
fragment ALPHA_NUM : ALPHA | DIGIT;
ID : ALPHA ALPHA_NUM*;
INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;
BOOL_LITERAL : TRUE | FALSE;
/*
PARSER RULES
------------
Parser rules are all lower case, and make use of lexer rules defined above
and other parser rules defined below. Parser rules also follow regular
expression syntax. Complete the parser rules following the Decaf Language
Specification.
*/
program : CLASS ID LCURLY field_decl* method_decl* RCURLY EOF;
field_name : ID | ID LSQUARE INT_LITERAL RSQUARE;
field_decl : datatype field_name (COMMA field_name)* SEMI;
method_decl : (datatype | VOID) ID LBRACE ((datatype ID) (COMMA datatype ID)*)? RBRACE block;
block : LCURLY var_decl* statement* RCURLY;
var_decl : datatype ID (COMMA ID)* SEMI;
datatype : INT | BOOLEAN;
statement : location assign_op expr SEMI
| method_call SEMI
| IF LBRACE expr RBRACE block (ELSE block)?
| FOR ID EQ expr COMMA expr block
| RETURN (expr)? SEMI
| BREAK SEMI
| CONTINUE SEMI
| block;
assign_op : EQ
| ADD EQ
| SUB EQ;
method_call : method_name LBRACE (expr (COMMA expr)*)? RBRACE
| CALLOUT LBRACE STRING(COMMA callout_arg (COMMA callout_arg)*) RBRACE;
method_name : ID;
location : ID | ID LSQUARE expr RSQUARE;
expr : location
| method_call
| literal
| expr bin_op expr
| SUB expr
| EXCLAMATION expr
| LBRACE expr RBRACE;
callout_arg : expr
| STRING ;
bin_op : arith_op
| rel_op
| eq_op
| cond_op;
arith_op : ADD | SUB | MUL | DIV | '%' ;
rel_op : LESS | GREATER | LESSEQUAL | GREATEREQUAL ;
eq_op : EQUALTO | NOTEQUAL ;
cond_op : AND | '||' ;
literal : INT_LITERAL | CHAR_LITERAL | BOOL_LITERAL ;
Whenever there are 2 or more lexer rules that match the same characters, the one defined first wins. In your case, these 2 rules both match 10:
DECIMAL_LITERAL : DIGIT(DIGIT)*;
INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;
and since INT_LITERAL is defined after DECIMAL_LITERAL, the lexer will never create a INT_LITERAL token. If you now try to use it in a parser rule, you get an error message you posted.
The solution: remove INT_LITERAL from your lexer and create a parser rule instead:
int_literal : DECIMAL_LITERAL | HEX_LITERAL;
and use int_literal in your parser rules instead.

antlr4 line 2:0 mismatched input 'if' expecting {'if', OTHER}

I am having a bit of difficulty in my g4 file. Below is my grammar:
// Define a grammar called Hello
grammar GYOO;
program : 'begin' block+ 'end';
block
: statement+
;
statement
: assign
| print
| add
| ifstatement
| OTHER {System.err.println("unknown char: " + $OTHER.text);}
;
assign
: 'let' ID 'be' expression
;
print
: 'print' (NUMBER | ID)
;
ifstatement
: 'if' condition_block (ELSE IF condition_block)* (ELSE stat_block)?
;
add
: (NUMBER | ID) OPERATOR (NUMBER | ID) ASSIGN ID
;
stat_block
: OBRACE block CBRACE
| statement
;
condition_block
: expression stat_block
;
expression
: NOT expression //notExpr
| expression (MULT | DIV | MOD) expression //multiplicationExpr
| expression (PLUS | MINUS) expression //additiveExpr
| expression (LTEQ | GTEQ | LT | GT) expression //relationalExpr
| expression (EQ | NEQ) expression //equalityExpr
| expression AND expression //andExpr
| expression OR expression //orExpr
| atom //atomExpr
;
atom
: (NUMBER | FLOAT) //numberAtom
| (TRUE | FALSE) //booleanAtom
| ID //idAtom
| STRING //stringAtom
| NULL //nullAtom
;
ID : [a-z]+ ;
NUMBER : [0-9]+ ;
OPERATOR : '+' | '-' | '*' | '/';
ASSIGN : '=';
WS : (' ' | '\t' | '\r' | '\n') + -> skip;
OPAR : '(';
CPAR : ')';
OBRACE : '{';
CBRACE : '}';
TRUE : 'true';
FALSE : 'false';
NULL : 'null';
IF : 'if';
ELSE : 'else';
OR : 'or';
AND : 'and';
EQ : 'is'; //'=='
NEQ : 'is not'; //'!='
GT : 'greater'; //'>'
LT : 'lower'; //'<'
GTEQ : 'is greater'; //'>='
LTEQ : 'is lower'; //'<='
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
MOD : '%';
POW : '^';
NOT : 'not';
FLOAT
: [0-9]+ '.' [0-9]*
| '.' [0-9]+
;
STRING
: '"' (~["\r\n] | '""')* '"'
;
COMMENT
: '/*' .*? '*/' -> channel(HIDDEN)
;
LINE_COMMENT
: '//' ~[\r\n]* -> channel(HIDDEN)
;
OTHER
: .
;
When i try to -gui tree from antlr it shows me this error:
line 2:3 missing OPERATOR at 'a'
This error is given from this code example:
begin
let a be true
if a is true
print a
end
Basically it does not recognizes the ifstatement beggining with IF 'if' and it shows the tree like i am making an assignment.
How can i fix this?
P.S. I also tried to reposition my statements. Also tried to remove all statements and leave only ifstatement, and same thing happens.
Thanks
There is at least one issue:
ID : [a-z]+ ;
...
TRUE : 'true';
FALSE : 'false';
NULL : 'null';
IF : 'if';
ELSE : 'else';
OR : 'or';
...
NOT : 'not';
Since ID is placed before TRUE .. NOT, those tokens will never be created since ID has precedence over them (and ID matches these tokens as well).
Start by moving ID beneath the NOT token.

Antlr not recognizing number

I have 3 types of numbers defined, number, decimal and percentage.
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Percentage and decimal work fine but when I assign a number, unless I put a sign (+ or -) in front of the number, it doesn't recognize it as a number.
number foo = +5 // does recognize
number foo = 5; // does not recognize
It does recognize it in an evaluation expression.
if (foo == 5 ) // does recognize
Here is my language (I took out the functions and left only the language recognition).
grammar Fetal;
transaction : begin statements end;
begin : 'begin' ;
end : 'end' ;
statements : (statement)+
;
statement
: declaration ';'
| command ';'
| assignment ';'
| evaluation
| ';'
;
declaration : type var;
var returns : identifier;
type returns
: DecimalType
| NumberType
| StringType
| BooleanType
| DateType
| ObjectType
| DaoType
;
assignment
: lharg Equals rharg
| lharg unaryOP rharg
;
assignmentOp : Equals
;
unaryOP : PlusEquals
| MinusEquals
| MultiplyEquals
| DivideEquals
| ModuloEquals
| ExponentEquals
;
expressionOp : arithExpressOp
| bitwiseExpressOp
;
arithExpressOp : Multiply
| Divide
| Plus
| Minus
| Modulo
| Exponent
;
bitwiseExpressOp
: And
| Or
| Not
;
comparisonOp : IsEqualTo
| IsLessThan
| IsLessThanOrEqualTo
| IsGreaterThan
| IsGreaterThanOrEqualTo
| IsNotEqualTo
;
logicExpressOp : AndExpression
| OrExpression
| ExclusiveOrExpression
;
rharg returns
: rharg expressionOp rharg
| '(' rharg expressionOp rharg ')'
| var
| literal
| assignmentCommands
;
lharg returns : var;
identifier : Identifier;
evaluation : IfStatement '(' evalExpression ')' block (Else block)?;
block : OpenBracket statements CloseBracket;
evalExpression
: evalExpression logicExpressOp evalExpression
| '(' evalExpression logicExpressOp evalExpression ')'
| eval
| '(' eval ')'
;
eval : rharg comparisonOp rharg ;
assignmentCommands
: GetBalance '(' stringArg ')'
| GetVariableType '(' var ')'
| GetDescription
| Today
| GetDays '(' startPeriod=dateArg ',' endPeriod=dateArg ')'
| DayOfTheWeek '(' dateArg ')'
| GetCalendarDay '(' dateArg ')'
| GetMonth '(' dateArg ')'
| GetYear '(' dateArg ')'
| Import '(' stringArg ')' /* Import( path ) */
| Lookup '(' sql=stringArg ',' argumentList ')' /* Lookup( table, SQL) */
| List '(' sql=stringArg ',' argumentList ')' /* List( table, SQL) */
| invocation
;
command : Print '(' rharg ')'
| Credit '(' amtArg ',' stringArg ')'
| Debit '(' amtArg ',' stringArg ')'
| Ledger '(' debitOrCredit ',' amtArg ',' acc=stringArg ',' desc=stringArg ')'
| Alias '(' account=stringArg ',' name=stringArg ')'
| MapFile ':' stringArg
| invocation
| Update '(' sql=stringArg ',' argumentList ')'
;
invocation
: o=objectLiteral '.' m=identifier '('argumentList? ')'
| o=objectLiteral '.' m=identifier '()'
;
argumentList
: rharg (',' rharg )*
;
amtArg : rharg ;
stringArg : rharg ;
numberArg : rharg ;
dateArg : rharg ;
debitOrCredit : charLiteral ;
literal
: numericLiteral
| doubleLiteral
| booleanLiteral
| percentLiteral
| stringLiteral
| dateLiteral
;
fileName : '<' fn=Identifier ('.' ft=Identifier)? '>' ;
charLiteral : ('D' | 'C');
numericLiteral : Number ;
doubleLiteral : Decimal ;
percentLiteral : Percentage ;
booleanLiteral : Boolean ;
stringLiteral : String ;
dateLiteral : Date ;
objectLiteral : Identifier ;
daoLiteral : Identifier ;
//Below are Token definitions
// Data Types
DecimalType : 'decimal' ;
NumberType : 'number' ;
StringType : 'string' ;
BooleanType : 'boolean' ;
DateType : 'date' ;
ObjectType : 'object' ;
DaoType : 'dao' ;
/******************************************************************
* Assignmnt operator
******************************************************************/
Equals : '=' ;
/*****************************************************************
* Unary operators
*****************************************************************/
PlusEquals : '+=' ;
MinusEquals : '-=' ;
MultiplyEquals : '*=' ;
DivideEquals : '/=' ;
ModuloEquals : '%=' ;
ExponentEquals : '^=' ;
/*****************************************************************
* Binary operators
*****************************************************************/
Plus : '+' ;
Minus : '-' ;
Multiply : '*' ;
Divide : '/' ;
Modulo : '%' ;
Exponent : '^' ;
/***************************************************************
* Bitwise operators
***************************************************************/
And : '&' ;
Or : '|' ;
Not : '!' ;
/*************************************************************
* Compariso operators
*************************************************************/
IsEqualTo : '==' ;
IsLessThan : '<' ;
IsLessThanOrEqualTo : '<=' ;
IsGreaterThan : '>' ;
IsGreaterThanOrEqualTo : '>=' ;
IsNotEqualTo : '!=' ;
/*************************************************************
* Expression operators
*************************************************************/
AndExpression : '&&' ;
OrExpression : '||' ;
ExclusiveOrExpression : '^^' ;
// Reserve words (Assignment Commands)
GetBalance : 'getBalance';
GetVariableType : 'getVariableType' ;
GetDescription : 'getDescription' ;
Today : 'today';
GetDays : 'getDays' ;
DayOfTheWeek : 'dayOfTheWeek' ;
GetCalendarDay : 'getCalendarDay' ;
GetMonth : 'getMonth' ;
GetYear : 'getYear' ;
Import : 'import' ;
Lookup : 'lookup' ;
List : 'list' ;
// Reserve words (Commands)
Credit : 'credit';
Debit : 'debit';
Ledger : 'ledger';
Alias : 'alias' ;
MapFile : 'mapFile' ;
Update : 'update' ;
Print : 'print';
IfStatement : 'if';
Else : 'else';
OpenBracket : '{';
CloseBracket : '}';
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Boolean : 'true' | 'false';
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Date : Year '-' Month '-' Day;
Identifier
: IdentifierNondigit
( IdentifierNondigit
| Digit
)*
;
String: '"' ( ESC | ~[\\"] )* '"';
/************************************************************
* Fragment Definitions
************************************************************/
fragment
ESC : '\\' [abtnfrv"'\\]
;
fragment
IdentifierNondigit
: Nondigit
//| // other implementation-defined characters...
;
fragment
Nondigit
: [a-zA-Z_]
;
fragment
Digit
: [0-9]
;
fragment
Sign : Plus | Minus;
fragment
Digits
: [-+]?[0-9]+
;
fragment
Year
: Digit Digit Digit Digit;
fragment
Month
: Digit Digit;
fragment
Day
: Digit Digit;
fragment Dot : '.';
fragment
SCharSequence
: SChar+
;
fragment
SChar
: ~["\\\r\n]
| SimpleEscapeSequence
| '\\\n' // Added line
| '\\\r\n' // Added line
;
fragment
CChar
: ~['\\\r\n]
| SimpleEscapeSequence
;
fragment
SimpleEscapeSequence
: '\\' ['"?abfnrtv\\]
;
ExtendedAscii
: [\x80-\xfe]+
-> skip
;
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
BlockComment
: '/*' .*? '*/'
-> skip
;
LineComment
: '//' ~[\r\n]*
-> skip
;
I have a hunch that this use of a fragment is incorrect:
fragment Sign : Plus | Minus;
I couldn't find anything in the reference book, but I think it needs to be changed to something like this:
fragment Sign : [+-];
I found the issue. I was using version 4.5.2-1 because every attempt to upgrade to 4.7 caused more errors and I didn't want to cause more errors while trying to solve another. I finally broke down and upgraded the libraries to 4.7, fixed the errors and the number recognition issue disappeared. It was a bug in the library, all this time.

Antlr wrong rule invocation

I'm trying to implement a grammar for parsing lucene queries. So far everything went smooth until i tried to add support for range queries . Lucene details aside my grammar looks like this :
grammar ModifiedParser;
TERM_RANGE : '[' ('*' | TERM_TEXT) 'TO' ('*' | TERM_TEXT) ']'
| '{' ('*' | TERM_TEXT) 'TO' ('*' | TERM_TEXT) '}'
;
query : not (booleanOperator? not)* ;
booleanOperator : andClause
| orClause
;
andClause : 'AND' ;
notClause : 'NOT' ;
orClause : 'OR' ;
not : notClause? MODIFIER? clause;
clause : unqualified
| qualified
;
unqualified : TERM_RANGE # termRange
| TERM_PHRASE # termPhrase
| TERM_PHRASE_ANYTHING # termTruncatedPhrase
| '(' query ')' # queryUnqualified
| TERM_TEXT_TRUNCATED # termTruncatedText
| TERM_NORMAL # termText
;
qualified : TERM_NORMAL ':' unqualified
;
fragment TERM_CHAR : (~(' ' | '\t' | '\n' | '\r' | '\u3000'
| '\'' | '\"' | '(' | ')' | '[' | ']' | '{' | '}'
| '+' | '-' | '!' | ':' | '~' | '^'
| '?' | '*' | '\\' ))
;
fragment TERM_START_CHAR : TERM_CHAR
| ESCAPE
;
fragment ESCAPE : '\\' ~[];
MODIFIER : '-'
| '+'
;
AND : 'AND';
OR : 'OR';
NOT : 'NOT';
TERM_PHRASE_ANYTHING : '"' (ESCAPE|~('\"'|'\\'))+ '"' ;
TERM_PHRASE : '"' (ESCAPE|~('\"'|'\\'|'?'|'*'))+ '"' ;
TERM_TEXT_TRUNCATED : ('*'|'?')(TERM_CHAR+ ('*'|'?'))+ TERM_CHAR*
| TERM_START_CHAR (TERM_CHAR* ('?'|'*'))+ TERM_CHAR+
| ('?'|'*') TERM_CHAR+
;
TERM_NORMAL : TERM_TEXT;
fragment TERM_TEXT : TERM_START_CHAR TERM_CHAR* ;
WS : [ \t\r\n] -> skip ;
When i try to do a visitor and work with the tokens apparently parsing asd [ 10 TO 100 ] { 1 TO 1000 } 100..1000 will throw token recognition error for [ , ] , } and {, and only tries to visit the termRange rule on the third range . do you guys know what i'm missing here ? Thanks in advance
Since you made TERM_RANGE a lexer rule, you must account for everything at a character level. In particular, you forgot to allow whitespace characters in your input.
You would likely be in a much better position if you instead created termRange, a parser rule.

antlrworks with multiple equals

I'm having a problem with my grammar.
It says Decision can match input such as "{EQUAL, GREATER..GREATER_EQUAL, LOWER..LOWER_EQUAL, NOT_EQUAL}" using multiple alternatives: 1, 2, Although all the trees of rules are correct.
Anyone to help?!
grammar test;
//parser : .* EOF;
program :T_PROGRAM ID T_LEFTPAR identifier_list T_RIGHTPAR T_SEMICOLON declarations subprogram_declarations compound_statement T_DOT;
identifier_list :(ID) (',' ID)*;
declarations :() (T_VAR identifier_list COLON type T_SEMICOLON)* ;
type : standard_type|
T_ARRAY T_LEFTBRACK NUM T_TO NUM T_RIGHTBRACK T_OF standard_type ;
standard_type : INT
| FLOAT ;
subprogram_declarations :() (subprogram_declaration T_SEMICOLON)* ;
subprogram_declaration : subprogram_head declarations compound_statement;
subprogram_head :T_FUNCTION ID arguments COLON standard_type |
T_PROCEDURE ID arguments ;
arguments :T_LEFTPAR parameter_list T_RIGHTPAR | ;
parameter_list :(identifier_list COLON type) (T_SEMICOLON identifier_list COLON type)*;
compound_statement : T_BEGIN optional_statements T_END;
optional_statements :statement_list | ;
statement_list :(statement) (T_SEMICOLON statement)*;
statement :
variable ASSIGN expression
| procedure_statement
| compound_statement
| T_IF expression T_THEN statement T_ELSE statement
| T_WHILE expression T_DO statement
;
procedure_statement :ID
| ID T_LEFTPAR expression_list T_RIGHTPAR;
expression_list : (expression) (',' expression)*;
variable : ID T_LEFTBRACK expression T_RIGHTBRACK | ;
expression :( () |simple_expression) (( LOWER | LOWER_EQUAL | GREATER | GREATER_EQUAL | EQUAL | NOT_EQUAL ) simple_expression)* ;
simple_expression :
( () | sign ) term (( PLUS | MINUS | T_OR ) term)*;
term :
(factor) (( CROSS | DIVIDE | MOD | T_AND ) factor)*;
factor :
variable
|ID T_LEFTPAR expression_list T_RIGHTPAR
| NUM
| T_LEFTPAR expression T_RIGHTPAR
| T_NOT factor;
sign :
'+'
| '-';
/********/
T_PROGRAM : 'program';
T_FUNCTION : 'function';
T_PROCEDURE : 'procedure';
T_READ : 'read';
T_WRITE : 'write';
T_OF : 'of';
T_ARRAY : 'array';
T_VAR : 'var';
T_FLOAT : 'float';
T_INT : 'int';
T_CHAR : 'char';
T_STRING : 'string';
T_BEGIN : 'begin';
T_END : 'end';
T_IF : 'if';
T_THEN : 'then';
T_ELSE : 'else';
T_WHILE : 'while';
T_DO : 'do';
T_NOT : 'not';
NUM : INT
| FLOAT;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
LOWER : '<';
LOWER_EQUAL : '<=';
GREATER : '>';
GREATER_EQUAL : '>=';
EQUAL : '=';
NOT_EQUAL : '<>';
ASSIGN :
':=';
COLON :
':';
PLUS : '+';
MINUS : '-';
T_OR : 'OR';
CROSS : '*';
DIVIDE : '/';
MOD : 'MOD';
T_AND : 'AND';
T_LEFTPAR
: '(';
T_RIGHTPAR
: ')';
T_LEFTBRACK
: '[';
T_RIGHTBRACK
: ']';
T_TO
: '..';
T_DOT : '.';
T_SEMICOLON
: ';';
T_COMMA
: ',';
T_BADNUM
: (NUM)(CHAR)*;
T_BADSTRING
: '"' ( ESC_SEQ | ~('\\'|'"') )*WS;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
fragment
INT : '0'..'9'+
;
fragment
FLOAT
: ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
Problem 1
The ambiguity (the errors/warnings that certain input can be matched in more than 1 way) originate mainly from the fact that both your variable and expression rules can match empty input:
variable
: ID T_LEFTBRACK expression T_RIGHTBRACK
| /* nothing */
;
expression
: (
(/* nothing */)
| simple_expression
)
(
(LOWER | LOWER_EQUAL | GREATER | GREATER_EQUAL | EQUAL | NOT_EQUAL) simple_expression
)* /* because of the `*`, this can also match nothing */
;
(I added the /* ... */ comments and reformatted the rules to make them more readable)
Fix 1
You probably want to do it like this instead:
variable
: ID T_LEFTBRACK expression T_RIGHTBRACK
;
expression
: simple_expression ((LOWER | LOWER_EQUAL | GREATER | GREATER_EQUAL | EQUAL | NOT_EQUAL) simple_expression)*
;
Problem 2
Another problem (one that will show up once you would have resolved the ambiguities, is that you defined the tokens MOD, T_AND and T_OR after your ID rule:
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
...
MOD : 'MOD';
T_AND : 'AND';
T_OR : 'OR';
This will cause MOD, T_AND and T_OR to be never created since ID matches these characters too.
Fix 2
Place MOD, T_AND and T_OR before your ID rule:
MOD : 'MOD';
T_AND : 'AND';
T_OR : 'OR';
...
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;

Resources