im trying to user antlr4 for a project and I got an error im not sure I know how to fix. It seems antlr4 is confuse with two parser rules.
Here is my lexer/parser :
grammar PARSER;
#header {package VSOP.Parser;}
program : code+ ; //statement+ ;
code : classHeader | methodHeader | ;
statement : assign | ifStatement | whileStatement;
classHeader : 'class' TYPE_IDENTIFIER ('extends' TYPE_IDENTIFIER)? '{' classBody '}';
classBody : methodHeader* | field*;
methodHeader : OBJECT_IDENTIFIER '(' (((formal ',')+ (formal)) | (formal)?) ')' ':' varType '{' methodBody '}' ;
methodBody : statement* ;
formal : OBJECT_IDENTIFIER ':' varType ;
field : OBJECT_IDENTIFIER ':' varType ('<-' varValue)? ';' ;
assign : OBJECT_IDENTIFIER ':' varType ('<-' varValue)? ;
whileStatement : 'while' condition* 'do' statement* ;
ifStatement : ifStat elseStat? ; //ifStat elseIfStat* elseStat? ;
ifStat : 'if' condition 'then' statement* ;
//elseIfStat : 'else if' condition 'then' '{' statement* '}' ;
elseStat : 'else' statement* ;
condition : comparaiser CONDITIONAL_OPERATOR comparaiser ;
comparaiser : OBJECT_IDENTIFIER | integer | STRING ;
integer : INTEGER_HEX | INTEGER_DEC | INTEGER_BIN ;
varType : 'bool' | 'int32' | 'string' | 'unit' | TYPE_IDENTIFIER ;
varValue : ('true' | 'false' | STRING | integer) ;
// KEYWORD : 'and' | 'class' | 'do' | 'else' | 'extends' | 'false' | 'if' | 'in' | 'isnull' | 'let' | 'new' | 'not' | 'then' | 'true' | 'unit' | 'while' ;
ARITHMETIC_OPERATOR : '+' | '-' | '*' | '/' | '^' ;
CONDITIONAL_OPERATOR : '=' | '<' | '<=';
MULTILINE_OPEN_COMMENT : '(*' ;
MULTILINE_CLOSE_COMMENT : '*)' ;
MULTILINE_COMMENT : '(*' .*? '*)' ;
INTEGER_BIN : '0'[bB][0-9a-zA-Z]* ;
INTEGER_HEX : '0'[xX][0-9a-zA-Z]* ;
INTEGER_DEC : [0-9][0-9a-zA-Z]* ;
OBJECT_IDENTIFIER : [a-z][a-zA-Z0-9_]* ;
TYPE_IDENTIFIER : [A-Z][a-zA-Z0-9_]* ;
STRING : '"' ( '\\"' | . )*? ('"' | EOF) ;
SINGLE_LINE_COMMENT : '//'~[\r\n]* ;
WS : [ \r\n\t]+ -> skip;
Using the code below, i get the errors
line 5:15 mismatched input '(' expecting ':'
line 5:31 mismatched input ',' expecting {'<-', ';'}
line 5:50 mismatched input ',' expecting {'<-', ';'}
line 5:69 mismatched input ')' expecting {'<-', ';'}
The problem is antlr4 confuse methodHeader and field. If I but the var nbOfEngine below the function, I get the function right, but the variable wrong.If i try them separatly, it work as well. I tried changing their order in the parser, without success.
class Plane extends Transport {
nbOfEngine: int32 ;
startEngine(gazLevel: int32, readyToStart:bool, foodOnBoard: bool) : bool {
}
}
Any idea how to fix this ?
Thanks !
You define classBody to either be a sequence of field definitions or a sequence of method definitions. You don't allow for it to be a sequence of both.
If you change it to (methodHeader | field)* instead, you'll get a sequence that can contain either.
I found the issue in the parser. The problem come from classBody.
classBody : methodHeader* | field*;
Instead Ive written:
classHeader : 'class' TYPE_IDENTIFIER ('extends' TYPE_IDENTIFIER)? '{' classBody* '}';
classBody : methodHeader | field;
Related
I am creating parser and lexer rules for Decaf programming language written in ANTLR4. I'm trying to parse a test file and keep getting an error, there must be something wrong in the grammar but i cant figure it out.
My test file looks like:
class Program {
int i[10];
}
The error is : line 2:8 mismatched input '10' expecting INT_LITERAL
And here is the full Decaf.g4 grammar file
grammar Decaf;
/*
LEXER RULES
-----------
Lexer rules define the basic syntax of individual words and symbols of a
valid Decaf program. Lexer rules follow regular expression syntax.
Complete the lexer rules following the Decaf Language Specification.
*/
CLASS : 'class';
INT : 'int';
RETURN : 'return';
VOID : 'void';
IF : 'if';
ELSE : 'else';
FOR : 'for';
BREAK : 'break';
CONTINUE : 'continue';
CALLOUT : 'callout';
TRUE : 'True' ;
FALSE : 'False' ;
BOOLEAN : 'boolean';
LCURLY : '{';
RCURLY : '}';
LBRACE : '(';
RBRACE : ')';
LSQUARE : '[';
RSQUARE : ']';
ADD : '+';
SUB : '-';
MUL : '*';
DIV : '/';
EQ : '=';
SEMI : ';';
COMMA : ',';
AND : '&&';
LESS : '<';
GREATER : '>';
LESSEQUAL : '<=' ;
GREATEREQUAL : '>=' ;
EQUALTO : '==' ;
NOTEQUAL : '!=' ;
EXCLAMATION : '!';
fragment CHAR : (' '..'!') | ('#'..'&') | ('('..'[') | (']'..'~') | ('\\'[']) | ('\\"') | ('\\') | ('\t') | ('\n');
CHAR_LITERAL : '\'' CHAR '\'';
//STRING_LITERAL : '"' CHAR+ '"' ;
HEXMARK : '0x';
fragment HEXA : [a-fA-F];
fragment HEXDIGIT : DIGIT | HEXA ;
HEX_LITERAL : HEXMARK HEXDIGIT+;
STRING : '"' (ESC|.)*? '"';
fragment ESC : '\\"' | '\\\\';
fragment DIGIT : [0-9];
DECIMAL_LITERAL : DIGIT(DIGIT)*;
COMMENT : '//' ~('\n')* '\n' -> skip;
WS : (' ' | '\n' | '\t' | '\r') + -> skip;
fragment ALPHA : [a-zA-Z] | '_';
fragment ALPHA_NUM : ALPHA | DIGIT;
ID : ALPHA ALPHA_NUM*;
INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;
BOOL_LITERAL : TRUE | FALSE;
/*
PARSER RULES
------------
Parser rules are all lower case, and make use of lexer rules defined above
and other parser rules defined below. Parser rules also follow regular
expression syntax. Complete the parser rules following the Decaf Language
Specification.
*/
program : CLASS ID LCURLY field_decl* method_decl* RCURLY EOF;
field_name : ID | ID LSQUARE INT_LITERAL RSQUARE;
field_decl : datatype field_name (COMMA field_name)* SEMI;
method_decl : (datatype | VOID) ID LBRACE ((datatype ID) (COMMA datatype ID)*)? RBRACE block;
block : LCURLY var_decl* statement* RCURLY;
var_decl : datatype ID (COMMA ID)* SEMI;
datatype : INT | BOOLEAN;
statement : location assign_op expr SEMI
| method_call SEMI
| IF LBRACE expr RBRACE block (ELSE block)?
| FOR ID EQ expr COMMA expr block
| RETURN (expr)? SEMI
| BREAK SEMI
| CONTINUE SEMI
| block;
assign_op : EQ
| ADD EQ
| SUB EQ;
method_call : method_name LBRACE (expr (COMMA expr)*)? RBRACE
| CALLOUT LBRACE STRING(COMMA callout_arg (COMMA callout_arg)*) RBRACE;
method_name : ID;
location : ID | ID LSQUARE expr RSQUARE;
expr : location
| method_call
| literal
| expr bin_op expr
| SUB expr
| EXCLAMATION expr
| LBRACE expr RBRACE;
callout_arg : expr
| STRING ;
bin_op : arith_op
| rel_op
| eq_op
| cond_op;
arith_op : ADD | SUB | MUL | DIV | '%' ;
rel_op : LESS | GREATER | LESSEQUAL | GREATEREQUAL ;
eq_op : EQUALTO | NOTEQUAL ;
cond_op : AND | '||' ;
literal : INT_LITERAL | CHAR_LITERAL | BOOL_LITERAL ;
Whenever there are 2 or more lexer rules that match the same characters, the one defined first wins. In your case, these 2 rules both match 10:
DECIMAL_LITERAL : DIGIT(DIGIT)*;
INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;
and since INT_LITERAL is defined after DECIMAL_LITERAL, the lexer will never create a INT_LITERAL token. If you now try to use it in a parser rule, you get an error message you posted.
The solution: remove INT_LITERAL from your lexer and create a parser rule instead:
int_literal : DECIMAL_LITERAL | HEX_LITERAL;
and use int_literal in your parser rules instead.
I am having a bit of difficulty in my g4 file. Below is my grammar:
// Define a grammar called Hello
grammar GYOO;
program : 'begin' block+ 'end';
block
: statement+
;
statement
: assign
| print
| add
| ifstatement
| OTHER {System.err.println("unknown char: " + $OTHER.text);}
;
assign
: 'let' ID 'be' expression
;
print
: 'print' (NUMBER | ID)
;
ifstatement
: 'if' condition_block (ELSE IF condition_block)* (ELSE stat_block)?
;
add
: (NUMBER | ID) OPERATOR (NUMBER | ID) ASSIGN ID
;
stat_block
: OBRACE block CBRACE
| statement
;
condition_block
: expression stat_block
;
expression
: NOT expression //notExpr
| expression (MULT | DIV | MOD) expression //multiplicationExpr
| expression (PLUS | MINUS) expression //additiveExpr
| expression (LTEQ | GTEQ | LT | GT) expression //relationalExpr
| expression (EQ | NEQ) expression //equalityExpr
| expression AND expression //andExpr
| expression OR expression //orExpr
| atom //atomExpr
;
atom
: (NUMBER | FLOAT) //numberAtom
| (TRUE | FALSE) //booleanAtom
| ID //idAtom
| STRING //stringAtom
| NULL //nullAtom
;
ID : [a-z]+ ;
NUMBER : [0-9]+ ;
OPERATOR : '+' | '-' | '*' | '/';
ASSIGN : '=';
WS : (' ' | '\t' | '\r' | '\n') + -> skip;
OPAR : '(';
CPAR : ')';
OBRACE : '{';
CBRACE : '}';
TRUE : 'true';
FALSE : 'false';
NULL : 'null';
IF : 'if';
ELSE : 'else';
OR : 'or';
AND : 'and';
EQ : 'is'; //'=='
NEQ : 'is not'; //'!='
GT : 'greater'; //'>'
LT : 'lower'; //'<'
GTEQ : 'is greater'; //'>='
LTEQ : 'is lower'; //'<='
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
MOD : '%';
POW : '^';
NOT : 'not';
FLOAT
: [0-9]+ '.' [0-9]*
| '.' [0-9]+
;
STRING
: '"' (~["\r\n] | '""')* '"'
;
COMMENT
: '/*' .*? '*/' -> channel(HIDDEN)
;
LINE_COMMENT
: '//' ~[\r\n]* -> channel(HIDDEN)
;
OTHER
: .
;
When i try to -gui tree from antlr it shows me this error:
line 2:3 missing OPERATOR at 'a'
This error is given from this code example:
begin
let a be true
if a is true
print a
end
Basically it does not recognizes the ifstatement beggining with IF 'if' and it shows the tree like i am making an assignment.
How can i fix this?
P.S. I also tried to reposition my statements. Also tried to remove all statements and leave only ifstatement, and same thing happens.
Thanks
There is at least one issue:
ID : [a-z]+ ;
...
TRUE : 'true';
FALSE : 'false';
NULL : 'null';
IF : 'if';
ELSE : 'else';
OR : 'or';
...
NOT : 'not';
Since ID is placed before TRUE .. NOT, those tokens will never be created since ID has precedence over them (and ID matches these tokens as well).
Start by moving ID beneath the NOT token.
I have 3 types of numbers defined, number, decimal and percentage.
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Percentage and decimal work fine but when I assign a number, unless I put a sign (+ or -) in front of the number, it doesn't recognize it as a number.
number foo = +5 // does recognize
number foo = 5; // does not recognize
It does recognize it in an evaluation expression.
if (foo == 5 ) // does recognize
Here is my language (I took out the functions and left only the language recognition).
grammar Fetal;
transaction : begin statements end;
begin : 'begin' ;
end : 'end' ;
statements : (statement)+
;
statement
: declaration ';'
| command ';'
| assignment ';'
| evaluation
| ';'
;
declaration : type var;
var returns : identifier;
type returns
: DecimalType
| NumberType
| StringType
| BooleanType
| DateType
| ObjectType
| DaoType
;
assignment
: lharg Equals rharg
| lharg unaryOP rharg
;
assignmentOp : Equals
;
unaryOP : PlusEquals
| MinusEquals
| MultiplyEquals
| DivideEquals
| ModuloEquals
| ExponentEquals
;
expressionOp : arithExpressOp
| bitwiseExpressOp
;
arithExpressOp : Multiply
| Divide
| Plus
| Minus
| Modulo
| Exponent
;
bitwiseExpressOp
: And
| Or
| Not
;
comparisonOp : IsEqualTo
| IsLessThan
| IsLessThanOrEqualTo
| IsGreaterThan
| IsGreaterThanOrEqualTo
| IsNotEqualTo
;
logicExpressOp : AndExpression
| OrExpression
| ExclusiveOrExpression
;
rharg returns
: rharg expressionOp rharg
| '(' rharg expressionOp rharg ')'
| var
| literal
| assignmentCommands
;
lharg returns : var;
identifier : Identifier;
evaluation : IfStatement '(' evalExpression ')' block (Else block)?;
block : OpenBracket statements CloseBracket;
evalExpression
: evalExpression logicExpressOp evalExpression
| '(' evalExpression logicExpressOp evalExpression ')'
| eval
| '(' eval ')'
;
eval : rharg comparisonOp rharg ;
assignmentCommands
: GetBalance '(' stringArg ')'
| GetVariableType '(' var ')'
| GetDescription
| Today
| GetDays '(' startPeriod=dateArg ',' endPeriod=dateArg ')'
| DayOfTheWeek '(' dateArg ')'
| GetCalendarDay '(' dateArg ')'
| GetMonth '(' dateArg ')'
| GetYear '(' dateArg ')'
| Import '(' stringArg ')' /* Import( path ) */
| Lookup '(' sql=stringArg ',' argumentList ')' /* Lookup( table, SQL) */
| List '(' sql=stringArg ',' argumentList ')' /* List( table, SQL) */
| invocation
;
command : Print '(' rharg ')'
| Credit '(' amtArg ',' stringArg ')'
| Debit '(' amtArg ',' stringArg ')'
| Ledger '(' debitOrCredit ',' amtArg ',' acc=stringArg ',' desc=stringArg ')'
| Alias '(' account=stringArg ',' name=stringArg ')'
| MapFile ':' stringArg
| invocation
| Update '(' sql=stringArg ',' argumentList ')'
;
invocation
: o=objectLiteral '.' m=identifier '('argumentList? ')'
| o=objectLiteral '.' m=identifier '()'
;
argumentList
: rharg (',' rharg )*
;
amtArg : rharg ;
stringArg : rharg ;
numberArg : rharg ;
dateArg : rharg ;
debitOrCredit : charLiteral ;
literal
: numericLiteral
| doubleLiteral
| booleanLiteral
| percentLiteral
| stringLiteral
| dateLiteral
;
fileName : '<' fn=Identifier ('.' ft=Identifier)? '>' ;
charLiteral : ('D' | 'C');
numericLiteral : Number ;
doubleLiteral : Decimal ;
percentLiteral : Percentage ;
booleanLiteral : Boolean ;
stringLiteral : String ;
dateLiteral : Date ;
objectLiteral : Identifier ;
daoLiteral : Identifier ;
//Below are Token definitions
// Data Types
DecimalType : 'decimal' ;
NumberType : 'number' ;
StringType : 'string' ;
BooleanType : 'boolean' ;
DateType : 'date' ;
ObjectType : 'object' ;
DaoType : 'dao' ;
/******************************************************************
* Assignmnt operator
******************************************************************/
Equals : '=' ;
/*****************************************************************
* Unary operators
*****************************************************************/
PlusEquals : '+=' ;
MinusEquals : '-=' ;
MultiplyEquals : '*=' ;
DivideEquals : '/=' ;
ModuloEquals : '%=' ;
ExponentEquals : '^=' ;
/*****************************************************************
* Binary operators
*****************************************************************/
Plus : '+' ;
Minus : '-' ;
Multiply : '*' ;
Divide : '/' ;
Modulo : '%' ;
Exponent : '^' ;
/***************************************************************
* Bitwise operators
***************************************************************/
And : '&' ;
Or : '|' ;
Not : '!' ;
/*************************************************************
* Compariso operators
*************************************************************/
IsEqualTo : '==' ;
IsLessThan : '<' ;
IsLessThanOrEqualTo : '<=' ;
IsGreaterThan : '>' ;
IsGreaterThanOrEqualTo : '>=' ;
IsNotEqualTo : '!=' ;
/*************************************************************
* Expression operators
*************************************************************/
AndExpression : '&&' ;
OrExpression : '||' ;
ExclusiveOrExpression : '^^' ;
// Reserve words (Assignment Commands)
GetBalance : 'getBalance';
GetVariableType : 'getVariableType' ;
GetDescription : 'getDescription' ;
Today : 'today';
GetDays : 'getDays' ;
DayOfTheWeek : 'dayOfTheWeek' ;
GetCalendarDay : 'getCalendarDay' ;
GetMonth : 'getMonth' ;
GetYear : 'getYear' ;
Import : 'import' ;
Lookup : 'lookup' ;
List : 'list' ;
// Reserve words (Commands)
Credit : 'credit';
Debit : 'debit';
Ledger : 'ledger';
Alias : 'alias' ;
MapFile : 'mapFile' ;
Update : 'update' ;
Print : 'print';
IfStatement : 'if';
Else : 'else';
OpenBracket : '{';
CloseBracket : '}';
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Boolean : 'true' | 'false';
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Date : Year '-' Month '-' Day;
Identifier
: IdentifierNondigit
( IdentifierNondigit
| Digit
)*
;
String: '"' ( ESC | ~[\\"] )* '"';
/************************************************************
* Fragment Definitions
************************************************************/
fragment
ESC : '\\' [abtnfrv"'\\]
;
fragment
IdentifierNondigit
: Nondigit
//| // other implementation-defined characters...
;
fragment
Nondigit
: [a-zA-Z_]
;
fragment
Digit
: [0-9]
;
fragment
Sign : Plus | Minus;
fragment
Digits
: [-+]?[0-9]+
;
fragment
Year
: Digit Digit Digit Digit;
fragment
Month
: Digit Digit;
fragment
Day
: Digit Digit;
fragment Dot : '.';
fragment
SCharSequence
: SChar+
;
fragment
SChar
: ~["\\\r\n]
| SimpleEscapeSequence
| '\\\n' // Added line
| '\\\r\n' // Added line
;
fragment
CChar
: ~['\\\r\n]
| SimpleEscapeSequence
;
fragment
SimpleEscapeSequence
: '\\' ['"?abfnrtv\\]
;
ExtendedAscii
: [\x80-\xfe]+
-> skip
;
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
BlockComment
: '/*' .*? '*/'
-> skip
;
LineComment
: '//' ~[\r\n]*
-> skip
;
I have a hunch that this use of a fragment is incorrect:
fragment Sign : Plus | Minus;
I couldn't find anything in the reference book, but I think it needs to be changed to something like this:
fragment Sign : [+-];
I found the issue. I was using version 4.5.2-1 because every attempt to upgrade to 4.7 caused more errors and I didn't want to cause more errors while trying to solve another. I finally broke down and upgraded the libraries to 4.7, fixed the errors and the number recognition issue disappeared. It was a bug in the library, all this time.
I'm trying to implement a grammar for parsing lucene queries. So far everything went smooth until i tried to add support for range queries . Lucene details aside my grammar looks like this :
grammar ModifiedParser;
TERM_RANGE : '[' ('*' | TERM_TEXT) 'TO' ('*' | TERM_TEXT) ']'
| '{' ('*' | TERM_TEXT) 'TO' ('*' | TERM_TEXT) '}'
;
query : not (booleanOperator? not)* ;
booleanOperator : andClause
| orClause
;
andClause : 'AND' ;
notClause : 'NOT' ;
orClause : 'OR' ;
not : notClause? MODIFIER? clause;
clause : unqualified
| qualified
;
unqualified : TERM_RANGE # termRange
| TERM_PHRASE # termPhrase
| TERM_PHRASE_ANYTHING # termTruncatedPhrase
| '(' query ')' # queryUnqualified
| TERM_TEXT_TRUNCATED # termTruncatedText
| TERM_NORMAL # termText
;
qualified : TERM_NORMAL ':' unqualified
;
fragment TERM_CHAR : (~(' ' | '\t' | '\n' | '\r' | '\u3000'
| '\'' | '\"' | '(' | ')' | '[' | ']' | '{' | '}'
| '+' | '-' | '!' | ':' | '~' | '^'
| '?' | '*' | '\\' ))
;
fragment TERM_START_CHAR : TERM_CHAR
| ESCAPE
;
fragment ESCAPE : '\\' ~[];
MODIFIER : '-'
| '+'
;
AND : 'AND';
OR : 'OR';
NOT : 'NOT';
TERM_PHRASE_ANYTHING : '"' (ESCAPE|~('\"'|'\\'))+ '"' ;
TERM_PHRASE : '"' (ESCAPE|~('\"'|'\\'|'?'|'*'))+ '"' ;
TERM_TEXT_TRUNCATED : ('*'|'?')(TERM_CHAR+ ('*'|'?'))+ TERM_CHAR*
| TERM_START_CHAR (TERM_CHAR* ('?'|'*'))+ TERM_CHAR+
| ('?'|'*') TERM_CHAR+
;
TERM_NORMAL : TERM_TEXT;
fragment TERM_TEXT : TERM_START_CHAR TERM_CHAR* ;
WS : [ \t\r\n] -> skip ;
When i try to do a visitor and work with the tokens apparently parsing asd [ 10 TO 100 ] { 1 TO 1000 } 100..1000 will throw token recognition error for [ , ] , } and {, and only tries to visit the termRange rule on the third range . do you guys know what i'm missing here ? Thanks in advance
Since you made TERM_RANGE a lexer rule, you must account for everything at a character level. In particular, you forgot to allow whitespace characters in your input.
You would likely be in a much better position if you instead created termRange, a parser rule.
NOTE: This is a continuation of the topic posted HERE.
I'm working on a parser for the Jass scripting language (here's an excellent API reference for it) so that I may use it as in interpreter for another language. Using ANTLR4 + ANTLRWorks 2, I have run this complex script to test the lexer/parser's strength, and have passed nearly all tests. The part where it fails is on in an 'elseif' statement, containing an expression with:
an outer parenthesis...
an array element...
a boolean/binary operation, AND...
a unary constant integer
...like so:
elseif(si__DroneSystem___data_V[this]!=-1)then (line #53 of the script).
Any changes I've made to the grammar fails to get ANTLR to recognize this input as a proper expression. The following grammar is what I've managed to write, thus far:
grammar Jass;
//----------------------------------------------------------------------
// Global Declarations
//----------------------------------------------------------------------
program : file+
;
file : declaration* function
;
declaration : globals | typedef | native_func
;
typedef : KEYWORD_TYPE identifier KEYWORD_EXTENDS (TYPE_HANDLE | identifier)
;
globals : KEYWORD_GLOBALS global_var_list KEYWORD_ENDGLOBALS
;
global_var_list : var_declaration*
;
native_func : KEYWORD_CONSTANT? KEYWORD_NATIVE func_declaration
;
func_declaration : identifier KEYWORD_TAKES (KEYWORD_NOTHING | parameter_list) KEYWORD_RETURNS (KEYWORD_NOTHING | type)
;
parameter_list : type identifier (',' type identifier)*
;
function : KEYWORD_CONSTANT? KEYWORD_FUNCTION func_declaration local_var_list statement_list KEYWORD_ENDFUNCTION
;
//----------------------------------------------------------------------
// Local Declarations
//----------------------------------------------------------------------
local_var_list : (KEYWORD_LOCAL? var_declaration)*
;
var_declaration : KEYWORD_CONSTANT type identifier '=' expression
| type identifier ('=' expression)? | type TYPE_ARRAY identifier
;
//----------------------------------------------------------------------
// Statements
//----------------------------------------------------------------------
statement_list : statement*
;
statement : set | call | if_statement | loop | exitwhen | return_statement | debug
;
set : KEYWORD_SET identifier '=' expression | KEYWORD_SET identifier OPENBRACKET expression CLOSEBRACKET '=' expression
;
call : KEYWORD_CALL identifier OPENPARENTHESIS args? CLOSEPARENTHESIS
;
args : expression (COMMA expression)*
;
if_statement : KEYWORD_IF expression KEYWORD_THEN statement_list else_clause? KEYWORD_ENDIF
;
else_clause : KEYWORD_ELSEIF ((OPENPARENTHESIS expression CLOSEPARENTHESIS) | expression) KEYWORD_THEN statement_list
| KEYWORD_ELSE ((OPENPARENTHESIS statement_list CLOSEPARENTHESIS) | statement_list) else_clause?
;
loop : KEYWORD_LOOP statement_list KEYWORD_ENDLOOP
;
// must appear in a loop
exitwhen : KEYWORD_EXITWHEN expression
;
return_statement : KEYWORD_RETURN expression?
;
debug : KEYWORD_DEBUG (set | call | if_statement | loop)
;
//----------------------------------------------------------------------
// Expressions
//----------------------------------------------------------------------
expression : parenthesis
| func_call
| array_ref
| (boolean_expression | binary_operation)
| unary_operation
| function_reference
| const_statement
| identifier
;
binary_operation : terminal (('+'|'-'|'*'|'/'|'>'|'<'|'=='|'!='|'>='|'<=') terminal)
;
unary_operation : ('+'|'-'|'not') terminal
;
boolean_expression : ('and'|'not')? terminal (('=='|'!=') terminal) ('and'|'or')?
;
terminal : factor*/(factor)
;
factor : identifier
| const_statement
| parenthesis
| brackets
;
parenthesis : OPENPARENTHESIS expression CLOSEPARENTHESIS
;
brackets : OPENBRACKET expression CLOSEBRACKET
;
// expression must be integer or real when used with unary '+'
func_call : identifier OPENPARENTHESIS args? CLOSEPARENTHESIS
;
array_ref : identifier OPENBRACKET expression CLOSEBRACKET
;
function_reference : KEYWORD_FUNCTION identifier
;
const_statement : INTEGER_CONST | REAL_CONST | BOOL_CONST | STRING_CONST | ASSIGNMENT_TYPE_NULL
;
FOURCC : QUOTATION_SINGLE . . . . QUOTATION_SINGLE
;
INTEGER_CONST : DECIMAL | OCTAL | HEXIDECIMAL | FOURCC
;
DECIMAL : (DIGIT)+ | (DIGIT+) '.' (DIGIT+)?
;
OCTAL : '0'..'7'+
;
HEXIDECIMAL : '$'(DIGIT|'a'..'f'|'A'..'F')+ | '0'('x'|'X')(DIGIT|'a'..'f'|'A'..'F')+
;
REAL_CONST : (DIGIT)+'.'(DIGIT)* | '.'(DIGIT)+
;
BOOL_CONST : ASSIGNMENT_TYPE_TRUE | ASSIGNMENT_TYPE_FALSE
;
// any double-quotes in the string must be escaped with \
STRING_CONST : QUOTATION_DOUBLE .*? QUOTATION_DOUBLE
;
//----------------------------------------------------------------------
// Base
//----------------------------------------------------------------------
type : nativetype | commontype
;
identifier : ID
;
//////////////////////////////////////////////////////////////////////////////////////////////
// TYPES
//////////////////////////////////////////////////////////////////////////////////////////////
TYPE_BOOLEAN : 'boolean'
;
TYPE_CODE : 'code'
;
TYPE_HANDLE : 'handle'
;
TYPE_INTEGER : 'integer'
;
TYPE_REAL : 'real'
;
TYPE_STRING : 'string'
;
TYPE_ARRAY : 'array'
;
nativetype : TYPE_BOOLEAN
| TYPE_CODE
| TYPE_HANDLE
| TYPE_INTEGER
| TYPE_REAL
| TYPE_STRING
| TYPE_ARRAY
;
TYPE_ABILITY : 'ability'
;
TYPE_AGENT : 'agent'
;
TYPE_AIDIFFICULTY : 'aidifficulty'
;
TYPE_ALLIANCETYPE : 'alliancetype'
;
TYPE_ATTACKTYPE : 'attacktype'
;
TYPE_BLENDMODE : 'blendmode'
;
TYPE_BOOLEXPR : 'boolexpr'
;
TYPE_BUFF : 'buff'
;
TYPE_BUTTON : 'button'
;
TYPE_CAMERAFIELD : 'camerafield'
;
TYPE_CAMERASETUP : 'camerasetup'
;
TYPE_CONDITIONFUNC : 'conditionfunc'
;
TYPE_DAMAGETYPE : 'damagetype'
;
TYPE_DEFEATCONDITION : 'defeatcondition'
;
TYPE_DESTRUCTABLE : 'destructable'
;
TYPE_DIALOG : 'dialog'
;
TYPE_DIALOGEVENT : 'dialogevent'
;
TYPE_EFFECT : 'effect'
;
TYPE_EVENTID : 'eventid'
;
TYPE_FGAMESTATE : 'fgamestate'
;
TYPE_FILTERFUNC : 'filterfunc'
;
TYPE_FOGMODIFIER : 'fogmodifier'
;
TYPE_FOGSTATE : 'fogstate'
;
TYPE_FORCE : 'force'
;
TYPE_GAMECACHE : 'gamecache'
;
TYPE_GAMEDIFFICULTY : 'gamedifficulty'
;
TYPE_GAMEEVENT : 'gameevent'
;
TYPE_GAMESPEED : 'gamespeed'
;
TYPE_GAMESTATE : 'gamestate'
;
TYPE_GAMETYPE : 'gametype'
;
TYPE_GROUP : 'group'
;
TYPE_HASHTABLE : 'hashtable'
;
TYPE_IGAMESTATE : 'igamestate'
;
TYPE_IMAGE : 'image'
;
TYPE_ITEM : 'item'
;
TYPE_ITEMPOOL : 'itempool'
;
TYPE_ITEMTYPE : 'itemtype'
;
TYPE_LEADERBOARD : 'leaderboard'
;
TYPE_LIGHTNING : 'lightning'
;
TYPE_LIMITOP : 'limitop'
;
TYPE_LOCATION : 'location'
;
TYPE_MAPCONTROL : 'mapcontrol'
;
TYPE_MAPDENSITY : 'mapdensity'
;
TYPE_MAPFLAG : 'mapflag'
;
TYPE_MAPSETTING : 'mapsettings'
;
TYPE_MAPVISIBILITY : 'mapvisibility'
;
TYPE_MULTIBOARD : 'multiboard'
;
TYPE_MULTIBOARDITEM : 'multiboarditem'
;
TYPE_PATHINGTYPE : 'pathingtype'
;
TYPE_PLACEMENT : 'placement'
;
TYPE_PLAYER : 'player'
;
TYPE_PLAYERCOLOR : 'playercolor'
;
TYPE_PLAYEREVENT : 'playerevent'
;
TYPE_PLAYERGAMERESULT : 'playergameresult'
;
TYPE_PLAYERSCORE : 'playerscore'
;
TYPE_PLAYERSLOTSTATE : 'playerslotstate'
;
TYPE_PLAYERSTATE : 'playerstate'
;
TYPE_PLAYERUNITEVENT : 'playerunitevent'
;
TYPE_QUEST : 'quest'
;
TYPE_QUESTITEM : 'questitem'
;
TYPE_RACE : 'race'
;
TYPE_RACEPREFERENCE : 'racepreference'
;
TYPE_RARITYCONTROL : 'raritycontrol'
;
TYPE_RECT : 'rect'
;
TYPE_REGION : 'region'
;
TYPE_SOUND : 'sound'
;
TYPE_SOUNDTYPE : 'soundtype'
;
TYPE_STARTLOCPRIO : 'startlocprio'
;
TYPE_TERRAINDEFORMATION : 'terraindeformation'
;
TYPE_TEXMAPFLAGS : 'texmapflags'
;
TYPE_TEXTTAG : 'texttag'
;
TYPE_TIMER : 'timer'
;
TYPE_TIMERDIALOG : 'timerdialog'
;
TYPE_TRACKABLE : 'trackable'
;
TYPE_TRIGGER : 'trigger'
;
TYPE_TRIGGERACTION : 'triggeraction'
;
TYPE_TRIGGERCONDITION : 'triggercondition'
;
TYPE_UBERSPLAT : 'ubersplat'
;
TYPE_UNIT : 'unit'
;
TYPE_UNITEVENT : 'unitevent'
;
TYPE_UNITPOOL : 'unitpool'
;
TYPE_UNITSTATE : 'unitstate'
;
TYPE_UNITTYPE : 'unittype'
;
TYPE_VERSION : 'version'
;
TYPE_VOLUMEGROUP : 'volumegroup'
;
TYPE_WEAPONTYPE : 'weapontype'
;
TYPE_WEATHEREFFECT : 'weathereffect'
;
TYPE_WIDGET : 'widget'
;
TYPE_WIDGETEVENT : 'widgetevent'
;
commontype : TYPE_ABILITY
| TYPE_AGENT
| TYPE_AIDIFFICULTY
| TYPE_ALLIANCETYPE
| TYPE_ATTACKTYPE
| TYPE_BLENDMODE
| TYPE_BOOLEXPR
| TYPE_BUFF
| TYPE_BUTTON
| TYPE_CAMERAFIELD
| TYPE_CAMERASETUP
| TYPE_CONDITIONFUNC
| TYPE_DAMAGETYPE
| TYPE_DEFEATCONDITION
| TYPE_DESTRUCTABLE
| TYPE_DIALOG
| TYPE_DIALOGEVENT
| TYPE_EFFECT
| TYPE_EVENTID
| TYPE_FGAMESTATE
| TYPE_FILTERFUNC
| TYPE_FOGMODIFIER
| TYPE_FOGSTATE
| TYPE_FORCE
| TYPE_GAMECACHE
| TYPE_GAMEDIFFICULTY
| TYPE_GAMEEVENT
| TYPE_GAMESPEED
| TYPE_GAMESTATE
| TYPE_GAMETYPE
| TYPE_GROUP
| TYPE_HASHTABLE
| TYPE_IGAMESTATE
| TYPE_IMAGE
| TYPE_ITEM
| TYPE_ITEMPOOL
| TYPE_ITEMTYPE
| TYPE_LEADERBOARD
| TYPE_LIGHTNING
| TYPE_LIMITOP
| TYPE_LOCATION
| TYPE_MAPCONTROL
| TYPE_MAPDENSITY
| TYPE_MAPFLAG
| TYPE_MAPSETTING
| TYPE_MAPVISIBILITY
| TYPE_MULTIBOARD
| TYPE_MULTIBOARDITEM
| TYPE_PATHINGTYPE
| TYPE_PLACEMENT
| TYPE_PLAYER
| TYPE_PLAYERCOLOR
| TYPE_PLAYEREVENT
| TYPE_PLAYERGAMERESULT
| TYPE_PLAYERSCORE
| TYPE_PLAYERSLOTSTATE
| TYPE_PLAYERSTATE
| TYPE_PLAYERUNITEVENT
| TYPE_QUEST
| TYPE_QUESTITEM
| TYPE_RACE
| TYPE_RACEPREFERENCE
| TYPE_RARITYCONTROL
| TYPE_RECT
| TYPE_REGION
| TYPE_SOUND
| TYPE_SOUNDTYPE
| TYPE_STARTLOCPRIO
| TYPE_TERRAINDEFORMATION
| TYPE_TEXMAPFLAGS
| TYPE_TEXTTAG
| TYPE_TIMER
| TYPE_TIMERDIALOG
| TYPE_TRACKABLE
| TYPE_TRIGGER
| TYPE_TRIGGERACTION
| TYPE_TRIGGERCONDITION
| TYPE_UBERSPLAT
| TYPE_UNIT
| TYPE_UNITEVENT
| TYPE_UNITPOOL
| TYPE_UNITSTATE
| TYPE_UNITTYPE
| TYPE_VERSION
| TYPE_VOLUMEGROUP
| TYPE_WEAPONTYPE
| TYPE_WEATHEREFFECT
| TYPE_WIDGET
| TYPE_WIDGETEVENT
;
//////////////////////////////////////////////////////////////////////////////////////////////
ASSIGNMENT_TYPE_NULL : 'null'
;
ASSIGNMENT_TYPE_INTEGER : DIGIT
;
ASSIGNMENT_TYPE_REAL : REAL_CONST
;
ASSIGNMENT_TYPE_TRUE : 'true'
;
ASSIGNMENT_TYPE_FALSE : 'false'
;
KEYWORD_DEBUG : 'debug'
;
KEYWORD_EXTENDS : 'extends'
;
KEYWORD_NATIVE : 'native'
;
KEYWORD_FUNCTION : 'function'
;
KEYWORD_ENDFUNCTION : 'endfunction'
;
KEYWORD_TAKES : 'takes'
;
KEYWORD_NOTHING : 'nothing'
;
KEYWORD_RETURNS : 'returns'
;
KEYWORD_CALL : 'call'
;
KEYWORD_RETURN : 'return'
;
KEYWORD_GLOBALS : 'globals'
;
KEYWORD_ENDGLOBALS : 'endglobals'
;
KEYWORD_LOCAL : 'local'
;
KEYWORD_CONSTANT : 'constant'
;
KEYWORD_SET : 'set'
;
KEYWORD_IF : 'if'
;
KEYWORD_THEN : 'then'
;
KEYWORD_ELSEIF : 'elseif'
;
KEYWORD_ELSE : 'else'
;
KEYWORD_ENDIF : 'endif'
;
KEYWORD_LOOP : 'loop'
;
KEYWORD_EXITWHEN : 'exitwhen'
;
KEYWORD_ENDLOOP : 'endloop'
;
KEYWORD_TYPE : 'type'
;
ID : (LETTER)((LETTER|DIGIT|'_'+)*)?
;
fragment
LETTER : '\u0024' // $
| '\u0041'..'\u005a' // A-Z
| '\u005f' // _
| '\u0061'..'\u007a' // a-z
| '\u00c0'..'\u00d6' // Latin Capital Letter A with grave - Latin Capital letter O with diaeresis
| '\u00d8'..'\u00f6' // Latin Capital letter O with stroke - Latin Small Letter O with diaeresis
| '\u00f8'..'\u00ff' // Latin Small Letter O with stroke - Latin Small Letter Y with diaeresis
| '\u0100'..'\u1fff' // Latin Capital Letter A with macron - Latin Small Letter O with stroke and acute
| '\u3040'..'\u318f' // Hiragana
| '\u3300'..'\u337f' // CJK compatibility
| '\u3400'..'\u3d2d' // CJK compatibility
| '\u4e00'..'\u9fff' // CJK compatibility
| '\uf900'..'\ufaff' // CJK compatibility
;
fragment
DIGIT : '0'..'9'/*'\u0030'..'\u0039' // 0-9
| '\u0660'..'\u0669' // Arabic-Indic Digit 0-9
| '\u06f0'..'\u06f9' // Extended Arabic-Indic Digit 0-9
| '\u0966'..'\u096f' // Devanagari 0-9
| '\u09e6'..'\u09ef' // Bengali 0-9
| '\u0a66'..'\u0a6f' // Gurmukhi 0-9
| '\u0ae6'..'\u0aef' // Gujarati 0-9
| '\u0b66'..'\u0b6f' // Oriya 0-9
| '\u0be7'..'\u0bef' // Tami 0-9
| '\u0c66'..'\u0c6f' // Telugu 0-9
| '\u0ce6'..'\u0cef' // Kannada 0-9
| '\u0d66'..'\u0d6f' // Malayala 0-9
| '\u0e50'..'\u0e59' // Thai 0-9
| '\u0ed0'..'\u0ed9' // Lao 0-9
| '\u1040'..'\u1049' // Myanmar 0-9?*/
;
OPENPARENTHESIS : '('
;
CLOSEPARENTHESIS : ')'
;
OPENBRACKET : '['
;
CLOSEBRACKET : ']'
;
QUOTATION_DOUBLE : '"'
;
QUOTATION_SINGLE : '\''
;
COMMA : ','
;
WS : (' ' | '\t' | '\n'+)+ {skip();}
;
LINE_COMMENT : '//' ~[\r\n]* -> channel(HIDDEN)
;
And from ANTLRWorks...
THIS file is the output log from using TestRig (starting with the first error), and here is an image of the generated parse tree where the first error occurs:
CLICK HERE to enlarge
TO ANYONE who can help me fix this issue: I will gladly upvote your answers, as well as your next 3 questions if you are marked as the answer to this question.
Thanks!
When looking at the BNF rules of an if statement:
ifthenelse
::= 'if' expr 'then' newline statement_list else_clause? 'endif'
else_clause
::= 'else' newline statement_list
| 'elseif' expr 'then' newline statement_list else_clause?
your translation:
if_statement : KEYWORD_IF expression KEYWORD_THEN statement_list else_clause? KEYWORD_ENDIF
;
else_clause : KEYWORD_ELSEIF ((OPENPARENTHESIS expression CLOSEPARENTHESIS) | expression) KEYWORD_THEN statement_list
| KEYWORD_ELSE ((OPENPARENTHESIS statement_list CLOSEPARENTHESIS) | statement_list) else_clause?
;
is incorrect (you have an optional else_clause in the KEYWORD_ELSE alternative).
It should be:
if_statement : KEYWORD_IF expression KEYWORD_THEN statement_list else_clause? KEYWORD_ENDIF
;
else_clause : KEYWORD_ELSE statement_list
| KEYWORD_ELSEIF expression KEYWORD_THEN statement_list else_clause?
;
And not that you don't need ((OPENPARENTHESIS expression CLOSEPARENTHESIS) | expression) since a expression already matches '(' expression ')'.
But the observations above are not the cause of your problem(s). The real issue is that your grammar does not account for unary expressions. It does not match the -1 in the expression si__DroneSystem___data_V[this]!=-1.
Change your expression rule into this:
expression : OPENPARENTHESIS expression CLOSEPARENTHESIS
| OPENBRACKET expression CLOSEBRACKET
| func_call
| array_ref
| function_reference
| const_statement
| identifier
| '+' expression
| '-' expression
| 'not' expression
| expression ('*'|'/') expression
| expression ('+'|'-') expression
| expression ('>'|'<'|'=='|'!='|'>='|'<=') expression
| expression ('and'|'or') expression
| identifier
;
Now input like this:
if this==null then
return
elseif(si__DroneSystem___data_V[this]!=-1)then
return
endif
will be parsed as follows: