I am creating parser and lexer rules for Decaf programming language written in ANTLR4. I'm trying to parse a test file and keep getting an error, there must be something wrong in the grammar but i cant figure it out.
My test file looks like:
class Program {
int i[10];
}
The error is : line 2:8 mismatched input '10' expecting INT_LITERAL
And here is the full Decaf.g4 grammar file
grammar Decaf;
/*
LEXER RULES
-----------
Lexer rules define the basic syntax of individual words and symbols of a
valid Decaf program. Lexer rules follow regular expression syntax.
Complete the lexer rules following the Decaf Language Specification.
*/
CLASS : 'class';
INT : 'int';
RETURN : 'return';
VOID : 'void';
IF : 'if';
ELSE : 'else';
FOR : 'for';
BREAK : 'break';
CONTINUE : 'continue';
CALLOUT : 'callout';
TRUE : 'True' ;
FALSE : 'False' ;
BOOLEAN : 'boolean';
LCURLY : '{';
RCURLY : '}';
LBRACE : '(';
RBRACE : ')';
LSQUARE : '[';
RSQUARE : ']';
ADD : '+';
SUB : '-';
MUL : '*';
DIV : '/';
EQ : '=';
SEMI : ';';
COMMA : ',';
AND : '&&';
LESS : '<';
GREATER : '>';
LESSEQUAL : '<=' ;
GREATEREQUAL : '>=' ;
EQUALTO : '==' ;
NOTEQUAL : '!=' ;
EXCLAMATION : '!';
fragment CHAR : (' '..'!') | ('#'..'&') | ('('..'[') | (']'..'~') | ('\\'[']) | ('\\"') | ('\\') | ('\t') | ('\n');
CHAR_LITERAL : '\'' CHAR '\'';
//STRING_LITERAL : '"' CHAR+ '"' ;
HEXMARK : '0x';
fragment HEXA : [a-fA-F];
fragment HEXDIGIT : DIGIT | HEXA ;
HEX_LITERAL : HEXMARK HEXDIGIT+;
STRING : '"' (ESC|.)*? '"';
fragment ESC : '\\"' | '\\\\';
fragment DIGIT : [0-9];
DECIMAL_LITERAL : DIGIT(DIGIT)*;
COMMENT : '//' ~('\n')* '\n' -> skip;
WS : (' ' | '\n' | '\t' | '\r') + -> skip;
fragment ALPHA : [a-zA-Z] | '_';
fragment ALPHA_NUM : ALPHA | DIGIT;
ID : ALPHA ALPHA_NUM*;
INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;
BOOL_LITERAL : TRUE | FALSE;
/*
PARSER RULES
------------
Parser rules are all lower case, and make use of lexer rules defined above
and other parser rules defined below. Parser rules also follow regular
expression syntax. Complete the parser rules following the Decaf Language
Specification.
*/
program : CLASS ID LCURLY field_decl* method_decl* RCURLY EOF;
field_name : ID | ID LSQUARE INT_LITERAL RSQUARE;
field_decl : datatype field_name (COMMA field_name)* SEMI;
method_decl : (datatype | VOID) ID LBRACE ((datatype ID) (COMMA datatype ID)*)? RBRACE block;
block : LCURLY var_decl* statement* RCURLY;
var_decl : datatype ID (COMMA ID)* SEMI;
datatype : INT | BOOLEAN;
statement : location assign_op expr SEMI
| method_call SEMI
| IF LBRACE expr RBRACE block (ELSE block)?
| FOR ID EQ expr COMMA expr block
| RETURN (expr)? SEMI
| BREAK SEMI
| CONTINUE SEMI
| block;
assign_op : EQ
| ADD EQ
| SUB EQ;
method_call : method_name LBRACE (expr (COMMA expr)*)? RBRACE
| CALLOUT LBRACE STRING(COMMA callout_arg (COMMA callout_arg)*) RBRACE;
method_name : ID;
location : ID | ID LSQUARE expr RSQUARE;
expr : location
| method_call
| literal
| expr bin_op expr
| SUB expr
| EXCLAMATION expr
| LBRACE expr RBRACE;
callout_arg : expr
| STRING ;
bin_op : arith_op
| rel_op
| eq_op
| cond_op;
arith_op : ADD | SUB | MUL | DIV | '%' ;
rel_op : LESS | GREATER | LESSEQUAL | GREATEREQUAL ;
eq_op : EQUALTO | NOTEQUAL ;
cond_op : AND | '||' ;
literal : INT_LITERAL | CHAR_LITERAL | BOOL_LITERAL ;
Whenever there are 2 or more lexer rules that match the same characters, the one defined first wins. In your case, these 2 rules both match 10:
DECIMAL_LITERAL : DIGIT(DIGIT)*;
INT_LITERAL : DECIMAL_LITERAL | HEX_LITERAL;
and since INT_LITERAL is defined after DECIMAL_LITERAL, the lexer will never create a INT_LITERAL token. If you now try to use it in a parser rule, you get an error message you posted.
The solution: remove INT_LITERAL from your lexer and create a parser rule instead:
int_literal : DECIMAL_LITERAL | HEX_LITERAL;
and use int_literal in your parser rules instead.
Related
I am having a bit of difficulty in my g4 file. Below is my grammar:
// Define a grammar called Hello
grammar GYOO;
program : 'begin' block+ 'end';
block
: statement+
;
statement
: assign
| print
| add
| ifstatement
| OTHER {System.err.println("unknown char: " + $OTHER.text);}
;
assign
: 'let' ID 'be' expression
;
print
: 'print' (NUMBER | ID)
;
ifstatement
: 'if' condition_block (ELSE IF condition_block)* (ELSE stat_block)?
;
add
: (NUMBER | ID) OPERATOR (NUMBER | ID) ASSIGN ID
;
stat_block
: OBRACE block CBRACE
| statement
;
condition_block
: expression stat_block
;
expression
: NOT expression //notExpr
| expression (MULT | DIV | MOD) expression //multiplicationExpr
| expression (PLUS | MINUS) expression //additiveExpr
| expression (LTEQ | GTEQ | LT | GT) expression //relationalExpr
| expression (EQ | NEQ) expression //equalityExpr
| expression AND expression //andExpr
| expression OR expression //orExpr
| atom //atomExpr
;
atom
: (NUMBER | FLOAT) //numberAtom
| (TRUE | FALSE) //booleanAtom
| ID //idAtom
| STRING //stringAtom
| NULL //nullAtom
;
ID : [a-z]+ ;
NUMBER : [0-9]+ ;
OPERATOR : '+' | '-' | '*' | '/';
ASSIGN : '=';
WS : (' ' | '\t' | '\r' | '\n') + -> skip;
OPAR : '(';
CPAR : ')';
OBRACE : '{';
CBRACE : '}';
TRUE : 'true';
FALSE : 'false';
NULL : 'null';
IF : 'if';
ELSE : 'else';
OR : 'or';
AND : 'and';
EQ : 'is'; //'=='
NEQ : 'is not'; //'!='
GT : 'greater'; //'>'
LT : 'lower'; //'<'
GTEQ : 'is greater'; //'>='
LTEQ : 'is lower'; //'<='
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
MOD : '%';
POW : '^';
NOT : 'not';
FLOAT
: [0-9]+ '.' [0-9]*
| '.' [0-9]+
;
STRING
: '"' (~["\r\n] | '""')* '"'
;
COMMENT
: '/*' .*? '*/' -> channel(HIDDEN)
;
LINE_COMMENT
: '//' ~[\r\n]* -> channel(HIDDEN)
;
OTHER
: .
;
When i try to -gui tree from antlr it shows me this error:
line 2:3 missing OPERATOR at 'a'
This error is given from this code example:
begin
let a be true
if a is true
print a
end
Basically it does not recognizes the ifstatement beggining with IF 'if' and it shows the tree like i am making an assignment.
How can i fix this?
P.S. I also tried to reposition my statements. Also tried to remove all statements and leave only ifstatement, and same thing happens.
Thanks
There is at least one issue:
ID : [a-z]+ ;
...
TRUE : 'true';
FALSE : 'false';
NULL : 'null';
IF : 'if';
ELSE : 'else';
OR : 'or';
...
NOT : 'not';
Since ID is placed before TRUE .. NOT, those tokens will never be created since ID has precedence over them (and ID matches these tokens as well).
Start by moving ID beneath the NOT token.
I have 3 types of numbers defined, number, decimal and percentage.
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Percentage and decimal work fine but when I assign a number, unless I put a sign (+ or -) in front of the number, it doesn't recognize it as a number.
number foo = +5 // does recognize
number foo = 5; // does not recognize
It does recognize it in an evaluation expression.
if (foo == 5 ) // does recognize
Here is my language (I took out the functions and left only the language recognition).
grammar Fetal;
transaction : begin statements end;
begin : 'begin' ;
end : 'end' ;
statements : (statement)+
;
statement
: declaration ';'
| command ';'
| assignment ';'
| evaluation
| ';'
;
declaration : type var;
var returns : identifier;
type returns
: DecimalType
| NumberType
| StringType
| BooleanType
| DateType
| ObjectType
| DaoType
;
assignment
: lharg Equals rharg
| lharg unaryOP rharg
;
assignmentOp : Equals
;
unaryOP : PlusEquals
| MinusEquals
| MultiplyEquals
| DivideEquals
| ModuloEquals
| ExponentEquals
;
expressionOp : arithExpressOp
| bitwiseExpressOp
;
arithExpressOp : Multiply
| Divide
| Plus
| Minus
| Modulo
| Exponent
;
bitwiseExpressOp
: And
| Or
| Not
;
comparisonOp : IsEqualTo
| IsLessThan
| IsLessThanOrEqualTo
| IsGreaterThan
| IsGreaterThanOrEqualTo
| IsNotEqualTo
;
logicExpressOp : AndExpression
| OrExpression
| ExclusiveOrExpression
;
rharg returns
: rharg expressionOp rharg
| '(' rharg expressionOp rharg ')'
| var
| literal
| assignmentCommands
;
lharg returns : var;
identifier : Identifier;
evaluation : IfStatement '(' evalExpression ')' block (Else block)?;
block : OpenBracket statements CloseBracket;
evalExpression
: evalExpression logicExpressOp evalExpression
| '(' evalExpression logicExpressOp evalExpression ')'
| eval
| '(' eval ')'
;
eval : rharg comparisonOp rharg ;
assignmentCommands
: GetBalance '(' stringArg ')'
| GetVariableType '(' var ')'
| GetDescription
| Today
| GetDays '(' startPeriod=dateArg ',' endPeriod=dateArg ')'
| DayOfTheWeek '(' dateArg ')'
| GetCalendarDay '(' dateArg ')'
| GetMonth '(' dateArg ')'
| GetYear '(' dateArg ')'
| Import '(' stringArg ')' /* Import( path ) */
| Lookup '(' sql=stringArg ',' argumentList ')' /* Lookup( table, SQL) */
| List '(' sql=stringArg ',' argumentList ')' /* List( table, SQL) */
| invocation
;
command : Print '(' rharg ')'
| Credit '(' amtArg ',' stringArg ')'
| Debit '(' amtArg ',' stringArg ')'
| Ledger '(' debitOrCredit ',' amtArg ',' acc=stringArg ',' desc=stringArg ')'
| Alias '(' account=stringArg ',' name=stringArg ')'
| MapFile ':' stringArg
| invocation
| Update '(' sql=stringArg ',' argumentList ')'
;
invocation
: o=objectLiteral '.' m=identifier '('argumentList? ')'
| o=objectLiteral '.' m=identifier '()'
;
argumentList
: rharg (',' rharg )*
;
amtArg : rharg ;
stringArg : rharg ;
numberArg : rharg ;
dateArg : rharg ;
debitOrCredit : charLiteral ;
literal
: numericLiteral
| doubleLiteral
| booleanLiteral
| percentLiteral
| stringLiteral
| dateLiteral
;
fileName : '<' fn=Identifier ('.' ft=Identifier)? '>' ;
charLiteral : ('D' | 'C');
numericLiteral : Number ;
doubleLiteral : Decimal ;
percentLiteral : Percentage ;
booleanLiteral : Boolean ;
stringLiteral : String ;
dateLiteral : Date ;
objectLiteral : Identifier ;
daoLiteral : Identifier ;
//Below are Token definitions
// Data Types
DecimalType : 'decimal' ;
NumberType : 'number' ;
StringType : 'string' ;
BooleanType : 'boolean' ;
DateType : 'date' ;
ObjectType : 'object' ;
DaoType : 'dao' ;
/******************************************************************
* Assignmnt operator
******************************************************************/
Equals : '=' ;
/*****************************************************************
* Unary operators
*****************************************************************/
PlusEquals : '+=' ;
MinusEquals : '-=' ;
MultiplyEquals : '*=' ;
DivideEquals : '/=' ;
ModuloEquals : '%=' ;
ExponentEquals : '^=' ;
/*****************************************************************
* Binary operators
*****************************************************************/
Plus : '+' ;
Minus : '-' ;
Multiply : '*' ;
Divide : '/' ;
Modulo : '%' ;
Exponent : '^' ;
/***************************************************************
* Bitwise operators
***************************************************************/
And : '&' ;
Or : '|' ;
Not : '!' ;
/*************************************************************
* Compariso operators
*************************************************************/
IsEqualTo : '==' ;
IsLessThan : '<' ;
IsLessThanOrEqualTo : '<=' ;
IsGreaterThan : '>' ;
IsGreaterThanOrEqualTo : '>=' ;
IsNotEqualTo : '!=' ;
/*************************************************************
* Expression operators
*************************************************************/
AndExpression : '&&' ;
OrExpression : '||' ;
ExclusiveOrExpression : '^^' ;
// Reserve words (Assignment Commands)
GetBalance : 'getBalance';
GetVariableType : 'getVariableType' ;
GetDescription : 'getDescription' ;
Today : 'today';
GetDays : 'getDays' ;
DayOfTheWeek : 'dayOfTheWeek' ;
GetCalendarDay : 'getCalendarDay' ;
GetMonth : 'getMonth' ;
GetYear : 'getYear' ;
Import : 'import' ;
Lookup : 'lookup' ;
List : 'list' ;
// Reserve words (Commands)
Credit : 'credit';
Debit : 'debit';
Ledger : 'ledger';
Alias : 'alias' ;
MapFile : 'mapFile' ;
Update : 'update' ;
Print : 'print';
IfStatement : 'if';
Else : 'else';
OpenBracket : '{';
CloseBracket : '}';
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Boolean : 'true' | 'false';
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Date : Year '-' Month '-' Day;
Identifier
: IdentifierNondigit
( IdentifierNondigit
| Digit
)*
;
String: '"' ( ESC | ~[\\"] )* '"';
/************************************************************
* Fragment Definitions
************************************************************/
fragment
ESC : '\\' [abtnfrv"'\\]
;
fragment
IdentifierNondigit
: Nondigit
//| // other implementation-defined characters...
;
fragment
Nondigit
: [a-zA-Z_]
;
fragment
Digit
: [0-9]
;
fragment
Sign : Plus | Minus;
fragment
Digits
: [-+]?[0-9]+
;
fragment
Year
: Digit Digit Digit Digit;
fragment
Month
: Digit Digit;
fragment
Day
: Digit Digit;
fragment Dot : '.';
fragment
SCharSequence
: SChar+
;
fragment
SChar
: ~["\\\r\n]
| SimpleEscapeSequence
| '\\\n' // Added line
| '\\\r\n' // Added line
;
fragment
CChar
: ~['\\\r\n]
| SimpleEscapeSequence
;
fragment
SimpleEscapeSequence
: '\\' ['"?abfnrtv\\]
;
ExtendedAscii
: [\x80-\xfe]+
-> skip
;
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
BlockComment
: '/*' .*? '*/'
-> skip
;
LineComment
: '//' ~[\r\n]*
-> skip
;
I have a hunch that this use of a fragment is incorrect:
fragment Sign : Plus | Minus;
I couldn't find anything in the reference book, but I think it needs to be changed to something like this:
fragment Sign : [+-];
I found the issue. I was using version 4.5.2-1 because every attempt to upgrade to 4.7 caused more errors and I didn't want to cause more errors while trying to solve another. I finally broke down and upgraded the libraries to 4.7, fixed the errors and the number recognition issue disappeared. It was a bug in the library, all this time.
I wrote the following grammar which should check for a conditional expression.
Examples below is what I want to achieve using this grammar:
test invalid
test = 1 valid
test = 1 and another_test>=0.2 valid
test = 1 kasd y = 1 invalid (two conditions MUST be separated by AND/OR)
a = 1 or (b=1 and c) invalid (there cannot be a lonely character like 'c'. It should always be a triplet. i.e, literal operator literal)
grammar expression;
expr
: literal_value
| expr ( '='|'<>'| '<' | '<=' | '>' | '>=' ) expr
| expr K_AND expr
| expr K_OR expr
| function_name '(' ( expr ( ',' expr )* | '*' )? ')'
| '(' expr ')'
;
literal_value
: NUMERIC_LITERAL
| STRING_LITERAL
| IDENTIFIER
;
keyword
: K_AND
| K_OR
;
name
: any_name
;
function_name
: any_name
;
database_name
: any_name
;
table_name
: any_name
;
column_name
: any_name
;
any_name
: IDENTIFIER
| keyword
| STRING_LITERAL
| '(' any_name ')'
;
K_AND : A N D;
K_OR : O R;
IDENTIFIER
: '"' (~'"' | '""')* '"'
| '`' (~'`' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [a-zA-Z_0-9]*
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
| '.' DIGIT+ ( E [-+]? DIGIT+ )?
;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
;
fragment DIGIT : [0-9];
fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];
WS: [ \n\t\r]+ -> skip;
So my question is, how can I get the grammar to work for the examples mentioned above? Can we make certain words as mandatory between two triplets (literal operator literal)? In a sense I'm just trying to get a parser to validate the where clause condition but only simple condition and functions are permitted. I also want have a visitor that retrieves the values like function, parenthesis, any literal etc in Java, how to achieve that?
Yes and no.
You can change your grammar to only allow expressions that are comparisons and logical operations on the same:
expr
: term ( '='|'<>'| '<' | '<=' | '>' | '>=' ) term
| expr K_AND expr
| expr K_OR expr
| '(' expr ')'
;
term
: literal_value
| function_name '(' ( expr ( ',' expr )* | '*' )? ')'
;
The issue comes if you want to allow boolean variables or functions -- you need to classify the functions/vars in your lexer and have a different terminal for each, which is tricky and error prone.
Instead, it is generally better to NOT do this kind of checking in the parser -- have your parser be permissive and accept anything expression-like, and generate an expression tree for it. Then have a separate pass over the tree (called a type checker) that checks the types of the operands of operations and the arguments to functions.
This latter approach (with a separate type checker) generally ends up being much simpler, clearer, more flexible, and gives better error messages (rather than just 'syntax error').
I'm having a problem with my grammar.
It says Decision can match input such as "{EQUAL, GREATER..GREATER_EQUAL, LOWER..LOWER_EQUAL, NOT_EQUAL}" using multiple alternatives: 1, 2, Although all the trees of rules are correct.
Anyone to help?!
grammar test;
//parser : .* EOF;
program :T_PROGRAM ID T_LEFTPAR identifier_list T_RIGHTPAR T_SEMICOLON declarations subprogram_declarations compound_statement T_DOT;
identifier_list :(ID) (',' ID)*;
declarations :() (T_VAR identifier_list COLON type T_SEMICOLON)* ;
type : standard_type|
T_ARRAY T_LEFTBRACK NUM T_TO NUM T_RIGHTBRACK T_OF standard_type ;
standard_type : INT
| FLOAT ;
subprogram_declarations :() (subprogram_declaration T_SEMICOLON)* ;
subprogram_declaration : subprogram_head declarations compound_statement;
subprogram_head :T_FUNCTION ID arguments COLON standard_type |
T_PROCEDURE ID arguments ;
arguments :T_LEFTPAR parameter_list T_RIGHTPAR | ;
parameter_list :(identifier_list COLON type) (T_SEMICOLON identifier_list COLON type)*;
compound_statement : T_BEGIN optional_statements T_END;
optional_statements :statement_list | ;
statement_list :(statement) (T_SEMICOLON statement)*;
statement :
variable ASSIGN expression
| procedure_statement
| compound_statement
| T_IF expression T_THEN statement T_ELSE statement
| T_WHILE expression T_DO statement
;
procedure_statement :ID
| ID T_LEFTPAR expression_list T_RIGHTPAR;
expression_list : (expression) (',' expression)*;
variable : ID T_LEFTBRACK expression T_RIGHTBRACK | ;
expression :( () |simple_expression) (( LOWER | LOWER_EQUAL | GREATER | GREATER_EQUAL | EQUAL | NOT_EQUAL ) simple_expression)* ;
simple_expression :
( () | sign ) term (( PLUS | MINUS | T_OR ) term)*;
term :
(factor) (( CROSS | DIVIDE | MOD | T_AND ) factor)*;
factor :
variable
|ID T_LEFTPAR expression_list T_RIGHTPAR
| NUM
| T_LEFTPAR expression T_RIGHTPAR
| T_NOT factor;
sign :
'+'
| '-';
/********/
T_PROGRAM : 'program';
T_FUNCTION : 'function';
T_PROCEDURE : 'procedure';
T_READ : 'read';
T_WRITE : 'write';
T_OF : 'of';
T_ARRAY : 'array';
T_VAR : 'var';
T_FLOAT : 'float';
T_INT : 'int';
T_CHAR : 'char';
T_STRING : 'string';
T_BEGIN : 'begin';
T_END : 'end';
T_IF : 'if';
T_THEN : 'then';
T_ELSE : 'else';
T_WHILE : 'while';
T_DO : 'do';
T_NOT : 'not';
NUM : INT
| FLOAT;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
LOWER : '<';
LOWER_EQUAL : '<=';
GREATER : '>';
GREATER_EQUAL : '>=';
EQUAL : '=';
NOT_EQUAL : '<>';
ASSIGN :
':=';
COLON :
':';
PLUS : '+';
MINUS : '-';
T_OR : 'OR';
CROSS : '*';
DIVIDE : '/';
MOD : 'MOD';
T_AND : 'AND';
T_LEFTPAR
: '(';
T_RIGHTPAR
: ')';
T_LEFTBRACK
: '[';
T_RIGHTBRACK
: ']';
T_TO
: '..';
T_DOT : '.';
T_SEMICOLON
: ';';
T_COMMA
: ',';
T_BADNUM
: (NUM)(CHAR)*;
T_BADSTRING
: '"' ( ESC_SEQ | ~('\\'|'"') )*WS;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
fragment
INT : '0'..'9'+
;
fragment
FLOAT
: ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
Problem 1
The ambiguity (the errors/warnings that certain input can be matched in more than 1 way) originate mainly from the fact that both your variable and expression rules can match empty input:
variable
: ID T_LEFTBRACK expression T_RIGHTBRACK
| /* nothing */
;
expression
: (
(/* nothing */)
| simple_expression
)
(
(LOWER | LOWER_EQUAL | GREATER | GREATER_EQUAL | EQUAL | NOT_EQUAL) simple_expression
)* /* because of the `*`, this can also match nothing */
;
(I added the /* ... */ comments and reformatted the rules to make them more readable)
Fix 1
You probably want to do it like this instead:
variable
: ID T_LEFTBRACK expression T_RIGHTBRACK
;
expression
: simple_expression ((LOWER | LOWER_EQUAL | GREATER | GREATER_EQUAL | EQUAL | NOT_EQUAL) simple_expression)*
;
Problem 2
Another problem (one that will show up once you would have resolved the ambiguities, is that you defined the tokens MOD, T_AND and T_OR after your ID rule:
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
...
MOD : 'MOD';
T_AND : 'AND';
T_OR : 'OR';
This will cause MOD, T_AND and T_OR to be never created since ID matches these characters too.
Fix 2
Place MOD, T_AND and T_OR before your ID rule:
MOD : 'MOD';
T_AND : 'AND';
T_OR : 'OR';
...
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
I am reorganizing my grammar into two files in order to accomodate a tree grammar; Lua.g and LuaGrammar.g. Lua.g will have all of my lexer rules, and LuaGrammar.g will have all of my tree grammar and parser rules. However, when i try and compile LuaGrammar.g i get the following error:
[00:28:37] error(10): internal error: C:\Users\RCIX\Desktop\AguaLua\Project\trunk\AguaLua\AguaLua\ANTLR Data\LuaGrammar.g : java.lang.IllegalArgumentException: Can't find template ruleRefBang.st; group hierarchy is [CSharp2]
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:507)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:484)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:484)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:392)
org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:404)
org.antlr.grammar.v2.CodeGenTreeWalker.getRuleElementST(CodeGenTreeWalker.java:152)
org.antlr.grammar.v2.CodeGenTreeWalker.atom(CodeGenTreeWalker.java:1986)
org.antlr.grammar.v2.CodeGenTreeWalker.element(CodeGenTreeWalker.java:1708)
org.antlr.grammar.v2.CodeGenTreeWalker.element(CodeGenTreeWalker.java:1556)
org.antlr.grammar.v2.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:1306)
org.antlr.grammar.v2.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1081)
org.antlr.grammar.v2.CodeGenTreeWalker.ebnf(CodeGenTreeWalker.java:1871)
org.antlr.grammar.v2.CodeGenTreeWalker.element(CodeGenTreeWalker.java:1704)
org.antlr.grammar.v2.CodeGenTreeWalker.alternative(CodeGenTreeWalker.java:1306)
org.antlr.grammar.v2.CodeGenTreeWalker.block(CodeGenTreeWalker.java:1081)
org.antlr.grammar.v2.CodeGenTreeWalker.rule(CodeGenTreeWalker.java:797)
org.antlr.grammar.v2.CodeGenTreeWalker.rules(CodeGenTreeWalker.java:588)
org.antlr.grammar.v2.CodeGenTreeWalker.grammarSpec(CodeGenTreeWalker.java:530)
org.antlr.grammar.v2.CodeGenTreeWalker.grammar(CodeGenTreeWalker.java:336)
org.antlr.codegen.CodeGenerator.genRecognizer(CodeGenerator.java:432)
org.antlr.Tool.generateRecognizer(Tool.java:641)
org.antlr.Tool.process(Tool.java:454)
org.antlr.works.generate.CodeGenerate.generate(CodeGenerate.java:104)
org.antlr.works.generate.CodeGenerate.run(CodeGenerate.java:185)
java.lang.Thread.run(Unknown Source)
And, i'm getting the following error:
[00:34:58] error(100): C:\Users\RCIX\Desktop\AguaLua\Project\trunk\AguaLua\AguaLua\ANTLR Data\Lua.g:0:0: syntax error: codegen: <AST>:0:0: unexpected end of subtree
when attempting to generate Lua.g. Why am i getting these errors, and how can i fix them? (Using ANTLR V3, am able to provide grammar files)
Update: here is the grammar file i am trying to compile.
tree grammar LuaGrammar;
options {
backtrack=true;
language=CSharp2;
output=AST;
tokenVocab=Lua;
filter=true;
ASTLabelType=CommonTree;
}
assignment
:
^('=' left=NAME right=NAME) {Ast. };
/*
chunk : (stat (';'!)?)* (laststat (';'!)?)?;
block : chunk;
stat : varlist1 '='^ explist1 |
functioncall |
doblock |
'while'^ exp doblock |
'repeat'^ block untilrule |
'if'^ exp thenchunk elseifchunk* elsechunk? 'end'! |
'for'^ forinitializer doblock |
'for'^ namelist inlist doblock |
'function'^ funcname funcbody |
'local' 'function' NAME funcbody |
'local'^ namelist localstat? ;
localstat
: '='^ explist1;
untilrule
: 'until'^ exp;
elseifchunk
: 'elseif'^ exp thenchunk;
thenchunk
: 'then'^ block;
elsechunk
: 'else'^ block;
forinitializer
: NAME '='^ exp ','! exp (','! exp)?;
doblock
: 'do'^ block 'end'!;
inlist
: 'in'^ explist1;
laststat : 'return'^ (explist1)? | 'break';
dotname : '.'! funcname;
colonname
: ':' NAME;
funcname : NAME^ (dotname | colonname)?;
varlist1 : var (','! var)*;
namelist : NAME (','! NAME)*;
explist1 : (exp ','!)* exp;
*/
/*
exp : expelement (binop^ exp)* ;
expelement
: ('nil' | 'false' | 'true' | number | stringrule | '...' | /*function |*\ prefixexp | tableconstructor | unop exp);
var: (namevar | dotvar | expvar | arrayvar)?;
namevar
: NAME^ var;
dotvar
: '.'! var;
expvar
: '('^ exp ')'! var;
arrayvar
: '['^ var ']'! var;
varSuffix: nameAndArgs* ('[' exp ']' | '.' NAME);
prefixexp: varOrExp nameAndArgs*;
functioncall: varOrExp nameAndArgs+;
varOrExp: var | '('! exp ')'!;
nameAndArgs: (':' NAME)? argsrule;
argsrule : '(' (explist1)? ')' | tableconstructor | stringrule ;
function : 'function' funcbody;
funcbody : funcparams funcblock;
funcblock
: ')'^ block 'end'!;
funcparams
: '('^ parlist1? ;
parlist1 : namelist (','! '...')? | '...';
tableconstructor : '{'^ (fieldlist)? '}'!;
fieldlist : field (fieldsep! field)* (fieldsep!)?;
field : '['! exp ']'! '='^ exp | NAME '='^ exp | exp;
*/
fieldsep : ',' | ';';
binop : '+' | '-' | '*' | '/' | '^' | '%' | '..' |
'<' | '<=' | '>' | '>=' | '==' | '~=' |
'and' | 'or';
unop : '-' | 'not' | '#';
number : INT | FLOAT | EXP | HEX;
stringrule : NORMALSTRING | CHARSTRING | LONGSTRING;
Lua.g:
/*
* Lua 5.1 grammar
*
* Nicolai Mainiero
* May 2007
*
* This is a Lua (http://www.lua.org) grammar for the version 5.1 for ANTLR 3.
* I tested it with basic and extended examples and it worked fine. It is also used
* for LunarEclipse (http://lunareclipse.sf.net) a Lua editor based on Eclipse.
*
* Thanks to Johannes Luber and Gavin Lambert who helped me with some mutually left recursion.
*
*/
grammar Lua;
options {
backtrack=true;
language=CSharp2;
//output=AST;
//ASTLabelType=CommonTree;
}
#lexer::namespace{AguaLua}
chunk : (stat (';'!)?)* (laststat (';'!)?)?;
block : chunk;
stat : varlist1 '='^ explist1 |
functioncall |
doblock |
'while'^ exp doblock |
'repeat'^ block untilrule |
'if'^ exp thenchunk elseifchunk* elsechunk? 'end'! |
'for'^ forinitializer doblock |
'for'^ namelist inlist doblock |
'function'^ funcname funcbody |
'local' 'function' NAME funcbody |
'local'^ namelist localstat? ;
localstat
: '='^ explist1;
untilrule
: 'until'^ exp;
elseifchunk
: 'elseif'^ exp thenchunk;
thenchunk
: 'then'^ block;
elsechunk
: 'else'^ block;
forinitializer
: NAME '='^ exp ','! exp (','! exp)?;
doblock
: 'do'^ block 'end'!;
inlist
: 'in'^ explist1;
laststat : 'return'^ (explist1)? | 'break';
dotname : '.'! funcname;
colonname
: ':' NAME;
funcname : NAME^ (dotname | colonname)?;
varlist1 : var (','! var)*;
namelist : NAME (','! NAME)*;
explist1 : (exp ','!)* exp;
exp : expelement (binop^ exp)* ;
expelement
: ('nil' | 'false' | 'true' | number | stringrule | '...' | function | prefixexp | tableconstructor | unop exp);
var: (namevar | dotvar | expvar | arrayvar)?;
namevar
: NAME^ var;
dotvar
: '.'! var;
expvar
: '('^ exp ')'! var;
arrayvar
: '['^ var ']'! var;
varSuffix: nameAndArgs* ('[' exp ']' | '.' NAME);
prefixexp: varOrExp nameAndArgs*;
functioncall: varOrExp nameAndArgs+;
varOrExp: var | '('! exp ')'!;
nameAndArgs: (':' NAME)? argsrule;
argsrule : '(' (explist1)? ')' | tableconstructor | stringrule ;
function : 'function' funcbody;
funcbody : funcparams funcblock;
funcblock
: ')'^ block 'end'!;
funcparams
: '('^ parlist1? ;
parlist1 : namelist (','! '...')? | '...';
tableconstructor : '{'^ (fieldlist)? '}'!;
fieldlist : field (fieldsep! field)* (fieldsep!)?;
field : '['! exp ']'! '='^ exp | NAME '='^ exp | exp;
fieldsep : ',' | ';';
binop : '+' | '-' | '*' | '/' | '^' | '%' | '..' |
'<' | '<=' | '>' | '>=' | '==' | '~=' |
'and' | 'or';
unop : '-' | 'not' | '#';
number : INT | FLOAT | EXP | HEX;
stringrule : NORMALSTRING | CHARSTRING | LONGSTRING;
// LEXER
NAME :('a'..'z'|'A'..'Z'|'_')(options{greedy=true;}: 'a'..'z'|'A'..'Z'|'_'|'0'..'9')*
;
INT : ('0'..'9')+;
FLOAT :INT '.' INT ;
EXP : (INT| FLOAT) ('E'|'e') ('-')? INT;
HEX :'0x' ('0'..'9'| 'a'..'f')+ ;
NORMALSTRING
: '"' ( EscapeSequence | ~('\\'|'"') )* '"'
;
CHARSTRING
: '\'' ( EscapeSequence | ~('\''|'\\') )* '\''
;
LONGSTRING
: '['('=')*'[' ( EscapeSequence | ~('\\'|']') )* ']'('=')*']'
;
fragment
EscapeSequence
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UnicodeEscape
| OctalEscape
;
fragment
OctalEscape
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UnicodeEscape
: '\\' 'u' HexDigit HexDigit HexDigit HexDigit
;
fragment
HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
COMMENT
: '--[[' ( options {greedy=false;} : . )* ']]' {Skip();}
;
LINE_COMMENT : '--' (~ NEWLINE)* {Skip();};
fragment NEWLINE : '\r'|'\n' | '\r\n' ;
WS : (' '|'\t'|'\u000C') {Skip();};
(both are based off of a grammar produced by Nicolai Mainero and available at ANTLR's site, Lua 5.1 grammar)
If i uncomment anymore than this, it comes up with the error above.
Okay, a 'Can't find template ruleRefBang.st' has something to do with the illegal use of a "tree exclude" operator: !. Usually, it is a contradicting rewrite rule: somewhere you have a ! and then rewrite it using -> but use that ignored token anyway. Since I cannot see a -> in your grammar, that can't be the case (unless you simplified the tree grammar to post here and removed some rewrite rules?).
Anyway, I'd start by removing all ! operators in your tree grammar and if your grammar then works put them, one by one, back in again. Then you should be able to pin point the place in your grammar that houses the illegal !.
Good luck!