The following sets of rules are mutually left-recursive TREE GRAMMAR - parsing

I have a complete parser grammer than generates an AST which i could say is correct using the rewrite rules and tree operators. At the moment i am stuck at the phase of creating a tree grammar.I have this error:
The following sets of rules are mutually left-recursive [direct_declarator, declarator] and [abstract_declarator, direct_abstract_declarator]
rewrite syntax or operator with no output option; setting output=AST
Here is my Tree Grammar.
tree grammar walker;
options {
language = Java;
tokenVocab = c2p;
ASTLabelType = CommonTree;
backtrack = true;
}
#header
{
package com.frankdaniel.compiler;
}
translation_unit
: ^(PROGRAM (^(FUNCTION external_declaration))+)
;
external_declaration
options {k=1;}
: (declaration_specifiers? declarator declaration*)=> function_definition
| declaration
;
function_definition
: declaration_specifiers? declarator (declaration+ compound_statement|compound_statement)
;
declaration
: 'typedef' declaration_specifiers? init_declarator_list
| declaration_specifiers init_declarator_list?
;
declaration_specifiers
: ( type_specifier|type_qualifier)+
;
init_declarator_list
: ^(INIT_DECLARATOR_LIST init_declarator+)
;
init_declarator
: declarator (ASSIGN^ initializer)?
;
type_specifier : (CONST)? (VOID | CHAR | INT | FLOAT );
type_id
: IDENTIFIER
//{System.out.println($IDENTIFIER.text+" is a type");}
;
type_qualifier
: CONST
;
declarator
: pointer? direct_declarator
| pointer
;
direct_declarator
: (IDENTIFIER|declarator) declarator_suffix*
;
declarator_suffix
: constant_expression
| parameter_type_list
| identifier_list
;
pointer
: TIMES type_qualifier+ pointer?
| TIMES pointer
| TIMES
;
parameter_type_list
: parameter_list
;
parameter_list
: ^(PARAMETER_LIST parameter_declaration)
;
parameter_declaration
: declaration_specifiers (declarator|abstract_declarator)*
;
identifier_list
: ^(IDENTIFIER_LIST IDENTIFIER+)
;
type_name
: specifier_qualifier_list abstract_declarator?
;
specifier_qualifier_list
: ( type_qualifier | type_specifier )+
;
abstract_declarator
: pointer direct_abstract_declarator?
| direct_abstract_declarator
;
direct_abstract_declarator
: (abstract_declarator | abstract_declarator_suffix ) abstract_declarator_suffix*
;
abstract_declarator_suffix
: constant_expression
| parameter_type_list
;
initializer
: assignment_expression
| initializer_list?
;
initializer_list
: ^(INITIALIZER_LIST initializer+)
;
// EXPRESSIONS
argument_expression_list
: ^(EXPRESSION_LIST assignment_expression+)
;
multiplicative_expression
: ^((TIMES|DIV|MOD) cast_expression cast_expression);
additive_expression
: ^((PLUS|MINUS) multiplicative_expression multiplicative_expression);
cast_expression
: ^(CAST_EXPRESSION type_name cast_expression)
| unary_expression
;
unary_expression
: postfix_expression
| PPLUS unary_expression
| MMINUS unary_expression
| unary_operator cast_expression
;
postfix_expression
: primary_expression
( expression
| argument_expression_list
| IDENTIFIER
| IDENTIFIER
| PPLUS
| MMINUS
)*
;
unary_operator
: TIMES
| PLUS
| MINUS
| NOT
;
primary_expression
: IDENTIFIER
| constant
| expression
;
constant
: HEX_LITERAL
| OCTAL_LITERAL
| DECIMAL_LITERAL
| CHARACTER_LITERAL
| STRING_LITERAL
| FLOATING_POINT_LITERAL
;
////////////////////////////////////////////////////////
expression
: ^(EXPRESSION assignment_expression+)
;
constant_expression
: conditional_expression
;
assignment_expression
: ^(assignment_operator lvalue assignment_expression)
| conditional_expression
;
lvalue
: unary_expression
;
assignment_operator
: ASSIGN
;
conditional_expression : (logical_or_expression) (QUESTIONMARK expression COLON conditional_expression)?;
logical_or_expression : ^(OR logical_and_expression logical_and_expression);
logical_and_expression : ^(AND equality_expression equality_expression);
//equality_expression : (a=relational_expression) ((e=EQUAL|e=NONEQUAL)^ b=relational_expression)?;
equality_expression : ^((EQUAL|NONEQUAL) relational_expression relational_expression);
//relational_expression : additive_expression ((ST|GT|STEQ|GTEQ)^ additive_expression)* ;
relational_expression : ^((ST|GT|STEQ|GTEQ) additive_expression additive_expression);
// STATEMENTS
statement
: compound_statement
| expression_statement
| selection_statement
| iteration_statement
| jump_statement
;
compound_statement
: ^(STATEMENT declaration* statement_list? )
;
statement_list
: statement+
;
expression_statement
:expression
;
selection_statement
:^(IF expression statement (^(ELSE statement))? )
|^(SWITCH expression statement)
;
iteration_statement
: ^(WHILE expression statement)
| ^(DO statement ^(WHILE expression))
| ^(FOR expression_statement expression_statement expression? statement)
;
jump_statement
: CONTINUE
| BREAK
| RETURN
| ^(RETURN expression)
;

It seems obvious that the following two rules are left recursive:
{code}
declarator
: pointer? direct_declarator
| pointer
;
direct_declarator
: (IDENTIFIER|declarator) declarator_suffix*
;
{code}
Rule "declarator" has reference to "direct_declarator", and "direct_declarator" has reference to "declarator", and there's no other predicates to pilot the rule evaluation.

Related

ANTLR4 precedence of operator

This is my grammar:
grammar FOOL;
#header {
import java.util.ArrayList;
}
#lexer::members {
public ArrayList<String> lexicalErrors = new ArrayList<>();
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
prog : exp SEMIC #singleExp
| let exp SEMIC #letInExp
| (classdec)+ SEMIC (let)? exp SEMIC #classExp
;
classdec : CLASS ID ( EXTENDS ID )? (LPAR (vardec ( COMMA vardec)*)? RPAR)? (CLPAR ((fun SEMIC)+)? CRPAR)?;
let : LET (dec SEMIC)+ IN ;
vardec : type ID ;
varasm : vardec ASM exp ;
fun : type ID LPAR ( vardec ( COMMA vardec)* )? RPAR (let)? exp ;
dec : varasm #varAssignment
| fun #funDeclaration
;
type : INT
| BOOL
| ID
;
exp : left=term (operator=(PLUS | MINUS) right=term)*
;
term : left=factor (operator=(TIMES | DIV) right=factor)*
;
factor : left=value (operator=(EQ | LESSEQ | GREATEREQ | GREATER | LESS | AND | OR ) right=value)*
;
value : MINUS?INTEGER #intVal
| (NOT)? ( TRUE | FALSE ) #boolVal
| LPAR exp RPAR #baseExp
| IF cond=exp THEN CLPAR thenBranch=exp CRPAR (ELSE CLPAR elseBranch=exp CRPAR)? #ifExp
| MINUS?ID #varExp
| THIS #thisExp
| funcall #funExp
| (ID | THIS) DOT funcall #methodExp
| NEW ID ( LPAR (exp (COMMA exp)* )? RPAR)? #newExp
| PRINT ( exp ) #print
;
/* PRINT LPAR exp RPAR */
funcall
: ID ( LPAR (exp (COMMA exp)* )? RPAR )
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
SEMIC : ';' ;
COLON : ':' ;
COMMA : ',' ;
EQ : '==' ;
ASM : '=' ;
PLUS : '+' ;
MINUS : '-' ;
TIMES : '*' ;
DIV : '/' ;
TRUE : 'true' ;
FALSE : 'false' ;
LPAR : '(' ;
RPAR : ')' ;
CLPAR : '{' ;
CRPAR : '}' ;
IF : 'if' ;
THEN : 'then' ;
ELSE : 'else' ;
PRINT : 'print' ;
LET : 'let' ;
IN : 'in' ;
VAR : 'var' ;
FUN : 'fun' ;
INT : 'int' ;
BOOL : 'bool' ;
CLASS : 'class' ;
EXTENDS : 'extends' ;
THIS : 'this' ;
NEW : 'new' ;
DOT : '.' ;
LESSEQ : ('<=' | '=<') ;
GREATEREQ : ('>=' | '=>') ;
GREATER: '>' ;
LESS : '<' ;
AND : '&&' ;
OR : '||' ;
NOT : '!' ;
//Numbers
fragment DIGIT : '0'..'9';
INTEGER : DIGIT+;
//IDs
fragment CHAR : 'a'..'z' |'A'..'Z' ;
ID : CHAR (CHAR | DIGIT)* ;
//ESCAPED SEQUENCES
WS : (' '|'\t'|'\n'|'\r')-> skip;
LINECOMENTS : '//' (~('\n'|'\r'))* -> skip;
BLOCKCOMENTS : '/*'( ~('/'|'*')|'/'~'*'|'*'~'/'|BLOCKCOMENTS)* '*/' -> skip;
ERR_UNKNOWN_CHAR
: . { lexicalErrors.add("UNKNOWN_CHAR " + getText()); }
;
I think that there is a problem in the grammar concerning the precedence of operator.
In particular, this one
let
int x = (5-2)+4;
in
print x;
prints 7, while this one:
let
int x = 5-2+4;
in
print x;
prints 9.
Why the first one works? How can I make the second one working, only changing the grammar?
I think there is something to change in exp, term or factor.
This is the first parse tree http://it.tinypic.com/r/2nj8tqw/9 .
This is the second parse tree http://it.tinypic.com/r/2iv02z6/9 .
exp : left=term (operator=(PLUS | MINUS) right=exp)?
This produces parse tree that is causing it. Simply put, 5 - 2 + 4 will be parsed as:
term PLUS exp
2 term MINUS exp
2 term
4
This should help, although you'll have to change the evaluation logic:
exp : left=term (operator=(PLUS | MINUS) right=term)*
Same for factor and any other possible binary operations.

Antlr not recognizing number

I have 3 types of numbers defined, number, decimal and percentage.
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Percentage and decimal work fine but when I assign a number, unless I put a sign (+ or -) in front of the number, it doesn't recognize it as a number.
number foo = +5 // does recognize
number foo = 5; // does not recognize
It does recognize it in an evaluation expression.
if (foo == 5 ) // does recognize
Here is my language (I took out the functions and left only the language recognition).
grammar Fetal;
transaction : begin statements end;
begin : 'begin' ;
end : 'end' ;
statements : (statement)+
;
statement
: declaration ';'
| command ';'
| assignment ';'
| evaluation
| ';'
;
declaration : type var;
var returns : identifier;
type returns
: DecimalType
| NumberType
| StringType
| BooleanType
| DateType
| ObjectType
| DaoType
;
assignment
: lharg Equals rharg
| lharg unaryOP rharg
;
assignmentOp : Equals
;
unaryOP : PlusEquals
| MinusEquals
| MultiplyEquals
| DivideEquals
| ModuloEquals
| ExponentEquals
;
expressionOp : arithExpressOp
| bitwiseExpressOp
;
arithExpressOp : Multiply
| Divide
| Plus
| Minus
| Modulo
| Exponent
;
bitwiseExpressOp
: And
| Or
| Not
;
comparisonOp : IsEqualTo
| IsLessThan
| IsLessThanOrEqualTo
| IsGreaterThan
| IsGreaterThanOrEqualTo
| IsNotEqualTo
;
logicExpressOp : AndExpression
| OrExpression
| ExclusiveOrExpression
;
rharg returns
: rharg expressionOp rharg
| '(' rharg expressionOp rharg ')'
| var
| literal
| assignmentCommands
;
lharg returns : var;
identifier : Identifier;
evaluation : IfStatement '(' evalExpression ')' block (Else block)?;
block : OpenBracket statements CloseBracket;
evalExpression
: evalExpression logicExpressOp evalExpression
| '(' evalExpression logicExpressOp evalExpression ')'
| eval
| '(' eval ')'
;
eval : rharg comparisonOp rharg ;
assignmentCommands
: GetBalance '(' stringArg ')'
| GetVariableType '(' var ')'
| GetDescription
| Today
| GetDays '(' startPeriod=dateArg ',' endPeriod=dateArg ')'
| DayOfTheWeek '(' dateArg ')'
| GetCalendarDay '(' dateArg ')'
| GetMonth '(' dateArg ')'
| GetYear '(' dateArg ')'
| Import '(' stringArg ')' /* Import( path ) */
| Lookup '(' sql=stringArg ',' argumentList ')' /* Lookup( table, SQL) */
| List '(' sql=stringArg ',' argumentList ')' /* List( table, SQL) */
| invocation
;
command : Print '(' rharg ')'
| Credit '(' amtArg ',' stringArg ')'
| Debit '(' amtArg ',' stringArg ')'
| Ledger '(' debitOrCredit ',' amtArg ',' acc=stringArg ',' desc=stringArg ')'
| Alias '(' account=stringArg ',' name=stringArg ')'
| MapFile ':' stringArg
| invocation
| Update '(' sql=stringArg ',' argumentList ')'
;
invocation
: o=objectLiteral '.' m=identifier '('argumentList? ')'
| o=objectLiteral '.' m=identifier '()'
;
argumentList
: rharg (',' rharg )*
;
amtArg : rharg ;
stringArg : rharg ;
numberArg : rharg ;
dateArg : rharg ;
debitOrCredit : charLiteral ;
literal
: numericLiteral
| doubleLiteral
| booleanLiteral
| percentLiteral
| stringLiteral
| dateLiteral
;
fileName : '<' fn=Identifier ('.' ft=Identifier)? '>' ;
charLiteral : ('D' | 'C');
numericLiteral : Number ;
doubleLiteral : Decimal ;
percentLiteral : Percentage ;
booleanLiteral : Boolean ;
stringLiteral : String ;
dateLiteral : Date ;
objectLiteral : Identifier ;
daoLiteral : Identifier ;
//Below are Token definitions
// Data Types
DecimalType : 'decimal' ;
NumberType : 'number' ;
StringType : 'string' ;
BooleanType : 'boolean' ;
DateType : 'date' ;
ObjectType : 'object' ;
DaoType : 'dao' ;
/******************************************************************
* Assignmnt operator
******************************************************************/
Equals : '=' ;
/*****************************************************************
* Unary operators
*****************************************************************/
PlusEquals : '+=' ;
MinusEquals : '-=' ;
MultiplyEquals : '*=' ;
DivideEquals : '/=' ;
ModuloEquals : '%=' ;
ExponentEquals : '^=' ;
/*****************************************************************
* Binary operators
*****************************************************************/
Plus : '+' ;
Minus : '-' ;
Multiply : '*' ;
Divide : '/' ;
Modulo : '%' ;
Exponent : '^' ;
/***************************************************************
* Bitwise operators
***************************************************************/
And : '&' ;
Or : '|' ;
Not : '!' ;
/*************************************************************
* Compariso operators
*************************************************************/
IsEqualTo : '==' ;
IsLessThan : '<' ;
IsLessThanOrEqualTo : '<=' ;
IsGreaterThan : '>' ;
IsGreaterThanOrEqualTo : '>=' ;
IsNotEqualTo : '!=' ;
/*************************************************************
* Expression operators
*************************************************************/
AndExpression : '&&' ;
OrExpression : '||' ;
ExclusiveOrExpression : '^^' ;
// Reserve words (Assignment Commands)
GetBalance : 'getBalance';
GetVariableType : 'getVariableType' ;
GetDescription : 'getDescription' ;
Today : 'today';
GetDays : 'getDays' ;
DayOfTheWeek : 'dayOfTheWeek' ;
GetCalendarDay : 'getCalendarDay' ;
GetMonth : 'getMonth' ;
GetYear : 'getYear' ;
Import : 'import' ;
Lookup : 'lookup' ;
List : 'list' ;
// Reserve words (Commands)
Credit : 'credit';
Debit : 'debit';
Ledger : 'ledger';
Alias : 'alias' ;
MapFile : 'mapFile' ;
Update : 'update' ;
Print : 'print';
IfStatement : 'if';
Else : 'else';
OpenBracket : '{';
CloseBracket : '}';
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Boolean : 'true' | 'false';
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Date : Year '-' Month '-' Day;
Identifier
: IdentifierNondigit
( IdentifierNondigit
| Digit
)*
;
String: '"' ( ESC | ~[\\"] )* '"';
/************************************************************
* Fragment Definitions
************************************************************/
fragment
ESC : '\\' [abtnfrv"'\\]
;
fragment
IdentifierNondigit
: Nondigit
//| // other implementation-defined characters...
;
fragment
Nondigit
: [a-zA-Z_]
;
fragment
Digit
: [0-9]
;
fragment
Sign : Plus | Minus;
fragment
Digits
: [-+]?[0-9]+
;
fragment
Year
: Digit Digit Digit Digit;
fragment
Month
: Digit Digit;
fragment
Day
: Digit Digit;
fragment Dot : '.';
fragment
SCharSequence
: SChar+
;
fragment
SChar
: ~["\\\r\n]
| SimpleEscapeSequence
| '\\\n' // Added line
| '\\\r\n' // Added line
;
fragment
CChar
: ~['\\\r\n]
| SimpleEscapeSequence
;
fragment
SimpleEscapeSequence
: '\\' ['"?abfnrtv\\]
;
ExtendedAscii
: [\x80-\xfe]+
-> skip
;
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
BlockComment
: '/*' .*? '*/'
-> skip
;
LineComment
: '//' ~[\r\n]*
-> skip
;
I have a hunch that this use of a fragment is incorrect:
fragment Sign : Plus | Minus;
I couldn't find anything in the reference book, but I think it needs to be changed to something like this:
fragment Sign : [+-];
I found the issue. I was using version 4.5.2-1 because every attempt to upgrade to 4.7 caused more errors and I didn't want to cause more errors while trying to solve another. I finally broke down and upgraded the libraries to 4.7, fixed the errors and the number recognition issue disappeared. It was a bug in the library, all this time.

antlrworks with multiple equals

I'm having a problem with my grammar.
It says Decision can match input such as "{EQUAL, GREATER..GREATER_EQUAL, LOWER..LOWER_EQUAL, NOT_EQUAL}" using multiple alternatives: 1, 2, Although all the trees of rules are correct.
Anyone to help?!
grammar test;
//parser : .* EOF;
program :T_PROGRAM ID T_LEFTPAR identifier_list T_RIGHTPAR T_SEMICOLON declarations subprogram_declarations compound_statement T_DOT;
identifier_list :(ID) (',' ID)*;
declarations :() (T_VAR identifier_list COLON type T_SEMICOLON)* ;
type : standard_type|
T_ARRAY T_LEFTBRACK NUM T_TO NUM T_RIGHTBRACK T_OF standard_type ;
standard_type : INT
| FLOAT ;
subprogram_declarations :() (subprogram_declaration T_SEMICOLON)* ;
subprogram_declaration : subprogram_head declarations compound_statement;
subprogram_head :T_FUNCTION ID arguments COLON standard_type |
T_PROCEDURE ID arguments ;
arguments :T_LEFTPAR parameter_list T_RIGHTPAR | ;
parameter_list :(identifier_list COLON type) (T_SEMICOLON identifier_list COLON type)*;
compound_statement : T_BEGIN optional_statements T_END;
optional_statements :statement_list | ;
statement_list :(statement) (T_SEMICOLON statement)*;
statement :
variable ASSIGN expression
| procedure_statement
| compound_statement
| T_IF expression T_THEN statement T_ELSE statement
| T_WHILE expression T_DO statement
;
procedure_statement :ID
| ID T_LEFTPAR expression_list T_RIGHTPAR;
expression_list : (expression) (',' expression)*;
variable : ID T_LEFTBRACK expression T_RIGHTBRACK | ;
expression :( () |simple_expression) (( LOWER | LOWER_EQUAL | GREATER | GREATER_EQUAL | EQUAL | NOT_EQUAL ) simple_expression)* ;
simple_expression :
( () | sign ) term (( PLUS | MINUS | T_OR ) term)*;
term :
(factor) (( CROSS | DIVIDE | MOD | T_AND ) factor)*;
factor :
variable
|ID T_LEFTPAR expression_list T_RIGHTPAR
| NUM
| T_LEFTPAR expression T_RIGHTPAR
| T_NOT factor;
sign :
'+'
| '-';
/********/
T_PROGRAM : 'program';
T_FUNCTION : 'function';
T_PROCEDURE : 'procedure';
T_READ : 'read';
T_WRITE : 'write';
T_OF : 'of';
T_ARRAY : 'array';
T_VAR : 'var';
T_FLOAT : 'float';
T_INT : 'int';
T_CHAR : 'char';
T_STRING : 'string';
T_BEGIN : 'begin';
T_END : 'end';
T_IF : 'if';
T_THEN : 'then';
T_ELSE : 'else';
T_WHILE : 'while';
T_DO : 'do';
T_NOT : 'not';
NUM : INT
| FLOAT;
STRING
: '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
;
CHAR: '\'' ( ESC_SEQ | ~('\''|'\\') ) '\''
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
LOWER : '<';
LOWER_EQUAL : '<=';
GREATER : '>';
GREATER_EQUAL : '>=';
EQUAL : '=';
NOT_EQUAL : '<>';
ASSIGN :
':=';
COLON :
':';
PLUS : '+';
MINUS : '-';
T_OR : 'OR';
CROSS : '*';
DIVIDE : '/';
MOD : 'MOD';
T_AND : 'AND';
T_LEFTPAR
: '(';
T_RIGHTPAR
: ')';
T_LEFTBRACK
: '[';
T_RIGHTBRACK
: ']';
T_TO
: '..';
T_DOT : '.';
T_SEMICOLON
: ';';
T_COMMA
: ',';
T_BADNUM
: (NUM)(CHAR)*;
T_BADSTRING
: '"' ( ESC_SEQ | ~('\\'|'"') )*WS;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
fragment
EXPONENT : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;
fragment
ESC_SEQ
: '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
| UNICODE_ESC
| OCTAL_ESC
;
fragment
OCTAL_ESC
: '\\' ('0'..'3') ('0'..'7') ('0'..'7')
| '\\' ('0'..'7') ('0'..'7')
| '\\' ('0'..'7')
;
fragment
UNICODE_ESC
: '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
;
fragment
INT : '0'..'9'+
;
fragment
FLOAT
: ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
| '.' ('0'..'9')+ EXPONENT?
| ('0'..'9')+ EXPONENT
;
Problem 1
The ambiguity (the errors/warnings that certain input can be matched in more than 1 way) originate mainly from the fact that both your variable and expression rules can match empty input:
variable
: ID T_LEFTBRACK expression T_RIGHTBRACK
| /* nothing */
;
expression
: (
(/* nothing */)
| simple_expression
)
(
(LOWER | LOWER_EQUAL | GREATER | GREATER_EQUAL | EQUAL | NOT_EQUAL) simple_expression
)* /* because of the `*`, this can also match nothing */
;
(I added the /* ... */ comments and reformatted the rules to make them more readable)
Fix 1
You probably want to do it like this instead:
variable
: ID T_LEFTBRACK expression T_RIGHTBRACK
;
expression
: simple_expression ((LOWER | LOWER_EQUAL | GREATER | GREATER_EQUAL | EQUAL | NOT_EQUAL) simple_expression)*
;
Problem 2
Another problem (one that will show up once you would have resolved the ambiguities, is that you defined the tokens MOD, T_AND and T_OR after your ID rule:
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
...
MOD : 'MOD';
T_AND : 'AND';
T_OR : 'OR';
This will cause MOD, T_AND and T_OR to be never created since ID matches these characters too.
Fix 2
Place MOD, T_AND and T_OR before your ID rule:
MOD : 'MOD';
T_AND : 'AND';
T_OR : 'OR';
...
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;

Why isn't this expression being parsed correctly?

NOTE: This is a continuation of the topic posted HERE.
I'm working on a parser for the Jass scripting language (here's an excellent API reference for it) so that I may use it as in interpreter for another language. Using ANTLR4 + ANTLRWorks 2, I have run this complex script to test the lexer/parser's strength, and have passed nearly all tests. The part where it fails is on in an 'elseif' statement, containing an expression with:
an outer parenthesis...
an array element...
a boolean/binary operation, AND...
a unary constant integer
...like so:
elseif(si__DroneSystem___data_V[this]!=-1)then (line #53 of the script).
Any changes I've made to the grammar fails to get ANTLR to recognize this input as a proper expression. The following grammar is what I've managed to write, thus far:
grammar Jass;
//----------------------------------------------------------------------
// Global Declarations
//----------------------------------------------------------------------
program : file+
;
file : declaration* function
;
declaration : globals | typedef | native_func
;
typedef : KEYWORD_TYPE identifier KEYWORD_EXTENDS (TYPE_HANDLE | identifier)
;
globals : KEYWORD_GLOBALS global_var_list KEYWORD_ENDGLOBALS
;
global_var_list : var_declaration*
;
native_func : KEYWORD_CONSTANT? KEYWORD_NATIVE func_declaration
;
func_declaration : identifier KEYWORD_TAKES (KEYWORD_NOTHING | parameter_list) KEYWORD_RETURNS (KEYWORD_NOTHING | type)
;
parameter_list : type identifier (',' type identifier)*
;
function : KEYWORD_CONSTANT? KEYWORD_FUNCTION func_declaration local_var_list statement_list KEYWORD_ENDFUNCTION
;
//----------------------------------------------------------------------
// Local Declarations
//----------------------------------------------------------------------
local_var_list : (KEYWORD_LOCAL? var_declaration)*
;
var_declaration : KEYWORD_CONSTANT type identifier '=' expression
| type identifier ('=' expression)? | type TYPE_ARRAY identifier
;
//----------------------------------------------------------------------
// Statements
//----------------------------------------------------------------------
statement_list : statement*
;
statement : set | call | if_statement | loop | exitwhen | return_statement | debug
;
set : KEYWORD_SET identifier '=' expression | KEYWORD_SET identifier OPENBRACKET expression CLOSEBRACKET '=' expression
;
call : KEYWORD_CALL identifier OPENPARENTHESIS args? CLOSEPARENTHESIS
;
args : expression (COMMA expression)*
;
if_statement : KEYWORD_IF expression KEYWORD_THEN statement_list else_clause? KEYWORD_ENDIF
;
else_clause : KEYWORD_ELSEIF ((OPENPARENTHESIS expression CLOSEPARENTHESIS) | expression) KEYWORD_THEN statement_list
| KEYWORD_ELSE ((OPENPARENTHESIS statement_list CLOSEPARENTHESIS) | statement_list) else_clause?
;
loop : KEYWORD_LOOP statement_list KEYWORD_ENDLOOP
;
// must appear in a loop
exitwhen : KEYWORD_EXITWHEN expression
;
return_statement : KEYWORD_RETURN expression?
;
debug : KEYWORD_DEBUG (set | call | if_statement | loop)
;
//----------------------------------------------------------------------
// Expressions
//----------------------------------------------------------------------
expression : parenthesis
| func_call
| array_ref
| (boolean_expression | binary_operation)
| unary_operation
| function_reference
| const_statement
| identifier
;
binary_operation : terminal (('+'|'-'|'*'|'/'|'>'|'<'|'=='|'!='|'>='|'<=') terminal)
;
unary_operation : ('+'|'-'|'not') terminal
;
boolean_expression : ('and'|'not')? terminal (('=='|'!=') terminal) ('and'|'or')?
;
terminal : factor*/(factor)
;
factor : identifier
| const_statement
| parenthesis
| brackets
;
parenthesis : OPENPARENTHESIS expression CLOSEPARENTHESIS
;
brackets : OPENBRACKET expression CLOSEBRACKET
;
// expression must be integer or real when used with unary '+'
func_call : identifier OPENPARENTHESIS args? CLOSEPARENTHESIS
;
array_ref : identifier OPENBRACKET expression CLOSEBRACKET
;
function_reference : KEYWORD_FUNCTION identifier
;
const_statement : INTEGER_CONST | REAL_CONST | BOOL_CONST | STRING_CONST | ASSIGNMENT_TYPE_NULL
;
FOURCC : QUOTATION_SINGLE . . . . QUOTATION_SINGLE
;
INTEGER_CONST : DECIMAL | OCTAL | HEXIDECIMAL | FOURCC
;
DECIMAL : (DIGIT)+ | (DIGIT+) '.' (DIGIT+)?
;
OCTAL : '0'..'7'+
;
HEXIDECIMAL : '$'(DIGIT|'a'..'f'|'A'..'F')+ | '0'('x'|'X')(DIGIT|'a'..'f'|'A'..'F')+
;
REAL_CONST : (DIGIT)+'.'(DIGIT)* | '.'(DIGIT)+
;
BOOL_CONST : ASSIGNMENT_TYPE_TRUE | ASSIGNMENT_TYPE_FALSE
;
// any double-quotes in the string must be escaped with \
STRING_CONST : QUOTATION_DOUBLE .*? QUOTATION_DOUBLE
;
//----------------------------------------------------------------------
// Base
//----------------------------------------------------------------------
type : nativetype | commontype
;
identifier : ID
;
//////////////////////////////////////////////////////////////////////////////////////////////
// TYPES
//////////////////////////////////////////////////////////////////////////////////////////////
TYPE_BOOLEAN : 'boolean'
;
TYPE_CODE : 'code'
;
TYPE_HANDLE : 'handle'
;
TYPE_INTEGER : 'integer'
;
TYPE_REAL : 'real'
;
TYPE_STRING : 'string'
;
TYPE_ARRAY : 'array'
;
nativetype : TYPE_BOOLEAN
| TYPE_CODE
| TYPE_HANDLE
| TYPE_INTEGER
| TYPE_REAL
| TYPE_STRING
| TYPE_ARRAY
;
TYPE_ABILITY : 'ability'
;
TYPE_AGENT : 'agent'
;
TYPE_AIDIFFICULTY : 'aidifficulty'
;
TYPE_ALLIANCETYPE : 'alliancetype'
;
TYPE_ATTACKTYPE : 'attacktype'
;
TYPE_BLENDMODE : 'blendmode'
;
TYPE_BOOLEXPR : 'boolexpr'
;
TYPE_BUFF : 'buff'
;
TYPE_BUTTON : 'button'
;
TYPE_CAMERAFIELD : 'camerafield'
;
TYPE_CAMERASETUP : 'camerasetup'
;
TYPE_CONDITIONFUNC : 'conditionfunc'
;
TYPE_DAMAGETYPE : 'damagetype'
;
TYPE_DEFEATCONDITION : 'defeatcondition'
;
TYPE_DESTRUCTABLE : 'destructable'
;
TYPE_DIALOG : 'dialog'
;
TYPE_DIALOGEVENT : 'dialogevent'
;
TYPE_EFFECT : 'effect'
;
TYPE_EVENTID : 'eventid'
;
TYPE_FGAMESTATE : 'fgamestate'
;
TYPE_FILTERFUNC : 'filterfunc'
;
TYPE_FOGMODIFIER : 'fogmodifier'
;
TYPE_FOGSTATE : 'fogstate'
;
TYPE_FORCE : 'force'
;
TYPE_GAMECACHE : 'gamecache'
;
TYPE_GAMEDIFFICULTY : 'gamedifficulty'
;
TYPE_GAMEEVENT : 'gameevent'
;
TYPE_GAMESPEED : 'gamespeed'
;
TYPE_GAMESTATE : 'gamestate'
;
TYPE_GAMETYPE : 'gametype'
;
TYPE_GROUP : 'group'
;
TYPE_HASHTABLE : 'hashtable'
;
TYPE_IGAMESTATE : 'igamestate'
;
TYPE_IMAGE : 'image'
;
TYPE_ITEM : 'item'
;
TYPE_ITEMPOOL : 'itempool'
;
TYPE_ITEMTYPE : 'itemtype'
;
TYPE_LEADERBOARD : 'leaderboard'
;
TYPE_LIGHTNING : 'lightning'
;
TYPE_LIMITOP : 'limitop'
;
TYPE_LOCATION : 'location'
;
TYPE_MAPCONTROL : 'mapcontrol'
;
TYPE_MAPDENSITY : 'mapdensity'
;
TYPE_MAPFLAG : 'mapflag'
;
TYPE_MAPSETTING : 'mapsettings'
;
TYPE_MAPVISIBILITY : 'mapvisibility'
;
TYPE_MULTIBOARD : 'multiboard'
;
TYPE_MULTIBOARDITEM : 'multiboarditem'
;
TYPE_PATHINGTYPE : 'pathingtype'
;
TYPE_PLACEMENT : 'placement'
;
TYPE_PLAYER : 'player'
;
TYPE_PLAYERCOLOR : 'playercolor'
;
TYPE_PLAYEREVENT : 'playerevent'
;
TYPE_PLAYERGAMERESULT : 'playergameresult'
;
TYPE_PLAYERSCORE : 'playerscore'
;
TYPE_PLAYERSLOTSTATE : 'playerslotstate'
;
TYPE_PLAYERSTATE : 'playerstate'
;
TYPE_PLAYERUNITEVENT : 'playerunitevent'
;
TYPE_QUEST : 'quest'
;
TYPE_QUESTITEM : 'questitem'
;
TYPE_RACE : 'race'
;
TYPE_RACEPREFERENCE : 'racepreference'
;
TYPE_RARITYCONTROL : 'raritycontrol'
;
TYPE_RECT : 'rect'
;
TYPE_REGION : 'region'
;
TYPE_SOUND : 'sound'
;
TYPE_SOUNDTYPE : 'soundtype'
;
TYPE_STARTLOCPRIO : 'startlocprio'
;
TYPE_TERRAINDEFORMATION : 'terraindeformation'
;
TYPE_TEXMAPFLAGS : 'texmapflags'
;
TYPE_TEXTTAG : 'texttag'
;
TYPE_TIMER : 'timer'
;
TYPE_TIMERDIALOG : 'timerdialog'
;
TYPE_TRACKABLE : 'trackable'
;
TYPE_TRIGGER : 'trigger'
;
TYPE_TRIGGERACTION : 'triggeraction'
;
TYPE_TRIGGERCONDITION : 'triggercondition'
;
TYPE_UBERSPLAT : 'ubersplat'
;
TYPE_UNIT : 'unit'
;
TYPE_UNITEVENT : 'unitevent'
;
TYPE_UNITPOOL : 'unitpool'
;
TYPE_UNITSTATE : 'unitstate'
;
TYPE_UNITTYPE : 'unittype'
;
TYPE_VERSION : 'version'
;
TYPE_VOLUMEGROUP : 'volumegroup'
;
TYPE_WEAPONTYPE : 'weapontype'
;
TYPE_WEATHEREFFECT : 'weathereffect'
;
TYPE_WIDGET : 'widget'
;
TYPE_WIDGETEVENT : 'widgetevent'
;
commontype : TYPE_ABILITY
| TYPE_AGENT
| TYPE_AIDIFFICULTY
| TYPE_ALLIANCETYPE
| TYPE_ATTACKTYPE
| TYPE_BLENDMODE
| TYPE_BOOLEXPR
| TYPE_BUFF
| TYPE_BUTTON
| TYPE_CAMERAFIELD
| TYPE_CAMERASETUP
| TYPE_CONDITIONFUNC
| TYPE_DAMAGETYPE
| TYPE_DEFEATCONDITION
| TYPE_DESTRUCTABLE
| TYPE_DIALOG
| TYPE_DIALOGEVENT
| TYPE_EFFECT
| TYPE_EVENTID
| TYPE_FGAMESTATE
| TYPE_FILTERFUNC
| TYPE_FOGMODIFIER
| TYPE_FOGSTATE
| TYPE_FORCE
| TYPE_GAMECACHE
| TYPE_GAMEDIFFICULTY
| TYPE_GAMEEVENT
| TYPE_GAMESPEED
| TYPE_GAMESTATE
| TYPE_GAMETYPE
| TYPE_GROUP
| TYPE_HASHTABLE
| TYPE_IGAMESTATE
| TYPE_IMAGE
| TYPE_ITEM
| TYPE_ITEMPOOL
| TYPE_ITEMTYPE
| TYPE_LEADERBOARD
| TYPE_LIGHTNING
| TYPE_LIMITOP
| TYPE_LOCATION
| TYPE_MAPCONTROL
| TYPE_MAPDENSITY
| TYPE_MAPFLAG
| TYPE_MAPSETTING
| TYPE_MAPVISIBILITY
| TYPE_MULTIBOARD
| TYPE_MULTIBOARDITEM
| TYPE_PATHINGTYPE
| TYPE_PLACEMENT
| TYPE_PLAYER
| TYPE_PLAYERCOLOR
| TYPE_PLAYEREVENT
| TYPE_PLAYERGAMERESULT
| TYPE_PLAYERSCORE
| TYPE_PLAYERSLOTSTATE
| TYPE_PLAYERSTATE
| TYPE_PLAYERUNITEVENT
| TYPE_QUEST
| TYPE_QUESTITEM
| TYPE_RACE
| TYPE_RACEPREFERENCE
| TYPE_RARITYCONTROL
| TYPE_RECT
| TYPE_REGION
| TYPE_SOUND
| TYPE_SOUNDTYPE
| TYPE_STARTLOCPRIO
| TYPE_TERRAINDEFORMATION
| TYPE_TEXMAPFLAGS
| TYPE_TEXTTAG
| TYPE_TIMER
| TYPE_TIMERDIALOG
| TYPE_TRACKABLE
| TYPE_TRIGGER
| TYPE_TRIGGERACTION
| TYPE_TRIGGERCONDITION
| TYPE_UBERSPLAT
| TYPE_UNIT
| TYPE_UNITEVENT
| TYPE_UNITPOOL
| TYPE_UNITSTATE
| TYPE_UNITTYPE
| TYPE_VERSION
| TYPE_VOLUMEGROUP
| TYPE_WEAPONTYPE
| TYPE_WEATHEREFFECT
| TYPE_WIDGET
| TYPE_WIDGETEVENT
;
//////////////////////////////////////////////////////////////////////////////////////////////
ASSIGNMENT_TYPE_NULL : 'null'
;
ASSIGNMENT_TYPE_INTEGER : DIGIT
;
ASSIGNMENT_TYPE_REAL : REAL_CONST
;
ASSIGNMENT_TYPE_TRUE : 'true'
;
ASSIGNMENT_TYPE_FALSE : 'false'
;
KEYWORD_DEBUG : 'debug'
;
KEYWORD_EXTENDS : 'extends'
;
KEYWORD_NATIVE : 'native'
;
KEYWORD_FUNCTION : 'function'
;
KEYWORD_ENDFUNCTION : 'endfunction'
;
KEYWORD_TAKES : 'takes'
;
KEYWORD_NOTHING : 'nothing'
;
KEYWORD_RETURNS : 'returns'
;
KEYWORD_CALL : 'call'
;
KEYWORD_RETURN : 'return'
;
KEYWORD_GLOBALS : 'globals'
;
KEYWORD_ENDGLOBALS : 'endglobals'
;
KEYWORD_LOCAL : 'local'
;
KEYWORD_CONSTANT : 'constant'
;
KEYWORD_SET : 'set'
;
KEYWORD_IF : 'if'
;
KEYWORD_THEN : 'then'
;
KEYWORD_ELSEIF : 'elseif'
;
KEYWORD_ELSE : 'else'
;
KEYWORD_ENDIF : 'endif'
;
KEYWORD_LOOP : 'loop'
;
KEYWORD_EXITWHEN : 'exitwhen'
;
KEYWORD_ENDLOOP : 'endloop'
;
KEYWORD_TYPE : 'type'
;
ID : (LETTER)((LETTER|DIGIT|'_'+)*)?
;
fragment
LETTER : '\u0024' // $
| '\u0041'..'\u005a' // A-Z
| '\u005f' // _
| '\u0061'..'\u007a' // a-z
| '\u00c0'..'\u00d6' // Latin Capital Letter A with grave - Latin Capital letter O with diaeresis
| '\u00d8'..'\u00f6' // Latin Capital letter O with stroke - Latin Small Letter O with diaeresis
| '\u00f8'..'\u00ff' // Latin Small Letter O with stroke - Latin Small Letter Y with diaeresis
| '\u0100'..'\u1fff' // Latin Capital Letter A with macron - Latin Small Letter O with stroke and acute
| '\u3040'..'\u318f' // Hiragana
| '\u3300'..'\u337f' // CJK compatibility
| '\u3400'..'\u3d2d' // CJK compatibility
| '\u4e00'..'\u9fff' // CJK compatibility
| '\uf900'..'\ufaff' // CJK compatibility
;
fragment
DIGIT : '0'..'9'/*'\u0030'..'\u0039' // 0-9
| '\u0660'..'\u0669' // Arabic-Indic Digit 0-9
| '\u06f0'..'\u06f9' // Extended Arabic-Indic Digit 0-9
| '\u0966'..'\u096f' // Devanagari 0-9
| '\u09e6'..'\u09ef' // Bengali 0-9
| '\u0a66'..'\u0a6f' // Gurmukhi 0-9
| '\u0ae6'..'\u0aef' // Gujarati 0-9
| '\u0b66'..'\u0b6f' // Oriya 0-9
| '\u0be7'..'\u0bef' // Tami 0-9
| '\u0c66'..'\u0c6f' // Telugu 0-9
| '\u0ce6'..'\u0cef' // Kannada 0-9
| '\u0d66'..'\u0d6f' // Malayala 0-9
| '\u0e50'..'\u0e59' // Thai 0-9
| '\u0ed0'..'\u0ed9' // Lao 0-9
| '\u1040'..'\u1049' // Myanmar 0-9?*/
;
OPENPARENTHESIS : '('
;
CLOSEPARENTHESIS : ')'
;
OPENBRACKET : '['
;
CLOSEBRACKET : ']'
;
QUOTATION_DOUBLE : '"'
;
QUOTATION_SINGLE : '\''
;
COMMA : ','
;
WS : (' ' | '\t' | '\n'+)+ {skip();}
;
LINE_COMMENT : '//' ~[\r\n]* -> channel(HIDDEN)
;
And from ANTLRWorks...
THIS file is the output log from using TestRig (starting with the first error), and here is an image of the generated parse tree where the first error occurs:
CLICK HERE to enlarge
TO ANYONE who can help me fix this issue: I will gladly upvote your answers, as well as your next 3 questions if you are marked as the answer to this question.
Thanks!
When looking at the BNF rules of an if statement:
ifthenelse
::= 'if' expr 'then' newline statement_list else_clause? 'endif'
else_clause
::= 'else' newline statement_list
| 'elseif' expr 'then' newline statement_list else_clause?
your translation:
if_statement : KEYWORD_IF expression KEYWORD_THEN statement_list else_clause? KEYWORD_ENDIF
;
else_clause : KEYWORD_ELSEIF ((OPENPARENTHESIS expression CLOSEPARENTHESIS) | expression) KEYWORD_THEN statement_list
| KEYWORD_ELSE ((OPENPARENTHESIS statement_list CLOSEPARENTHESIS) | statement_list) else_clause?
;
is incorrect (you have an optional else_clause in the KEYWORD_ELSE alternative).
It should be:
if_statement : KEYWORD_IF expression KEYWORD_THEN statement_list else_clause? KEYWORD_ENDIF
;
else_clause : KEYWORD_ELSE statement_list
| KEYWORD_ELSEIF expression KEYWORD_THEN statement_list else_clause?
;
And not that you don't need ((OPENPARENTHESIS expression CLOSEPARENTHESIS) | expression) since a expression already matches '(' expression ')'.
But the observations above are not the cause of your problem(s). The real issue is that your grammar does not account for unary expressions. It does not match the -1 in the expression si__DroneSystem___data_V[this]!=-1.
Change your expression rule into this:
expression : OPENPARENTHESIS expression CLOSEPARENTHESIS
| OPENBRACKET expression CLOSEBRACKET
| func_call
| array_ref
| function_reference
| const_statement
| identifier
| '+' expression
| '-' expression
| 'not' expression
| expression ('*'|'/') expression
| expression ('+'|'-') expression
| expression ('>'|'<'|'=='|'!='|'>='|'<=') expression
| expression ('and'|'or') expression
| identifier
;
Now input like this:
if this==null then
return
elseif(si__DroneSystem___data_V[this]!=-1)then
return
endif
will be parsed as follows:

How to turn this into a parser

If I just add on to the following yacc file, will it turn into a parser?
/* C-Minus BNF Grammar */
%token ELSE
%token IF
%token INT
%token RETURN
%token VOID
%token WHILE
%token ID
%token NUM
%token LTE
%token GTE
%token EQUAL
%token NOTEQUAL
%%
program : declaration_list ;
declaration_list : declaration_list declaration | declaration ;
declaration : var_declaration | fun_declaration ;
var_declaration : type_specifier ID ';'
| type_specifier ID '[' NUM ']' ';' ;
type_specifier : INT | VOID ;
fun_declaration : type_specifier ID '(' params ')' compound_stmt ;
params : param_list | VOID ;
param_list : param_list ',' param
| param ;
param : type_specifier ID | type_specifier ID '[' ']' ;
compound_stmt : '{' local_declarations statement_list '}' ;
local_declarations : local_declarations var_declaration
| /* empty */ ;
statement_list : statement_list statement
| /* empty */ ;
statement : expression_stmt
| compound_stmt
| selection_stmt
| iteration_stmt
| return_stmt ;
expression_stmt : expression ';'
| ';' ;
selection_stmt : IF '(' expression ')' statement
| IF '(' expression ')' statement ELSE statement ;
iteration_stmt : WHILE '(' expression ')' statement ;
return_stmt : RETURN ';' | RETURN expression ';' ;
expression : var '=' expression | simple_expression ;
var : ID | ID '[' expression ']' ;
simple_expression : additive_expression relop additive_expression
| additive_expression ;
relop : LTE | '<' | '>' | GTE | EQUAL | NOTEQUAL ;
additive_expression : additive_expression addop term | term ;
addop : '+' | '-' ;
term : term mulop factor | factor ;
mulop : '*' | '/' ;
factor : '(' expression ')' | var | call | NUM ;
call : ID '(' args ')' ;
args : arg_list | /* empty */ ;
arg_list : arg_list ',' expression | expression ;
Heh
Its only a grammer of PL
To make it a parser you need to add some code into this.
Like there http://dinosaur.compilertools.net/yacc/index.html
Look at chapter 2. Actions
Also you'd need lexical analyzer -- 3: Lexical Analysis

Resources