Parse tree generation with Java CUP - parsing

I am using CUP with JFlex to validate expression syntax. I have the basic functionality working: I can tell if an expression is valid or not.
Next step is to implement simple arithmetic operations, such as "add 1". For example, if my expression is "1 + a", the result should be "2 + a". I need access to parse tree to do that, because simply identifying a numeric term won't do it: the result of adding 1 to "(1 + a) * b" should be "(1 + a) * b + 1", not "(2 + a) * b".
Does anyone have a CUP example that generates a parse tree? I think I will be able to take it from there.
As an added bonus, is there a way to get a list of all tokens in expression using JFlex? Seems like a typical use case, but I cannot figure out how to do it.
Edit: Found a promising clue on stack overflow:
Create abstract tree problem from parser
Discussion of CUP and AST:
http://pages.cs.wisc.edu/~fischer/cs536.s08/lectures/Lecture16.4up.pdf
Specifically, this paragraph:
The Symbol returned by the parser is associated with the grammar’s start
symbol and contains the AST for the whole source program
This does not help. How to traverse the tree given Symbol instance, if Symbol class does not have any navigation pointers to its children? In other words, it does not look or behave like a tree node:
package java_cup.runtime;
/**
* Defines the Symbol class, which is used to represent all terminals
* and nonterminals while parsing. The lexer should pass CUP Symbols
* and CUP returns a Symbol.
*
* #version last updated: 7/3/96
* #author Frank Flannery
*/
/* ****************************************************************
Class Symbol
what the parser expects to receive from the lexer.
the token is identified as follows:
sym: the symbol type
parse_state: the parse state.
value: is the lexical value of type Object
left : is the left position in the original input file
right: is the right position in the original input file
******************************************************************/
public class Symbol {
/*******************************
Constructor for l,r values
*******************************/
public Symbol(int id, int l, int r, Object o) {
this(id);
left = l;
right = r;
value = o;
}
/*******************************
Constructor for no l,r values
********************************/
public Symbol(int id, Object o) {
this(id, -1, -1, o);
}
/*****************************
Constructor for no value
***************************/
public Symbol(int id, int l, int r) {
this(id, l, r, null);
}
/***********************************
Constructor for no value or l,r
***********************************/
public Symbol(int sym_num) {
this(sym_num, -1);
left = -1;
right = -1;
value = null;
}
/***********************************
Constructor to give a start state
***********************************/
Symbol(int sym_num, int state)
{
sym = sym_num;
parse_state = state;
}
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/
/** The symbol number of the terminal or non terminal being represented */
public int sym;
/*. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .*/
/** The parse state to be recorded on the parse stack with this symbol.
* This field is for the convenience of the parser and shouldn't be
* modified except by the parser.
*/
public int parse_state;
/** This allows us to catch some errors caused by scanners recycling
* symbols. For the use of the parser only. [CSA, 23-Jul-1999] */
boolean used_by_parser = false;
/*******************************
The data passed to parser
*******************************/
public int left, right;
public Object value;
/*****************************
Printing this token out. (Override for pretty-print).
****************************/
public String toString() { return "#"+sym; }
}

Ok, I got it. But unfortunately I cannot publish all my code here as-is. I will try to outline solution anyway, and please ask questions if something is not clear.
JFlex uses its own Symbol class. Look here:
JFlex.jar/java_cup.runtime/Symbol.class
You will see a couple of constructors added:
public Symbol(int id, Symbol left, Symbol right, Object o){
this(id,left.left,right.right,o);
}
public Symbol(int id, Symbol left, Symbol right){
this(id,left.left,right.right);
}
The key here is Object o, which is the value of Symbol.
Define your own class to represent an AST tree node, and another one to represent lexer token. Granted, you can use the same class, but I found it more clear to use different classes to distinguish between the two. Both JFlex and CUP will generate java code, and it is easy to get your tokens and nodes mixed-up.
Then, in your parser.flex, in the lexical rules sections, you want to do something like this for each token:
{float_lit} { return symbol(sym.NUMBER, createToken(yytext(), yycolumn)); }
Do this for all your tokens. Your createToken could be something like this:
%{
private LexerToken createToken(String val, int start) {
LexerToken tk = new LexerToken(val, start);
addToken(tk);
return tk;
}
}%
Now let's move on to parser.cup. Declare all your terminals to be of type LexerToken, and all your non-terminals to be of type Node. You want to read CUP manual, but for quick refresher, a terminal would be anything recognized by the lexer (e.g. numbers, variables, operators), and non-terminal would be parts of your grammar (e.g. expression, factor, term...).
Finally, this all comes together in the grammar definition. Consider the following example:
factor ::= factor:f TIMES:times term:t
{: RESULT = new Node(times.val, f, t, times.start); :}
|
factor:f DIVIDE:div term:t
{: RESULT = new Node(div.val, f, t, div.start); :}
|
term:t
{: RESULT = t; :}
;
Syntax factor:f means you alias the factor's value to be f, and you can refer to it in the following section {: ... :}. Remember, our terminals have values of type LexerToken, and our non-terminals have values that are Nodes.
Your term in expression may have the following definition:
term ::= LPAREN expr:e RPAREN
{: RESULT = new Node(e.val, e.start); :}
|
NUMBER:n
{: RESULT = new Node(n.val, n.start); :}
;
When you successfully generate the parser code, you will see in your parser.java the part where the parent-child relationship between nodes is established:
case 16: // term ::= UFUN LPAREN expr RPAREN
{
Node RESULT =null;
int ufleft = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-3)).left;
int ufright = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-3)).right;
LexerToken uf = (LexerToken)((java_cup.runtime.Symbol) CUP$parser$stack.elementAt(CUP$parser$top-3)).value;
int eleft = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-1)).left;
int eright = ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-1)).right;
Node e = (Node)((java_cup.runtime.Symbol) CUP$parser$stack.elementAt(CUP$parser$top-1)).value;
RESULT = new Node(uf.val, e, null, uf.start);
CUP$parser$result = parser.getSymbolFactory().newSymbol("term",0, ((java_cup.runtime.Symbol)CUP$parser$stack.elementAt(CUP$parser$top-3)), ((java_cup.runtime.Symbol)CUP$parser$stack.peek()), RESULT);
}
return CUP$parser$result;
I am sorry that I cannot publish complete code example, but hopefully this will save someone a few hours of trial and error. Not having complete code is also good because it won't render all those CS homework assignments useless.
As a proof of life, here's a pretty-print of my sample AST.
Input expression:
T21 + 1A / log(max(F1004036, min(a1, a2))) * MIN(1B, 434) -LOG(xyz) - -3.5+10 -.1 + .3 * (1)
Resulting AST:
|--[+]
|--[-]
| |--[+]
| | |--[-]
| | | |--[-]
| | | | |--[+]
| | | | | |--[T21]
| | | | | |--[*]
| | | | | |--[/]
| | | | | | |--[1A]
| | | | | | |--[LOG]
| | | | | | |--[MAX]
| | | | | | |--[F1004036]
| | | | | | |--[MIN]
| | | | | | |--[A1]
| | | | | | |--[A2]
| | | | | |--[MIN]
| | | | | |--[1B]
| | | | | |--[434]
| | | | |--[LOG]
| | | | |--[XYZ]
| | | |--[-]
| | | |--[3.5]
| | |--[10]
| |--[.1]
|--[*]
|--[.3]
|--[1]

Related

How to use Listener method in antlr4 to get the contents of parsers?

As far as I am concerned, the Listener method of antlr4 seems can only directly get the informations of TerminalNodes --- specifically the Lexer Nodes.
However, now I am hoping to put out the information of Parser like this:
type :
primitiveType
| referencedType
| arrayType
| listType
| mapType
| 'void'
;
primitiveType :
'byte'
| 'short'
| 'int'
| 'long'
| 'char'
| 'float'
| 'double'
| 'boolean'
;
referencedType :
'String'
| 'CharSequence'
| selfdefineType
;
First of all, I want to figure out how to diirectly get the contents of primitiveType and put out the contents like byte or short without changing it to Lexer(TerminalNode). I've checked the code of aidlParser.java(aidl.g4 is my initial grammar file(
Second, I want to know that if there is a way to know what actually a parser matches. E.g I want to know which regulation(like primitiveType or referencedType ...) of type is used in matching a type in the grammar without having to visit each sub-node(actually the regulations in Lisenter method) of type and see which one contains something.
Here is the entire code of my .g4 file:
grammar aidl;
//parser
//file
file : packageDeclaration* importDeclaration* parcelableDeclaration? interfaceDeclaration? ;
//packageDeclaration
packageDeclaration :'package' packageName ';';
packageName : Identifier
|
packageName '.' Identifier;
// importDeclaration
importDeclaration
: 'import' importName ';'
;
importName : Identifier
|
importName '.' Identifier;
//parcelableDeclaration
parcelableDeclaration : 'parcelable' parcelableName ';' ;
parcelableName : Identifier ;
//interfaceDeclaration
interfaceDeclaration : interfaceTag? 'interface' interfaceName '{' methodsDeclaration+ '}' ;
interfaceTag : 'oneway' ;
interfaceName : Identifier ;
// methodsDeclaration
methodsDeclaration : methodTag? returnType methodName '(' parameters? ')' ';' ;
methodName : Identifier ;
methodTag: 'oneway';
returnType : type ;
// parameters
parameters
: parameter (',' parameter)*
;
parameter
: parameterTag? parameterType parameterName ;
parameterType : type ;
parameterName : Identifier;
parameterTag : 'in' | 'out' | 'inout' ;
// type
type :
primitiveType
| referencedType
| arrayType
| listType
| mapType
| 'void'
;
primitiveType :
'byte'
| 'short'
| 'int'
| 'long'
| 'char'
| 'float'
| 'double'
| 'boolean'
;
referencedType :
'String'
| 'CharSequence'
| selfdefineType
;
selfdefineType : Identifier;
arrayType : primitiveType dims
| referencedType dims
;
listType : 'List' ('<' (primitiveType | referencedType) (',' (primitiveType | referencedType))* '>')?;
mapType : 'Map' ('<' (primitiveType | referencedType) (',' (primitiveType | referencedType))* '>')?;
dims
: '[' ']' ( '[' ']')*
;
//Lexer
// Identifier
Identifier
: JavaLetter JavaLetterOrDigit*
;
fragment
JavaLetter
: [a-zA-Z$_] // these are the "java letters" below 0x7F
| // covers all characters above 0x7F which are not a surrogate
~[\u0000-\u007F\uD800-\uDBFF]
{Character.isJavaIdentifierStart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
fragment
JavaLetterOrDigit
: [a-zA-Z0-9$_] // these are the "java letters or digits" below 0x7F
| // covers all characters above 0x7F which are not a surrogate
~[\u0000-\u007F\uD800-\uDBFF]
{Character.isJavaIdentifierPart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
WS : [ \t\r\n\u000C]+ -> skip
;
I would sincerely be grateful for your help in time!
Once your parse run is over you will get a parse tree. You can walk that tree down to the nodes you are interested in (usually you use a parse tree listener for that and only override the enter/exit* methods that are relevant for your problem). In your enterPrimitveType method you get an EnterPrimitiveTypeContext parameter. Use its getText method to get the text it matched.
For your second question you would do exactly the same, just use the enterType method instead. The EnterTypeContext parameter has members for each alternative in your rule. Check which one is not null to see which actually matched.

Flex and Yacc Grammar Issue

Edit #1: I think the problem is in my .l file. I don't think the rules are being treated as rules, and I'm not sure how to treat the terminals of the rules as strings.
My last project for a compilers class is to write a .l and a .y file for a simple SQL grammar. I have no experience with Flex or Yacc, so everything I have written I have pieced together. I only have a basic understanding of how these files work, so if you spot my problem can you also explain what that section of the file is supposed to do? I'm not even sure what the '%' symbols do.
Basically some rules just do not work when I try to parse something. Some rules hang and others reject when they should accept. I need to implement the following grammar:
start
::= expression
expression
::= one-relation-expression | two-relation-expression
one-relation-expression
::= renaming | restriction | projection
renaming
::= term RENAME attribute AS attribute
term
::= relation | ( expression )
restriction
::= term WHERE comparison
projection
::= term | term [ attribute-commalist ]
attribute-commalist
::= attribute | attribute , attribute-commalist
two-relation-expression
::= projection binary-operation expression
binary-operation
::= UNION | INTERSECT | MINUS | TIMES | JOIN | DIVIDEBY
comparison
::= attribute compare number
compare
::= < | > | <= | >= | = | <>
number
::= val | val number
val
::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
attribute
::= CNO | CITY | CNAME | SNO | PNO | TQTY |
SNAME | QUOTA | PNAME | COST | AVQTY |
S# | STATUS | P# | COLOR | WEIGHT | QTY
relation
::= S | P | SP | PRDCT | CUST | ORDERS
Here is my .l file:
%{
#include <stdio.h>
#include "p5.tab.h"
%}
binaryOperation UINION|INTERSECT|MINUS|TIMES|JOIN|DIVIDEBY
compare <|>|<=|>=|=|<>
attribute CNO|CITY|CNAME|SNO|PNO|TQTY|SNAME|QUOTA|PNAME|COST|AVQTY|S#|STATUS|P#|COLOR|WEIGHT|QTY
relation S|P|SP|PRDCT|CUST|ORDERS
%%
[ \t\n]+ ;
{binaryOperation} return binaryOperation;
{compare} return compare;
[0-9]+ return val;
{attribute} return attribute;
{relation} return relation;
"RENAME" return RENAME;
"AS" return AS;
"WHERE" return WHERE;
"(" return '(';
")" return ')';
"[" return '[';
"]" return ']';
"," return ',';
. {printf("REJECT\n");
exit(0);}
%%
Here is my .y file:
%{
#include <stdio.h>
#include <stdlib.h>
%}
%token RENAME attribute AS relation WHERE binaryOperation compare val
%%
start:
expression {printf("ACCEPT\n");}
;
expression:
oneRelationExpression
| twoRelationExpression
;
oneRelationExpression:
renaming
| restriction
| projection
;
renaming:
term RENAME attribute AS attribute
;
term:
relation
| '(' expression ')'
;
restriction:
term WHERE comparison
;
projection:
term
| term '[' attributeCommalist ']'
;
attributeCommalist:
attribute
| attribute ',' attributeCommalist
;
twoRelationExpression:
projection binaryOperation expression
;
comparison:
attribute compare number
;
number:
val
| val number
;
%%
yyerror() {
printf("REJECT\n");
exit(0);
}
main() {
yyparse();
}
yywrap() {}
Here is my makefile:
p5: p5.tab.c lex.yy.c
cc -o p5 p5.tab.c lex.yy.c
p5.tab.c: p5.y
bison -d p5.y
lex.yy.c: p5.l
flex p5.l
This works:
S RENAME CNO AS CITY
These do not:
S
S WHERE CNO = 5
I have not tested everything, but I think there is a common problem for these issues.
Your grammar is correct, the problem is that you are running interactively. When you call yyparse() it will attempt to read all input. Because the input
S
could be followed by either RENAME or WHERE it won't accept. Similarly,
S WHERE CNO = 5
could be followed by one or more numbers, so yyparse won't accept until it gets an EOF or an unexpected token.
What you want to do is follow the advice here and change p5.l to have these lines:
[ \t]+ ;
\n if (yyin==stdin) return 0;
That way when you are running interactively it will take the ENTER key to be the end of input.
Also, you want to use left recursion for number:
number:
val
| number val
;

Matching of tokens with Antlr4

I am a an Antlr4 newbie and have problems with a relatively simple grammar. The grammar is given at the bottom at the end. (This is a fragment from a grammar for parsing description of biological sequence variants).
I am trying to parse the string "p.A3L" in the following unit test.
#Test
public void testProteinSubtitutionWithoutRef() {
ANTLRInputStream inputStream = new ANTLRInputStream("p.A3L");
HGVSLexer l = new HGVSLexer(inputStream);
HGVSParser p = new HGVSParser(new CommonTokenStream(l));
p.setTrace(true);
p.addErrorListener(new BaseErrorListener() {
#Override
public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line,
int charPositionInLine, String msg, RecognitionException e) {
throw new IllegalStateException("failed to parse at line " + line + " due to " + msg, e);
}
});
p.hgvs();
}
The test fails with the message "line 1:2 mismatched input 'A3L' expecting AA". I assume that this is related to lexing, i.e. splitting "A3L" into the three tokens A, 3, and L, such that the parser can then generate the corresponding syntax subtree containing the three terminals from it.
What is going wrong here and where can I learn how to fix this?
The grammar
grammar HGVS;
hgvs: protein_var
;
// Basix lexemes
AA: AA1
| AA3
| 'X';
AA1: 'A'
| 'R'
| 'N'
| 'D'
| 'C'
| 'Q'
| 'E'
| 'G'
| 'H'
| 'I'
| 'L'
| 'K'
| 'M'
| 'F'
| 'P'
| 'S'
| 'T'
| 'W'
| 'Y'
| 'V';
AA3: 'Ala'
| 'Arg'
| 'Asn'
| 'Asp'
| 'Cys'
| 'Gln'
| 'Glu'
| 'Gly'
| 'His'
| 'Ile'
| 'Leu'
| 'Lys'
| 'Met'
| 'Phe'
| 'Pro'
| 'Ser'
| 'Thr'
| 'Trp'
| 'Tyr'
| 'Val';
NUMBER: [0-9]+;
NAME: [a-zA-Z0-9_]+;
// Top-level Rule
/** Variant in a protein. */
protein_var: 'p.' AA NUMBER AA
;
There are two problems:
Define the rule for protein_var ahead of the lexer rules (should work now to, but is not easy to read because the other parser rule is ahead).
Remove the rule for NAME. A3L is not (as you probably expected) AA NUMBER AA but NAME <= ANTLR always prefers the longest matching lexer rule
The resulting grammar should look like:
grammar HGVS;
hgvs
: protein_var
;
protein_var
: 'p.' AA NUMBER AA
;
AA: ...;
AA3: ...;
AA1: ...;
NUMBER: [0-9]+;
If you need NAME for other purposes, you will have to disambiguate it in the lexer (by a prefix that NAMEs and AA do not have in common or by using lexer modes).

ANTLR parse assignments

I want to parse some assignments, where I only care about the assignment as a whole. Not about whats inside the assignment. An assignment is indiciated by ':='. (EDIT: Before and after the assignments other things may come)
Some examples:
a := TRUE & FALSE;
c := a ? 3 : 5;
b := case
a : 1;
!a : 0;
esac;
Currently I make a difference between assignments containing a 'case' and other assignments. For simple assignments I tried something like ~('case' | 'esac' | ';') but then antlr complained about unmatched tokens (like '=').
assignment :
NAME ':='! expression ;
expression :
( simple_expression | case_expression) ;
simple_expression :
((OPERATOR | NAME) & ~('case' | 'esac'))+ ';'! ;
case_expression :
'case' .+ 'esac' ';'! ;
I tried replacing with the following, because the eclipse-interpreter did not seem to like the ((OPERATOR | NAME) & ~('case' | 'esac'))+ ';'! ; because of the 'and'.
(~(OPERATOR | ~NAME | ('case' | 'esac')) |
~(~OPERATOR | NAME | ('case' | 'esac')) |
~(~OPERATOR | ~NAME | ('case' | 'esac'))) ';'!
But this does not work. I get
"error(139): /AntlrTutorial/src/foo/NusmvInput.g:78:5: set complement is empty |---> ~(~OPERATOR | ~NAME | ('case' | 'esac'))) EOC! ;"
How can I parse it?
There are a couple of things going wrong here:
you're using & in your grammar while it should be with quotes around it: '&'
unless you know exactly what you're doing, don't use ~ and . (especially not .+ !) inside parser rules: use them in lexer rules only;
create lexer rules instead of defining 'case' and 'esac' in your parser rules (it's safe to use literal tokens in your parser rules if no other lexer rule can potentially match is, but 'case' and 'esac' look a lot like NAME and they could end up in your AST in which case it's better to explicitly define them yourself in the lexer)
Here's a quick demo:
grammar T;
options {
output=AST;
}
tokens {
ROOT;
CASES;
CASE;
}
parse
: (assignment SCOL)* EOF -> ^(ROOT assignment*)
;
assignment
: NAME ASSIGN^ expression
;
expression
: ternary_expression
;
ternary_expression
: or_expression (QMARK^ ternary_expression COL! ternary_expression)?
;
or_expression
: unary_expression ((AND | OR)^ unary_expression)*
;
unary_expression
: NOT^ atom
| atom
;
atom
: TRUE
| FALSE
| NUMBER
| NAME
| CASE single_case+ ESAC -> ^(CASES single_case+)
| '(' expression ')' -> expression
;
single_case
: expression COL expression SCOL -> ^(CASE expression expression)
;
TRUE : 'TRUE';
FALSE : 'FALSE';
CASE : 'case';
ESAC : 'esac';
ASSIGN : ':=';
AND : '&';
OR : '|';
NOT : '!';
QMARK : '?';
COL : ':';
SCOL : ';';
NAME : ('a'..'z' | 'A'..'Z')+;
NUMBER : ('0'..'9')+;
SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};
which will parse your input:
a := TRUE & FALSE;
c := a ? 3 : 5;
b := case
a : 1;
!a : 0;
esac;
as follows:

ANTLR assignment expression disambiguation

The following grammar works, but also gives a warning:
test.g
grammar test;
options {
language = Java;
output = AST;
ASTLabelType = CommonTree;
}
program
: expr ';'!
;
term: ID | INT
;
assign
: term ('='^ expr)?
;
add : assign (('+' | '-')^ assign)*
;
expr: add
;
// T O K E N S
ID : (LETTER | '_') (LETTER | DIGIT | '_')* ;
INT : DIGIT+ ;
WS :
( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
DOT : '.' ;
fragment
LETTER : ('a'..'z'|'A'..'Z') ;
fragment
DIGIT : '0'..'9' ;
Warning
[15:08:20] warning(200): C:\Users\Charles\Desktop\test.g:21:34:
Decision can match input such as "'+'..'-'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
Again, it does produce a tree the way I want:
Input: 0 + a = 1 + b = 2 + 3;
ANTLR produces | ... but I think it
this tree: | gives the warning
| because it _could_
+ | also be parsed this
/ \ | way:
0 = |
/ \ | +
a + | / \
/ \ | + 3
1 = | / \
/ \ | + =
b + | / \ / \
/ \ | 0 = b 2
2 3 | / \
| a 1
How can I explicitly tell ANTLR that I want it to create the AST on the left, thus making my intent clear and silencing the warning?
Charles wrote:
How can I explicitly tell ANTLR that I want it to create the AST on the left, thus making my intent clear and silencing the warning?
You shouldn't create two separate rules for assign and add. As your rules are now, assign has precedence over add, which you don't want: they should have equal precedence by looking at your desired AST. So, you need to wrap all operators +, - and = in one rule:
program
: expr ';'!
;
expr
: term (('+' | '-' | '=')^ expr)*
;
But now the grammar is still ambiguous. You'll need to "help" the parser to look beyond this ambiguity to assure there really is operator expr ahead when parsing (('+' | '-' | '=') expr)*. This can be done using a syntactic predicate, which looks like this:
(look_ahead_rule(s)_in_here)=> rule(s)_to_actually_parse
(the ( ... )=> is the predicate syntax)
A little demo:
grammar test;
options {
output=AST;
ASTLabelType=CommonTree;
}
program
: expr ';'!
;
expr
: term ((op expr)=> op^ expr)*
;
op
: '+'
| '-'
| '='
;
term
: ID
| INT
;
ID : (LETTER | '_') (LETTER | DIGIT | '_')* ;
INT : DIGIT+ ;
WS : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
fragment LETTER : ('a'..'z'|'A'..'Z');
fragment DIGIT : '0'..'9';
which can be tested with the class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "0 + a = 1 + b = 2 + 3;";
testLexer lexer = new testLexer(new ANTLRStringStream(source));
testParser parser = new testParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.program().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
And the output of the Main class corresponds to the following AST:
which is created without any warnings from ANTLR.

Resources