ANTLR position specific Tokens

ANTLR position specific Tokens - token

We want to ANTLR grammar where "0" in first position treated differently compared with other position.
Sample Input
0 Test
World 0
Is there anyway to identify position or context specific Token?
Currently we are writing code to change the Token type.
Grammar
// Define a grammar called Hello
grammar Hello;
r : (r2 | r1)+ ;
r2 : 'World' Num ;
r1 : keyZero 'Test';
keyZero : {_input.LT(1).getCharPositionInLine() == 0}? Num ;
Num : [0-9]+ ;
Ws : [ ] -> skip;
Eol : [\r\n]->skip;

It could be handled in the lexer:
grammar Hello;
r : (r2 | r1)+ EOF;
r2 : 'World' NUM ;
r1 : KEY_ZERO 'Test';
KEY_ZERO : {getCharPositionInLine() == 0}? [0-9]+;
NUM : [0-9]+;
SPACE : [ \r\n] -> skip;

Related

ANTLR Lexer matching the wrong rule

I'm working on a lexer and parser for an old object oriented chat system (MOO in case any readers are familiar with its language). Within this language, any of the below examples are valid floating point numbers:
2.3
3.
.2
3e+5
The language also implements an indexing syntax for extracting one or more characters from a string or list (which is a set of comma separated expressions enclosed in curly braces). The problem arises from the fact that the language supports a range operator inside the index brackets. For example:
a = foo[1..3];
I understand that ANTLR wants to match the longest possible match first. Unfortunately this results in the lexer seeing '1..3' as two floating points numbers (1. and .3), rather than two integers with a range operator ('..') between them. Is there any way to solve this short of using lexer modes? Given that the values inside of an indexing expression can be any valid expression, I would have to duplicate a lot of token rules (essentially all but the floating point numbers as I understand it). Now granted I'm new to ANTLR so I'm sure I'm missing something and any help is much appreciated. I will supply my lexer grammar below:
lexer grammar MooLexer;
channels { COMMENTS_CHANNEL }
SINGLE_LINE_COMMENT
: '//' INPUT_CHARACTER* -> channel(COMMENTS_CHANNEL);
DELIMITED_COMMENT
: '/*' .*? '*/' -> channel(COMMENTS_CHANNEL);
WS
: [ \t\r\n] -> channel(HIDDEN)
;
IF
: I F
;
ELSE
: E L S E
;
ELSEIF
: E L S E I F
;
ENDIF
: E N D I F
;
FOR
: F O R;
ENDFOR
: E N D F O R;
WHILE
: W H I L E
;
ENDWHILE
: E N D W H I L E
;
FORK
: F O R K
;
ENDFORK
: E N D F O R K
;
RETURN
: R E T U R N
;
BREAK
: B R E A K
;
CONTINUE
: C O N T I N U E
;
TRY
: T R Y
;
EXCEPT
: E X C E P T
;
ENDTRY
: E N D T R Y
;
IN
: I N
;
SPLICER
: '#';
UNDERSCORE
: '_';
DOLLAR
: '$';
SEMI
: ';';
COLON
: ':';
DOT
: '.';
COMMA
: ',';
BANG
: '!';
OPEN_QUOTE
: '`';
SINGLE_QUOTE
: '\'';
LEFT_BRACKET
: '[';
RIGHT_BRACKET
: ']';
LEFT_CURLY_BRACE
: '{';
RIGHT_CURLY_BRACE
: '}';
LEFT_PARENTHESIS
: '(';
RIGHT_PARENTHESIS
: ')';
PLUS
: '+';
MINUS
: '-';
STAR
: '*';
DIV
: '/';
PERCENT
: '%';
PIPE
: '|';
CARET
: '^';
ASSIGNMENT
: '=';
QMARK
: '?';
OP_AND
: '&&';
OP_OR
: '||';
OP_EQUALS
: '==';
OP_NOT_EQUAL
: '!=';
OP_LESS_THAN
: '<';
OP_GREATER_THAN
: '>';
OP_LESS_THAN_OR_EQUAL_TO
: '<=';
OP_GREATER_THAN_OR_EQUAL_TO
: '>=';
RANGE
: '..';
ERROR
: 'E_NONE'
| 'E_TYPE'
| 'E_DIV'
| 'E_PERM'
| 'E_PROPNF'
| 'E_VERBNF'
| 'E_VARNF'
| 'E_INVIND'
| 'E_RECMOVE'
| 'E_MAXREC'
| 'E_RANGE'
| 'E_ARGS'
| 'E_NACC'
| 'E_INVARG'
| 'E_QUOTA'
| 'E_FLOAT'
;
OBJECT
: '#' DIGIT+
| '#-' DIGIT+
;
STRING
: '"' ( ESC | [ !] | [#-[] | [\]-~] | [\t] )* '"';
INTEGER
: DIGIT+;
FLOAT
: DIGIT+ [.] (DIGIT*)? (EXPONENTNOTATION EXPONENTSIGN DIGIT+)?
| [.] DIGIT+ (EXPONENTNOTATION EXPONENTSIGN DIGIT+)?
| DIGIT+ EXPONENTNOTATION EXPONENTSIGN DIGIT+
;
IDENTIFIER
: (LETTER | DIGIT | UNDERSCORE)+
;
LETTER
: LOWERCASE
| UPPERCASE
;
/*
* fragments
*/
fragment LOWERCASE
: [a-z] ;
fragment UPPERCASE
: [A-Z] ;
fragment EXPONENTNOTATION
: ('E' | 'e');
fragment EXPONENTSIGN
: ('-' | '+');
fragment DIGIT
: [0-9] ;
fragment ESC
: '\\"' | '\\\\' ;
fragment INPUT_CHARACTER
: ~[\r\n\u0085\u2028\u2029];
fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];

No, AFAIK, there is no way to solve this using lexer modes. You'll need a predicate with a bit of target specific code. If Java is your target, that might look like this:
lexer grammar RangeTestLexer;
FLOAT
: [0-9]+ '.' [0-9]+
| [0-9]+ '.' {_input.LA(1) != '.'}?
| '.' [0-9]+
;
INTEGER
: [0-9]+
;
RANGE
: '..'
;
SPACES
: [ \t\r\n] -> skip
;
If you run the following Java code:
Lexer lexer = new RangeTestLexer(CharStreams.fromString("1 .2 3. 4.5 6..7 8 .. 9"));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
for (Token t : tokens.getTokens()) {
System.out.printf("%-20s `%s`\n", RangeTestLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
you'll get the following output:
INTEGER `1`
FLOAT `.2`
FLOAT `3.`
FLOAT `4.5`
INTEGER `6`
RANGE `..`
INTEGER `7`
INTEGER `8`
RANGE `..`
INTEGER `9`
EOF `<EOF>`
The { ... }? is the predicate and the embedded code must evaluate to a boolean. In my example, the Java code _input.LA(1) != '.' returns true if the character stream 1 step ahead of the current position does not equal a '.' char.

ANTLR4: How to set different context within rule for the same tag?

I have such grammar:
grammar SearchQuery;
queryDeclaration : predicateGroupItem predicateGroupItemWithBooleanOperator* ;
predicateGroupItemWithBooleanOperator : groupOperator predicateGroupItem ;
predicateGroupItem : LEFT_BRACKET variable variableDelimiter
expression expressionWithBoolean* RIGHT_BRACKET ;
variable : VARIABLE_STRING ;
variableDelimiter : VAR_DELIMITER ;
expressionWithBoolean : boolOperator expression ;
expression : value ;
value : polygonType;
boolOperator : or
;
or : OR ;
groupOperator : AND ;
polygonType : POLYGON LEFT_BRACKET pointList (POLYGON_DELIMITER pointList)* RIGHT_BRACKET ;
longType : LONG ;
doubleType : DOUBLE ;
pointList : point
| LEFT_BRACKET point ( POLYGON_DELIMITER point)* RIGHT_BRACKET
;
point : latitude longitude ;
latitude : longType
| doubleType
;
longitude : longType
| doubleType
;
POLYGON : [pP] [oO] [lL] [yY] [gG] [oO] [nN] ;
LONG : DIGIT+ ;
DOUBLE : DIGIT+ '.' DIGIT*
| '.' DIGIT+
;
AND : [aA] [nN] [dD] ;
OR : COMMA
| [oO] [rR]
;
VARIABLE_STRING : [a-zA-Z0-9.]+ ;
COMMA : ',' ;
POLYGON_DELIMITER : ';' ;
VAR_DELIMITER : ':' ;
RIGHT_BRACKET : ')' ;
LEFT_BRACKET : '(' ;
WS : [ \t\r\n]+ -> skip ;
fragment DIGIT : [0-9] ;
Problem is that I can't use COMMA tag with different rules simultaneously in polygonType, pointList rules (I need to use COMMA except for POLYGON_DELIMITER) and boolOperator rule (there is COMMA used)
Other words, if we will change POLYGON_DELIMITER to COMMA and
test such grammar with a value like this
(polygons: polygon(20 30.4, 23.4 23),
polygon(20 30.4, 23.4 23),
polygon(20 30.4, 23.4 23))
we will get an error
mismatch input: ',' expecting {',', ')'}
I will happy if somebody will help me to understand the problem.
P.S. if we will not change current grammar the value for the testing it is
(poligons: polygon(20 30.4; 23.4 23),
polygon(20 30.4; 23.4 23),
polygon(20 30.4; 23.4 23))

Because of these rules:
OR : COMMA
| [oO] [rR]
;
COMMA : ',' ;
the lexer will never produce a COMMA token since it is already matched by the OR token. And because OR is defined before COMMA, it gets precedence.
That is what the error message mismatch input: ',' expecting {',', ')'} really means. In other words: mismatch input: OR expecting {COMMA, RIGHT_BRACKET}
What you should do (if the OR operator can be either "or" or ",") is let the parser rule or match the COMMA:
or : OR
| COMMA
;
OR : [oO] [rR]
;
COMMA : ',' ;

ANTLR parse key value list

I'm new to ANTLR and I am trying to parse something like
ref:something title:(something else) blah ref other
and to obtain a list like
KEY = ref VALUE = something
KEY = title VALUE = something else
KEY = null VALUE = blah
KEY = null VALUE = ref // same ref string as item 1 key
KEY = null VALUE = other
The grammar I have is
searchCriteriaList
locals[List<object> s = new List<object>()]
: t+=criteriaBean (WS t+=criteriaBean)* { $s.addAll($t); }
;
criteriaBean : (KEY ':' WS* expression)
| expression ;
expression : '(' WORD (WS WORD)* ')'
| WORD ;
/*
* Lexer Rules
*/
fragment A : ('A'|'a') ;
fragment B : ('B'|'b') ;
fragment C : ('C'|'c') ;
fragment D : ('D'|'d') ;
fragment E : ('E'|'e') ;
fragment F : ('F'|'f') ;
fragment G : ('G'|'g') ;
fragment H : ('H'|'h') ;
fragment I : ('I'|'i') ;
fragment J : ('J'|'j') ;
fragment K : ('K'|'k') ;
fragment L : ('L'|'l') ;
fragment M : ('M'|'m') ;
fragment N : ('N'|'n') ;
fragment O : ('O'|'o') ;
fragment P : ('P'|'p') ;
fragment Q : ('Q'|'q') ;
fragment R : ('R'|'r') ;
fragment S : ('S'|'s') ;
fragment T : ('T'|'t') ;
fragment U : ('U'|'u') ;
fragment V : ('V'|'v') ;
fragment W : ('W'|'w') ;
fragment X : ('X'|'x') ;
fragment Y : ('Y'|'y') ;
fragment Z : ('Z'|'z') ;
fragment LOWERCASE : [a-z] ;
fragment UPPERCASE : [A-Z] ;
TITLE : T I T L E ;
MESSAGE : M E S S A G E ;
REF : R E F ;
KEY : TITLE | MESSAGE | REF ;
WORD : (LOWERCASE | UPPERCASE | '_')+ ;
WS : [ \t\u000C\r\n] ;
When I try parsing the string I get 2 exceptions and in the addAll method I end up with 3 elements rather than 5.
Can someone point me into the right direction? What I am doing wrong?
Thanks,
S
PS: The exception I am getting is:
Exception of type 'Antlr4.Runtime.InputMismatchException' was thrown.
InputStream: {ref:something title:(something else) blah ref other }
OffendingToken: {[#0,0:2='ref',<5>,1:0]}

The lexer tries to match as much characters as possible when constructing tokens. When 2 or more lexer rules match the same characters, the rule defined first "wins". With this in mind, the KEY token will never be created since TITLE, MESSAGE and REF are defined above it:
TITLE : T I T L E ;
MESSAGE : M E S S A G E ;
REF : R E F ;
KEY : TITLE | MESSAGE | REF ;
WORD : (LOWERCASE | UPPERCASE | '_')+ ;
So the input ref will always become a REF token, never a KEY or a WORD. What you need to do is create a parser rule from KEY instead.
Also, since you want a WORD to also match your keywords, you should not do this:
expression
: '(' WORD (WS WORD)* ')'
| WORD
;
but something like this instead:
expression
: '(' word (WS word)* ')'
| word
;
word
: key
| WORD
;
key
: TITLE
| MESSAGE
| REF
;
Oh, and this:
fragment Z : ('Z'|'z') ;
can be rewritten as:
fragment Z : [Zz] ;
And is there a particular reason you're littering your parser rules with WS tokens? You could just remove them during tokenisation:
WS : [ \t\u000C\r\n] -> skip;

In antlr4.7 how to parse a rule like ISO 8601 interval "P3M2D" ahead of an "ID" rule

I am trying to parse ISO 8601 period expressions like "P3M2D", using antlr4. But I am hitting some kind of roadblock and will appreciate help. I am rather new to both antlr and compilers.
My grammar is as below. I have combined the lexer and parser rules in one go here:
grammar test_iso ;
// import testLexerRules ;
iso : ( date_expr NEWLINE)* EOF;
date_expr
: date_expr op=( '+' | '-' ) iso8601_interval #dateexpr_Interval
| date_expr op='-' date_expr #dateexpr_Diff
| DATETIME_NAME #dateexpr_Named
| '(' inner=date_expr ')' #dateexpr_Paren
;
///////////////////////////////////////////
iso8601_interval
: iso8601_interval_d
{ System.out.println("ISO8601_INTERVAL DATE seen " + $text);}
;
iso8601_interval_d
: 'P' ( y=NUMBER_INT 'Y' )? ( m=NUMBER_INT 'M' )? ( w=NUMBER_INT 'W' )? ( d=NUMBER_INT 'D' )?
;
///////////////////////////////////////////
// in separate file : test_lexer.g4
// lexer grammar testLexerRules ;
///////////////////////////////////////////
fragment
TODAY
: 'today' | 'TODAY'
;
fragment
NOW
: 'now' | 'NOW'
;
DATETIME_NAME
: TODAY
| NOW
;
///////////////////////////////////////////
NUMBER_INT
: '-'? INT // -3, 45
;
fragment
DIGIT : [0-9] ;
fragment
INT : '0' | [1-9] DIGIT* ;
//////////////////////////////////////////////
//
// identifiers
//
ID
: ALPHA ALPH_NUM*
{ System.out.println("ID seen " + getText()); }
;
ID_SQLFUNC
: 'h$' ALPHA_UPPER ALPHA_UPPER_NUM*
{ System.out.println("SQL FUNC seen " + getText()); }
;
fragment
ALPHA : [a-zA-Z] ;
fragment
ALPH_NUM : [a-zA-Z_0-9] ;
fragment
ALPHA_UPPER : [A-Z] ;
fragment
ALPHA_UPPER_NUM : [A-Z_0-9] ;
//////////////////////////////////////////////
NEWLINE : '\r\n' ;
WS : [ \t]+ -> skip ;
In test run, it never hits the iso8601_interval_d rule, it always goes to ID rule.
C:\lab>java org.antlr.v4.gui.TestRig test_iso iso -tokens -tree
now + P3M2D
^Z
ID seen P3M2D
[#0,0:2='now',<DATETIME_NAME>,1:0]
[#1,4:4='+',<'+'>,1:4]
[#2,6:10='P3M2D',<ID>,1:6]
[#3,11:12='\r\n',<'
'>,1:11]
[#4,13:12='<EOF>',<EOF>,2:0]
line 1:6 mismatched input 'P3M2D' expecting 'P'
ISO8601_INTERVAL DATE seen P3M2D
(iso (date_expr (date_expr now) + (iso8601_interval (iso8601_interval_d P3M2D))) \r\n <EOF>)
If I remove the "ID" rule and run again, it parses as desired:
now + P3M2D
^Z
[#0,0:2='now',<DATETIME_NAME>,1:0]
[#1,4:4='+',<'+'>,1:4]
[#2,6:6='P',<'P'>,1:6]
[#3,7:7='3',<NUMBER_INT>,1:7]
[#4,8:8='M',<'M'>,1:8]
[#5,9:9='2',<NUMBER_INT>,1:9]
[#6,10:10='D',<'D'>,1:10]
[#7,11:12='\r\n',<'
'>,1:11]
[#8,13:12='<EOF>',<EOF>,2:0]
ISO8601_INTERVAL DATE seen P3M2D
(iso (date_expr (date_expr now) + (iso8601_interval (iso8601_interval_d P 3 M 2 D))) \r\n <EOF>)
I also tried prefixing a special character like "#" in the parser rule
iso8601_interval_d
: '#P' ( y=NUMBER_INT 'Y' )? ( m=NUMBER_INT 'M' )? ( w=NUMBER_INT 'W' )? ( d=NUMBER_INT 'D' )?
;
but now a different kind of failure
now + #P3M2D
^Z
ID seen M2D
[#0,0:2='now',<DATETIME_NAME>,1:0]
[#1,4:4='+',<'+'>,1:4]
[#2,6:7='#P',<'#P'>,1:6]
[#3,8:8='3',<NUMBER_INT>,1:8]
[#4,9:11='M2D',<ID>,1:9]
[#5,12:13='\r\n',<'
'>,1:12]
[#6,14:13='<EOF>',<EOF>,2:0]
line 1:9 no viable alternative at input '3M2D'
ISO8601_INTERVAL DATE seen #P3M2D
(iso (date_expr (date_expr now) + (iso8601_interval (iso8601_interval_d #P 3 M2D))) \r\n <EOF>)
I am sure I am not the first one to hit upon something like this. What is the antlr idiom here?
EDIT -- I need the ID token elsewhere in other parts of my grammar that I have omitted here, to highlight the problem I am facing.

Like find out even by other, the issue is in the ID token. The fact is that the duration syntax for iso-8601 is a valid ID. Besides the solution figured out by #Mike. If something called island grammar is suitable for your needs you can use ANTLR's lexical modes to exclude the ID lexer rule while parsing an iso date.
Belove there is an examples on how it could work
parser grammar iso;
options { tokenVocab=iso_lexer; }
iso : ISO_BEGIN ( date_expr NEWLINE)* ISO_END;
date_expr
: date_expr op=( '+' | '-' ) iso8601_interval #dateexpr_Interval
| date_expr op='-' date_expr #dateexpr_Diff
| DATETIME_NAME #dateexpr_Named
| '(' inner=date_expr ')' #dateexpr_Paren
;
///////////////////////////////////////////
iso8601_interval
: iso8601_interval_d
{ System.out.println("ISO8601_INTERVAL DATE seen " + $text);}
;
iso8601_interval_d
: 'P' ( y=NUMBER_INT 'Y' )? ( m=NUMBER_INT 'M' )? ( w=NUMBER_INT 'W' )? ( d=NUMBER_INT 'D' )?
;
then in the lexer
lexer grammar iso_lexer;
//
// identifiers (in DEFAULT_MODE)
//
ISO_BEGIN
: '<#' -> mode(ISO)
;
ID
: ALPHA ALPH_NUM*
{ System.out.println("ID seen " + getText()); }
;
ID_SQLFUNC
: 'h$' ALPHA_UPPER ALPHA_UPPER_NUM*
{ System.out.println("SQL FUNC seen " + getText()); }
;
WS0 : [ \t]+ -> skip ;
// all the following token are scanned only when iso mode is active
mode ISO;
ISO_END
: '#>' -> mode(DEFAULT_MODE)
;
WS0 : [ \t]+ -> skip ;
NEWLINE : '\r'? '\n' ;
ADD : '+' ;
SUB : '-' ;
LPAREN : '(' ;
RPAREN : ')' ;
P : 'P' ;
Y : 'Y' ;
M : 'M' ;
W : 'W' ;
D : 'D' ;
DATETIME_NAME
: TODAY
| NOW
;
fragment TODAY: 'today' | 'TODAY' ;
fragment NOW : 'now' | 'NOW' ;
///////////////////////////////////////////
NUMBER_INT
: '-'? INT // -3, 45
;
fragment DIGIT : [0-9] ;
fragment INT : '0' | [1-9] DIGIT* ;
//////////////////////////////////////////////
fragment ALPHA : [a-zA-Z] ;
fragment ALPH_NUM : [a-zA-Z_0-9] ;
fragment ALPHA_UPPER : [A-Z] ;
fragment ALPHA_UPPER_NUM : [A-Z_0-9] ;
Such grammar can parse expressions like
Pluton Planet <% now + P10Y
%>
I changed a bit the parser rule iso to demonstrate ID and period mixing.
Hope this helps

It's not possible what you wanna do. ID matches the same input as iso8601_interval. In such cases ANTLR4 picks the longest match, which is ID as it can match an unlimited number of characters.
The only way you could possible make it work in the grammar is to exclude P as a possible ID introducer, which then can exclusively be used for the duration.
Another option is a post processing step. Parse durations like any other identifier and in your semantic phase check all those ids that look like a duration. This is probably the best solution.

Alternatives in ANTLR rewrite rule

I'm writing a grammar that supports arbitrary boolean expressions. The grammar is used to represent a program, which is later passed through the static analysis tool. The static analysis tool has certain limitations so I want to apply the following rewrite rules:
Strict inequalities are approximated with epsilon:
expression_a > expression_b -> expression_a >= expression_b + EPSILON
Inequality is approximated using "or" statement:
expression_a != expression_b -> expression_a > expression_b || expression_a < expression_b
Is there any easy way to do it using ANTLR? Currently my grammar looks like so:
comparison : expression ('=='^|'<='^|'>='^|'!='^|'>'^|'<'^) expression;
I'm not sure how to apply a different rewrite rule depending on what the operator is. I want to tree stay as it is if the operator is ("==", "<=" or ">=") and to recursively transform it otherwise, according to the rules defined above.

[...] and to recursively transform it otherwise, [...]
You can do it partly.
You can't tell ANTLR to rewrite a > b to ^('>=' a ^('+' b epsilon)) and then define a != b to become ^('||' ^('>' a b) ^('<' a b)) and then have ANTLR automatically rewrite both ^('>' a b) and ^('<' a b) to ^('>=' a ^('+' b epsilon)) and ^('<=' a ^('-' b epsilon)) respectively.
A bit of manual work is needed here. The trick is that you can't just use a token like >= if this token isn't actually parsed. A solution to this is to use imaginary tokens.
A quick demo:
grammar T;
options {
output=AST;
}
tokens {
AND;
OR;
GTEQ;
LTEQ;
SUB;
ADD;
EPSILON;
}
parse
: expr
;
expr
: logical_expr
;
logical_expr
: comp_expr ((And | Or)^ comp_expr)*
;
comp_expr
: (e1=mult_expr -> $e1) ( Eq e2=mult_expr -> ^(AND ^(GTEQ $e1 $e2) ^(LTEQ $e1 $e2))
| LtEq e2=mult_expr -> ^(LTEQ $e1 $e2)
| GtEq e2=mult_expr -> ^(GTEQ $e1 $e2)
| NEq e2=mult_expr -> ^(OR ^(GTEQ $e1 ^(ADD $e2 EPSILON)) ^(LTEQ $e1 ^(SUB $e2 EPSILON)))
| Gt e2=mult_expr -> ^(GTEQ $e1 ^(ADD $e2 EPSILON))
| Lt e2=mult_expr -> ^(LTEQ $e1 ^(SUB $e2 EPSILON))
)?
;
add_expr
: mult_expr ((Add | Sub)^ mult_expr)*
;
mult_expr
: atom ((Mult | Div)^ atom)*
;
atom
: Num
| Id
| '(' expr ')'
;
Eq : '==';
LtEq : '<=';
GtEq : '>=';
NEq : '!=';
Gt : '>';
Lt : '<';
Or : '||';
And : '&&';
Mult : '*';
Div : '/';
Add : '+';
Sub : '-';
Num : '0'..'9'+ ('.' '0'..'9'+)?;
Id : ('a'..'z' | 'A'..'Z')+;
Space : ' ' {skip();};
The parser generated from the grammar above will produce the following:
a == b
a != b
a > b
a < b

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

ANTLR position specific Tokens - token

It could be handled in the lexer: grammar Hello; r : (r2 | r1)+ EOF; r2 : 'World' NUM ; r1 : KEY_ZERO 'Test'; KEY_ZERO : {getCharPositionInLine() == 0}? [0-9]+; NUM : [0-9]+; SPACE : [ \r\n] -> skip;

Related

ANTLR Lexer matching the wrong rule

ANTLR4: How to set different context within rule for the same tag?

ANTLR parse key value list

In antlr4.7 how to parse a rule like ISO 8601 interval "P3M2D" ahead of an "ID" rule

Alternatives in ANTLR rewrite rule

Categories

Resources