ANTLR4 - Parser's testcases are not working - parsing

I try to write some parser rules for my assignment. In the assignment, we need to do the part "variable declaration" with some types. I had tried successfully with all types. For example:
int i;
or:
boolean bc;
But it does not work with the Array type. For example:
int a[5];
Here is the code I wrote:
vardecl: pritype id (COMMA id)* SEMI ;
pritype: INTTYPE | BOOLEANTYPE | FLOATTYPE | STRINGTYPE ;
id: ID | ID LSB INTLIT RSB ;
INTTYPE: 'int' ;
BOOLEANTYPE: 'boolean' ;
FLOATTYPE: 'float' ;
STRINGTYPE: 'string' ;
ID: [_a-zA-Z] [_a-zA-Z0-9]* ;
INTLIT: [0-9]+ -> type(INTTYPE) ;
LSB: '[' ;
RSB: ']' ;
COMMA: ',' ;
SEMI: ';' ;
Can you show me what I did wrong in order for the Array type to work. Thanks in advance!!

Your solution was very close, but channeling INTLIT to another channel was causing problems. I simplified it a bit and added handling for whitespace. But I added an array rule that will allow you to more easily handle this construct in your visitor or listener:
grammar Vardecl;
vardecl: pritype id (COMMA id)* SEMI ;
pritype: INTTYPE | BOOLEANTYPE | FLOATTYPE | STRINGTYPE ;
id: ID | ID array ;
array : LSB INTLIT RSB;
INTTYPE: 'int' ;
BOOLEANTYPE: 'boolean' ;
FLOATTYPE: 'float' ;
STRINGTYPE: 'string' ;
ID: [_a-zA-Z] [_a-zA-Z0-9]* ;
INTLIT: [0-9]+ ;
LSB: '[' ;
RSB: ']' ;
COMMA: ',' ;
SEMI: ';' ;
WS: [ \t\r\n] -> skip;
With this input:
int i[5], a[10];
You get this lexer tokenization:
[#0,0:2='int',<'int'>,1:0]
[#1,4:4='i',<ID>,1:4]
[#2,5:5='[',<'['>,1:5]
[#3,6:6='5',<INTLIT>,1:6]
[#4,7:7=']',<']'>,1:7]
[#5,8:8=',',<','>,1:8]
[#6,10:10='a',<ID>,1:10]
[#7,11:11='[',<'['>,1:11]
[#8,12:13='10',<INTLIT>,1:12]
[#9,14:14=']',<']'>,1:14]
[#10,15:15=';',<';'>,1:15]
[#11,16:15='<EOF>',<EOF>,1:16]
And this parse tree:
So I think you're good to go now.

Related

How to use Listener method in antlr4 to get the contents of parsers?

As far as I am concerned, the Listener method of antlr4 seems can only directly get the informations of TerminalNodes --- specifically the Lexer Nodes.
However, now I am hoping to put out the information of Parser like this:
type :
primitiveType
| referencedType
| arrayType
| listType
| mapType
| 'void'
;
primitiveType :
'byte'
| 'short'
| 'int'
| 'long'
| 'char'
| 'float'
| 'double'
| 'boolean'
;
referencedType :
'String'
| 'CharSequence'
| selfdefineType
;
First of all, I want to figure out how to diirectly get the contents of primitiveType and put out the contents like byte or short without changing it to Lexer(TerminalNode). I've checked the code of aidlParser.java(aidl.g4 is my initial grammar file(
Second, I want to know that if there is a way to know what actually a parser matches. E.g I want to know which regulation(like primitiveType or referencedType ...) of type is used in matching a type in the grammar without having to visit each sub-node(actually the regulations in Lisenter method) of type and see which one contains something.
Here is the entire code of my .g4 file:
grammar aidl;
//parser
//file
file : packageDeclaration* importDeclaration* parcelableDeclaration? interfaceDeclaration? ;
//packageDeclaration
packageDeclaration :'package' packageName ';';
packageName : Identifier
|
packageName '.' Identifier;
// importDeclaration
importDeclaration
: 'import' importName ';'
;
importName : Identifier
|
importName '.' Identifier;
//parcelableDeclaration
parcelableDeclaration : 'parcelable' parcelableName ';' ;
parcelableName : Identifier ;
//interfaceDeclaration
interfaceDeclaration : interfaceTag? 'interface' interfaceName '{' methodsDeclaration+ '}' ;
interfaceTag : 'oneway' ;
interfaceName : Identifier ;
// methodsDeclaration
methodsDeclaration : methodTag? returnType methodName '(' parameters? ')' ';' ;
methodName : Identifier ;
methodTag: 'oneway';
returnType : type ;
// parameters
parameters
: parameter (',' parameter)*
;
parameter
: parameterTag? parameterType parameterName ;
parameterType : type ;
parameterName : Identifier;
parameterTag : 'in' | 'out' | 'inout' ;
// type
type :
primitiveType
| referencedType
| arrayType
| listType
| mapType
| 'void'
;
primitiveType :
'byte'
| 'short'
| 'int'
| 'long'
| 'char'
| 'float'
| 'double'
| 'boolean'
;
referencedType :
'String'
| 'CharSequence'
| selfdefineType
;
selfdefineType : Identifier;
arrayType : primitiveType dims
| referencedType dims
;
listType : 'List' ('<' (primitiveType | referencedType) (',' (primitiveType | referencedType))* '>')?;
mapType : 'Map' ('<' (primitiveType | referencedType) (',' (primitiveType | referencedType))* '>')?;
dims
: '[' ']' ( '[' ']')*
;
//Lexer
// Identifier
Identifier
: JavaLetter JavaLetterOrDigit*
;
fragment
JavaLetter
: [a-zA-Z$_] // these are the "java letters" below 0x7F
| // covers all characters above 0x7F which are not a surrogate
~[\u0000-\u007F\uD800-\uDBFF]
{Character.isJavaIdentifierStart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
fragment
JavaLetterOrDigit
: [a-zA-Z0-9$_] // these are the "java letters or digits" below 0x7F
| // covers all characters above 0x7F which are not a surrogate
~[\u0000-\u007F\uD800-\uDBFF]
{Character.isJavaIdentifierPart(_input.LA(-1))}?
| // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
[\uD800-\uDBFF] [\uDC00-\uDFFF]
{Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2), (char)_input.LA(-1)))}?
;
WS : [ \t\r\n\u000C]+ -> skip
;
I would sincerely be grateful for your help in time!
Once your parse run is over you will get a parse tree. You can walk that tree down to the nodes you are interested in (usually you use a parse tree listener for that and only override the enter/exit* methods that are relevant for your problem). In your enterPrimitveType method you get an EnterPrimitiveTypeContext parameter. Use its getText method to get the text it matched.
For your second question you would do exactly the same, just use the enterType method instead. The EnterTypeContext parameter has members for each alternative in your rule. Check which one is not null to see which actually matched.

Match any printable letter-like characters in ANTLR4 with Go as target

This is freaking me out, I just can't find a solution to it. I have a grammar for search queries and would like to match any searchterm in a query composed out of printable letters except for special characters "(", ")". Strings enclosed in quotes are handled separately and work.
Here is a somewhat working grammar:
/* ANTLR Grammar for Minidb Query Language */
grammar Mdb;
start
: searchclause EOF
;
searchclause
: table expr
;
expr
: fieldsearch
| searchop fieldsearch
| unop expr
| expr relop expr
| lparen expr relop expr rparen
;
lparen
: '('
;
rparen
: ')'
;
unop
: NOT
;
relop
: AND
| OR
;
searchop
: NO
| EVERY
;
fieldsearch
: field EQ searchterm
;
field
: ID
;
table
: ID
;
searchterm
:
| STRING
| ID+
| DIGIT+
| DIGIT+ ID+
;
STRING
: '"' ~('\n'|'"')* ('"' )
;
AND
: 'and'
;
OR
: 'or'
;
NOT
: 'not'
;
NO
: 'no'
;
EVERY
: 'every'
;
EQ
: '='
;
fragment VALID_ID_START
: ('a' .. 'z') | ('A' .. 'Z') | '_'
;
fragment VALID_ID_CHAR
: VALID_ID_START | ('0' .. '9')
;
ID
: VALID_ID_START VALID_ID_CHAR*
;
DIGIT
: ('0' .. '9')
;
/*
NOT_SPECIAL
: ~(' ' | '\t' | '\n' | '\r' | '\'' | '"' | ';' | '.' | '=' | '(' | ')' )
; */
WS
: [ \r\n\t] + -> skip
;
The problem is that searchterm is too restricted. It should match any character that is in the commented out NOT_SPECIAL, i.e., valid queries would be:
Person Name=%
Person Address=^%Street%%%$^&*#^
But whenever I try to put NOT_SPECIAL in any way into the definition of searchterm it doesn't work. I have tried putting it literally into the rule, too (commenting out NOT_SPECIAL) and many others things, but it just doesn't work. In most of my attempts the grammar just complained about extraneous input after "=" and said it was expecting EOF. But I also cannot put EOF into NOT_SPECIAL.
Is there any way I can simply parse every text after "=" in rule fieldsearch until there is a whitespace or ")", "("?
N.B. The STRING rule works fine, but the user ought not be required to use quotes every time, because this is a command line tool and they'd need to be escaped.
Target language is Go.
You could solve that by introducing a lexical mode that you'll enter whenever you match an EQ token. Once in that lexical mode, you either match a (, ) or a whitespace (in which case you pop out of the lexical mode), or you keep matching your NOT_SPECIAL chars.
By using lexical modes, you must define your lexer- and parser rules in their own files. Be sure to use lexer grammar ... and parser grammar ... instead of the grammar ... you use in a combined .g4 file.
A quick demo:
lexer grammar MdbLexer;
STRING
: '"' ~[\r\n"]* '"'
;
OPAR
: '('
;
CPAR
: ')'
;
AND
: 'and'
;
OR
: 'or'
;
NOT
: 'not'
;
NO
: 'no'
;
EVERY
: 'every'
;
EQ
: '=' -> pushMode(NOT_SPECIAL_MODE)
;
ID
: VALID_ID_START VALID_ID_CHAR*
;
DIGIT
: [0-9]
;
WS
: [ \r\n\t]+ -> skip
;
fragment VALID_ID_START
: [a-zA-Z_]
;
fragment VALID_ID_CHAR
: [a-zA-Z_0-9]
;
mode NOT_SPECIAL_MODE;
OPAR2
: '(' -> type(OPAR), popMode
;
CPAR2
: ')' -> type(CPAR), popMode
;
WS2
: [ \t\r\n] -> skip, popMode
;
NOT_SPECIAL
: ~[ \t\r\n()]+
;
Your parser grammar would start like this:
parser grammar MdbParser;
options {
tokenVocab=MdbLexer;
}
start
: searchclause EOF
;
// your other parser rules
My Go is a bit rusty, but a small Java test:
String source = "Person Address=^%Street%%%$^&*#^()";
MdbLexer lexer = new MdbLexer(CharStreams.fromString(source));
CommonTokenStream tokens = new CommonTokenStream(lexer);
tokens.fill();
for (Token t : tokens.getTokens()) {
System.out.printf("%-15s %s\n", MdbLexer.VOCABULARY.getSymbolicName(t.getType()), t.getText());
}
print the following:
ID Person
ID Address
EQ =
NOT_SPECIAL ^%Street%%%$^&*#^
OPAR (
CPAR )
EOF <EOF>

ANTLR4 Parser confusion

im trying to user antlr4 for a project and I got an error im not sure I know how to fix. It seems antlr4 is confuse with two parser rules.
Here is my lexer/parser :
grammar PARSER;
#header {package VSOP.Parser;}
program : code+ ; //statement+ ;
code : classHeader | methodHeader | ;
statement : assign | ifStatement | whileStatement;
classHeader : 'class' TYPE_IDENTIFIER ('extends' TYPE_IDENTIFIER)? '{' classBody '}';
classBody : methodHeader* | field*;
methodHeader : OBJECT_IDENTIFIER '(' (((formal ',')+ (formal)) | (formal)?) ')' ':' varType '{' methodBody '}' ;
methodBody : statement* ;
formal : OBJECT_IDENTIFIER ':' varType ;
field : OBJECT_IDENTIFIER ':' varType ('<-' varValue)? ';' ;
assign : OBJECT_IDENTIFIER ':' varType ('<-' varValue)? ;
whileStatement : 'while' condition* 'do' statement* ;
ifStatement : ifStat elseStat? ; //ifStat elseIfStat* elseStat? ;
ifStat : 'if' condition 'then' statement* ;
//elseIfStat : 'else if' condition 'then' '{' statement* '}' ;
elseStat : 'else' statement* ;
condition : comparaiser CONDITIONAL_OPERATOR comparaiser ;
comparaiser : OBJECT_IDENTIFIER | integer | STRING ;
integer : INTEGER_HEX | INTEGER_DEC | INTEGER_BIN ;
varType : 'bool' | 'int32' | 'string' | 'unit' | TYPE_IDENTIFIER ;
varValue : ('true' | 'false' | STRING | integer) ;
// KEYWORD : 'and' | 'class' | 'do' | 'else' | 'extends' | 'false' | 'if' | 'in' | 'isnull' | 'let' | 'new' | 'not' | 'then' | 'true' | 'unit' | 'while' ;
ARITHMETIC_OPERATOR : '+' | '-' | '*' | '/' | '^' ;
CONDITIONAL_OPERATOR : '=' | '<' | '<=';
MULTILINE_OPEN_COMMENT : '(*' ;
MULTILINE_CLOSE_COMMENT : '*)' ;
MULTILINE_COMMENT : '(*' .*? '*)' ;
INTEGER_BIN : '0'[bB][0-9a-zA-Z]* ;
INTEGER_HEX : '0'[xX][0-9a-zA-Z]* ;
INTEGER_DEC : [0-9][0-9a-zA-Z]* ;
OBJECT_IDENTIFIER : [a-z][a-zA-Z0-9_]* ;
TYPE_IDENTIFIER : [A-Z][a-zA-Z0-9_]* ;
STRING : '"' ( '\\"' | . )*? ('"' | EOF) ;
SINGLE_LINE_COMMENT : '//'~[\r\n]* ;
WS : [ \r\n\t]+ -> skip;
Using the code below, i get the errors
line 5:15 mismatched input '(' expecting ':'
line 5:31 mismatched input ',' expecting {'<-', ';'}
line 5:50 mismatched input ',' expecting {'<-', ';'}
line 5:69 mismatched input ')' expecting {'<-', ';'}
The problem is antlr4 confuse methodHeader and field. If I but the var nbOfEngine below the function, I get the function right, but the variable wrong.If i try them separatly, it work as well. I tried changing their order in the parser, without success.
class Plane extends Transport {
nbOfEngine: int32 ;
startEngine(gazLevel: int32, readyToStart:bool, foodOnBoard: bool) : bool {
}
}
Any idea how to fix this ?
Thanks !
You define classBody to either be a sequence of field definitions or a sequence of method definitions. You don't allow for it to be a sequence of both.
If you change it to (methodHeader | field)* instead, you'll get a sequence that can contain either.
I found the issue in the parser. The problem come from classBody.
classBody : methodHeader* | field*;
Instead Ive written:
classHeader : 'class' TYPE_IDENTIFIER ('extends' TYPE_IDENTIFIER)? '{' classBody* '}';
classBody : methodHeader | field;

Antlr not recognizing number

I have 3 types of numbers defined, number, decimal and percentage.
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Percentage and decimal work fine but when I assign a number, unless I put a sign (+ or -) in front of the number, it doesn't recognize it as a number.
number foo = +5 // does recognize
number foo = 5; // does not recognize
It does recognize it in an evaluation expression.
if (foo == 5 ) // does recognize
Here is my language (I took out the functions and left only the language recognition).
grammar Fetal;
transaction : begin statements end;
begin : 'begin' ;
end : 'end' ;
statements : (statement)+
;
statement
: declaration ';'
| command ';'
| assignment ';'
| evaluation
| ';'
;
declaration : type var;
var returns : identifier;
type returns
: DecimalType
| NumberType
| StringType
| BooleanType
| DateType
| ObjectType
| DaoType
;
assignment
: lharg Equals rharg
| lharg unaryOP rharg
;
assignmentOp : Equals
;
unaryOP : PlusEquals
| MinusEquals
| MultiplyEquals
| DivideEquals
| ModuloEquals
| ExponentEquals
;
expressionOp : arithExpressOp
| bitwiseExpressOp
;
arithExpressOp : Multiply
| Divide
| Plus
| Minus
| Modulo
| Exponent
;
bitwiseExpressOp
: And
| Or
| Not
;
comparisonOp : IsEqualTo
| IsLessThan
| IsLessThanOrEqualTo
| IsGreaterThan
| IsGreaterThanOrEqualTo
| IsNotEqualTo
;
logicExpressOp : AndExpression
| OrExpression
| ExclusiveOrExpression
;
rharg returns
: rharg expressionOp rharg
| '(' rharg expressionOp rharg ')'
| var
| literal
| assignmentCommands
;
lharg returns : var;
identifier : Identifier;
evaluation : IfStatement '(' evalExpression ')' block (Else block)?;
block : OpenBracket statements CloseBracket;
evalExpression
: evalExpression logicExpressOp evalExpression
| '(' evalExpression logicExpressOp evalExpression ')'
| eval
| '(' eval ')'
;
eval : rharg comparisonOp rharg ;
assignmentCommands
: GetBalance '(' stringArg ')'
| GetVariableType '(' var ')'
| GetDescription
| Today
| GetDays '(' startPeriod=dateArg ',' endPeriod=dateArg ')'
| DayOfTheWeek '(' dateArg ')'
| GetCalendarDay '(' dateArg ')'
| GetMonth '(' dateArg ')'
| GetYear '(' dateArg ')'
| Import '(' stringArg ')' /* Import( path ) */
| Lookup '(' sql=stringArg ',' argumentList ')' /* Lookup( table, SQL) */
| List '(' sql=stringArg ',' argumentList ')' /* List( table, SQL) */
| invocation
;
command : Print '(' rharg ')'
| Credit '(' amtArg ',' stringArg ')'
| Debit '(' amtArg ',' stringArg ')'
| Ledger '(' debitOrCredit ',' amtArg ',' acc=stringArg ',' desc=stringArg ')'
| Alias '(' account=stringArg ',' name=stringArg ')'
| MapFile ':' stringArg
| invocation
| Update '(' sql=stringArg ',' argumentList ')'
;
invocation
: o=objectLiteral '.' m=identifier '('argumentList? ')'
| o=objectLiteral '.' m=identifier '()'
;
argumentList
: rharg (',' rharg )*
;
amtArg : rharg ;
stringArg : rharg ;
numberArg : rharg ;
dateArg : rharg ;
debitOrCredit : charLiteral ;
literal
: numericLiteral
| doubleLiteral
| booleanLiteral
| percentLiteral
| stringLiteral
| dateLiteral
;
fileName : '<' fn=Identifier ('.' ft=Identifier)? '>' ;
charLiteral : ('D' | 'C');
numericLiteral : Number ;
doubleLiteral : Decimal ;
percentLiteral : Percentage ;
booleanLiteral : Boolean ;
stringLiteral : String ;
dateLiteral : Date ;
objectLiteral : Identifier ;
daoLiteral : Identifier ;
//Below are Token definitions
// Data Types
DecimalType : 'decimal' ;
NumberType : 'number' ;
StringType : 'string' ;
BooleanType : 'boolean' ;
DateType : 'date' ;
ObjectType : 'object' ;
DaoType : 'dao' ;
/******************************************************************
* Assignmnt operator
******************************************************************/
Equals : '=' ;
/*****************************************************************
* Unary operators
*****************************************************************/
PlusEquals : '+=' ;
MinusEquals : '-=' ;
MultiplyEquals : '*=' ;
DivideEquals : '/=' ;
ModuloEquals : '%=' ;
ExponentEquals : '^=' ;
/*****************************************************************
* Binary operators
*****************************************************************/
Plus : '+' ;
Minus : '-' ;
Multiply : '*' ;
Divide : '/' ;
Modulo : '%' ;
Exponent : '^' ;
/***************************************************************
* Bitwise operators
***************************************************************/
And : '&' ;
Or : '|' ;
Not : '!' ;
/*************************************************************
* Compariso operators
*************************************************************/
IsEqualTo : '==' ;
IsLessThan : '<' ;
IsLessThanOrEqualTo : '<=' ;
IsGreaterThan : '>' ;
IsGreaterThanOrEqualTo : '>=' ;
IsNotEqualTo : '!=' ;
/*************************************************************
* Expression operators
*************************************************************/
AndExpression : '&&' ;
OrExpression : '||' ;
ExclusiveOrExpression : '^^' ;
// Reserve words (Assignment Commands)
GetBalance : 'getBalance';
GetVariableType : 'getVariableType' ;
GetDescription : 'getDescription' ;
Today : 'today';
GetDays : 'getDays' ;
DayOfTheWeek : 'dayOfTheWeek' ;
GetCalendarDay : 'getCalendarDay' ;
GetMonth : 'getMonth' ;
GetYear : 'getYear' ;
Import : 'import' ;
Lookup : 'lookup' ;
List : 'list' ;
// Reserve words (Commands)
Credit : 'credit';
Debit : 'debit';
Ledger : 'ledger';
Alias : 'alias' ;
MapFile : 'mapFile' ;
Update : 'update' ;
Print : 'print';
IfStatement : 'if';
Else : 'else';
OpenBracket : '{';
CloseBracket : '}';
Percentage : (Sign)? Digit+ (Dot Digit+)? '%' ;
Boolean : 'true' | 'false';
Number : Sign? Digit+;
Decimal : Sign? Digit+ Dot Digit*;
Date : Year '-' Month '-' Day;
Identifier
: IdentifierNondigit
( IdentifierNondigit
| Digit
)*
;
String: '"' ( ESC | ~[\\"] )* '"';
/************************************************************
* Fragment Definitions
************************************************************/
fragment
ESC : '\\' [abtnfrv"'\\]
;
fragment
IdentifierNondigit
: Nondigit
//| // other implementation-defined characters...
;
fragment
Nondigit
: [a-zA-Z_]
;
fragment
Digit
: [0-9]
;
fragment
Sign : Plus | Minus;
fragment
Digits
: [-+]?[0-9]+
;
fragment
Year
: Digit Digit Digit Digit;
fragment
Month
: Digit Digit;
fragment
Day
: Digit Digit;
fragment Dot : '.';
fragment
SCharSequence
: SChar+
;
fragment
SChar
: ~["\\\r\n]
| SimpleEscapeSequence
| '\\\n' // Added line
| '\\\r\n' // Added line
;
fragment
CChar
: ~['\\\r\n]
| SimpleEscapeSequence
;
fragment
SimpleEscapeSequence
: '\\' ['"?abfnrtv\\]
;
ExtendedAscii
: [\x80-\xfe]+
-> skip
;
Whitespace
: [ \t]+
-> skip
;
Newline
: ( '\r' '\n'?
| '\n'
)
-> skip
;
BlockComment
: '/*' .*? '*/'
-> skip
;
LineComment
: '//' ~[\r\n]*
-> skip
;
I have a hunch that this use of a fragment is incorrect:
fragment Sign : Plus | Minus;
I couldn't find anything in the reference book, but I think it needs to be changed to something like this:
fragment Sign : [+-];
I found the issue. I was using version 4.5.2-1 because every attempt to upgrade to 4.7 caused more errors and I didn't want to cause more errors while trying to solve another. I finally broke down and upgraded the libraries to 4.7, fixed the errors and the number recognition issue disappeared. It was a bug in the library, all this time.

"No viable alternative at input" error for ANTLR 4 JSON grammar

I am trying to adapt the STRING part of Pair in Object to a CamelString, but it fails. and report "No viable alternative at input".
I have tried to used my CamelString as an independent grammar, it works well. I think it means there is ambiguity in my grammar, but I can not understand why.
For the test input
{
'BaaaBcccCdddd':'cc'
}
Ther error is
line 2:2 no viable alternative at input '{'BaaaBcccCdddd''
The following is my grammar. It's almost the same with the standard JSON grammar for ANTLR 4.
/** Taken from "The Definitive ANTLR 4 Reference" by Terence Parr */
// Derived from http://json.org
grammar Json;
json: object
| array
;
object
: '{' pair (',' pair)* '}'
| '{' '}' // empty object
;
pair : camel_string ':' value;
camel_string : '\'' (camel_body)*? '\'';
STRING
: '\'' (ESC | ~['\\])* '\'';
camel_body: CAMEL_BODY;
CAMEL_START: [a-z] ALPHA_NUM_LOWER*;
CAMEL_BODY: [A-Z] ALPHA_NUM_LOWER*;
CAMEL_END: [A-Z]+;
fragment ALPHA_NUM_LOWER : [0-9a-z];
array
: '[' value (',' value)* ']'
| '[' ']' // empty array
;
value
: STRING
| NUMBER
| object // recursion
| array // recursion
| 'true' // keywords
| 'false'
| 'null'
;
fragment ESC : '\\' (["\\/bfnrt] | UNICODE) ;
fragment UNICODE : 'u' HEX HEX HEX HEX ;
fragment HEX : [0-9a-fA-F] ;
NUMBER
: '-'? INT '.' [0-9]+ EXP? // 1.35, 1.35E-9, 0.3, -4.5
| '-'? INT EXP // 1e10 -3e4
| '-'? INT // -3, 45
;
fragment INT : '0' | [1-9] [0-9]* ; // no leading zeros
fragment EXP : [Ee] [+\-]? INT ; // \- since - means "range" inside [...]
WS : [ \t\n\r]+ -> skip ;

Resources