Xtext mismatched input '0' expecting RULE_INT - xtext

I don't understand what is wrong with this grammar:
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
header=Header (elements+=Element)*;
Header:
'Test:Revision' version=Decimal ';'
;
Decimal:
INT'.'INT
;
Element:
TableRow
;
TableRow:
'__Row' name=ID '{'
'__Alias' '=' Alias(','Alias)* ';'
'}'
;
Alias:
'0'|'1'|'H'|'L'
;
The following simple test statement fails with JUnit with the message "mismatched input '0' expecting RULE_INT on Header
Test:Revision2.0;
Everything works fine if I remove '0' from the Alias rule or I change the test statement to:
Test:Revision2.00;
Can you please tell me what is wrong with this grammar?

With Alias you turn '0' into a keyword so it can never be matched by the INT terminal rule. The same would happen if you create a element with name 'L' or name 'H' you could introduce a datatype rule like
IntValue: INT | '0' | '1';
and use that one instead of INT inside Decimal

Related

ANTLR Making Negative Test Cases

I'm new to ANTLR and am trying to understand how to do some things with it. I need it to throw an error when a statement is missing things, like a semicolon or an end bracket. It's been called negative test cases by the problem set that I'm working through.
For example, the below code returns true, which is correct.
val program = """
1 + 2;
"""
recognize(program)
However, this code also returns true, despite it missing the semicolon at the end. It should return false ([PARSER error at line=1]: missing ';' at '').
val program = """
1 + 2
""".trimIndent()
recognize(program)
The grammar is as follows:
program: (expression ';')* | EOF;
expression: INT PLUS INT | OPENBRAC INT PLUS INT CLOSEBRAC | QUOTE IDENT QUOTE PLUS QUOTE IDENT QUOTE;
IDENT: [A-Za-z0-9]+;
INT: [-][0-9]+ | ('0'..'9')+;
PLUS: '+';
OPENBRAC: '(';
CLOSEBRAC: ')';
QUOTE: '"';
program: (expression ';')* | EOF;
This means a program can either be zero or more instances of expression ';' followed by whatever else is in the input stream or it can be empty. Since (expression ';')* can already match the empty input by itself, the | EOF is just redundant.
What you want is program: (expression ';')* EOF, which means that a program consists of zero or more instances of expression ';', followed by the end of input, meaning there must be nothing left in the input afterwards.

ANTLR grammar not working as expected. What am I doing wrong?

I have this grammar below for implementing an IN operator taking a list of numbers or strings.
grammar listFilterExpr;
listFilterExpr: entityIdNumberListFilter | entityIdStringListFilter;
entityIdNumberProperty
: 'a.Id'
| 'c.Id'
| 'e.Id'
;
entityIdStringProperty
: 'f.phone'
;
listFilterExpr
: entityIdNumberListFilter
| entityIdStringListFilter
;
listOperator
: '$in:'
;
entityIdNumberListFilter
: entityIdNumberProperty listOperator numberList
;
entityIdStringListFilter
: entityIdStringProperty listOperator stringList
;
numberList: '[' ID (',' ID)* ']';
fragment ID: [1-9][0-9]*;
stringList: '[' STRING (',' STRING)* ']';
STRING
: '"'(ESC | SAFECODEPOINT)*'"'
;
fragment ESC
: '\\' (["\\/bfnrt] | UNICODE)
;
fragment SAFECODEPOINT
: ~ ["\\\u0000-\u001F]
;
If I try to parse the following input:
c.Id $in: [1,1]
Then I get the following error in the parser:
mismatched input '1' expecting ID
Please help me to correct this grammar.
Update
I found this following rule way above in the huge grammar file of my project that might be matching '1' before it gets to match to ID:
NUMBER
: '-'? INT ('.' [0-9] +)?
;
fragment INT
: '0' | [1-9] [0-9]*
;
But, If I write my ID rule before NUMBER then other things fail, because they have already matched ID which should have matched NUMBER
What should I do?
As mentioned by rici: ID should not be a fragment. Fragments can only be used by other lexer rules, they will never become a token on their own (and can therefor not be used in parser rules).
Just remove the fragment keyword from it: ID: [1-9][0-9]*;
Note that you'll also have to account for spaces. You probably want to skip them:
SPACES : [ \t\r\n] -> skip;
...
mismatched input '1' expecting ID
...
This looks like there's another lexer, besides ID, that also matches the input 1 and is defined before ID. In that case, have a look at this Q&A: ANTLR 4.5 - Mismatched Input 'x' expecting 'x'
EDIT
Because you have the rules ordered like this:
NUMBER
: '-'? INT ('.' [0-9] +)?
;
fragment INT
: '0' | [1-9] [0-9]*
;
ID
: [1-9][0-9]*
;
the lexer will never create an ID token (only NUMBER tokens will be created). This is just how ANTLR works: in case of 2 or more lexer rules match the same amount of characters, the one defined first "wins".
In the first place I think it's odd to have an ID rule that matches only digits, but, if that's the language you're parsing, OK. In your case, you could do something like this:
id : POS_NUMBER;
number : POS_NUMBER | NEG_NUMBER;
POS_NUMBER : INT ('.' [0-9] +)?;
NEG_NUMBER : '-' POS_NUMBER;
fragment INT
: '0' | [1-9] [0-9]*
;
and then instead of ID, use id in your parser rules. As well as using number instead of the NUMBER you're using now.

antlr4 matches longest one

I try an Antlr4 grammar file. When I change define of ID property
ID :[A-Z]+;
to
ID: [A-Z][A-Za-z0-9_]* ;
I got this error.
line 1:7 mismatched input 'E550' expecting {'W', 'I'}
line 1:12 mismatched input ';' expecting {'W', 'I'}
Actualy I know the reason. which mathces with the longest one. But I must use ID Like erroneous way. and my foo must be E or I and Number. How can I make it happen? any help is appreciate.
Here is my code snippet which causes the error.
QUEST E550 ;
Here is my grammar
grammar test;
block: foo+;
foo:ID op=(WARNING|INFORMATION)INT SCOL;
SCOL :';';
WARNING :'W';
INFORMATION :'I';
ID: [A-Z]+ ;
//if I change to ID: [A-Z][A-Za-z0-9_]* ; error occurs
INT : [0-9]+;
SPACE: [ \t\r\n] -> skip;
OTHER: . ;
If your ID rule cannot start with W, I or E, then you will need to exclude those from the start:
ID: [A-DF-HJ-VX-Z] [A-Za-z0-9_]* ;
Of course, then input like EEEEE will not become an ID. To account for such cases, you could (1) let your ID rule start with a single uppercase other than W, I or E followed by the rest, or (2) let it start with 2 letters followed by the rest:
ID
: [A-DF-HJ-VX-Z] [A-Za-z0-9_]* // (1)
| [A-Z] [A-Z] [A-Za-z0-9_]* // (2)
;

antlr4 not parsing according to grammar

I'm trying to parse 'for loop' according to this (partial) grammar:
grammar GaleugParserNew;
/*
* PARSER RULES
*/
relational
: '>'
| '<'
;
varChange
: '++'
| '--'
;
values
: ID
| DIGIT
;
for_stat
: FOR '(' ID '=' values ';' values relational values ';' ID varChange ')' '{' '}'
;
/*
* LEXER RULES
*/
FOR : 'for' ;
ID : [a-zA-Z_] [a-zA-Z_0-9]* ;
DIGIT : [0-9]+ ;
SPACE : [ \t\r\n] -> skip ;
When I try to generate the gui of how it's parsed, it's not following the grammar I provided above. This is what it produces:
I've encountered this problem before, what I did then was simply exit cmd, open it again and compile everything and somehow that worked then. It's not working now though.
I'm not really very knowledgeable about antlr4 so I'm not sure where to look to solve this problem.
Must be a problem of the IDE you are using. The grammar is fine and produces this parse tree in Visual Studio Code:
I guess the IDE is using the wrong parser or lexer (maybe from a different work file?). Print the lexer tokens to see if they are what you expect. Hint: avoid defining implicit lexer tokens (like '(', '}' etc.), which will allow to give the tokens good names.

Require newline or EOF after statement match

Just looking for a simple way of getting ANTLR4 to generate a parser that will do the following (ignore anything after the ;):
int #i ; defines an int
int #j ; see how I have to go to another line for another statement?
My parser is as the following:
compilationUnit:
(statement END?)*
statement END?
EOF
;
statement:
intdef |
WS
;
// 10 - 1F block.
intdef:
'intdef' Identifier
;
// Lexer.
Identifier: '#' Letter LetterOrDigit*;
fragment Letter: [a-zA-Z_];
fragment LetterOrDigit: [a-zA-Z0-9$_];
// Whitespace, fragments and terminals.
WS: [ \t\r\n\u000C]+ -> skip;
//COMMENT: '/*' .*? '*/' -> channel(HIDDEN);
END: (';' ~[\r\n]*) | '\n';
In essence, any time I have a statement, I need it to REQUIRE a newline before another is entered. I don't care if there's 3 new lines and then on the second one a bunch of tabs persist, as long as there's a new line.
The issue is, the ANTLR4 Parse Tree seems to be giving me errors for inputs such as:
.
(Pretend the dot isnt there, its literally no input)
int #i int #j
Woops, we got two on the same line!
Any ideas on how I can achieve this? I appreciate the help.
I've simplified your grammar a bit but made it require an end-of-line sequence after each statement to parse correctly.
grammar Testnl;
program: (statement )* EOF ;
statement: 'int' Identifier EOL;
Identifier: '#' Letter LetterOrDigit*;
fragment Letter: [a-zA-Z_];
fragment LetterOrDigit: [a-zA-Z0-9$_];
EOL: ';' .*? '\r\n'
| ';' .*? '\n'
;
WS: [ \t\r\n\u000C]+ -> skip;
It parses
int #i ;
int #j;
[#0,0:2='int',<'int'>,1:0]
[#1,4:5='#i',<Identifier>,1:4]
[#2,7:9=';\r\n',<EOL>,1:7]
[#3,10:12='int',<'int'>,2:0]
[#4,14:15='#j',<Identifier>,2:4]
[#5,16:18=';\r\n',<EOL>,2:6]
[#6,19:18='<EOF>',<EOF>,3:0]
It also ignore stuff after the semicolon as just part of the EOL token:
[#0,0:2='int',<'int'>,1:0]
[#1,4:5='#i',<Identifier>,1:4]
[#2,7:20='; ignore this\n',<EOL>,1:7]
[#3,21:23='int',<'int'>,2:0]
[#4,25:26='#j',<Identifier>,2:4]
[#5,27:28=';\n',<EOL>,2:6]
[#6,29:28='<EOF>',<EOF>,3:0]
using either linefeed or carriagereturn-linefeed just fine. Is that what you're looking for?
EDIT
Per OP comment, made a small change to allow consecutive EOL tokens, and also move EOL token to statement to reduce repetition:
grammar Testnl;
program: ( statement EOL )* EOF ;
statement: 'int' Identifier;
Identifier: '#' Letter LetterOrDigit*;
fragment Letter: [a-zA-Z_];
fragment LetterOrDigit: [a-zA-Z0-9$_];
EOL: ';' .*? ('\r\n')+
| ';' .*? ('\n')+
;
WS: [ \t\r\n\u000C]+ -> skip;

Resources