Antlr4, mismatched input error: the token is not recognized - parsing

I have the following ANTLR grammar:
declareField : MODIFIER* typeVar nameType ASSIGN value ';';
nameType : NAME(('.')NAME)*;
typeVar : nameType | nameType'<'typeVar'>' | typeVar'['']';
value : PRIMITIVE_VALUE;
And such a set of tokens:
ASSIGN : '=';
NULL : 'null';
INT : [0-9]+;
FLOAT : [0-9]+.[0-9]+;
STRING : '"'[a-zA-Z_0-9.]*'"';
CHAR : '\''[a-zA-Z_0-9]'\'';
BOOLEAN : TRUE | FALSE;
TRUE : 'true';
FALSE : 'false';
PRIMITIVE_VALUE : INT | FLOAT | STRING | CHAR | BOOLEAN | NULL;
PUBLIC : 'public';
PRIVATE : 'private';
FINAL : 'final';
STATIC : 'static';
VOLATILE : 'volatile';
TRANSIENT : 'transient';
SYNCHRONIZED : 'synchronized';
NATIVE : 'native';
ABSTRACT : 'abstract';
PROTECTED : 'protected';
MODIFIER : PUBLIC | PRIVATE | FINAL | STATIC | VOLATILE | TRANSIENT | SYNCHRONIZED | NATIVE | ABSTRACT | PROTECTED;
NAME : [a-zA-Z_][a-zA-Z_0-9]*;
WS: [ \t\r\n]+ -> channel(HIDDEN);
I expected such input data will be accepted by my grammar:
protected static final int test = 10;
But I get the following error.
line 1:0 mismatched input 'protected' expecting {MODIFIER, NAME}
Although the token 'protected' should definitely be accepted by the rule MODIFIER

The rule MODIFIER will never match because all the rules a MODIFIER is made from are matched before MODIFIER.
Change it to be a parser rule instead:
declareField : modifier* typeVar nameType ASSIGN value ';';
...
modifier : PUBLIC | PRIVATE | FINAL | STATIC | VOLATILE | TRANSIENT | SYNCHRONIZED | NATIVE | ABSTRACT | PROTECTED;
PUBLIC : 'public';
PRIVATE : 'private';
FINAL : 'final';
STATIC : 'static';
VOLATILE : 'volatile';
TRANSIENT : 'transient';
SYNCHRONIZED : 'synchronized';
NATIVE : 'native';
ABSTRACT : 'abstract';
PROTECTED : 'protected';

Related

Breaking head over how to get position of token with a rule - ANTLR4 / grammar

I'm writing a little grammar using ANLTR, and I have a rule like this:
operation : OPERATION (IDENT | EXPR) ',' (IDENT | EXPR);
...
OPERATION : 'ADD' | 'SUB' | 'MUL' | 'DIV' ;
IDENT : [a-z]+;
EXPR : INTEGER | FLOAT;
INTEGER : [0-9]+ | '-'[0-9]+
FLOAT : [0-9]+'.'[0-9]+ | '-'[0-9]+'.'[0-9]+
Now in the listener inside Java, how do I determine in the case of such a scenario where an operation consist of both IDENT and EXPR the order in which they appear?
Obviously the rule can match both
ADD 10, d
or
ADD d, 10
But in the listener for the rule, generated by ANTLR4, if there is both IDENT() and EXPR() how to get their order, since I want to assign the left and right operands correctly.
Been breaking my head over this, is there any simple way or should I rewrite the rule itself? The ctx.getTokens () requires me to give the token type, which kind of defeats the purpose, since I cannot get the sequence of the tokens in the rule, if I specify their type.
You can do it like this:
operation : OPERATION lhs=(IDENT | EXPR) ',' rhs=(IDENT | EXPR);
and then inside your listener, do this:
#Override
public void enterOperation(TParser.OperationContext ctx) {
if (ctx.lhs.getType() == TParser.IDENT) {
// left hand side is an identifier
} else {
// left hand side is an expression
}
// check `rhs` the same way
}
where TParser comes from the grammar file T.g4. Change this accordingly.
Another solution would be something like this:
operation
: OPERATION ident_or_expr ',' ident_or_expr
;
ident_or_expr
: IDENT
| EXPR
;
and then in your listener:
#Override
public void enterOperation(TParser.OperationContext ctx) {
Double lhs = findValueFor(ctx.ident_or_expr().get(0));
Double rhs = findValueFor(ctx.ident_or_expr().get(1));
...
}
private Double findValueFor(TParser.Ident_or_exprContext ctx) {
if (ctx.IDENT() != null) {
// it's an identifier
} else {
// it's an expression
}
}

create a type with a constructor that takes in a string and a list

I want to extend a type ExprTree (expression tree) with a new constructor App that takes in a string and a list as arguments. Below is the type ExprTree:
type ExprTree =
| Const of int
| Ident of string
| Sum of ExprTree * ExprTree
| Let of string * ExprTree * ExprTree
The definition that you posted in a comment is correct. Your version from the comments was:
type ExprTree =
| Const of int
| Ident of string
| Sum of ExprTree * ExprTree
| Let of string * ExprTree * ExprTree
| App of string * ((ExprTree) list)
The definition of the App constructor has some unnecessary parentheses - you do not need any and can write just string * ExprTree list, but they do not hurt eihter. I suspect that the issue was not with the definition, but with how you use the constructor. The following is the right syntax:
App("foo", [Const 1; Ident "x"])

Disable wrapping in Xtext formatter

I have a xtext grammar which consists of one declaration per line. When I format the code, all the declarations end up in the same line, the line breaks are removed.
As I didn't manage to change the grammar to require line breaks, I would like to disable the removal of line breaks. How do I do that? Bonus points if someone can tell me how to require line breaks at the end of each declaration.
Part of the Grammar:
grammar com.example.Msg with org.eclipse.xtext.common.Terminals
hidden(WS, SL_COMMENT)
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate msg_idl "http://www.example.com/ex/ample/msg"
Model:
MsgDef
;
MsgDef:
(definitions+=definition)+
;
definition:
type=fieldType ' '+ name=ValidID (' '* '=' ' '* const=Value)?
;
fieldType:
value = ( builtinType | header)
;
builtinType:
BOOL = "bool"
| INT32 = "int32"
| CHAR = "char"
;
header:
value="Header"
;
Bool_l:
target=BOOL_E
;
String_l:
target = ('""'|STRING)
;
Number_l:
Double_l | Integer_l | NegInteger_l
;
NegInteger_l:
target=NEG_INT
;
Integer_l :
target=INT
;
Double_l:
target=DOUBLE
;
terminal NEG_INT returns ecore::EInt:
'-' INT
;
terminal DOUBLE returns ecore::EDouble :
('-')? ('0'..'9')* ('.' INT) |
('-')? INT ('.') |
('-')? INT ('.' ('0'..'9')*)? (('e'|'E')('-'|'+')? INT )|
'nan' | 'inf' | '-inf'
;
enum BOOL_E :
true | false
;
ValidID:
"bool"
| "string"
| "time"
| "duration"
| "char"
| ID ;
Value:
String_l | Number_l
;
terminal SL_COMMENT :
' '* '#' !('\n'|'\r')* ('\r'? '\n')?
;
Example data
string left
string top
string right
string bottom
I already tried:
class MsgFormatter extends AbstractDeclarativeFormatter {
extension MsgGrammarAccess msgGrammarAccess = grammarAccess as MsgGrammarAccess
override protected void configureFormatting(FormattingConfig c) {
c.setLinewrap(0, 1, 2).before(SL_COMMENTRule)
c.setLinewrap(0, 1, 2).before(ML_COMMENTRule)
c.setLinewrap(0, 1, 1).after(ML_COMMENTRule)
c.setLinewrap().before(definitionRule); // does not work
c.setLinewrap(1,1,2).before(definitionRule); // does not work
c.setLinewrap().before(fieldTypeRule); // does not work
}
}
In general it is a bad idea to encode whitespace into the language itself. Most of the time it is better to write the language in a way that you can use all kinds of whitespaces (blanks, tabs, newlines ...) to separate tokens.
You should implement a custom formatter for your language that inserts the line breaks after each statement. Xtext comes with two formatter APIs (an old one and a new one starting with Xtext 2.8). I propose to use the new one.
Here you extend AbstractFormatter2 and implement the format methods.
You can find a bit information in the online manual: https://www.eclipse.org/Xtext/documentation/303_runtime_concepts.html#formatting
Some more explanation in the folowing blog post: https://blogs.itemis.com/en/tabular-formatting-with-the-new-formatter-api
Some technical background: https://de.slideshare.net/meysholdt/xtexts-new-formatter-api

xtext - Couldn't resolve reference to

I have the following grammar:
Model: prog+=Program*;
Program: g=Greeting de+=DataEntry* s+=Statement*;
Greeting: 'Hello' t=ProgPara '!';
ProgPara: 'PROGRAM' pname=Progname ';';
Progname : name=ID;
DataEntry: a=INT v=Varname ';';
Varname : name = ID;
Statement: (c=CopyStmt|m=MoveStmt) ';';
CopyStmt: 'COPY' 'TO' qname=[IndexVarname|ID] ;
IndexVarname : (Indexname|Varname);
Indexname : '(' name = ID ')';
MoveStmt: 'MOVE' 'TO' p=[PrVarName|ID];
PrVarName : (Varname|Progname);
But it throws error for:
PrVarName : (Varname|Progname);
So i modified the grammar to have as below:
PrVarName : (v=Varname|Progname);
I updated the Scope provider as below:
override getScope(EObject context, EReference reference) {
if (context instanceof CopyStmt) {
if (reference.featureID == TestDslPackage.COPY_STMT__QNAME) {
val rootElement = EcoreUtil2.getRootContainer(context);
val candidates1 = EcoreUtil2.getAllContentsOfType(rootElement, Indexname);
val candidates2 = EcoreUtil2.getAllContentsOfType(rootElement, Varname);
val candidates = candidates1 + candidates2;
return Scopes.scopeFor(candidates);
}
} else if (context instanceof MoveStmt) {
if (reference.featureID == TestDslPackage.MOVE_STMT__P) {
val rootElement = EcoreUtil2.getRootContainer(context);
val candidates1 = EcoreUtil2.getAllContentsOfType(rootElement, Progname);
val candidates2 = EcoreUtil2.getAllContentsOfType(rootElement, Varname);
val candidates = candidates1 + candidates2;
return Scopes.scopeFor(candidates);
}
}
}
Once the grammar was built and i ran the below test case, it is throwing error in the MOVE statement saying "Couldn't resolve reference to PrVarName 'test1'."
Hello PROGRAM test;!
1 test1;
2 test2;
3 test3;
COPY TO test2;
MOVE TO test1;
Looks like i cannot use Varname in two different cross references. But there is a very valid need for it. How do I achieve this?
Thanks in advance.
PrVarName : p=(Progname|Varname);
is bad cause it changes the type hierarchy. Progname and Varname are no longer subtypes of PrVarName. By reverting the change and introducing a common Named supertype you can resolve this
Model:
prog+=Program*;
Program:
g=Greeting de+=DataEntry* s+=Statement*;
Greeting:
'Hello' t=ProgPara '!';
ProgPara:
'PROGRAM' pname=Progname ';';
DataEntry:
a=INT (v=Varname | in=Indexname) ';';
Statement:
(c=CopyStmt | m=MoveStmt) ';';
CopyStmt:
'COPY' 'TO' qname=[IndexVarname|ID];
MoveStmt:
'MOVE' 'TO' p=[PrVarName|ID];
PrVarName:
Progname | Varname;
IndexVarname:
(Indexname | Varname);
Named:Progname|Indexname|Varname;
Progname:
{Progname} name=ID;
Indexname:
{Indexname}'(' name=ID ')';
Varname:
{Varame}name=ID;

Antlr whitespace token error

I have the following grammar and I want to match the String "{name1, name2}". I just want lists of names/intergers with at least one element. However I get the error:
line 1:6 no viable alternative at character ' '
line 1:11 no viable alternative at character '}'
line 1:7 mismatched input 'name' expecting SIMPLE_VAR_TYPE
I would expect whitespaces and such are ignored... Also interesting is the error does not occur with input "{name1,name2}" (no space after ',').
Heres my gramar
grammar NusmvInput;
options {
language = Java;
}
#header {
package secltlmc.grammar;
}
#lexer::header {
package secltlmc.grammar;
}
specification :
SIMPLE_VAR_TYPE EOF
;
INTEGER
: ('0'..'9')+
;
SIMPLE_VAR_TYPE
: ('{' (NAME | INTEGER) (',' (NAME | INTEGER))* '}' )
;
NAME
: ('A'..'Z' | 'a'..'z') ('a'..'z' | 'A'..'Z' | '0'..'9' | '_' | '$' | '#' | '-')*
;
WS
: (' ' | '\t' | '\n' | '\r')+ {$channel = HIDDEN;}
;
And this is my testing code
package secltlmc;
public class Main {
public static void main(String[] args) throws
IOException, RecognitionException {
CharStream stream = new ANTLRStringStream("{name1, name2}");
NusmvInputLexer lexer = new NusmvInputLexer(stream);
CommonTokenStream tokenStream = new CommonTokenStream(lexer);
NusmvInputParser parser = new NusmvInputParser(tokenStream);
parser.specification();
}
}
Thanks for your help.
The problem is that you are trying to parse SIMPLE_VAR_TYPE with the lexer, i.e. you are trying to make it a single token. In reality, it looks like you want a multi-token production, since you'd like whitespace to be re-directed to hidden channel through WS.
You should change SIMPLE_VAR_TYPE from a lexer rule to a parser rule by changing its initial letter (or better yet, the entire name) to lower case.
specification :
simple_var_type EOF
;
simple_var_type
: ('{' (NAME | INTEGER) (',' (NAME | INTEGER))* '}' )
;
The defintion of SIMPLE_VAR_TYPE specifies the following expression:
Open {
followed by one of NAME or INTEGER
follwoed by zero or more of:
comma (,) followed by one of NAME or INTEGER
followed by closing }
Nowhere does it allow white-space in the input (neither NAME nor INTEGER allows it either), so you get an error when you supply one
Try:
SIMPLE_VAR_TYPE
: ('{' (NAME | INTEGER) (WS* ',' WS* (NAME | INTEGER))* '}' )
;

Resources