I am writing grammar to recognize following input
Say Hello Boss
Hello friend
Here is my complete grammar
grammar org.xtext.example.second.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/second/MyDsl"
Example:
statements+=Statement*;
Statement:
(IDLABEL)? Directives;
Directives:
TAG1 | TAG2 | TAG3 | TAG4;
TAG1: tag=('Hi'|'Hello') IDLABEL;
TAG2: tag=('Tag2') IDLABEL;
TAG3: tag=('Tag3') IDLABEL;
TAG4: tag=('Tag4') IDLABEL;
STRING_OPERANDS hidden(WS):
("*"|UNQUOTED|QUOTED)+;
terminal QUOTED:
"'" ( '\\' . /* 'b'|'t'|'n'|'f'|'r'|'u'|'"'|"'"|'\\' */ | !('\\'|"'") )* "'";
terminal UNQUOTED:
('a'..'z' | 'A'..'Z' | '_' | '0'..'9' | '-' | '*' | "/" | "\\" | '(' | ')' | '$' | '=' |'#' |'.' | '"' |'#'|'+'|"'"|'<'|'>')*;
terminal IDLABEL:
('a'..'z' | 'A'..'Z' | '_' | '0'..'9'|'='|'#')*;
For the input, Say Hello Boss
I am getting an error "missing EOF at Say"
and for the input Hello Boss
I am getting an error "mismatched input 'Boss' expecting RULE_IDLABEL"
What is wrong with this grammar?
Boss matches both the rule IDLABEL and UNQUOTED. In cases where two rules can match the current input and both rules match the same prefix, the tokenizer uses the rule that comes first. So the input Boss produces an UNQUOTED token, not an IDLABEL token.
In fact all valid IDLABELs are also valid UNQUOTEDs, so you'll never get any IDLABEL tokens.
To fix this, you can change the order of UNQUOTED and IDLABEL, so that IDLABEL comes first.
I have checked similar questions surrounding this issue but none seems to provide a solution to my version of the problem.
I just started Antlr4 recently and all has been going nicely until I hit this particular roadblock.
My grammar is a basic math expression grammar but for some reason I noticed the generated parser(?) is unable to walk from paser-rule "equal" to paser-rule "expr", in order to reach lexer-rule "NAME".
grammar MathCraze;
NUM : [0-9]+ ('.' [0-9]+)?;
WS : [ \t]+ -> skip;
NL : '\r'? '\n' -> skip;
NAME: [a-zA-Z_][a-zA-Z_0-9]*;
ADD: '+';
SUB : '-';
MUL : '*';
DIV : '/';
POW : '^';
equal
: add # add1
| NAME '=' equal # assign
;
add
: mul # mul1
| add op=('+'|'-') mul # addSub
;
mul
: exponent # power1
| mul op=('*'|'/') exponent # mulDiv
;
exponent
: expr # expr1
| expr '^' exponent # power
;
expr
: NUM # num
| NAME # name
| '(' add ')' # parens
;
If I pass a word as input, sth like "variable", the parser throws the error above, but if I pass a number as input (say "78"), the parser walks the tree successfully (i.e, from rule "equal" to "expr").
equal equal
| |
add add
| |
mul mul
| |
exponent exponent
| |
expr expr
| |
NUM NAME
| |
"78" # No Error "variable" # Error! Tree walk doesn't reach here.
I've checked for every type of ambiguity I know of, so I'm probably missing something here.
I'm using Antlr5.6 by the way and I will appreciate if this problem gets solved. Thanks in advance.
Your style of expression hierarchy is the one we use in parsers written by hand or in ANTLR v3, from low to high precedence.
As Raven said, ANTLR 4 is much more powerful. Note the <assoc = right> specification in the power rule, which is usually right-associative.
grammar Question;
question
: line+ EOF
;
line
: expr NL
| assign NL
;
assign
: NAME '=' expr # assignSingle
| NAME '=' assign # assignMulti
;
expr // from high to low precedence
: <assoc = right> expr '^' expr # power
| expr op=( '*' | '/' ) expr # mulDiv
| expr op=( '+' | '-' ) expr # addSub
| '(' expr ')' # parens
| atom_r # atom
;
atom_r
: NUM
| NAME
;
NAME: [a-zA-Z_][a-zA-Z_0-9]*;
NUM : [0-9]+ ('.' [0-9]+)?;
WS : [ \t]+ -> skip;
NL : [\r\n]+ ;
Run with the -gui option to see the parse tree :
$ echo $CLASSPATH
.:/usr/local/lib/antlr-4.6-complete.jar
$ alias grun
alias grun='java org.antlr.v4.gui.TestRig'
$ grun Question question -gui data.txt
and this data.txt file :
variable
78
a + b * c
a * b + c
a = 8 + (6 * 9)
a ^ b
a ^ b ^ c
7 * 2 ^ 5
a = b = c = 88
.
Added
Using your original grammar and starting with the equal rule, I have the following error :
$ grun Q2 equal -tokens data.txt
[#0,0:7='variable',<NAME>,1:0]
[#1,9:10='78',<NUM>,2:0]
...
[#41,89:88='<EOF>',<EOF>,10:0]
line 2:0 no viable alternative at input 'variable78'
If I start with rule expr, there is no error :
$ grun Q2 expr -tokens data.txt
[#0,0:7='variable',<NAME>,1:0]
...
[#41,89:88='<EOF>',<EOF>,10:0]
$
Run grun with the -gui option and you'll see the difference :
running with expr, the input token variable is catched in NAME, rule expr is satisfied and terminates;
running with equal it's all in error. The parser tries the first alternative equal -> add -> mul -> exponent -> expr -> NAME => OK. It consumes the token variable and tries to do something with the next token 78. It rolls back in each rule, see if it can do something with the alt of rule, but each alt requires an operator. Thus it arrives in equal and starts again with the token variable, this time using the alt | NAME '='. NAME consumes the token, then the rule requires '=', but the input is 78 and does not satisfies it. As there is no other choice, it says there is no viable alternative.
$ grun Q2 equal -tokens data.txt
[#0,0:7='variable',<NAME>,1:0]
[#1,8:7='<EOF>',<EOF>,1:8]
line 1:8 no viable alternative at input 'variable'
If variable is the only token, same reasoning : first alternative equal -> add -> mul -> exponent -> expr -> NAME => OK, consumes variable, back to equal, tries the alt which requires '=', but the input is at EOF. That's why it says there is no viable alternative.
$ grun Q2 equal -tokens data.txt
[#0,0:1='78',<NUM>,1:0]
[#1,2:1='<EOF>',<EOF>,1:2]
If 78 is the only token, do the same reasoning : first alternative equal -> add -> mul -> exponent -> expr -> NUM => OK, consumes 78, back to equal. The alternative is not an option. Satisfied ? oops, what about EOF.
Now let's add a NUM alt to equal :
equal
: add # add1
| NAME '=' equal # assign
| NUM '=' equal # assignNum
;
$ grun Q2 equal -tokens data.txt
[#0,0:1='78',<NUM>,1:0]
[#1,2:1='<EOF>',<EOF>,1:2]
line 1:2 no viable alternative at input '78'
First alternative equal -> add -> mul -> exponent -> expr -> NUM => OK, consumes 78, back to equal. Now there is also an alt for NUM, starts again, this time using the alt | NUM '='. NUM consumes the token 78,
then the parser requires '=', but the input is at EOF, hence the message.
Now let's add a new rule with EOF and let's run the grammar from all :
all : equal EOF ;
$ grun Q2 all -tokens data.txt
[#0,0:1='78',<NUM>,1:0]
[#1,2:1='<EOF>',<EOF>,1:2]
$ grun Q2 all -tokens data.txt
[#0,0:7='variable',<NAME>,1:0]
[#1,8:7='<EOF>',<EOF>,1:8]
The input corresponds to the grammar, and there is no more message.
Although I can't answer your question about why the parser can't reach NAME in expr I'd like to point out that with Antlr4 you can use direct left recursion in your rule specification which makes your grammar more compact and omproves readability.
With that in mind your grammar could be rewritten as
math:
assignment
| expression
;
assignment:
ID '=' (assignment | expression)
;
expression:
expression '^' expression
| expression ('*' | '/') expression
| expression ('+' | '-') expression
| NAME
| NUM
;
That grammar hapily takes a NAME as part of an expression so I guess it would solve your problem.
If you're really interested in why it didn't work with your grammar then I'd first check if the lexer has matched the input into the expected tokens. Afterwards I would have a look at the parse tree to see what the parser is making of the given token sequence and then trying to do the parsing manually accoding to your grammar and during that you should be able to find the point at which the parser does something different from what you'd expect it to do.
Currently I'm trying to create a DSL for the class diagrams of PlantUML. I'm new to Xtext and I can't get my head around several things. Before I list my problems I show you some parts of my current grammar:
ClassUml:
{ClassUml}
'#startuml' umlElements+=(ClassElement)* '#enduml';
ClassElement:
Class
| Association;
Class:
{Class}
'class' name=ClassName
(color=ColorTag)?
('{' (classContents+=ClassContent)* '}')?;
ClassContent:
Attribute | Method;
ClassName:
(ID | STRING);
Attribute:
{Attribute}
(visibility=Visibility)? name=ID (":" type=ID)?;
Method:
{Method}
(visibility=Visibility)? name=METHID
(":" type=ID)?;
Association:
{Association}
(classFrom=[Class]
associationType=Bidirectional
classTo=[Class])
|
(classTo=[Class]
associationType=UnidirectionalLeft
classFrom=[Class])
|
(classFrom=[Class]
associationType=UnidirectionalRight
classTo=[Class])
(':' text+=(ID)*)?;
Bidirectional:
{Bidrectional}
('-' ("[" color=ColorTag "]")? '-'?)
| ('.' ("[" color=ColorTag "]")? '.'?);
UnidirectionalLeft:
{UnidirectionalLeft}
('<-' ("[" color=ColorTag "]")? '-'?)
| ('<.' ("[" color=ColorTag "]")? '.'?);
UnidirectionalRight:
{UnidirectionalRight}
((('-[' color=ColorTag "]")|'-')? '->')
| ((('.[' color=ColorTag "]")|'.')? '.>');
ColorTag:
(COLOR | HEXCODE);
enum Visibility:
PROTECTED='#'
| PRIVATE='-'
| DEFAULT='~'
| PUBLIC='+';
terminal COLOR:
"#"
('red') | ('orange');
terminal HEXCODE:
"#"
('A' .. 'F'|'0' .. '9')('A' .. 'F'|'0' .. '9')('A' .. 'F'|'0' .. '9')
('A' .. 'F'|'0' .. '9')('A' .. 'F'|'0' .. '9')('A' .. 'F'|'0' .. '9');
terminal STRING:
'"' ('\\' . | !('\\' | '"'))* '"';
terminal ID:
('a'..'z' | 'A'..'Z' | '_' | '0'..'9' | '\"\"' | '//' | '\\')
('a'..'z' | 'A'..'Z' | '_' | '0'..'9' | '\"\"' | '//' | '\\' | ':')*;
I left out the other association types (--*, --o, --|>) because I've defined them in the same way.
Problems
1. The visibility enum '#' isn't working without a separation from the method / attribute name. But all the other cases (+,-,~) are fine, with and without a blank space between.
2. The associations don't seem to work in most cases. I've listed a few examples:
' Working '
Alice -* Bob : Hello
Alice - Bob
Alice .o Bob
Alice <|-[#002211]- Bob
Alice *-[#red]- Bob
Alice -[#000000]-> Bob
Alice .[#red].> Bob
' Not Working '
Alice *-- Bob
Alice --* Bob
Alice .. Bob
Alice -[#ff0022]- Bob
Alice <-- Bob
Alice ..> Bob
Alice -- Bob
I don't know how I can use cross references for classes which were defined by STRING and not ID.
Also I'm guessing the additional terminal for the method name is a weird solution and should be handled differently.
1) Color should be a parser rule not a terminal rule.
Also remove the Hex rule and simply use your changed ID rule.
Color:
"#" ('red' | 'orange' | ID);
2) Make sure you to unify the differences, for instance there is a conflict between
Bidirectional:
...
('-' ("[" ...;
and
UnidirectionalRight:
((('-[' ...;
a sequence '-[' will always match the latter version. You should create one rule AssociationType and make that work for all cases. Something like this:
Association:
{Association}
(classFrom=[Class | ClassName]
associationType=AssociationType
classTo=[Class | ClassName])
(':' text+=(ID)*)?;
AssociationType:
{AssociationType}
left?='<'? ('-'|'.') ("[" color=Color "]")? ('-'|'.') right?='>'?;
3) You could allow a STRING in the cross references, as well, by using the following syntax for the crossrefs: classFrom=[Class|ClassName]
I'm writing a grammar that should convert infix to postfix. Our teacher told us to change this grammar:
E -> TT'
T -> FF'
T'-> +T | -T | nil
F -> (E) | id | num
F' -> *F | /F | nil
Note: tokens are +,-,*,/, ^ (pow). The problem is power operator . I don't know how to change the grammar so that it could parse power too.
Thanks in advance.
I'm looking for a way to prevent KEYWORDS matching at a place where those KEYWORDS are not expected.
Take a look at the following grammar. Both 'APPLY' and 'OUTPUT' are keywords.
'OUTPUT' has an argument that contains any characters.
Everything works fine but if this argument contains the word APPLY, an error is raised (extraneous input APPLY expecting RULE_END).
Is there a way to solve this issue?
Thanks.
Sample text
APPLY, 'an id' $
OUTPUT, A text $
OUTPUT, A text with the word APPLY $
DSL
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
statement+=Statement*;
Statement:
ApplyStatement | OutputStatement;
OutputStatement:
'OUTPUT' ',' out+=EXTENDLABEL* end=END;
ApplyStatement:
'APPLY' ',' id=LABELIDENTIFIER end=END;
terminal fragment LETTER:
'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H' | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P' | 'Q' | 'R' | 'S' | 'T'
| 'U' | 'V' | 'W' | 'X' | 'Y' | 'Z' | 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h' | 'i' | 'j' | 'k' | 'l' | 'm' |
'n' | 'o' | 'p' | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x' | 'y' | 'z';
terminal LABELIDENTIFIER:
"'"->"'";
terminal EXTENDLABEL:
(LETTER) (LETTER)*;
terminal END:
'$' !('\n' | '\r')*;
I see a few different ways your issue can be handled. First of all, you could escape the keywords appearing, e.g. the Xbase language uses the '^' character as an escape character; if for any reason there is a problem with writing a keyword, you can prefix it with '^', and it would work. Similarly, if you would put your string inside specific symbols, e.g. apostrophes, it would help a lot. Of course, these solutions require to change your language itself, which you may or may not do.
You might also replace your EXTENDLABEL terminal with a datatype rule. This allows greater flexibility with regards to conflict resolution; worst case you could add the language keywords as options. I was suggested this route by a tangentially related case in the Eclipse forums.
an other solution is to change the ID of your token before that your parser used it. Token are provided by the lexer and your parser will take these tokens in input to produce your AST. So the idea is to change the tokens before to pass them to your parser.
To do it you need to declare your own parser:
#Override
public Class<? extends IParser> bindIParser() {
return ModelParser.class;
}
Note : your parser will extends the generated parser of your grammar.
Then you need to override the following method to introduce your own TokenSource:
override protected XtextTokenStream createTokenStream(TokenSource tokenSource) {
return new TokenSource(tokenSource, getTokenDefProvider());
}
You own token source need to extend 'XtextTokenStream'.
After you need to override the method 'LT' as following :
override LT(int k) {
var Token token = super.LT(k)
if(token != null && token.text != null) token.tokenOverride(k);
token
}
Then you just need to change the ID :
def void tokenOverride(Token token, int index){
switch (token.text){
case "APPLY" : {
overrideType(t_parameter, InternalModelParser.RULE_ID);
}
}
}
def void overrideType(Token token, int i) {
token.type = i
}
Note : don't forget to add your condition before to change the ID of your token, in this example all token 'APPLY' will become an ID.
And of course inside the switch you can use the ID of the token 'APPLY' instead the text of your token.