I'm starting to find xText and I found something that I don't know how to solve.
I have 2 grammars
A.xtext
Domain:
'domain' name=ID
'{'
(instances+=Instance)*
'}'
Instance:
'instance' name=ID
B.xText
import "http://somewhere/languages/A" as A
MyCommand:
DomainCommand | InstanceCommand
;
DomainCommand:
'domain'
domain=[A::Domain]
;
InstanceCommand:
'instance'
instance=[A::Instance]
;
SomeFile.A
domain A {
instance X
instance Y
}
SomeFile.B
domain A
instance A.X
When I'm writting my text file in my B Grammar I can access the Domain values defined in some file.A but I don't know what's the best way to access the Instance X and to make sure is from the domain A.
Related
I'm writing a custom Xtext editor for my own DSL-Language and now want to add If-Statements to my language.
The Statements look something like this:
if (TRUE) {
(...)
}
But when I try adding them in, I get an error "A class may not be a super type of itself".
This is my code so far:
grammar XtextTest with org.eclipse.xtext.common.Terminals
generate xtextTest "http://www.my.xtext/Test"
Model:
statements+=Statement*;
Statement:
VariableAssignment |
IfStatement;
IfStatement:
'if' '(' BooleanExpression ')' '{' Statement '}';
BooleanExpression:
'TRUE' | 'FALSE';
VariableAssignment:
name=ID "=" INT ';';
How can I implement this? Or am I doing something obviously wrong?
Any help is appreciated ^^
Assignments are a important thing in Xtext. if you just call rules without assigning them it influences the supertype hierarchy that is inferred. => it is better to change the grammar to
IfStatement:
'if' '(' condition=BooleanExpression ')' '{' statement=Statement '}';
If you want to introduce a common supertype/subtype relationship don't use assignments
Number: Double | Long;
I have had an issue with my Xtext grammar in that the parser cannot recover upon reaching a misspelled keyword. Here is a minimal grammar which reproduces the issue and resembles my actual grammar structure:
Model:
'NS' name=ID
(
a+=TypeA |
b+=TypeB |
c+=TypeC
)*
'EndNS'
;
TypeA:
'TypeA' name=ID ';'
;
TypeB:
'TypeB' name=ID
'EndTypeB'
;
TypeC:
'TypeC' name=ID
'EndTypeC'
;
So if I created a file with the following text:
NS myNamespace
TypeA myA;
TypeB myB
EndTypeB
TypeC myC
EndTypeC
EndNS
And then I misspelled the TypeA keyword, then both the TypeB and TypeC entries would also fail to parse despite having a keyword centric grammar. (This problem will still occur if you remove the namespace concept, or if you make the type entries ordered) My expectation is that the TypeA entry would be null, but upon reaching the TypeB keyword, the parser would recover and add them to the AST.
My question then is, is there an issue I am missing in my current grammar and how can I structure this grammar to provide the best parser recovery ability?
I'm defining a grammar for a small language and Antlr4. The idea is in that language, there's a keyword "function" which can be used to either define a function or as a type specifier when defining parameters. I would like to be able to do something like this:
function aFunctionHere(int a, function callback) ....
However, it seems Antlr doesn't like that I use "function" in two different places. As far as I can tell, the grammar isn't even ambiguous.
In the following grammar, if I remove LINE 1, the generated parser parses the sample input without a problem. Also, if I change the token string in either LINE 2 or LINE 3, so that they are not equal, the parser works.
The error I get with the grammar as-is:
line 1:0 mismatched input 'function' expecting <INVALID>
What does "expecting <INVALID>" mean?
The (stripped down) grammar:
grammar test;
begin : function ;
function: FUNCTION IDENTIFIER '(' parameterlist? ')' ;
parameterlist: parameter (',' parameter)+ ;
parameter: BaseParamType IDENTIFIER ;
// Lexer stuff
BaseParamType:
INT_TYPE
| FUNCTION_TYPE // <---- LINE 1
;
FUNCTION : 'function'; // <---- LINE 2
INT_TYPE : 'int';
FUNCTION_TYPE : 'function'; // <---- LINE 3
IDENTIFIER : [a-zA-Z_$]+[a-zA-Z_$0-9]*;
WS : [ \t\r\n]+ -> skip ;
The input I'm using:
function abc(int c, int d, int a)
The program to test the generated parser:
from antlr4 import *
from testLexer import testLexer as Lexer
from testParser import testParser as Parser
from antlr4.tree.Trees import Trees
def main(argv):
input = FileStream(argv[1] if len(argv)>1 else "test.in")
lexer = Lexer(input)
tokens = CommonTokenStream(lexer)
parser = Parser(tokens)
tree = parser.begin()
print Trees.toStringTree(tree, None, parser)
if __name__ == '__main__':
import sys
main(sys.argv)
Just use one name for the token function.
A token is just a token. Looking at function in isolation, it is not possible to decide whether it is a FUNCTION or a FUNCTION_TYPE. Since FUNCTION, comes first in the file, that's what the lexer used. That makes it impossible to match FUNCTION_TYPE, so that becomes an invalid token type.
The parser will figure out the syntactic role of the token function. So there would be no point using two different lexical descriptors for the same token, even if it would be possible.
In the grammar in the OP, BaseParamType is also a lexical type, which will absorb all uses of the token function, preventing FUNCTION from being recognized in the production for function. Changing its name to baseParamType, which effectively changes it to a parser non-terminal, will allow the parser to work, although I suppose it may alter the parse tree in undesirable ways.
I understand the objection that the parser "should know" which lexical tokens are possible in context, given the nature of Antlr's predictive parsing strategy. I'm far from an Antlr expert so I won't pretend to explain why it doesn't seem to work, but with the majority of parser generators -- and all the ones I commonly use -- lexical analysis is effectively performed as a prior pass to parsing, so the conversion of textual input into a stream of tokens is done prior to the parser establishing context. (Most lexical generators, including Antlr, have mechanisms with which the user can build lexical context, but IMHO these mechanisms reduce grammar readability and should only be used if strictly necessary.)
Here's the grammar file which I tested:
grammar test;
begin : function ;
function: FUNCTION IDENTIFIER '(' parameterlist? ')' ;
parameterlist: parameter (',' parameter)+ ;
parameter: baseParamType IDENTIFIER ;
// Lexer stuff
baseParamType:
INT_TYPE
| FUNCTION //
;
FUNCTION : 'function';
INT_TYPE : 'int';
IDENTIFIER : [a-zA-Z_$]+[a-zA-Z_$0-9]*;
WS : [ \t\r\n]+ -> skip ;
I am having problems with my Antlr grammar. I'm trying to write a parser rule for 'typedident' which can accept the following inputs:
'int a' or 'char a'
The variable name 'a' is from my lexer rule 'IDENT' which is defined as follows:
IDENT : (('a'..'z'|'A'..'Z') | '_') (('a'..'z'|'A'..'Z')|('0'..'9')| '_')*;
My 'typedident' parser rule is as follows:
typedident : (INT|CHAR) IDENT;
INT and CHAR having been defined as tokens.
The problem I'm having is that when I test 'typedident' the variable name has to be more than one character. For example:
'int a' isn't accepted while 'int ab' is accepted.
The outputed error I get is:
"MismatchedTokenException: mismatched input 'a' expecting '$'"
Any idea why I'm getting this error? I'm fairly new to Antlr so apologies if the error is trivial.
EDIT
I literally just got it working, and I don't know why. I also had two other lexer rules defined as follows:
ALPH : ('a'..'z'|'A'..'Z');
DIGIT : ('0'..'9');
I realised these weren't being used at all so I deleted them and everything now works fine! My guess why this works is because ALPH and DIGIT were overriding my other Lexer rules:
NUMBER : ('0'..'9')+;
CHARACTER : '\'' (~('\n' | '\r' |'\'')) '\'';
Does anyone know if this is the case? I'm curious as to why this problem has now been solved.
'int a' isn't accepted while 'int ab' is accepted.
...
My guess why this works is because ALPH and DIGIT were overriding ...
Yes, it appears ALPH was defined before the IDENT rule, in which case single letters were tokenized as ALPH tokens. If IDENT was defined before ALPH, it should all go okay (in your case).
To summarize how ANTLR's lexer rules work:
lexer rules match as much characters as possible (greedy);
if 2 (or more) lexer rules match the same input, the rule defined first will "win".
You must realize that the lexer does not produce tokens based on what the parser (at that time) needs. The lexer operates independently from the parser.
I'm having some trouble creating part of my ANTLR grammar for my programming language.
I'm getting the error when the second part of the type declaration occurs:
public type
: ID ('.' ID)* ('?')? -> ^(R__Type ID ID* ('?')?)
| '(' type (',' type)* ')' ('?')? -> ^(R__Type type* ('?')?)
;
I'm trying to either match:
A line like System.String (works fine)
A tuple such as (System.String, System.Int32)
The error occurs slightly higher up the tree, and states:
[fatal] rule statement has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
What am I doing wrong?
Right, I managed to fix this a little earlier up the tree, by editing the rule that deals with variable declarations:
'my' ID (':' type '=' constant | ':' type | '=' constant) -> ^(R__VarDecl ID type? constant?)
So that it works like:
'my' ID
(
':' type ('=' constant)?
| '=' constant
) -> ^(R__VarDecl ID type? constant?)
I got the idea from the example of syntactic predicates here:
https://wincent.com/wiki/ANTLR_predicates
Luckily I didn't need a predicate in the end!