Extending the Xbase type system

Extending the Xbase type system - xtext

I have this simple DSL, inspired from the mini-java example but based on XBase.
See below to take a quick look at my grammar.
Package returns Package:
{Package}
'package'
name = QualifiedName
(importSection = XImportSection)?
(classifiers += Classifier)*
;
Classifier returns Classifier :
Class
| DataType
| Enum
;
Class returns Class:
{Class}
((abstract?='abstract'? 'class') | interface?= 'interface') name = ID
('<' typeParameters+=JvmTypeParameter (','
typeParameters+=JvmTypeParameter)* '>')?
('extends' superType=JvmParameterizedTypeReference)?
'{'
(members+=Member)*
'}'
; ...
My question is as follows:
How can I extend the XBase type system in order to recognize type conformance between a super-class and a sub-class defined with this simple DSL?!
I've spent a couple of days looking for examples out there, but I couldn't put my hands on one clear example.
Thanks you in advance for any hint, help!
Cheers,

you can have a look what xtend does about that e.g.
https://github.com/eclipse/xtext-xtend/blob/7ffa1888e0e8b2f1e960bcfd92b2cf4c74babcf1/org.eclipse.xtend.core/src/org/eclipse/xtend/core/validation/XtendValidator.java

Related

Cross Reference fom 2 different DSL

How everbody,
I have an intriging scenario with Xtext and I am out of ideas, so I like to ask you.
I am actually using cross references from two different DSLs in my project but I can't figure of how to deal with following scenario,.
DSL1:
grammar com.test.DSL1 with org.eclipse.xtext.common.Terminals
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate DSL1 "http://test.com/DSL1"
Model:
(elements+=AbstractElement)*;
QualifiedName:
ID ('.' ID)*;
QualifiedNameWithWildcard:
QualifiedName '.*'?;
AbstractElement:
Base;
Base:
'base' name=ID
'something' '=' (something=STRING)
DSL2
grammar com.test.DSL2 with org.eclipse.xtext.common.Terminals
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate DSL2 "http://test.com/DSL2"
import "http://test.com/DSL1" as dsl1
Model:
(elements+=OtherElement)*;
QualifiedName:
ID ('.' ID)*;
QualifiedNameWithWildcard:
QualifiedName '.*'?;
OtherElement:
Ceiling;
Ceiling:
'ceiling' name=ID
'otherthing' '=' (otherthing=STRING)
Plan:
'plan' name=ID
'element' element=[dsl1::Base|Ceiling]
As you might guess
'element' element=[dsl1::Base|Ceiling]
is not working.
If the Base and Ceiling would be in the same DSL, I would do the following and it will work..
AbstractBaseCeiling:
Base | Ceiling;
Plan:
'plan' name=ID
'element' element=[AbstractBaseCeiling]
But
AbstractBaseCeiling:
dsl1::Base| Ceiling;
Plan:
'plan' name=ID
'element' element=[AbstractBaseCeiling]
is also not working...
Don't understand me wrong my cross reference Setup is working because if I do the following everything works fine..
Plan:
'plan' name=ID
'element' element=[dsl1::Base]
But I could not figure out a way to use another Rule from another DSL and element can be either "dsl1::Base" or DSL2 Ceiling.
What am trying to do, is it possible? If yes, how?
Thx for answers....

I think there is two way to go around this:
Either you want to share a grammar rule between DSL1 and DSL2, in that case see grammar mixins
Or you want to reference from DSL2 an element defined using DSL1, in that case you need to setup an import mechanism (see e.g. a tutorial here but I'm sure there are others somewhere in the documentation)

Validating unique names for strings and optional reference

New to XText, I am struggling with two issues with the following MWE grammar.
Metamodel:
(classes += Type)*
;
Type:
Enumeration | Class
;
Enumeration:
'enumeration' name = ValidID '{' (literals += EnumLiteral ';')+ '}'
;
EnumLiteral:
ValidID
;
Class:
'class' name = ValidID '{'
(references += Reference)*
'}'
;
Reference:
'reference' name = ValidID ':' type = Class ('#' opposite = [Reference])?
;
So my questions are:
Since the enumeration literals list is ValidID, it seems to be represented by EStrings. The documentation does not seem to deal with the case of primitive types in ECore. How is it possible to check for non-duplicates in literals, and report it adequately in the editor (i.e., the error should be at the first occurence of a repeated literal)?
Despite my best efforts, I was unable to write a custom scope for the opposite reference. Since XText uses reflection for retrieving the scoping methods, I suspect I don't have the correct one: I tried def scope_Reference_opposite(Reference context, EReference r), is it correct? An example would be really appreciated, from which I am confident I can easily adapt to my "real" DSL.
Thanks a lot for the help, you will save me a lot of time looking again and again for a solution in documentation...

Errors can be attached to a certain index of a many-values feature. Write a validation for the type Enumeration and check the the list of literals for duplicates. Attach the error to the index in the list.
The signature is correct. Did you import the correct 'Reference' or did you use some other class with the same simple name by accident. Also please not that your grammar appears to be wrong for the type of the reference. This should be type=[Class] or more likely type=[Class|ValidID].
If you plan to use or do already use Xbase, things may look different. Xbase doesn't use the reflective scope provider.

Xtext Grammar: "The following alternatives can never be matched"

I am running into a problem with ambiguity in a rather complicated grammar I have been building up. It's too complex to post here, so I've reduced my problem down to aid comprehension.
I am getting the following error:
error(201): ../org.xtext.example.mydsl.ui/src-gen/org/xtext/example/mydsl/ui/contentassist/antlr/internal/InternalMyDsl.g:398:1: The following alternatives can never be matched: 2
From this grammar:
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
(contents+=ModelMember)*;
ModelMember:
Field | Assignment | Static | Class
;
Static:
"static" type=TypeDef name=ID
;
Class:
"class" name=ID "{"
(fields+=Field)*
"}"
;
Field:
"var" type=TypeDef name=ID
;
TypeDef:
{Primtive} ("String" | "int") |
{Object} clazz=[Class]
;
Reference:
(
{StaticField} static=[Static] (withDiamond?="<>")?
|
{DynamicField} field=[Field]
)
;
ObjectReference:
reference=Reference ({ObjectReference.target=current} '.' reference=Reference)*
;
Assignment:
field=ObjectReference "=" value=ObjectReference
;
I know the problem relates to Reference, which is struggling with the ambiguity of which rule to chose.
I can get it to compile with the following grammar change, but this allows syntax that I deem to be illegal:
Reference:
ref=[RefType] (withDiamond?="<>")?
;
RefType:
Static|Field
;
Where my use-case is:
static String a
class Person {
String name
}
Person paul
// This should be legal
paul.name = a<>;
// This should be illegal, diamond not vaild against non-static vars
paul.name = paul.name<>;
// This sohuld be legal
paul.name = paul.name

Your second grammar is the way to go. The fact that diamond is only legal for static variables can be handled in your language's validator.
Generally, make your grammar loose and your validation strict. That makes your grammar easier to maintain. It also gives your users better error messages ("Diamand is not allowed for non-static vars" instead of "Invalid input '<'")

Antlr mismatched '>' for include macro

I started to work with antlr a few days ago. I'd like to use it to parse #include macros in c. Only includes are to my interest, all other parts are irrelevant. here i wrote a simple grammar file:
... parser part omitted...
INCLUDE : '#include';
INCLUDE_FILE_QUOTE: '"'FILE_NAME'"';
INCLUDE_FILE_ANGLE: '<'FILE_NAME'>';
fragment
FILE_NAME: ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.'|' ')+;
MACROS: '#'('if' | 'ifdef' | 'define' | 'endif' | 'undef' | 'elif' | 'else' );
//MACROS: '#'('a'..'z'|'A'..'Z')+;
OPERATORS: ('+'|'-'|'*'|'/'|'='|'=='|'!='|'>'|'>='|'<'|'<='|'>>'|'<<'|'<<<'|'|'|'&'|','|';'|'.'|'->'|'#');
... other supporting tokens like ID, WS and COMMENT ...
This grammar produces ambiguity when such statement are encountered:
(;i<listLength;i++)
output: mismatched character ';' expecting '>'
Seems it's trying to match INCLUDE_FILE_ANGLE instead of treating the ";" as OPERATORS.
I heard there's an operator called syntactic predicate, but im not sure how to properly use it in this case.
How can i solve this problem in an Antlr encouraged way?

Looks like there's not lots of activity about antlr here.
Anyway i figured this out.
INCLUDE_MACRO: ('#include')=>'#include';
VERSION_MACRO: ('#version')=>'#version';
OTHER_MACRO:
(
|('#if')=>'#if'
|('#ifndef')=>'#ifndef'
|('#ifdef')=>'#ifdef'
|('#else')=>'#else'
|('#elif')=>'#elif'
|('#endif')=>'#endif'
);
This only solves first half of the problem. Secondly, one cannot use the INCLUDE_FILE_ANGLE to match the desired string in the #include directive.
The '<'FILE_NAME'>' stuffs creates ambiguity and must be broken down to basic tokens from lexer or use more advanced context-aware checks. Im not familiar with the later technique, So i wrote this in the parser rule:
include_statement :
INCLUDE_MACRO include_file
-> ^(INCLUDE_MACRO include_file);
include_file
: STRING
| LEFT_ANGLE(INT|ID|OPERATORS)+RIGHT_ANGLE
;
Though this works , but it admittedly looks ugly.
I hope experienced users can comment with much better solution.

Parsing with incomplete grammars

Are there any common solutions how to use incomplete grammars? In my case I just want to detect methods in Delphi (Pascal)-files, that means procedures and functions. The following first attempt is working
methods
: ( procedure | function | . )+
;
but is that a solution at all? Are there any better solutions? Is it possible to stop parsing with an action (e. g. after detecting implementation). Does it make sense to use a preprocessor? And when yes - how?

If you're only looking for names, then something as simple as this:
grammar PascalFuncProc;
parse
: (Procedure | Function)* EOF
;
Procedure
: 'procedure' Spaces Identifier
;
Function
: 'function' Spaces Identifier
;
Ignore
: (StrLiteral | Comment | .) {skip();}
;
fragment Spaces : (' ' | '\t' | '\r' | '\n')+;
fragment Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*;
fragment StrLiteral : '\'' ~'\''* '\'';
fragment Comment : '{' ~'}'* '}';
will do the trick. Note that I am not very familiar with Delhpi/Pascal, so I am surely goofing up StrLiterals and/or Comments, but that'll be easily fixed.
The lexer generated from the grammar above will only produce two type of tokens (Procedures and Functions), the rest of the input (string literals, comments or if nothing is matched, a single character: the .) is being discarded from the lexer immediately (the skip() method).
For input like this:
some valid source
{
function NotAFunction ...
}
procedure Proc
Begin
...
End;
procedure Func
Begin
s = 'function NotAFunction!!!'
End;
the following parse tree is created:

What you asking about are called island grammars. The notion is that you define a parser for the part of the language you care about (the "island") with all the classic tokenization needed for that part, and that you define an extremely sloppy parser to skip the rest (the "ocean" in which the island is embedded). One common trick to doing this is to define correspondingly sloppy lexers, that pick up vast amounts of stuff (to skip past HTML to embedded code, you can try to skip past anything that doesn't look like a script tag in the lexer, for example).
The ANTLR site even discusses some related issues but notably says there are examples included with ANTLR. I have no experience with ANTLR so I don't know how useful this specific information is.
Having built many tools that use parsers to analyze/transform code (check my bio) I'm a bit of a pessimist about the general utility of island grammmars. Unless your goal is to do something pretty trivial with the parsed-island, you will need to collect the meaning of all identifiers it uses directly or indirectly... and most of them are unfortunately for you defined in the ocean. So IMHO you pretty much have to parse the ocean too to get past trivial tasks. You'll have other troubles, too, making sure you really skip the island stuff; this pretty much means your ocean lexer has know about whitespace, comment, and all the picky syntax of character strings (this is harder than it looks with modern languages) so that these get properly skipped over. YMMV.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart