New to XText, I am struggling with two issues with the following MWE grammar.
Metamodel:
(classes += Type)*
;
Type:
Enumeration | Class
;
Enumeration:
'enumeration' name = ValidID '{' (literals += EnumLiteral ';')+ '}'
;
EnumLiteral:
ValidID
;
Class:
'class' name = ValidID '{'
(references += Reference)*
'}'
;
Reference:
'reference' name = ValidID ':' type = Class ('#' opposite = [Reference])?
;
So my questions are:
Since the enumeration literals list is ValidID, it seems to be represented by EStrings. The documentation does not seem to deal with the case of primitive types in ECore. How is it possible to check for non-duplicates in literals, and report it adequately in the editor (i.e., the error should be at the first occurence of a repeated literal)?
Despite my best efforts, I was unable to write a custom scope for the opposite reference. Since XText uses reflection for retrieving the scoping methods, I suspect I don't have the correct one: I tried def scope_Reference_opposite(Reference context, EReference r), is it correct? An example would be really appreciated, from which I am confident I can easily adapt to my "real" DSL.
Thanks a lot for the help, you will save me a lot of time looking again and again for a solution in documentation...
Errors can be attached to a certain index of a many-values feature. Write a validation for the type Enumeration and check the the list of literals for duplicates. Attach the error to the index in the list.
The signature is correct. Did you import the correct 'Reference' or did you use some other class with the same simple name by accident. Also please not that your grammar appears to be wrong for the type of the reference. This should be type=[Class] or more likely type=[Class|ValidID].
If you plan to use or do already use Xbase, things may look different. Xbase doesn't use the reflective scope provider.
Related
Could someone help me with using context free grammars. Up until now I've used regular expressions to remove comments, block comments and empty lines from a string so that it can be used to count the PLOC. This seems to be extremely slow so I was looking for a different more efficient method.
I saw the following post: What is the best way to ignore comments in a java file with Rascal?
I have no idea how to use this, the help doesn't get me far as well. When I try to define the line used in the post I immediately get an error.
lexical SingleLineComment = "//" ~[\n] "\n";
Could someone help me out with this and also explain a bit about how to setup such a context free grammar and then to actually extract the wanted data?
Kind regards,
Bob
First this will help: the ~ in Rascal CFG notation is not in the language, the negation of a character class is written like so: ![\n].
To use a context-free grammar in Rascal goes in three steps:
write it, like for example the syntax definition of the Func language here: http://docs.rascal-mpl.org/unstable/Recipes/#Languages-Func
Use it to parse input, like so:
// This is the basic parse command, but be careful it will not accept spaces and newlines before and after the TopNonTerminal text:
Prog myParseTree = parse(#Prog, "example string");
// you can do the same directly to an input file:
Prog myParseTree = parse(#TopNonTerminal, |home:///myProgram.func|);
// if you need to accept layout before and after the program, use a "start nonterminal":
start[Prog] myParseTree = parse(#start[TopNonTerminal], |home:///myProgram.func|);
Prog myProgram = myParseTree.top;
// shorthand for parsing stuff:
myProgram = [Prog] "example";
myProgram = [Prog] |home:///myLocation.txt|;
Once you have the tree you can start using visit and / deepmatch to extract information from the tree, or write recursive functions if you like. Examples can be found here: http://docs.rascal-mpl.org/unstable/Recipes/#Languages-Func , but here are some common idioms as well to extract information from a parse tree:
// produces the source location of each node in the tree:
myParseTree#\loc
// produces a set of all nodes of type Stat
{ s | /Stat s := myParseTree }
// pattern match an if-then-else and bind the three expressions and collect them in a set:
{ e1, e2, e3 | (Stat) `if <Exp e1> then <Exp e2> else <Exp e3> end` <- myExpressionList }
// collect all locations of all sub-trees (every parse tree is of a non-terminal type, which is a sub-type of Tree. It uses |unknown:///| for small sub-trees which have not been annotated for efficiency's sake, like literals and character classes:
[ t#\loc?|unknown:///| | /Tree t := myParseTree ]
That should give you a start. I'd go try out some stuff and look at more examples. Writing a grammar is a nice thing to do, but it does require some trial and error methods like writing a regex, but even more so.
For the grammar you might be writing, which finds source code comments but leaves the rest as "any character" you will need to use the longest match disambiguation a lot:
lexical Identifier = [a-z]+ !>> [a-z]; // means do not accept an Identifier if there is still [a-z] to add to it; so only the longest possible Identifier will match.
This kind of context-free grammar is called an "Island Grammar" metaphorically, because you will write precise rules for the parts you want to recognize (the comments are "Islands") while leaving the rest as everything else (the rest is "Water"). See https://dl.acm.org/citation.cfm?id=837160
I've defined a polymorphic binary relation type (a class) in Dafny:
class binRel<S,T>
The actual declaration is:
class binRel<S(!new,==),T(!new,==)>.
I'd like to add a new type constraint: that types S and T should implement a "show" operation (returning a string).
My reading of the Dafny Reference Manual suggests Dafny supports only a few built-in type constraints: ==, and evidently !new, and that there's no way to require that type support, e.g., some particular trait.
Perhaps I'm wrong and that updates more recent than the reference manual have provided such capabilities. Am I in luck? If not, is there perhaps a work-around?
Correct, there are only a few built in type constraints in Dafny. There is no mechanism to require that a type extend a trait.
I'm not aware of a good work around for the object-oriented/imperative fragment of Dafny. In the pure fragment, you could work around this using first-class functions.
datatype MyPair<A,B> = MakePair(a: A, b: B)
type Show<!A> = A -> string
function ShowMyPair<A,B>(sa: Show<A>, sb: Show<B>): Show<MyPair<A,B>>
{
(p: MyPair<A,B>) => "(" + sa(p.a) + "," + sb(p.b) + ")"
}
This syntax module is syntactically valid:
module mod1
syntax Empty =
;
And so is this one, which should be an equivalent grammar to the previous one:
module mod2
syntax Empty =
( )
;
(The resulting parser accepts only empty strings.)
Which means that you can make grammars such as this one:
module mod3
syntax EmptyOrKitchen =
( ) | "kitchen"
;
But, the following is not allowed (nested parenthesis):
module mod4
syntax Empty =
(( ))
;
I would have guessed that redundant parenthesis are allowed, since they are allowed in things like expressions, e.g. ((2)) + 2.
This problem came up when working with the data types for internal representation of rascal syntax definitions. The following code will create the same module as in the last example, namely mod4 (modulo some whitespace):
import Grammar;
import lang::rascal::format::Grammar;
str sm1 = definition2rascal(\definition("unknown_main",("the-module":\module("unknown",{},{},grammar({sort("Empty")},(sort("Empty"):prod(sort("Empty"),[
alt({seq([])})
],{})))))));
The problematic part of the data is on its own line - alt({seq([])}). If this code is changed to seq([]), then you get the same syntax module as mod2. If you further delete this whole expression, i.e. so that you get this:
str sm3 =
definition2rascal(\definition("unknown_main",("the-module":\module("unknown",{},{},grammar({sort("Empty")},(sort("Empty"):prod(sort("Empty"),[
], {})))))));
Then you get mod1.
So should such redundant parenthesis by printed by the definition2rascal(...) function? And should it matter with regards to making the resulting module valid or not?
Why they are not allowed is basically we wanted to see if we could do without. There is currently no priority relation between the symbol kinds, so in general there is no need to have a bracket syntax (like you do need to + and * in expressions).
Already the brackets have two different semantics, one () being the epsilon symbol and two (Sym1 Sym2 ...) being a nested sequence. This nested sequence is defined (syntactically) to expect at least two symbols. Now we could without ambiguity introduce a third semantics for the brackets with a single symbol or relax the requirement for sequence... But we reckoned it would be confusing that in one case you would get an extra layer in the resulting parse tree (sequence), while in the other case you would not (ignored superfluous bracket).
More detailed wise, the problem of printing seq([]) is not so much a problem of the meta syntax but rather that the backing abstract notation is more relaxed than the concrete notation (i.e. it is a bigger language or an over-approximation). The parser generator will generate a working parser for seq([]). But, there is no Rascal notation for an empty sequence and I guess the pretty printer should throw an exception.
I am trying to understand a xtext grammar I have found (below). I have two questions:
The XFeatureCall has return Type XExpression but this is overruled by {XFeatureCall} so I could set "returns XFeatureCall" as well?. Or is it actually necessary to do it this way?
Line 8 and 14 start with "=>". Are these "chosen predicates" or something else that did not come to my attention so far? I could not find this variation of chosen predicates in the xtext documentation. So I would appreciate clarification in its application.
xtext grammar:
StaticEquals:':=';
XFeatureCall returns XExpression:
// Same as Xbase...
{XFeatureCall}
(declaringType=[JvmDeclaredType|StaticQualifier])?
('<' typeArguments+=JvmArgumentTypeReference (',' typeArguments+=JvmArgumentTypeReference)* '>')?
(feature=[JvmIdentifiableElement|IdOrSuper]|'class')
(=>explicitOperationCall?='('
(
featureCallArguments+=XShortClosure
| featureCallArguments+=XExpression (',' featureCallArguments+=XExpression)*
)?
')')?
=>featureCallArguments+=XClosure?
// ... Except with this additional optional clause that allows static members to be set with := operator
({XAssignment.assignable = current} StaticEquals value = XAssignment)?;
First question: In fact in this case your rule returns a XFeatureCall but XFeatureCall has XExpression as its supertype. It is useful for example if you have:
SomeRule: (parts+=XFeatureCall)* (parts+=XOtherFeatureCall)*
Let XOtherFeatureCall also extend XExpression, and parts be a list of XExpressions.
Second question: it is a priority operator and means that what follows should be parsed now, even if there are other parsing solutions. See this classic example:
if a
if b
do;
else
doelse;
else could be parsed for the inner if or the outer if. Of course we want it in the inner if. Setting a rule such as:
=>'else' else=ElseExpression
forces the grammar to parse the else as soon as it finds it instead of returning to the outer rule that could consume a else too. So it solves an ambiguity.
In grako one can use the following name:e to add the result of e to the AST using name as key. For example
var_def
=
var+:ID {',' var+:ID}*
What would be a good translation of this to Xtext?
I tried
var_def:
var=ID (',' var=ID)*;
which is not failing, but is raising the following warning
Multiple markers at this line
- The possibly assigned value of feature 'var' may be overridden
by subsequent assignments.
- This assignment will override the possibly assigned value of
feature 'var'.
I think I am trying to mimic the name behavior, but do not have much success.
With your solution only the last ID will be available in the AST. I assume var should be a multi-valued feature holding all IDs, not only the last one. This can be expressed as
var_def:
var+=ID (',' var+=ID)*;
In the resulting AST var is a list of IDs.