Xtext: Cross-References with Alternative - xtext

I've started my first Xtext project and I run into a problem with Cross-Reference (That's what I think is maybe the problem). I've got a DatType, InterfaceDescription rule and an Enumeration. What I want to do is to describe an interface by leting the user choose a datatype from the enumeration or define a new one.
The Enum works without a problem, but when I define a new Datatype with "datatype test1" and use it inside the InterfaceDescription, I get the following Error: 'XtextReconcilerJob' has encountered a problem. An internal error occurred during: "XtextReconcileJon". And that's the error stack: http://pastebin.com/evFki2mB
DataType:
'datatype' name=ID ('mapto' mappedType = JAVAID)?
;
Interface:
interfaceType=InterfaceType name=ID datatype=([DataType]| DataTypeEnum)
;
enum InterfaceType:
INLET = 'inlet' |
OUTLET = 'outlet'
;
DataTypeEnum:
INT8 = 'int8' | INT16 = 'int16' | INT32 = 'int32' |
DOUBLE = 'double' | SINGLE = 'single' | REAL = 'real' |
BOOLEAN = 'boolean' | CHAR = 'char'
;
When I use the DataType Cross-Reference in another Rule, it works:
ParamList:
'param:' datatype=[DataType] name=ID
;
Anyone knows what's the problem?

There are a few issues with the grammar, that together cause this strange behaviour:
DataTypeEnum, as opposed to its name is not an enum but a strange object that might represent a few string values. This hides the issue of the alternate type assignment in the interface rule in the editor.
When generating the editor, some cryptic error messages appear in the output:
error(208): ../org.xtext.example.mydsl/src-gen/org/xtext/example/mydsl/parser/antlr/internal/InternalMyDsl.g:447:1: The following token definitions can never be matched because prior tokens match the same input: RULE_ID
ebnf2 is not supported for CrossReference - this means, that an extended construct, such as the '|' pattern is not allowed when defining references
By prefixing the DataTypeEnum with the enum keyword, the datatype attribute definition becomes erroneous in the editor, as there is no type in EMF that can be both an enum and an EObject, thus the problem location becomes obvious.
Finally, the runtime error was caused by the fact, that something was missing from the generated parser/lexer tooling, and the resulting model was also incorrect.
To be more constructive, I suggest replacing the offending line by defining a TypeReference element, that can either refer to types mapped to Java or data types. I could extend your grammar in the following way:
Interface:
interfaceType=InterfaceType name=ID datatype=(TypeReference)
;
TypeReference:
JavaTypeReference | DataTypeReference
;
JavaTypeReference:
type = [DataType]
;
DataTypeReference:
type = DataTypeEnum
;
enum DataTypeEnum:
INT8 = 'int8' | INT16 = 'int16' | INT32 = 'int32' |
DOUBLE = 'double' | SINGLE = 'single' | REAL = 'real' |
BOOLEAN = 'boolean' | CHAR = 'char'
;
PS.: I suggest adding some keywords to the language to ease parsing, especially error recovery. See the following blog post for details: http://zarnekow.blogspot.hu/2012/11/xtext-corner-7-parser-error-recovery.html

Related

Antlr4: access variable name in type annotation context (access grandparent's sibling rule context)

I need to parse variable declaration using Antlr4.
variableDeclaration
: name ':' type
;
name
: Name
;
type is top-level rule for type expressions. So there are some simple types parsed via long production chain.
type
: // some expression
| lessComplexTypeExpression
;
... // many rules
verySimpleTypeExpression:
: Many
| Simple
| Types
| But
| OneSpecificType
;
Unfortunately, I need to know variable name when build OneSpecificType's application-specific node. I could pass the name through the whole stack of my custom visit methods, but name parameter will be ignored by the majority of simple types.
Which way is the best to access name.Name() from OneSpecificTypeContext? Is there some antlr4 feature to do this?

Cannot create list literal in F#

I have the following types
type StatusCode =
| OK = 200
| NoContent = 204
| MovedTemp = 301
| MovedPerm = 302
| SeeOther = 303
| NotModified = 304
| NotFound = 404
| ServerError = 500
[<Literal>]
let NoBodyAllowedStatusCodes = [StatusCode.NoContent; StatusCode.NotModified]
And I'm getting a compile-time error that says:
This is not a valid constant expression or custom attribute value
I can't really figure out what's wrong here.
In F#, and .NET in general, lists cannot be literals (constant in C#/VB.NET). Only primitive values can, like string, bool, etc. The F# 3.0 specification has the guidelines on what can or cannot be a literal in section 10.2.2:
A value that has the Literal attribute is subject to the following restrictions:
It may not be marked mutable or inline.
It may not also have the ThreadStatic or ContextStatic attributes.
The right-hand side expression must be a literal constant expression that is made up of either:
A simple constant expression, with the exception of (), native integer literals, unsigned native integer literals, byte array literals, BigInteger literals, and user-defined numeric literals.
—OR—
A reference to another literal.
Depending on what you are trying to do, you could make your list static if the let binding is being used in a class. If it is in a module, I'd just remove the Literal attribute since let bindings are immutable by default, anyway.

How to resolve Xtext variables' names and keywords statically?

I have a grammar describing an assembler dialect. In code section programmer can refer to registers from a certain list and to defined variables. Also I have a rule matching both [reg0++413] and [myVariable++413]:
BinaryBiasInsideFetchOperation:
'['
v = (Register|[IntegerVariableDeclaration]) ( gbo = GetBiasOperation val = (Register|IntValue|HexValue) )?
']'
;
But when I try to compile it, Xtext throws a warning:
Decision can match input such as "'[' '++' 'reg0' ']'" using multiple alternatives: 2, 3. As a result, alternative(s) 3 were disabled for that input
Spliting the rules I've noticed, that
BinaryBiasInsideFetchOperation:
'['
v = Register ( gbo = GetBiasOperation val = (Register|IntValue|HexValue) )?
']'
;
BinaryBiasInsideFetchOperation:
'['
v = [IntegerVariableDeclaration] ( gbo = GetBiasOperation val = (Register|IntValue|HexValue) )?
']'
;
work well separately, but not at the same time. When I try to compile both of them, XText writes a number of errors saying that registers from list could be processed ambiguously. So:
1) Am I right, that part of rule v = (Register|[IntegerVariableDeclaration]) matches any IntegerVariable name including empty, but rule v = [IntegerVariableDeclaration] matches only nonempty names?
2) Is it correct that when I try to compile separate rules together Xtext thinks that [IntegerVariableDeclaration] can concur with Register?
3) How to resolve this ambiguity?
edit: definitors
Register:
areg = ('reg0' | 'reg1' | 'reg2' | 'reg3' | 'reg4' | 'reg5' | 'reg6' | 'reg7' )
;
IntegerVariableDeclaration:
section = SectionServiceWord? name=ID ':' type = IntegerType ('[' size = IntValue ']')? ( value = IntegerVariableDefinition )? ';'
;
ID is a standart terminal which parses a single word, a.k.a identifier
No, (Register|[IntegerVariableDeclaration]) can't match Empty. Actually, [IntegerVariableDeclaration] is the same than [IntegerVariableDeclaration|ID], it is matching ID rule.
Yes, i think you can't split your rules.
I can't reproduce your problem (i need full grammar), but, in order to solve your problem you should look at this article about xtext grammar debugging:
Compile grammar in debug mode by adding the following line into your workflow.mwe2
fragment = org.eclipse.xtext.generator.parser.antlr.DebugAntlrGeneratorFragment {}
Open generated antrl debug grammar with AntlrWorks and check the diagram.
In addition to Fabien's answer, I'd like to add that an omnimatching rule like
AnyId:
name = ID
;
instead of
(Register|[IntegerVariableDeclaration])
solves the problem. One need to dynamically check if AnyId.name is a Regiser, Variable or something else like Constant.

Parsing of optionals with PEG (Grako) falling short?

My colleague PaulS asked me the following:
I'm writing a parser for an existing language (SystemVerilog - an IEEE standard), and the specification has a rule in it that is similar in structure to this:
cover_point
=
[[data_type] identifier ':' ] 'coverpoint' identifier ';'
;
data_type
=
'int' | 'float' | identifier
;
identifier
=
?/\w+/?
;
The problem is that when parsing the following legal string:
anIdentifier: coverpoint another_identifier;
anIdentifier matches with data_type (via its identifier option) successfully, which means Grako is looking for another identifier after it and then fails. It doesn't then try to parse without the data_type part.
I can re-write the rule as follows,
cover_point_rewrite
=
[data_type identifier ':' | identifier ':' ] 'coverpoint' identifier ';'
;
but I wonder if:
this is intentional and
if there's a better syntax?
Is this a PEG-in-general issue, or a tool (Grako) one?
It says here that in PEGs the choice operator is ordered to avoid CFGs ambiguities by using the first match.
In your first example [data_type] succeeds parsing id, so it fails when it finds : instead of another identifier.
That may be because [data_type] behaves like (data_type | ε) so it will always parse data_type with the first id.
In [data_type identifier ':' | identifier ':' ] the first choice fails when there is no second id, so the parser backtracks and tries with the second choice.

Let bindings support punctuation enclosed in double backticks, but types don't?

Using F# in Visual Studio 2012, this code compiles:
let ``foo.bar`` = 5
But this code does not:
type ``foo.bar`` = class end
Invalid namespace, module, type or union case name
According to section 3.4 of the F# language specification:
Any sequence of characters that is enclosed in double-backtick marks (````),
excluding newlines, tabs, and double-backtick pairs themselves, is treated
as an identifier.
token ident =
| ident-text
| `` [^ '\n' '\r' '\t']+ | [^ '\n' '\r' '\t'] ``
Section 5 defines type as:
type :=
( type )
type -> type -- function type
type * ... * type -- tuple type
typar -- variable type
long-ident -- named type, such as int
long-ident<types> -- named type, such as list<int>
long-ident< > -- named type, such as IEnumerable< >
type long-ident -- named type, such as int list
type[ , ... , ] -- array type
type lazy -- lazy type
type typar-defns -- type with constraints
typar :> type -- variable type with subtype constraint
#type -- anonymous type with subtype constraint
... and Section 4.2 defines long-ident as:
long-ident := ident '.' ... '.' ident
As far as I can tell from the spec, types are named with long-idents, and long-idents can be idents. Since idents support double-backtick-quoted punctuation, it therefore seems like types should too.
So am I misreading the spec? Or is this a compiler bug?
It definitely looks like the specification is not synchronized with the actual implementation, so there is a bug on one side or the other.
When you use identifier in double backticks, the compiler treats it as a name and simply generates type (or member) with the name you specified in backticks. It does not do any name mangling to make sure that the identifier is valid type/member name.
This means that it is not too surprising that you cannot use identifiers that would clash with some standard meaning in the compiled code. In your example, it is dot, but here are a few other examples:
type ``Foo.Bar``() = // Dot is not allowed because it represents namespace
member x.Bar = 0
type ``Foo`1``() = // Single backtick is used to compile generic types
member x.Bar = 0
type ``Foo+Bar``() = // + is used in the name of a nested type
member x.Bar = 0
The above examples are not allowed as type names (because they clash with some standard meaning), but you can use them in let-bindings, because there are no such restrictions on variable names:
let ``foo`1`` = 0
let ``foo.bar`` = 2
let ``foo+bar`` = 1
This is definitely something that should be explained in the documentation & the specification, but I hope this helps to clarify what is going on.

Resources