Generate custom terminate rule on Xtext - xtext

I'm tring to add a custom terminal rule that allows to set a value that contains any letter (uppercase or lowercase) and optionally '/' , all the rules that I tried didn't work , I'm totally new on Xtext but I think that my custom rules interferes with ID rule
terminal MYVALUE:
('a'..'z'|'A'..'Z'|'/')+;
With this rule I get an error on the filed ID value that I use
mismatched input 'David' expecting RULE_ID

Related

Lexer rule optional suffix not matching, when it should match

Using ANTLR 3, my lexer has rule
SELECT_ASSIGN:
'SELECT' WS+ IDENTIFIER WS+ 'ASSIGN' WS+ (('TO'|'USING') WS+)?
using this these match correctly
SELECT VAR1 ASSIGN TO
SELECT VAR1 ASSIGN USING
and this also matches
SELECT VAR1 ASSIGN FOO
However this does not match
SELECT VAR1 ASSIGN TWO
Whereas I have marked TO|USING as optional in the rule.
From generated Java code I see...
When lexer notices T of TWO, it goes to match('TO')
but since does not find O after T
then generates failure.... and returns all the way from the rule -- hence not matching it.
How do I get my lexer rule to match, when input has word with chars starting with suffixed optional part of the rule
Basically I want my rule to match this also (beside what it already matches - as lised at the start):
SELECT VAR1 ASSIGN TWO
Kindly suggest how I approach/resolve this situation.
NOTE:
Such rules are recommended in the parser - But I have this in lexer - because I do not want to parse the entire input by the parser, and want to parse only content of interest. So using such rules in lexer, I locate sections which I really want to parse by the parser.
UPDATE 1
I could circumvent this problem by making 2 rules, like so:
SELECT_ASSIGN_USING_TO
: tok='SELECT' WS+ name=IDENTIFIER WS+ 'ASSIGN' WS+ ('USING'|'TO')
SELECT_ASSIGN
: tok='SELECT' WS+ name=IDENTIFIER WS+ 'ASSIGN'
But is it possible to do the desired in one lexer rule?
An approach to get this in one rule, suggested by my senior - use syntactic predicate
SELECT_ASSIGN
: tok='SELECT' WS+ name=IDENTIFIER WS+ 'ASSIGN'
(
(WS+ ('TO'|'USING') WS+)=> (WS+ ('TO'|'USING') WS+)
| (WS+)
)
Tokens match a complete char sequence or none. It cannot match partially and the grammar rule determines which exactly. You cannot expect a rule for TO to match TWO. If you want TWO to match too you have to add it to your lexer rule.
A few notes here:
The solution your "senior" gave you makes no sense at all. A
syntactic predicate is a kinda lookahead to guide the parser in case
of ambiquities. There are no ambiquities involved here.
Writing
the entire SELECT_ASSIGN rule as a lexer rule is very uncommon and
not flexible. A lexer rule should not be used for entire sentences,
but only for a small set of characters to find tokens to assign them
a type (usually elementary structures of a language like string,
number, comment etc.).
ANTLR3 is totally outdated and I wonder why this is still used in your class. ANTLR4 is out since 5 years and should be the choice for any new project.

AntlrWorks 2 output

So i am using Antlrworks 2, working on a rather large grammar. Problem is that in this grammar there are multiple ambiguities that i am trying to work through.
I was wondering if there is a way to interpret which rules were invoked when there was a failure.
For instance, when i run my rule i get following output
[#0,0:1='99',<20>,1:0]
[#1,2:1='<EOF>',<-1>,1:2]
line 1:0 mismatched input '99' expecting Digit2
(dummy 99)
I am wondering what [#0,0:1='99',<20>,1:0] means. Do the #0 or <20> have any relationship to the rule number in my grammar or something ?
Here is a breakdown of the default token formatting.
[#{TokenIndex},{StartIndex}:{StopIndex}={Text},<{TokenType}>,{Line}:{Column}]
The {TokenType} field generally corresponds to a particular lexer rule (the constant will be declared in your generated lexer). However, the -> type(X) command can be used in any lexer rule to reassign tokens produced by that rule to another type. If the value 20 is assigned to the token named Foo, then the first token in your listing was produced by either a lexer rule named Foo or a lexer rule containing the action -> type(foo) or you have a user-defined action which explicitly assigns the type Foo to a token produced by some other rule (this will be code you wrote, not code generated by ANTLR).

JvmFormalParameter rule ambigouous?

I have a simple little grammar which keeps giving a multiple alternatives error when I try to generate Xtext artefacts.
The grammar is:
grammar org.xtext.example.hyrule.HyRule with org.eclipse.xtext.xbase.Xbase
generate hyRule (You can only use links to eclipse.org sites while you have fewer than 25 messages )
Start:
rules+=Rule+
;
Rule:
'FOR''PAYLOAD'payload=PAYLOAD'ELEMENTS' elements+=JvmFormalParameter+'CONSTRAINED' 'BY' expressions+= XExpression*;
PAYLOAD:
"Stacons"|"PFResults"|"any"
;
And the exact error I get is:
![warning(200): ../org.xtext.example.hyrule/src-gen/org/xtext/example/hyrule/parser/antlr/internal/InternalHyRule.g:3197:2: Decision can match input such as "{RULE_ID, '=>', '('}" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
error(201): ../org.xtext.example.hyrule/src-gen/org/xtext/example/hyrule/parser/antlr/internal/InternalHyRule.g:3197:2: The following alternatives can never be matched: 2][1]
I have attached the Syntax diagram for the generated antlr grammar in antlrworks, and can clearly see the multiple alternatives(JvmFormalParameter can match RULE_ID via the JvmTypeReference or the ValidID rule).
So it looks as if JvmFormalParameter is ambiguous...Apologies for my stupidity but could someone point out what it is I'm missing? Is there some way of overcoming this ambiguity when using the JvmFormalParameter rule in my grammar?
The rule JvmFormalParameter is defined as
JvmFormalParameter returns types::JvmFormalParameter:
(parameterType=JvmTypeReference)? name=ValidID;
so the type of the parameter is optional. If you use elements+=JvmFormalParameter+, you allow multiple parameters without a delimiter thus the parser cannot decide about the input sequence
String s
since both String and s could be names of two parameters or String s could be a single parameter with a type String and the name s. You should use a delimiter like
elements+=JvmFormalParameter (',' elements+=JvmFormalParameter)*
or use the rule FullJvmFormalParameter which is defined with a mandatory type reference:
FullJvmFormalParameter returns types::JvmFormalParameter:
parameterType=JvmTypeReference name=ValidID;

How can I create cross-reference when a sigil is required for referring but not for defining?

In my DSL, I have something similar to:
x = 14
y = $x + 1
So an element is defined with just its name, but when referred to, some sigil must be added. Any whitespace between the sigil and the name is forbidden when referencing the element.
How can I do this in Xtext, while still allowing cross-reference between these elements?
Because it seems to me that I either have to use two different terminals for this - one to match x and the other to match $x - but then how would the cross-reference mechanism associate them together? Or alternatively, if I define:
ElementRef: '$' [Element|ELEMENT_NAME];
then Xtext will allow whitespace between the sigil and the name, which is illegal in my DSL. I guess an option such as "do not accept whitespace at this point" would be great, but I could not find anything in the Xtext documentation about something like that.
You have to use a datatype rule for the cross-reference token and register a value converter that strips the $ sign.
ElementRef: [Element|ReferenceID];
ReferenceID hidden(): '$' ID;
The value converter is responsible for the conversion between the abstract syntax (the ID) and the concrete syntax ($ID) for your tokens. Please refer to the docs for details.

how to return something when there is not match in flex(lexer)

I am using flex(the lexer) to do some lexical analysis.
What I need is:
If none of the rules are matched, then a value is returned to indicate such thing has happened.
This is like the default syntax in the switch control flow structure in many programming language.
Is there a way to do such kind of stuff?
EDIT 1:
Reference from the official doc
If no match is found, then the default rule is executed:
the next character in the input is considered matched and copied to the standard output.
But how can I change the default rule?
In acacia-lex it is done in the following way:
Lexer has run method:
#Override
public void run() {
Token token;
while ((token = this.findNext()).isFound()) {
System.out.println("LEXER RES = " + token.toString());
}
}
When nothing is found, there is no default rule. Lexer method run just completed its job.
To continue lexing, at the end of tokens specification is needed token "DOT" -> ".". So if no other tokens match, DOT will match and Lexer run will continue its job.
The default rule only applies if no other rule matches. So you can simply insert your own rule which matches any single character as the last rule:
.|\n { /* Your default action. */ }
It must go at the end because (F)lex will give priority to earlier rules in the file which have the same match. You need to explicitly mention \n (unless you are certain that some other rule will match it) because in (F)lex, . matches any character except a newline.
If you are using Flex, and you don't want the default rule to ever be used, it is advisable to put
%option nodefault
into your prologue. That will suppress the default rule and produce a warning if there is some input which might not match any rule. (If you ignore the warning, a runtime error will be produced for such input.)

Resources