How to implode an embedded choice of alternative symbols? - rascal

I have this syntax definition
syntax RuleData
= rule_data: ID+ RulePart '-\>' (Command|RulePart)+ Message? Newlines
;
Which for the most part doesn't cause me an issue, the only problem is that I'm not sure how to implode (Command|RulePart)+, I looked through the Rascal docs but I didn't find anything on how to define "Union" types.
This is what my ADT looks like currently
data RULEDATA
= rule_data(list[str] prefix, list[RULEPART] left, list[???] right, list[str] message, str)
;
The ??? is the bit where it could either be a RulePart (which, for the sake of simplicity, is a list[str]) or a Command (which is a str).

Turns out I was overcomplicating the whole thing. Instead of trying to have a union of types I simply added an additional construction to RULEPART that could accommodate COMMAND. I think I might have been also initially confused about some of the issues because I had bugs in other parts of the code that I was misinterpreting as being caused by this problem.
data RULEDATA
= rule_data(list[str] prefix, list[RULEPART] left, list[RULEPART] right, list[str] message, str)
;
data RULEPART
= part(list[RULECONTENT] contents)
| command(str command)
;

Related

Antlr Indirect Left Recursion

I've seen this question asked multiple times, and also seen people "solve" it... but it either confused me or didn't solve my specific situation:
Here's approximately what's going on:
block: statement*;
statement: <bunch of stuff> | expressionStatement;
expression_statement: <more stuff, for example> | method_invoke;
method_invoke: expression LEFT_PAREN <args...> RIGHT_PAREN block;
expression: <bunch of stuff> | expression_statement;
Everything inside of the expression_statement that starts with an expression uses indirect left recursion that I do not know how to fix while still being able to use those syntaxes as statements so they'll be usable in blocks(It is possible to do something like Print("hello world");
On it's own(a statement), but also do something like int c = a + b.getValue() as a part of an expression(an expression)...
How would I handle it differently?
If you need more info please let me know and I'll try my best to provide
I knew that to solve Indirect Left-Recursion I'd have to duplicate one or more of the rules... I hoped there'd be a better way to handle it then what is written online and also said here, but there isn't. I ended up doing that and it worked, thank you

Retaining separator while imploding

I have a syntax definition that looks like this
keyword LegendOperation = 'or' | 'and';
syntax LegendData
= legend_data: LegendKey '=' {ID LegendOperation}+ Newlines
;
I need to implode this into a way that allows me to retain the information on whether the separator for the ID is 'or' or 'and' but I didn't find anything in the docs on whether the separator is retained and if it can be used by implode. Initially, I did something like the below to try and keep that informatioACn.
syntax LegendData
= legend_data_or: LegendKey '=' {ID 'or'}+ Newlines
> legend_data_and: LegendKey '=' {ID 'and'}+ Newlines
;
The issue that I run into is that there are three forms of text that it needs to be able to parse
. = Background
# = Crate and Target
# = Crate or Wall
And when it tries to parse the first line it hits an ambiguity error when it should instead parse it as a legend_data_or with a single element (perhaps I misunderstood how to use the priorities). Honestly, I would prefer to be able to use that second format, but is there a way to disambiguate it?
Either a way to implode the first syntax while retaining the separator or a way to disambiguate the second format would help me with this issue.
I did not manage to come up with an elegant solution in the end. Discussing with others the best we could come up with was
syntax LegendOperation
= legend_or: 'or' ID
| legend_and: 'and' ID
;
syntax LegendData
= legend_data: LegendKey '=' ID LegendOperation* Newlines
;
Which works and allows us to retain the information on the separator but requires post-processing to turn into a usable datatype.

How to use context free grammars?

Could someone help me with using context free grammars. Up until now I've used regular expressions to remove comments, block comments and empty lines from a string so that it can be used to count the PLOC. This seems to be extremely slow so I was looking for a different more efficient method.
I saw the following post: What is the best way to ignore comments in a java file with Rascal?
I have no idea how to use this, the help doesn't get me far as well. When I try to define the line used in the post I immediately get an error.
lexical SingleLineComment = "//" ~[\n] "\n";
Could someone help me out with this and also explain a bit about how to setup such a context free grammar and then to actually extract the wanted data?
Kind regards,
Bob
First this will help: the ~ in Rascal CFG notation is not in the language, the negation of a character class is written like so: ![\n].
To use a context-free grammar in Rascal goes in three steps:
write it, like for example the syntax definition of the Func language here: http://docs.rascal-mpl.org/unstable/Recipes/#Languages-Func
Use it to parse input, like so:
// This is the basic parse command, but be careful it will not accept spaces and newlines before and after the TopNonTerminal text:
Prog myParseTree = parse(#Prog, "example string");
// you can do the same directly to an input file:
Prog myParseTree = parse(#TopNonTerminal, |home:///myProgram.func|);
// if you need to accept layout before and after the program, use a "start nonterminal":
start[Prog] myParseTree = parse(#start[TopNonTerminal], |home:///myProgram.func|);
Prog myProgram = myParseTree.top;
// shorthand for parsing stuff:
myProgram = [Prog] "example";
myProgram = [Prog] |home:///myLocation.txt|;
Once you have the tree you can start using visit and / deepmatch to extract information from the tree, or write recursive functions if you like. Examples can be found here: http://docs.rascal-mpl.org/unstable/Recipes/#Languages-Func , but here are some common idioms as well to extract information from a parse tree:
// produces the source location of each node in the tree:
myParseTree#\loc
// produces a set of all nodes of type Stat
{ s | /Stat s := myParseTree }
// pattern match an if-then-else and bind the three expressions and collect them in a set:
{ e1, e2, e3 | (Stat) `if <Exp e1> then <Exp e2> else <Exp e3> end` <- myExpressionList }
// collect all locations of all sub-trees (every parse tree is of a non-terminal type, which is a sub-type of Tree. It uses |unknown:///| for small sub-trees which have not been annotated for efficiency's sake, like literals and character classes:
[ t#\loc?|unknown:///| | /Tree t := myParseTree ]
That should give you a start. I'd go try out some stuff and look at more examples. Writing a grammar is a nice thing to do, but it does require some trial and error methods like writing a regex, but even more so.
For the grammar you might be writing, which finds source code comments but leaves the rest as "any character" you will need to use the longest match disambiguation a lot:
lexical Identifier = [a-z]+ !>> [a-z]; // means do not accept an Identifier if there is still [a-z] to add to it; so only the longest possible Identifier will match.
This kind of context-free grammar is called an "Island Grammar" metaphorically, because you will write precise rules for the parts you want to recognize (the comments are "Islands") while leaving the rest as everything else (the rest is "Water"). See https://dl.acm.org/citation.cfm?id=837160

whitespace in flex patterns leads to "unrecognized rule"

The flex info manual provides allows whitespace in regular expressions using the "x" modifier in the (?r-s:pattern) form. It specifically offers a simple example (without whitespace)
(?:foo) same as (foo)
but the following program fails to compile with the error "unrecognized rule":
BAD (?:foo)
%%
{BAD} {}
I cannot find any form of (? that is acceptable as a rule pattern. Is the manual in error, or do I misunderstand?
The example in your question does not seem to reflect the question itself, since it shows neither the use of whitespace nor a x flag. So I'm going to assume that the pattern which is failing for you is something like
BAD (?x:two | lines |
of | words)
%%
{BAD} { }
And, indeed, that will not work. Although you can use extended format in a pattern, you can only use it in a definition if it doesn't contain a newline. The definition terminates at the last non-whitespace character on the definition line.
Anyway, definitions are overused. You could write the above as
%%
(?x:two | lines |
of | words ) { }
Which saves anyone reading your code from having to search for a definition.
I do understand that you might want to use a very long pattern in a rule, which is awkward, particularly if you want to use it twice. Regardless of the issue with newlines, this tends to run into problems with Flex's definition length limit (2047 characters). My approach has been to break the very long pattern into a series of definitions, and then define another symbol which concatenates the pieces.
Before v2.6, Flex did not chop whitespace off the end of the definition line, which also leads to mysterious "unrecognized rule" errors. The manual seems to still reflect the v2.5 behaviour:
The definition is taken to begin at the first non-whitespace character following the name and continuing to the end of the line.

XTEXT 2 Customized Rename Refactoring Example

Is anyone out there who has an example of a customized rename refactoring example in xtext ?
I guess it has to be similar to the customized syntax highlighting, binding some classes override some implementations and then crawl trough the EObjects you want to rename.
But i don't know where to start, has anyone an idea ? Or is there even someone who has allready implemented a customized rename refactoring in xtext ?
kind regards,
Example: If i do rename, the ruleName of a Rule, i also want to rename the ruleReferenceName of the RuleReference
Rule:
ruleName=(RuleName)':' ruleContent=RuleContent ';'
;
RuleContent:
ruleReferences+=RuleReference
;
RuleReference:
ruleReferenceName=RuleName (cardinality=Cardinality)?
;
RuleName:
value=RuleReferenceNameTerminal
;
I guess what i first planned to do isn't intended by the xtext rename refactoring. So I took again a closer look at the crossreference concept. I tried rename refactoring through crossreferencing earlier, but stumbled across the fact that i didn't had an "ID" Terminal defined. What solved my issue was to let the crossreference know which terminal rule it should use and to set the name-attribute at the right place.
Here is what the grammar should look like to have the rename refactoring work like i wanted it to(note the square brackets and the name attribute). No binding and overriding needed at all.
Rule:
ruleName=(RuleName)':' ruleContent=RuleContent ';'
;
RuleContent:
ruleReferences+=RuleReference
;
RuleReference:
ruleReferenceName=[RuleName | RuleReferenceNameTerminal] (cardinality=Cardinality)?
;
RuleName:
name=RuleReferenceNameTerminal
;
It is important to know that the " | " between the square brackets is not an alternative.

Resources