Xtext field dot-notation scope provider - xtext

Can the "memory" be avoided in the following Xtext scope provider for the field dot-notation of the following grammar, which is stock-standard except for the field dot-notation of the LocalRelation.
grammar test.path.Path with org.eclipse.xtext.common.Terminals
generate path "http://www.path.test/Path"
Model:
(elements+=AbstractElement)*;
PackageDeclaration:
'package' name=QualifiedName '{'
(elements+=AbstractElement)*
'}';
AbstractElement:
PackageDeclaration | Entity | Import;
Import:
'import' importedNamespace=QualifiedNameWithWildcard;
Entity:
'entity' name=ID '{'
relations+=Relation*
'}';
Relation:
GlobalRelation | LocalRelation;
GlobalRelation:
('id')? name=ID ':' ('*' | '+' | '?')? ref=[Entity|QualifiedName];
LocalRelation:
('id')? name=ID ':' ('*' | '+' | '?')? 'this' '.' head=[Relation|ID] ('.' tail+=[Relation|ID])*;
QualifiedNameWithWildcard:
QualifiedName '.*'?;
QualifiedName:
ID ('.' ID)*;
The follow grammar instance demonstrates the LocalRelation dot-notion.
entity A {
b : B
}
entity B {
a : A
}
entity C {
b : B
x1 : this.b
x2 : this.x1
// x3 : this.x3 No self ref
x4 : this.b.a
x5 : this.x1.a
x6 : this.x1.a.b.a.b.a.b.a
}
entity D {
c : C
x1 : this.c.b.a
}
The scoping resolution for GlobalRelations works out of the box, but of course the scoping for LocalRelations does not. I have come up with the following working scope provider, but it uses a global map to keep track of dot-depth, together with a special head to set the counter to zero, since one cannot sample the value of a reference before it is defined as that causes an infinite loop.
class PathScopeProvider extends AbstractPathScopeProvider {
#Inject extension IQualifiedNameProvider
override getScope(EObject context, EReference reference) {
if (context instanceof LocalRelation) {
return if (reference == PathPackage.Literals.LOCAL_RELATION__HEAD)
getScopeLocalRelation_HEAD(context as LocalRelation, context.eContainer as Entity)
else if (reference == PathPackage.Literals.LOCAL_RELATION__TAIL)
getScopeLocalRelation_TAIL(context as LocalRelation, context.eContainer as Entity)
}
return super.getScope(context, reference);
}
def IScope getScopeLocalRelation_HEAD(LocalRelation contextLocalRelation,
Entity contextLocalRelationContainingEntity) {
// Don't touch contextLocalRelation.head not contextLocalRelation.tail!
val result = newArrayList
contextLocalRelationContainingEntity.relations.filter(
target |
target != contextLocalRelation
).forEach [ target |
{
result.add(EObjectDescription.create(QualifiedName.create(target.name), target))
resetDepth(contextLocalRelation)
}
]
return new SimpleScope(IScope.NULLSCOPE, result)
}
def IScope getScopeLocalRelation_TAIL(LocalRelation contextLocalRelation,
Entity contextLocalRelationContainingEntity) {
// Note that head is well-defined, while tail is well-defined up to depth
val head = contextLocalRelation.head
val result = newArrayList
val depthSoFar = getDepth(contextLocalRelation)
incDepth(contextLocalRelation)
val targetSoFar = if(depthSoFar === 0) head else contextLocalRelation.tail.get(depthSoFar - 1)
if (targetSoFar instanceof GlobalRelation) {
val targetSoFar_Global = targetSoFar as GlobalRelation
targetSoFar_Global.ref.relations.forEach [ t |
result.add(EObjectDescription.create(QualifiedName.create(t.name), t))
]
} else if (targetSoFar instanceof LocalRelation) {
var Relation i = targetSoFar as LocalRelation
while (i instanceof LocalRelation) {
i = if(i.tail.empty) i.head else i.tail.last
}
(i as GlobalRelation).ref.relations.forEach [ t |
result.add(EObjectDescription.create(QualifiedName.create(t.name), t))
]
}
return new SimpleScope(IScope.NULLSCOPE, result)
}
// DEPTH MEMORY
val enity_relation__depthSoFar = new HashMap<String, Integer>()
private def void resetDepth(LocalRelation r) {
enity_relation__depthSoFar.put(r.fullyQualifiedName.toString, 0)
}
private def int getDepth(LocalRelation r) {
enity_relation__depthSoFar.get(r.fullyQualifiedName.toString)
}
private def int incDepth(LocalRelation r) {
enity_relation__depthSoFar.put(r.fullyQualifiedName.toString, getDepth(r) + 1)
}
}
Can this additional depth memory be avoided in any way?
Is there some internal way of detecting the depth of the tail so far scoped?
I have experimented with a try-catch block but that doesn't work, and would be sloppy anyway.

The following solution is derived from Xtext and Dot/Path-Expressions, as suggested by Christian Dietrich, and that is the example to use if you are looking to implement standard field dot path scoping.
The grammar above requires a small change.
LocalRelation:
('id')? name=ID ':' ('*' | '+' | '?')? 'this' '.' path=Path;
Path: head=[Relation|ID] ({Path.curr=current} "." tail+=[Relation|ID])*;
The only change is {Path.curr=current} and this is the "magic ingredient" that makes all the difference. The scope provider is now almost trivial, especially compared to the monster in the OP.
override getScope(EObject context, EReference reference) {
if (context instanceof Path) {
if (reference == PathPackage.Literals.PATH__HEAD) {
// Filter self reference
return new FilteringScope(super.getScope(context, reference), [ e |
!Objects.equals(e.getEObjectOrProxy(), context)
]);
} else { // DOT_EXPRESSION__TAIL
val target = context.curr.followTheDots
if (target === null) {
return IScope::NULLSCOPE
}
return new FilteringScope(super.getScope(target, reference), [ e |
!Objects.equals(e.getEObjectOrProxy(), context)
]);
}
}
return super.getScope(context, reference);
}
def Entity followTheDots(Path path) {
var targetRelation = if(path.tail === null || path.tail.empty) path.head else path.tail.last;
return if (targetRelation instanceof GlobalRelation)
targetRelation.ref
else if (targetRelation instanceof LocalRelation)
targetRelation.path.followTheDots
else null // happens when dot path is nonsense
}
def GlobalRelation followTheDots(LocalRelation exp) {
var targetRelation = if(exp.tail === null || exp.tail.empty) exp.head else exp.tail.last;
return if (targetRelation instanceof GlobalRelation)
targetRelation
else if (targetRelation instanceof LocalRelation)
targetRelation.followTheDots
else null // happens when dot path is nonsense
}
While we cannot "touch" tail in the scope provider, the "magic ingredient" {LocalRelation.curr=current} allows the iterative scope provider full access to the previously fully resolved and well-defined reference via curr. Post scoping, the full resolved and well-defined trail is available via head and tail.
[... Edited because original was wrong! ...]

Related

Skipping sub-expressions in listener

Let's say I have logical expressions in this form
((a OR b) AND c) OR (d AND e = (f OR g))
I'm interested in getting a, b, c, d and ignore e, f, g (everything that has to do with assignments).
My attempt was something like this
infixOp : OR | AND ;
assignment : Identifier EQUALS expression ;
expression :
Identifier |
assignment |
expression infixOp expression |
LEFTBRACKET expression RIGHTBRACKET ;
Then in my listener in enterExpression() I just printed expressionContext.getTokens(identifierType). Of course, this is too much and I got all of them.
How could "skip" over an assignment expression? Could this be done in the grammar? Or it can only be done programatically?
What you can do is create a small listener and keep track of the fact when you enter- and exit an assignment. Then inside the enterExpression method, check if you're NOT inside an assignment AND the Identifier token has a value.
A quick demo for the grammar:
grammar T;
parse
: expression EOF
;
expression
: '(' expression ')'
| expression 'AND' expression
| expression 'OR' expression
| assignment
| Identifier
;
assignment
: Identifier '=' expression
;
Identifier : [a-z]+;
Space : [ \t\r\n] -> skip;
and Java class:
public class Main {
public static void main(String[] args) {
TLexer lexer = new TLexer(CharStreams.fromString("((a OR b) AND c) OR (d AND e = (f OR g))"));
TParser parser = new TParser(new CommonTokenStream(lexer));
MyListener listener = new MyListener();
ParseTreeWalker.DEFAULT.walk(listener, parser.parse());
System.out.println(listener.ids);
}
}
class MyListener extends TBaseListener {
public List<String> ids = new ArrayList<String>();
private boolean inAssignment = false;
#Override
public void enterExpression(TParser.ExpressionContext ctx) {
if (!this.inAssignment && ctx.Identifier() != null) {
this.ids.add(ctx.Identifier().getText());
}
}
#Override
public void enterAssignment(TParser.AssignmentContext ctx) {
this.inAssignment = true;
}
#Override
public void exitAssignment(TParser.AssignmentContext ctx) {
this.inAssignment = false;
}
}
will print:
[a, b, c, d]
Maybe something like:
if (!(expressionContext.parent() instanceof AssignmentContext)) {
expressionContext.getTokens(identifierType);
}
You can fine that by walking the parse tree structure and check the different members in the expression context.

Building expression parser with Dart petitparser, getting stuck on node visitor

I've got more of my expression parser working (Dart PetitParser to get at AST datastructure created with ExpressionBuilder). It appears to be generating accurate ASTs for floats, parens, power, multiply, divide, add, subtract, unary negative in front of both numbers and expressions. (The nodes are either literal strings, or an object that has a precedence with a List payload that gets walked and concatenated.)
I'm stuck now on visiting the nodes. I have clean access to the top node (thanks to Lukas), but I'm stuck on deciding whether or not to add a paren. For example, in 20+30*40, we don't need parens around 30*40, and the parse tree correctly has the node for this closer to the root so I'll hit it first during traversal. However, I don't seem to have enough data when looking at the 30*40 node to determine if it needs parens before going on to the 20+.. A very similar case would be (20+30)*40, which gets parsed correctly with 20+30 closer to the root, so once again, when visiting the 20+30 node I need to add parens before going on to *40.
This has to be a solved problem, but I never went to compiler school, so I know just enough about ASTs to be dangerous. What "a ha" am I missing?
// rip-common.dart:
import 'package:petitparser/petitparser.dart';
// import 'package:petitparser/debug.dart';
class Node {
int precedence;
List<dynamic> args;
Node([this.precedence = 0, this.args = const []]) {
// nodeList.add(this);
}
#override
String toString() => 'Node($precedence $args)';
String visit([int fromPrecedence = -1]) {
print('=== visiting $this ===');
var buf = StringBuffer();
var parens = (precedence > 0) &&
(fromPrecedence > 0) &&
(precedence < fromPrecedence);
print('<$fromPrecedence $precedence $parens>');
// for debugging:
var curlyOpen = '';
var curlyClose = '';
buf.write(parens ? '(' : curlyOpen);
for (var arg in args) {
if (arg is Node) {
buf.write(arg.visit(precedence));
} else if (arg is String) {
buf.write(arg);
} else {
print('not Node or String: $arg');
buf.write('$arg');
}
}
buf.write(parens ? ')' : curlyClose);
print('$buf for buf');
return '$buf';
}
}
class RIPParser {
Parser _make_parser() {
final builder = ExpressionBuilder();
var number = char('-').optional() &
digit().plus() &
(char('.') & digit().plus()).optional();
// precedence 5
builder.group()
..primitive(number.flatten().map((a) => Node(0, [a])))
..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a]));
// negation is a prefix operator
// precedence 4
builder.group()..prefix(char('-').trim(), (op, a) => Node(4, [op, a]));
// power is right-associative
// precedence 3
builder.group()..right(char('^').trim(), (a, op, b) => Node(3, [a, op, b]));
// multiplication and addition are left-associative
// precedence 2
builder.group()
..left(char('*').trim(), (a, op, b) => Node(2, [a, op, b]))
..left(char('/').trim(), (a, op, b) => Node(2, [a, op, b]));
// precedence 1
builder.group()
..left(char('+').trim(), (a, op, b) => Node(1, [a, op, b]))
..left(char('-').trim(), (a, op, b) => Node(1, [a, op, b]));
final parser = builder.build().end();
return parser;
}
Result _result(String input) {
var parser = _make_parser(); // eventually cache
var result = parser.parse(input);
return result;
}
String parse(String input) {
var result = _result(input);
if (result.isFailure) {
return result.message;
} else {
print('result.value = ${result.value}');
return '$result';
}
}
String visit(String input) {
var result = _result(input);
var top_node = result.value; // result.isFailure ...
return top_node.visit();
}
}
// rip_cmd_example.dart
import 'dart:io';
import 'package:rip_common/rip_common.dart';
void main() {
print('start');
String input;
while (true) {
input = stdin.readLineSync();
if (input.isEmpty) {
break;
}
print(RIPParser().parse(input));
print(RIPParser().visit(input));
}
;
print('done');
}
As you've observed, the ExpressionBuilder already assembles the tree in the right precedence order based on the operator groups you've specified.
This also happens for the wrapping parens node created here: ..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a])). If I test for this node, I get back the input string for your example expressions: var parens = precedence == 0 && args.length == 1 && args[0] is Node;.
Unless I am missing something, there should be no reason for you to track the precedence manually. I would also recommend that you create different node classes for the different operators: ValueNode, ParensNode, NegNode, PowNode, MulNode, ... A bit verbose, but much easier to understand what is going on, if each of them can just visit (print, evaluate, optimize, ...) itself.

Breaking head over how to get position of token with a rule - ANTLR4 / grammar

I'm writing a little grammar using ANLTR, and I have a rule like this:
operation : OPERATION (IDENT | EXPR) ',' (IDENT | EXPR);
...
OPERATION : 'ADD' | 'SUB' | 'MUL' | 'DIV' ;
IDENT : [a-z]+;
EXPR : INTEGER | FLOAT;
INTEGER : [0-9]+ | '-'[0-9]+
FLOAT : [0-9]+'.'[0-9]+ | '-'[0-9]+'.'[0-9]+
Now in the listener inside Java, how do I determine in the case of such a scenario where an operation consist of both IDENT and EXPR the order in which they appear?
Obviously the rule can match both
ADD 10, d
or
ADD d, 10
But in the listener for the rule, generated by ANTLR4, if there is both IDENT() and EXPR() how to get their order, since I want to assign the left and right operands correctly.
Been breaking my head over this, is there any simple way or should I rewrite the rule itself? The ctx.getTokens () requires me to give the token type, which kind of defeats the purpose, since I cannot get the sequence of the tokens in the rule, if I specify their type.
You can do it like this:
operation : OPERATION lhs=(IDENT | EXPR) ',' rhs=(IDENT | EXPR);
and then inside your listener, do this:
#Override
public void enterOperation(TParser.OperationContext ctx) {
if (ctx.lhs.getType() == TParser.IDENT) {
// left hand side is an identifier
} else {
// left hand side is an expression
}
// check `rhs` the same way
}
where TParser comes from the grammar file T.g4. Change this accordingly.
Another solution would be something like this:
operation
: OPERATION ident_or_expr ',' ident_or_expr
;
ident_or_expr
: IDENT
| EXPR
;
and then in your listener:
#Override
public void enterOperation(TParser.OperationContext ctx) {
Double lhs = findValueFor(ctx.ident_or_expr().get(0));
Double rhs = findValueFor(ctx.ident_or_expr().get(1));
...
}
private Double findValueFor(TParser.Ident_or_exprContext ctx) {
if (ctx.IDENT() != null) {
// it's an identifier
} else {
// it's an expression
}
}

Abstract Syntax Tree for Source Code including Expressions

I am building a new simple programming language (just to learn how compilers work in my free time).
I have already built a lexer which can tokenize my source code into lexemes.
However, I am now stuck on how to form an Abstract Syntax Tree from the tokens, where the source code might contain an expression (with operator precedence).
For simplicity, I shall include only 4 basic operators: +, -, /, and * in addition to brackets (). Operator precedence will follow BODMAS rule.
I realize I might be able to convert the expression from infix to prefix/postfix, form the tree and substitute it.
However, I am not sure if that is possible. Even if it is possible, I am not sure how efficient it might be or how difficult it might be to implement.
Is there some trivial way to form the tree in-place without having to convert to prefix/postfix first?
I came across the Shunting Yard algorithm which seems to do this. However, I found it to be quite a complicated algorithm. Is there something simpler, or should I go ahead with implementing the Shunting Yard algorithm?
Currently, the following program is tokenized by my lexer as follows:
I am demonstrating using a Java program for syntax familiarity.
Source Program:
public class Hello
{
public static void main(String[] args)
{
int a = 5;
int b = 6;
int c = 7;
int r = a + b * c;
System.out.println(r);
}
}
Lexer output:
public
class
Hello
{
public
static
void
main
(
String
[
]
args
)
{
int
a
=
5
;
int
b
=
6
;
int
c
=
7
;
int
r
=
a
+
b
*
c
;
System
.
out
.
println
(
r
)
;
}
}
// I know this might look ugly that I use a global variable ret to return parsed subtrees
// but please bear with it, I got used to this for various performance/usability reasons
var ret, tokens
function get_precedence(op) {
// this is an essential part, cannot parse an expression without the precedence checker
if (op == '*' || op == '/' || op == '%') return 14
if (op == '+' || op == '-') return 13
if (op == '<=' || op == '>=' || op == '<' || op == '>') return 11
if (op == '==' || op == '!=') return 10
if (op == '^') return 8
if (op == '&&') return 6
if (op == '||') return 5
return 0
}
function parse_primary(pos) {
// in the real language primary is almost everything that can be on the sides of +
// but here we only handle numbers detected with the JavaScript 'typeof' keyword
if (typeof tokens[pos] == 'number') {
ret = {
type: 'number',
value: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_operator(pos) {
// let's just reuse the function we already wrote insted of creating another huge 'if'
if (get_precedence(tokens[pos]) != 0) {
ret = {
type: 'operator',
operator: tokens[pos],
}
return pos + 1
}
else {
return undefined
}
}
function parse_expr(pos) {
var stack = [], code = [], n, op, next, precedence
pos = parse_primary(pos)
if (pos == undefined) {
// error, an expression can only start with a primary
return undefined
}
stack.push(ret)
while (true) {
n = pos
pos = parse_operator(pos)
if (pos == undefined) break
op = ret
pos = parse_primary(pos)
if (pos == undefined) break
next = ret
precedence = get_precedence(op.operator)
while (stack.length > 0 && get_precedence(stack[stack.length - 1].operator) >= precedence) {
code.push(stack.pop())
}
stack.push(op)
code.push(next)
}
while(stack.length > 0) {
code.push(stack.pop())
}
if (code.length == 1) ret = code[0]
else ret = {
type: 'expr',
stack: code,
}
return n
}
function main() {
tokens = [1, '+', 2, '*', 3]
var pos = parse_expr(0)
if (pos) {
console.log('parsed expression AST')
console.log(ret)
}
else {
console.log('unable to parse anything')
}
}
main()
Here is your bare-bones implementation of shunting yard expression parsing. This is written in JavaScript. This is as minimalistic and simple as you can get. Tokenizing is left off for brevity, you give the parse the array of tokens (you call them lexemes).
The actual Shunting Yard is the parse_expr function. This is the "classic" implementation that uses the stack, this is my preference, some people prefer functional recursion.
Functions that parse various syntax elements are usually called "parselets". here we have three of them, one for expression, others are for primary and operator. If a parselet detects the corresponding syntax construction at the position pos it will return the next position right after the construct, and the construct itself in AST form is returned via the global variable ret. If the parselet does not find what it expects it returns undefined.
It is now trivially simple to add support for parens grouping (, just extend parse_primary with if (parse_group())... else if (parse_number())... etc. In the meantime your parse_primary will grow real big supporting various things, prefix operators, function calls, etc.

xtext prase of context dependant grammars

I have the grammar
Model:
vars+=Vars*
funcs+=Funcs*;
Name:
name=ID;
VarName:
Name;
FuncName:
Name;
Funcs:
'func' left=FuncName (bracket?='(' ')')? '=' right=[Name]';';
Vars:
'var' VarName ';';
where the right hand size of the Func rule can be either of type VarName or FuncName depending is the brackets on the left hand size appear.
Must I modify the xtext grammar or do a type of validation/scoping?
Update 1
the scope function:
override getScope(EObject context, EReference reference) {
if (context instanceof Funcs) {
val func = context as Funcs
if (reference == MultiNameDslPackage.Literals.FUNCS__RIGHT) {
if (func.bracket) {
val rootElement = EcoreUtil2.getRootContainer(context)
val candidates = EcoreUtil2.getAllContentsOfType(rootElement, VarName)
return Scopes.scopeFor(candidates)
} else {
val rootElement = EcoreUtil2.getRootContainer(context)
val candidates = EcoreUtil2.getAllContentsOfType(rootElement, FuncName)
return Scopes.scopeFor(candidates)
}
}
return super.getScope(context, reference);
}
}
The left hand size is independent of the presence of the brackets in the editor.
Update 2
Using validation
#Check
def checkFuncContext(Funcs func) {
if (func.bracket) {
if (!(func.right instanceof VarName)) {
warning("Right hand size must be of Var type",
MultiNameDslPackage.Literals.FUNCS__RIGHT
)
}
} else {
if (!(func.right instanceof FuncName)) {
warning("Right hand size must be of Function type",
MultiNameDslPackage.Literals.FUNCS__RIGHT
)
}
}
}
The warning statements are not executed. The statement func.right instanceof FuncName) does not behave as expected.
How can I test for the correct instance?
Update 3
Using a modified grammar
VarName:
name=ID;
FuncName:
name=ID;
Funcs:
'func' left=FuncName (bracket?='(' ')')? '=' (right=[FuncName] | r1=[VarName]) ';';
does not compile: Decision can match input such as "RULE_ID" using multiple alternatives: 1, 2
You need to change your grammar to get the inheritance order for Name, FuncName and VarName right (Name super type of both)
Either use a parser fragment
fragment Name: name=ID;
Or use
Name:VarName|FuncName;
VarName: name=ID;
FuncName:name=ID;

Resources