I have the grammar
Model:
vars+=Vars*
funcs+=Funcs*;
Name:
name=ID;
VarName:
Name;
FuncName:
Name;
Funcs:
'func' left=FuncName (bracket?='(' ')')? '=' right=[Name]';';
Vars:
'var' VarName ';';
where the right hand size of the Func rule can be either of type VarName or FuncName depending is the brackets on the left hand size appear.
Must I modify the xtext grammar or do a type of validation/scoping?
Update 1
the scope function:
override getScope(EObject context, EReference reference) {
if (context instanceof Funcs) {
val func = context as Funcs
if (reference == MultiNameDslPackage.Literals.FUNCS__RIGHT) {
if (func.bracket) {
val rootElement = EcoreUtil2.getRootContainer(context)
val candidates = EcoreUtil2.getAllContentsOfType(rootElement, VarName)
return Scopes.scopeFor(candidates)
} else {
val rootElement = EcoreUtil2.getRootContainer(context)
val candidates = EcoreUtil2.getAllContentsOfType(rootElement, FuncName)
return Scopes.scopeFor(candidates)
}
}
return super.getScope(context, reference);
}
}
The left hand size is independent of the presence of the brackets in the editor.
Update 2
Using validation
#Check
def checkFuncContext(Funcs func) {
if (func.bracket) {
if (!(func.right instanceof VarName)) {
warning("Right hand size must be of Var type",
MultiNameDslPackage.Literals.FUNCS__RIGHT
)
}
} else {
if (!(func.right instanceof FuncName)) {
warning("Right hand size must be of Function type",
MultiNameDslPackage.Literals.FUNCS__RIGHT
)
}
}
}
The warning statements are not executed. The statement func.right instanceof FuncName) does not behave as expected.
How can I test for the correct instance?
Update 3
Using a modified grammar
VarName:
name=ID;
FuncName:
name=ID;
Funcs:
'func' left=FuncName (bracket?='(' ')')? '=' (right=[FuncName] | r1=[VarName]) ';';
does not compile: Decision can match input such as "RULE_ID" using multiple alternatives: 1, 2
You need to change your grammar to get the inheritance order for Name, FuncName and VarName right (Name super type of both)
Either use a parser fragment
fragment Name: name=ID;
Or use
Name:VarName|FuncName;
VarName: name=ID;
FuncName:name=ID;
Related
Let's say I have logical expressions in this form
((a OR b) AND c) OR (d AND e = (f OR g))
I'm interested in getting a, b, c, d and ignore e, f, g (everything that has to do with assignments).
My attempt was something like this
infixOp : OR | AND ;
assignment : Identifier EQUALS expression ;
expression :
Identifier |
assignment |
expression infixOp expression |
LEFTBRACKET expression RIGHTBRACKET ;
Then in my listener in enterExpression() I just printed expressionContext.getTokens(identifierType). Of course, this is too much and I got all of them.
How could "skip" over an assignment expression? Could this be done in the grammar? Or it can only be done programatically?
What you can do is create a small listener and keep track of the fact when you enter- and exit an assignment. Then inside the enterExpression method, check if you're NOT inside an assignment AND the Identifier token has a value.
A quick demo for the grammar:
grammar T;
parse
: expression EOF
;
expression
: '(' expression ')'
| expression 'AND' expression
| expression 'OR' expression
| assignment
| Identifier
;
assignment
: Identifier '=' expression
;
Identifier : [a-z]+;
Space : [ \t\r\n] -> skip;
and Java class:
public class Main {
public static void main(String[] args) {
TLexer lexer = new TLexer(CharStreams.fromString("((a OR b) AND c) OR (d AND e = (f OR g))"));
TParser parser = new TParser(new CommonTokenStream(lexer));
MyListener listener = new MyListener();
ParseTreeWalker.DEFAULT.walk(listener, parser.parse());
System.out.println(listener.ids);
}
}
class MyListener extends TBaseListener {
public List<String> ids = new ArrayList<String>();
private boolean inAssignment = false;
#Override
public void enterExpression(TParser.ExpressionContext ctx) {
if (!this.inAssignment && ctx.Identifier() != null) {
this.ids.add(ctx.Identifier().getText());
}
}
#Override
public void enterAssignment(TParser.AssignmentContext ctx) {
this.inAssignment = true;
}
#Override
public void exitAssignment(TParser.AssignmentContext ctx) {
this.inAssignment = false;
}
}
will print:
[a, b, c, d]
Maybe something like:
if (!(expressionContext.parent() instanceof AssignmentContext)) {
expressionContext.getTokens(identifierType);
}
You can fine that by walking the parse tree structure and check the different members in the expression context.
I've got more of my expression parser working (Dart PetitParser to get at AST datastructure created with ExpressionBuilder). It appears to be generating accurate ASTs for floats, parens, power, multiply, divide, add, subtract, unary negative in front of both numbers and expressions. (The nodes are either literal strings, or an object that has a precedence with a List payload that gets walked and concatenated.)
I'm stuck now on visiting the nodes. I have clean access to the top node (thanks to Lukas), but I'm stuck on deciding whether or not to add a paren. For example, in 20+30*40, we don't need parens around 30*40, and the parse tree correctly has the node for this closer to the root so I'll hit it first during traversal. However, I don't seem to have enough data when looking at the 30*40 node to determine if it needs parens before going on to the 20+.. A very similar case would be (20+30)*40, which gets parsed correctly with 20+30 closer to the root, so once again, when visiting the 20+30 node I need to add parens before going on to *40.
This has to be a solved problem, but I never went to compiler school, so I know just enough about ASTs to be dangerous. What "a ha" am I missing?
// rip-common.dart:
import 'package:petitparser/petitparser.dart';
// import 'package:petitparser/debug.dart';
class Node {
int precedence;
List<dynamic> args;
Node([this.precedence = 0, this.args = const []]) {
// nodeList.add(this);
}
#override
String toString() => 'Node($precedence $args)';
String visit([int fromPrecedence = -1]) {
print('=== visiting $this ===');
var buf = StringBuffer();
var parens = (precedence > 0) &&
(fromPrecedence > 0) &&
(precedence < fromPrecedence);
print('<$fromPrecedence $precedence $parens>');
// for debugging:
var curlyOpen = '';
var curlyClose = '';
buf.write(parens ? '(' : curlyOpen);
for (var arg in args) {
if (arg is Node) {
buf.write(arg.visit(precedence));
} else if (arg is String) {
buf.write(arg);
} else {
print('not Node or String: $arg');
buf.write('$arg');
}
}
buf.write(parens ? ')' : curlyClose);
print('$buf for buf');
return '$buf';
}
}
class RIPParser {
Parser _make_parser() {
final builder = ExpressionBuilder();
var number = char('-').optional() &
digit().plus() &
(char('.') & digit().plus()).optional();
// precedence 5
builder.group()
..primitive(number.flatten().map((a) => Node(0, [a])))
..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a]));
// negation is a prefix operator
// precedence 4
builder.group()..prefix(char('-').trim(), (op, a) => Node(4, [op, a]));
// power is right-associative
// precedence 3
builder.group()..right(char('^').trim(), (a, op, b) => Node(3, [a, op, b]));
// multiplication and addition are left-associative
// precedence 2
builder.group()
..left(char('*').trim(), (a, op, b) => Node(2, [a, op, b]))
..left(char('/').trim(), (a, op, b) => Node(2, [a, op, b]));
// precedence 1
builder.group()
..left(char('+').trim(), (a, op, b) => Node(1, [a, op, b]))
..left(char('-').trim(), (a, op, b) => Node(1, [a, op, b]));
final parser = builder.build().end();
return parser;
}
Result _result(String input) {
var parser = _make_parser(); // eventually cache
var result = parser.parse(input);
return result;
}
String parse(String input) {
var result = _result(input);
if (result.isFailure) {
return result.message;
} else {
print('result.value = ${result.value}');
return '$result';
}
}
String visit(String input) {
var result = _result(input);
var top_node = result.value; // result.isFailure ...
return top_node.visit();
}
}
// rip_cmd_example.dart
import 'dart:io';
import 'package:rip_common/rip_common.dart';
void main() {
print('start');
String input;
while (true) {
input = stdin.readLineSync();
if (input.isEmpty) {
break;
}
print(RIPParser().parse(input));
print(RIPParser().visit(input));
}
;
print('done');
}
As you've observed, the ExpressionBuilder already assembles the tree in the right precedence order based on the operator groups you've specified.
This also happens for the wrapping parens node created here: ..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a])). If I test for this node, I get back the input string for your example expressions: var parens = precedence == 0 && args.length == 1 && args[0] is Node;.
Unless I am missing something, there should be no reason for you to track the precedence manually. I would also recommend that you create different node classes for the different operators: ValueNode, ParensNode, NegNode, PowNode, MulNode, ... A bit verbose, but much easier to understand what is going on, if each of them can just visit (print, evaluate, optimize, ...) itself.
Can the "memory" be avoided in the following Xtext scope provider for the field dot-notation of the following grammar, which is stock-standard except for the field dot-notation of the LocalRelation.
grammar test.path.Path with org.eclipse.xtext.common.Terminals
generate path "http://www.path.test/Path"
Model:
(elements+=AbstractElement)*;
PackageDeclaration:
'package' name=QualifiedName '{'
(elements+=AbstractElement)*
'}';
AbstractElement:
PackageDeclaration | Entity | Import;
Import:
'import' importedNamespace=QualifiedNameWithWildcard;
Entity:
'entity' name=ID '{'
relations+=Relation*
'}';
Relation:
GlobalRelation | LocalRelation;
GlobalRelation:
('id')? name=ID ':' ('*' | '+' | '?')? ref=[Entity|QualifiedName];
LocalRelation:
('id')? name=ID ':' ('*' | '+' | '?')? 'this' '.' head=[Relation|ID] ('.' tail+=[Relation|ID])*;
QualifiedNameWithWildcard:
QualifiedName '.*'?;
QualifiedName:
ID ('.' ID)*;
The follow grammar instance demonstrates the LocalRelation dot-notion.
entity A {
b : B
}
entity B {
a : A
}
entity C {
b : B
x1 : this.b
x2 : this.x1
// x3 : this.x3 No self ref
x4 : this.b.a
x5 : this.x1.a
x6 : this.x1.a.b.a.b.a.b.a
}
entity D {
c : C
x1 : this.c.b.a
}
The scoping resolution for GlobalRelations works out of the box, but of course the scoping for LocalRelations does not. I have come up with the following working scope provider, but it uses a global map to keep track of dot-depth, together with a special head to set the counter to zero, since one cannot sample the value of a reference before it is defined as that causes an infinite loop.
class PathScopeProvider extends AbstractPathScopeProvider {
#Inject extension IQualifiedNameProvider
override getScope(EObject context, EReference reference) {
if (context instanceof LocalRelation) {
return if (reference == PathPackage.Literals.LOCAL_RELATION__HEAD)
getScopeLocalRelation_HEAD(context as LocalRelation, context.eContainer as Entity)
else if (reference == PathPackage.Literals.LOCAL_RELATION__TAIL)
getScopeLocalRelation_TAIL(context as LocalRelation, context.eContainer as Entity)
}
return super.getScope(context, reference);
}
def IScope getScopeLocalRelation_HEAD(LocalRelation contextLocalRelation,
Entity contextLocalRelationContainingEntity) {
// Don't touch contextLocalRelation.head not contextLocalRelation.tail!
val result = newArrayList
contextLocalRelationContainingEntity.relations.filter(
target |
target != contextLocalRelation
).forEach [ target |
{
result.add(EObjectDescription.create(QualifiedName.create(target.name), target))
resetDepth(contextLocalRelation)
}
]
return new SimpleScope(IScope.NULLSCOPE, result)
}
def IScope getScopeLocalRelation_TAIL(LocalRelation contextLocalRelation,
Entity contextLocalRelationContainingEntity) {
// Note that head is well-defined, while tail is well-defined up to depth
val head = contextLocalRelation.head
val result = newArrayList
val depthSoFar = getDepth(contextLocalRelation)
incDepth(contextLocalRelation)
val targetSoFar = if(depthSoFar === 0) head else contextLocalRelation.tail.get(depthSoFar - 1)
if (targetSoFar instanceof GlobalRelation) {
val targetSoFar_Global = targetSoFar as GlobalRelation
targetSoFar_Global.ref.relations.forEach [ t |
result.add(EObjectDescription.create(QualifiedName.create(t.name), t))
]
} else if (targetSoFar instanceof LocalRelation) {
var Relation i = targetSoFar as LocalRelation
while (i instanceof LocalRelation) {
i = if(i.tail.empty) i.head else i.tail.last
}
(i as GlobalRelation).ref.relations.forEach [ t |
result.add(EObjectDescription.create(QualifiedName.create(t.name), t))
]
}
return new SimpleScope(IScope.NULLSCOPE, result)
}
// DEPTH MEMORY
val enity_relation__depthSoFar = new HashMap<String, Integer>()
private def void resetDepth(LocalRelation r) {
enity_relation__depthSoFar.put(r.fullyQualifiedName.toString, 0)
}
private def int getDepth(LocalRelation r) {
enity_relation__depthSoFar.get(r.fullyQualifiedName.toString)
}
private def int incDepth(LocalRelation r) {
enity_relation__depthSoFar.put(r.fullyQualifiedName.toString, getDepth(r) + 1)
}
}
Can this additional depth memory be avoided in any way?
Is there some internal way of detecting the depth of the tail so far scoped?
I have experimented with a try-catch block but that doesn't work, and would be sloppy anyway.
The following solution is derived from Xtext and Dot/Path-Expressions, as suggested by Christian Dietrich, and that is the example to use if you are looking to implement standard field dot path scoping.
The grammar above requires a small change.
LocalRelation:
('id')? name=ID ':' ('*' | '+' | '?')? 'this' '.' path=Path;
Path: head=[Relation|ID] ({Path.curr=current} "." tail+=[Relation|ID])*;
The only change is {Path.curr=current} and this is the "magic ingredient" that makes all the difference. The scope provider is now almost trivial, especially compared to the monster in the OP.
override getScope(EObject context, EReference reference) {
if (context instanceof Path) {
if (reference == PathPackage.Literals.PATH__HEAD) {
// Filter self reference
return new FilteringScope(super.getScope(context, reference), [ e |
!Objects.equals(e.getEObjectOrProxy(), context)
]);
} else { // DOT_EXPRESSION__TAIL
val target = context.curr.followTheDots
if (target === null) {
return IScope::NULLSCOPE
}
return new FilteringScope(super.getScope(target, reference), [ e |
!Objects.equals(e.getEObjectOrProxy(), context)
]);
}
}
return super.getScope(context, reference);
}
def Entity followTheDots(Path path) {
var targetRelation = if(path.tail === null || path.tail.empty) path.head else path.tail.last;
return if (targetRelation instanceof GlobalRelation)
targetRelation.ref
else if (targetRelation instanceof LocalRelation)
targetRelation.path.followTheDots
else null // happens when dot path is nonsense
}
def GlobalRelation followTheDots(LocalRelation exp) {
var targetRelation = if(exp.tail === null || exp.tail.empty) exp.head else exp.tail.last;
return if (targetRelation instanceof GlobalRelation)
targetRelation
else if (targetRelation instanceof LocalRelation)
targetRelation.followTheDots
else null // happens when dot path is nonsense
}
While we cannot "touch" tail in the scope provider, the "magic ingredient" {LocalRelation.curr=current} allows the iterative scope provider full access to the previously fully resolved and well-defined reference via curr. Post scoping, the full resolved and well-defined trail is available via head and tail.
[... Edited because original was wrong! ...]
The following canonical XBase entities grammar (from "Implementing Domain-Specific Languages with Xtext and Xtend" Bettini) permits entities to extend any Java class. As the commented line indicates, I would like to grammatically force entities to only inherit from entities.
grammar org.example.xbase.entities.Entities with org.eclipse.xtext.xbase.Xbase
generate entities "http://www.example.org/xbase/entities/Entities"
Model:
importSection=XImportSection?
entities+=Entity*;
Entity:
'entity' name=ID ('extends' superType=JvmParameterizedTypeReference)? '{'
// 'entity' name=ID ('extends' superType=[Entity|QualifiedName])? '{'
attributes += Attribute*
constructors+=Constructor*
operations += Operation*
'}';
Attribute:
'attr' (type=JvmTypeReference)? name=ID ('=' initexpression=XExpression)? ';';
Operation:
'op' (type=JvmTypeReference)? name=ID
'(' (params+=FullJvmFormalParameter (',' params+=FullJvmFormalParameter)*)? ')'
body=XBlockExpression;
Constructor: 'new'
'(' (params+=FullJvmFormalParameter (',' params+=FullJvmFormalParameter)*)? ')'
body=XBlockExpression;
Here is a working JVMModelInferrer for the model above, again where the commented line (and extra method) reflect my intention.
package org.example.xbase.entities.jvmmodel
import com.google.inject.Inject
import org.eclipse.xtext.common.types.JvmTypeReference
import org.eclipse.xtext.naming.IQualifiedNameProvider
import org.eclipse.xtext.xbase.jvmmodel.AbstractModelInferrer
import org.eclipse.xtext.xbase.jvmmodel.IJvmDeclaredTypeAcceptor
import org.eclipse.xtext.xbase.jvmmodel.JvmTypesBuilder
import org.example.xbase.entities.entities.Entity
class EntitiesJvmModelInferrer extends AbstractModelInferrer {
#Inject extension JvmTypesBuilder
#Inject extension IQualifiedNameProvider
def dispatch void infer(Entity entity, IJvmDeclaredTypeAcceptor acceptor, boolean isPreIndexingPhase) {
acceptor.accept(entity.toClass("entities." + entity.name)) [
documentation = entity.documentation
if (entity.superType !== null) {
superTypes += entity.superType.cloneWithProxies
//superTypes += entity.superType.jvmTypeReference.cloneWithProxies
}
entity.attributes.forEach [ a |
val type = a.type ?: a.initexpression?.inferredType
members += a.toField(a.name, type) [
documentation = a.documentation
if (a.initexpression != null)
initializer = a.initexpression
]
members += a.toGetter(a.name, type)
members += a.toSetter(a.name, type)
]
entity.operations.forEach [ op |
members += op.toMethod(op.name, op.type ?: inferredType) [
documentation = op.documentation
for (p : op.params) {
parameters += p.toParameter(p.name, p.parameterType)
}
body = op.body
]
]
entity.constructors.forEach [ con |
members += entity.toConstructor [
for (p : con.params) {
parameters += p.toParameter(p.name, p.parameterType)
}
body = con.body
]
]
]
}
def JvmTypeReference getJvmTypeReference(Entity e) {
e.toClass(e.fullyQualifiedName).typeRef
}
}
The following simple instance parses and infers perfectly (with the comments in place).
entity A {
attr String y;
new(String y) {
this.y=y
}
}
entity B extends A {
new() {
super("Hello World!")
}
}
If, however, I uncomment (and comment in the corresponding line above) both the grammar and the inferrer (and regenerate), the above instance no longer parses. The message is "The method super(String) is undefined".
I understand how to leave the inheritance "loose" and restrict using validators, etc., but would far prefer to strongly type this into the model.
I am lost as to how to solve this, as I am not sure where things are breaking, given the role of XBase and the JvmModelInferrer. A pointer (or reference) would suffice.
[... I am able to implement all the scoping issues for a non-xbase version of this grammar ...]
this won't work. you either have to leave the grammar as is and customize proposal provider and validation. or you have to use "f.q.n.o.y.Entity".typeRef. You can use NodeModelUtils to read the FQN or try something like ("entities."+entity.superType.name).typeRef
I'm using pegjs to define a grammar that allows new types to be defined. How do I then recognize those types subsequent to their definition? I have a production that defines the built in types, e.g.
BuiltInType
= "int"
/ "float"
/ "string"
/ TYPE_NAME
But what do I do for the last one? I don't know what possible strings will be type names until they are defined in the source code.
In the traditional way of parsing where there is both a lexer and a parser, the parser would add the type name to a table and the lexer would use this table to determine whether to return TYPE_NAME or IDENTIFIER for a particular token. But pegjs does not have this separation.
You're right, you cannot (easily) modify pegjs' generated parser on the fly without knowing a lot about its internals. But what you lose from a standard LALR, you gain in interspersing JavaScript code throughout the parser rules themselves.
To accomplish your goal, you'll need to recognize new types (in context) and keep them for use later, as in:
{
// predefined types
const types = {'int':true, 'float':true, 'string':true}
// variable storage
const vars = {}
}
start = statement statement* {
console.log(JSON.stringify({types:types,vars:vars}, null, 2))
}
statement
= WS* typedef EOL
/ WS* vardef EOL
typedef "new type definition" // eg. 'define myNewType'
= 'define' SP+ type:symbol {
if(types[type]) {
throw `attempted redefinition of: "${type}"`
}
types[type]=true
}
// And then, when you need to recognize a type, something like:
vardef "variable declaration" // eg: 'let foo:myNewType=10'
= 'let' SP+ name:symbol COLON type:symbol SP* value:decl_assign? {
if(!types[type]) {
throw `unknown type encountered: ${type}`
}
vars[name] = { name: name, type:type, value: value }
}
decl_assign "variable declaration assignment"
= '=' SP* value:number {
return value
}
symbol = $( [a-zA-Z][a-zA-Z0-9]* )
number = $( ('+' / '-')? [1-9][0-9]* ( '.' [0-9]+ )? )
COLON = ':'
SP = [ \t]
WS = [ \t\n]
EOL = '\n'
which, when asked to parse:
define fooType
let bar:fooType = 1
will print:
{
"types": {
"int": true,
"float": true,
"string": true,
"fooType": true
},
"vars": {
"bar": {
"name": "bar",
"type": "fooType",
"value": "1"
}
}
}