Skipping sub-expressions in listener - parsing

Let's say I have logical expressions in this form
((a OR b) AND c) OR (d AND e = (f OR g))
I'm interested in getting a, b, c, d and ignore e, f, g (everything that has to do with assignments).
My attempt was something like this
infixOp : OR | AND ;
assignment : Identifier EQUALS expression ;
expression :
Identifier |
assignment |
expression infixOp expression |
LEFTBRACKET expression RIGHTBRACKET ;
Then in my listener in enterExpression() I just printed expressionContext.getTokens(identifierType). Of course, this is too much and I got all of them.
How could "skip" over an assignment expression? Could this be done in the grammar? Or it can only be done programatically?

What you can do is create a small listener and keep track of the fact when you enter- and exit an assignment. Then inside the enterExpression method, check if you're NOT inside an assignment AND the Identifier token has a value.
A quick demo for the grammar:
grammar T;
parse
: expression EOF
;
expression
: '(' expression ')'
| expression 'AND' expression
| expression 'OR' expression
| assignment
| Identifier
;
assignment
: Identifier '=' expression
;
Identifier : [a-z]+;
Space : [ \t\r\n] -> skip;
and Java class:
public class Main {
public static void main(String[] args) {
TLexer lexer = new TLexer(CharStreams.fromString("((a OR b) AND c) OR (d AND e = (f OR g))"));
TParser parser = new TParser(new CommonTokenStream(lexer));
MyListener listener = new MyListener();
ParseTreeWalker.DEFAULT.walk(listener, parser.parse());
System.out.println(listener.ids);
}
}
class MyListener extends TBaseListener {
public List<String> ids = new ArrayList<String>();
private boolean inAssignment = false;
#Override
public void enterExpression(TParser.ExpressionContext ctx) {
if (!this.inAssignment && ctx.Identifier() != null) {
this.ids.add(ctx.Identifier().getText());
}
}
#Override
public void enterAssignment(TParser.AssignmentContext ctx) {
this.inAssignment = true;
}
#Override
public void exitAssignment(TParser.AssignmentContext ctx) {
this.inAssignment = false;
}
}
will print:
[a, b, c, d]

Maybe something like:
if (!(expressionContext.parent() instanceof AssignmentContext)) {
expressionContext.getTokens(identifierType);
}
You can fine that by walking the parse tree structure and check the different members in the expression context.

Related

Building expression parser with Dart petitparser, getting stuck on node visitor

I've got more of my expression parser working (Dart PetitParser to get at AST datastructure created with ExpressionBuilder). It appears to be generating accurate ASTs for floats, parens, power, multiply, divide, add, subtract, unary negative in front of both numbers and expressions. (The nodes are either literal strings, or an object that has a precedence with a List payload that gets walked and concatenated.)
I'm stuck now on visiting the nodes. I have clean access to the top node (thanks to Lukas), but I'm stuck on deciding whether or not to add a paren. For example, in 20+30*40, we don't need parens around 30*40, and the parse tree correctly has the node for this closer to the root so I'll hit it first during traversal. However, I don't seem to have enough data when looking at the 30*40 node to determine if it needs parens before going on to the 20+.. A very similar case would be (20+30)*40, which gets parsed correctly with 20+30 closer to the root, so once again, when visiting the 20+30 node I need to add parens before going on to *40.
This has to be a solved problem, but I never went to compiler school, so I know just enough about ASTs to be dangerous. What "a ha" am I missing?
// rip-common.dart:
import 'package:petitparser/petitparser.dart';
// import 'package:petitparser/debug.dart';
class Node {
int precedence;
List<dynamic> args;
Node([this.precedence = 0, this.args = const []]) {
// nodeList.add(this);
}
#override
String toString() => 'Node($precedence $args)';
String visit([int fromPrecedence = -1]) {
print('=== visiting $this ===');
var buf = StringBuffer();
var parens = (precedence > 0) &&
(fromPrecedence > 0) &&
(precedence < fromPrecedence);
print('<$fromPrecedence $precedence $parens>');
// for debugging:
var curlyOpen = '';
var curlyClose = '';
buf.write(parens ? '(' : curlyOpen);
for (var arg in args) {
if (arg is Node) {
buf.write(arg.visit(precedence));
} else if (arg is String) {
buf.write(arg);
} else {
print('not Node or String: $arg');
buf.write('$arg');
}
}
buf.write(parens ? ')' : curlyClose);
print('$buf for buf');
return '$buf';
}
}
class RIPParser {
Parser _make_parser() {
final builder = ExpressionBuilder();
var number = char('-').optional() &
digit().plus() &
(char('.') & digit().plus()).optional();
// precedence 5
builder.group()
..primitive(number.flatten().map((a) => Node(0, [a])))
..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a]));
// negation is a prefix operator
// precedence 4
builder.group()..prefix(char('-').trim(), (op, a) => Node(4, [op, a]));
// power is right-associative
// precedence 3
builder.group()..right(char('^').trim(), (a, op, b) => Node(3, [a, op, b]));
// multiplication and addition are left-associative
// precedence 2
builder.group()
..left(char('*').trim(), (a, op, b) => Node(2, [a, op, b]))
..left(char('/').trim(), (a, op, b) => Node(2, [a, op, b]));
// precedence 1
builder.group()
..left(char('+').trim(), (a, op, b) => Node(1, [a, op, b]))
..left(char('-').trim(), (a, op, b) => Node(1, [a, op, b]));
final parser = builder.build().end();
return parser;
}
Result _result(String input) {
var parser = _make_parser(); // eventually cache
var result = parser.parse(input);
return result;
}
String parse(String input) {
var result = _result(input);
if (result.isFailure) {
return result.message;
} else {
print('result.value = ${result.value}');
return '$result';
}
}
String visit(String input) {
var result = _result(input);
var top_node = result.value; // result.isFailure ...
return top_node.visit();
}
}
// rip_cmd_example.dart
import 'dart:io';
import 'package:rip_common/rip_common.dart';
void main() {
print('start');
String input;
while (true) {
input = stdin.readLineSync();
if (input.isEmpty) {
break;
}
print(RIPParser().parse(input));
print(RIPParser().visit(input));
}
;
print('done');
}
As you've observed, the ExpressionBuilder already assembles the tree in the right precedence order based on the operator groups you've specified.
This also happens for the wrapping parens node created here: ..wrapper(char('('), char(')'), (l, a, r) => Node(0, [a])). If I test for this node, I get back the input string for your example expressions: var parens = precedence == 0 && args.length == 1 && args[0] is Node;.
Unless I am missing something, there should be no reason for you to track the precedence manually. I would also recommend that you create different node classes for the different operators: ValueNode, ParensNode, NegNode, PowNode, MulNode, ... A bit verbose, but much easier to understand what is going on, if each of them can just visit (print, evaluate, optimize, ...) itself.

Breaking head over how to get position of token with a rule - ANTLR4 / grammar

I'm writing a little grammar using ANLTR, and I have a rule like this:
operation : OPERATION (IDENT | EXPR) ',' (IDENT | EXPR);
...
OPERATION : 'ADD' | 'SUB' | 'MUL' | 'DIV' ;
IDENT : [a-z]+;
EXPR : INTEGER | FLOAT;
INTEGER : [0-9]+ | '-'[0-9]+
FLOAT : [0-9]+'.'[0-9]+ | '-'[0-9]+'.'[0-9]+
Now in the listener inside Java, how do I determine in the case of such a scenario where an operation consist of both IDENT and EXPR the order in which they appear?
Obviously the rule can match both
ADD 10, d
or
ADD d, 10
But in the listener for the rule, generated by ANTLR4, if there is both IDENT() and EXPR() how to get their order, since I want to assign the left and right operands correctly.
Been breaking my head over this, is there any simple way or should I rewrite the rule itself? The ctx.getTokens () requires me to give the token type, which kind of defeats the purpose, since I cannot get the sequence of the tokens in the rule, if I specify their type.
You can do it like this:
operation : OPERATION lhs=(IDENT | EXPR) ',' rhs=(IDENT | EXPR);
and then inside your listener, do this:
#Override
public void enterOperation(TParser.OperationContext ctx) {
if (ctx.lhs.getType() == TParser.IDENT) {
// left hand side is an identifier
} else {
// left hand side is an expression
}
// check `rhs` the same way
}
where TParser comes from the grammar file T.g4. Change this accordingly.
Another solution would be something like this:
operation
: OPERATION ident_or_expr ',' ident_or_expr
;
ident_or_expr
: IDENT
| EXPR
;
and then in your listener:
#Override
public void enterOperation(TParser.OperationContext ctx) {
Double lhs = findValueFor(ctx.ident_or_expr().get(0));
Double rhs = findValueFor(ctx.ident_or_expr().get(1));
...
}
private Double findValueFor(TParser.Ident_or_exprContext ctx) {
if (ctx.IDENT() != null) {
// it's an identifier
} else {
// it's an expression
}
}

Xtext field dot-notation scope provider

Can the "memory" be avoided in the following Xtext scope provider for the field dot-notation of the following grammar, which is stock-standard except for the field dot-notation of the LocalRelation.
grammar test.path.Path with org.eclipse.xtext.common.Terminals
generate path "http://www.path.test/Path"
Model:
(elements+=AbstractElement)*;
PackageDeclaration:
'package' name=QualifiedName '{'
(elements+=AbstractElement)*
'}';
AbstractElement:
PackageDeclaration | Entity | Import;
Import:
'import' importedNamespace=QualifiedNameWithWildcard;
Entity:
'entity' name=ID '{'
relations+=Relation*
'}';
Relation:
GlobalRelation | LocalRelation;
GlobalRelation:
('id')? name=ID ':' ('*' | '+' | '?')? ref=[Entity|QualifiedName];
LocalRelation:
('id')? name=ID ':' ('*' | '+' | '?')? 'this' '.' head=[Relation|ID] ('.' tail+=[Relation|ID])*;
QualifiedNameWithWildcard:
QualifiedName '.*'?;
QualifiedName:
ID ('.' ID)*;
The follow grammar instance demonstrates the LocalRelation dot-notion.
entity A {
b : B
}
entity B {
a : A
}
entity C {
b : B
x1 : this.b
x2 : this.x1
// x3 : this.x3 No self ref
x4 : this.b.a
x5 : this.x1.a
x6 : this.x1.a.b.a.b.a.b.a
}
entity D {
c : C
x1 : this.c.b.a
}
The scoping resolution for GlobalRelations works out of the box, but of course the scoping for LocalRelations does not. I have come up with the following working scope provider, but it uses a global map to keep track of dot-depth, together with a special head to set the counter to zero, since one cannot sample the value of a reference before it is defined as that causes an infinite loop.
class PathScopeProvider extends AbstractPathScopeProvider {
#Inject extension IQualifiedNameProvider
override getScope(EObject context, EReference reference) {
if (context instanceof LocalRelation) {
return if (reference == PathPackage.Literals.LOCAL_RELATION__HEAD)
getScopeLocalRelation_HEAD(context as LocalRelation, context.eContainer as Entity)
else if (reference == PathPackage.Literals.LOCAL_RELATION__TAIL)
getScopeLocalRelation_TAIL(context as LocalRelation, context.eContainer as Entity)
}
return super.getScope(context, reference);
}
def IScope getScopeLocalRelation_HEAD(LocalRelation contextLocalRelation,
Entity contextLocalRelationContainingEntity) {
// Don't touch contextLocalRelation.head not contextLocalRelation.tail!
val result = newArrayList
contextLocalRelationContainingEntity.relations.filter(
target |
target != contextLocalRelation
).forEach [ target |
{
result.add(EObjectDescription.create(QualifiedName.create(target.name), target))
resetDepth(contextLocalRelation)
}
]
return new SimpleScope(IScope.NULLSCOPE, result)
}
def IScope getScopeLocalRelation_TAIL(LocalRelation contextLocalRelation,
Entity contextLocalRelationContainingEntity) {
// Note that head is well-defined, while tail is well-defined up to depth
val head = contextLocalRelation.head
val result = newArrayList
val depthSoFar = getDepth(contextLocalRelation)
incDepth(contextLocalRelation)
val targetSoFar = if(depthSoFar === 0) head else contextLocalRelation.tail.get(depthSoFar - 1)
if (targetSoFar instanceof GlobalRelation) {
val targetSoFar_Global = targetSoFar as GlobalRelation
targetSoFar_Global.ref.relations.forEach [ t |
result.add(EObjectDescription.create(QualifiedName.create(t.name), t))
]
} else if (targetSoFar instanceof LocalRelation) {
var Relation i = targetSoFar as LocalRelation
while (i instanceof LocalRelation) {
i = if(i.tail.empty) i.head else i.tail.last
}
(i as GlobalRelation).ref.relations.forEach [ t |
result.add(EObjectDescription.create(QualifiedName.create(t.name), t))
]
}
return new SimpleScope(IScope.NULLSCOPE, result)
}
// DEPTH MEMORY
val enity_relation__depthSoFar = new HashMap<String, Integer>()
private def void resetDepth(LocalRelation r) {
enity_relation__depthSoFar.put(r.fullyQualifiedName.toString, 0)
}
private def int getDepth(LocalRelation r) {
enity_relation__depthSoFar.get(r.fullyQualifiedName.toString)
}
private def int incDepth(LocalRelation r) {
enity_relation__depthSoFar.put(r.fullyQualifiedName.toString, getDepth(r) + 1)
}
}
Can this additional depth memory be avoided in any way?
Is there some internal way of detecting the depth of the tail so far scoped?
I have experimented with a try-catch block but that doesn't work, and would be sloppy anyway.
The following solution is derived from Xtext and Dot/Path-Expressions, as suggested by Christian Dietrich, and that is the example to use if you are looking to implement standard field dot path scoping.
The grammar above requires a small change.
LocalRelation:
('id')? name=ID ':' ('*' | '+' | '?')? 'this' '.' path=Path;
Path: head=[Relation|ID] ({Path.curr=current} "." tail+=[Relation|ID])*;
The only change is {Path.curr=current} and this is the "magic ingredient" that makes all the difference. The scope provider is now almost trivial, especially compared to the monster in the OP.
override getScope(EObject context, EReference reference) {
if (context instanceof Path) {
if (reference == PathPackage.Literals.PATH__HEAD) {
// Filter self reference
return new FilteringScope(super.getScope(context, reference), [ e |
!Objects.equals(e.getEObjectOrProxy(), context)
]);
} else { // DOT_EXPRESSION__TAIL
val target = context.curr.followTheDots
if (target === null) {
return IScope::NULLSCOPE
}
return new FilteringScope(super.getScope(target, reference), [ e |
!Objects.equals(e.getEObjectOrProxy(), context)
]);
}
}
return super.getScope(context, reference);
}
def Entity followTheDots(Path path) {
var targetRelation = if(path.tail === null || path.tail.empty) path.head else path.tail.last;
return if (targetRelation instanceof GlobalRelation)
targetRelation.ref
else if (targetRelation instanceof LocalRelation)
targetRelation.path.followTheDots
else null // happens when dot path is nonsense
}
def GlobalRelation followTheDots(LocalRelation exp) {
var targetRelation = if(exp.tail === null || exp.tail.empty) exp.head else exp.tail.last;
return if (targetRelation instanceof GlobalRelation)
targetRelation
else if (targetRelation instanceof LocalRelation)
targetRelation.followTheDots
else null // happens when dot path is nonsense
}
While we cannot "touch" tail in the scope provider, the "magic ingredient" {LocalRelation.curr=current} allows the iterative scope provider full access to the previously fully resolved and well-defined reference via curr. Post scoping, the full resolved and well-defined trail is available via head and tail.
[... Edited because original was wrong! ...]

xtext prase of context dependant grammars

I have the grammar
Model:
vars+=Vars*
funcs+=Funcs*;
Name:
name=ID;
VarName:
Name;
FuncName:
Name;
Funcs:
'func' left=FuncName (bracket?='(' ')')? '=' right=[Name]';';
Vars:
'var' VarName ';';
where the right hand size of the Func rule can be either of type VarName or FuncName depending is the brackets on the left hand size appear.
Must I modify the xtext grammar or do a type of validation/scoping?
Update 1
the scope function:
override getScope(EObject context, EReference reference) {
if (context instanceof Funcs) {
val func = context as Funcs
if (reference == MultiNameDslPackage.Literals.FUNCS__RIGHT) {
if (func.bracket) {
val rootElement = EcoreUtil2.getRootContainer(context)
val candidates = EcoreUtil2.getAllContentsOfType(rootElement, VarName)
return Scopes.scopeFor(candidates)
} else {
val rootElement = EcoreUtil2.getRootContainer(context)
val candidates = EcoreUtil2.getAllContentsOfType(rootElement, FuncName)
return Scopes.scopeFor(candidates)
}
}
return super.getScope(context, reference);
}
}
The left hand size is independent of the presence of the brackets in the editor.
Update 2
Using validation
#Check
def checkFuncContext(Funcs func) {
if (func.bracket) {
if (!(func.right instanceof VarName)) {
warning("Right hand size must be of Var type",
MultiNameDslPackage.Literals.FUNCS__RIGHT
)
}
} else {
if (!(func.right instanceof FuncName)) {
warning("Right hand size must be of Function type",
MultiNameDslPackage.Literals.FUNCS__RIGHT
)
}
}
}
The warning statements are not executed. The statement func.right instanceof FuncName) does not behave as expected.
How can I test for the correct instance?
Update 3
Using a modified grammar
VarName:
name=ID;
FuncName:
name=ID;
Funcs:
'func' left=FuncName (bracket?='(' ')')? '=' (right=[FuncName] | r1=[VarName]) ';';
does not compile: Decision can match input such as "RULE_ID" using multiple alternatives: 1, 2
You need to change your grammar to get the inheritance order for Name, FuncName and VarName right (Name super type of both)
Either use a parser fragment
fragment Name: name=ID;
Or use
Name:VarName|FuncName;
VarName: name=ID;
FuncName:name=ID;

Parse stacked comparison expression into logical Conjunction Tree with antlr3

I have run into a problem, when i tried to parse a stacked arithmetic comparison expression:
"1<2<3<4<5"
into a logical Tree of Conjunctions:
CONJUNCTION(COMPARISON(1,2,<) COMPARISON(2,3,<) COMPARISON(3,4,<) COMPARISON(4,5,<))
Is there a way in Antlr3 Tree Rewrite rules to iterate through matched tokens and create the result Tree from them in the target language (I'm using java)? So i could make COMPARISON nodes from element x, x-1 of matched 'addition' tokens. I know i can reference the last result of a rule but that way i'd only get nested COMPARISON rules, that's not what i wish for.
/This is how i approached the problem, sadly it doesn't do what i would like to do yet of course.
fragment COMPARISON:;
operator
:
('<'|'>'|'<='|'>='|'=='|'!=')
;
comparison
#init{boolean secondpart = false;}
:
e=addition (operator {secondpart=true;} k=addition)*
-> {secondpart}? ^(COMPARISON ^(VALUES addition*) ^(OPERATORS operator*))
-> $e
;
//Right now what this does is:
tree=(COMPARISON (VALUES (INTEGERVALUE (VALUE 1)) (INTEGERVALUE (VALUE 2)) (INTEGERVALUE (VALUE 3)) (INTEGERVALUE (VALUE 4)) (INTEGERVALUE (VALUE 5))) (OPERATORS < < < <))
//The label for the CONJUNCTION TreeNode that i would like to use:
fragment CONJUNCTION:;
I came up with a nasty solution to this problem by writing actual tree building java code:
grammar testgrammarforcomparison;
options {
language = Java;
output = AST;
}
tokens
{
CONJUNCTION;
COMPARISON;
OPERATOR;
ADDITION;
}
WS
:
('\t' | '\f' | ' ' | '\r' | '\n' )+
{$channel = HIDDEN;}
;
comparison
#init
{
List<Object> additions = new ArrayList<Object>();
List<Object> operators = new ArrayList<Object>();
boolean secondpart = false;
}
:
(( e=addition {additions.add(e.getTree());} ) ( op=operator k=addition {additions.add(k.getTree()); operators.add(op.getTree()); secondpart = true;} )*)
{
if(secondpart)
{
root_0 = (Object)adaptor.nil();
Object root_1 = (Object)adaptor.nil();
root_1 = (Object)adaptor.becomeRoot(
(Object)adaptor.create(CONJUNCTION, "CONJUNCTION")
, root_1);
Object lastaddition = additions.get(0);
for(int i=1;i<additions.size();i++)
{
Object root_2 = (Object)adaptor.nil();
root_2 = (Object)adaptor.becomeRoot(
(Object)adaptor.create(COMPARISON, "COMPARISON")
, root_2);
adaptor.addChild(root_2, additions.get(i-1));
adaptor.addChild(root_2, operators.get(i-1));
adaptor.addChild(root_2, additions.get(i));
adaptor.addChild(root_1, root_2);
}
adaptor.addChild(root_0, root_1);
}
else
{
root_0 = (Object)adaptor.nil();
adaptor.addChild(root_0, e.getTree());
}
}
;
/** lowercase letters */
fragment LOWCHAR
: 'a'..'z';
/** uppercase letters */
fragment HIGHCHAR
: 'A'..'Z';
/** numbers */
fragment DIGIT
: '0'..'9';
fragment LETTER
: LOWCHAR
| HIGHCHAR
;
IDENTIFIER
:
LETTER (LETTER|DIGIT)*
;
addition
:
IDENTIFIER ->^(ADDITION IDENTIFIER)
;
operator
:
('<'|'>') ->^(OPERATOR '<'* '>'*)
;
parse
:
comparison EOF
;
For input
"DATA1 < DATA2 > DATA3"
This outputs tree such as:
If you guys know any better solutions, please tell me about them

Resources