goto statement in antlr? - parsing

I would really appreciate if someone could give me advice,or point me to tutorial, or sample implementation, anything that could help me implement basic goto statement in ANTLR?
Thanks for any help
edit. ver2 of question:
Say I have this tree structure:
(BLOCK (PRINT 1) (PRINT 2) (PRINT 3) (PRINT 4) )
Now, I'm interested to know is there a way to
select, say, node (PRINT 2) and all nodes that follow
that node ((PRINT 2) (PRINT 3) (PRINT 4)) ?
I'm asking this because I'm trying to implement
basic goto mechanism.
I have print statement like this:
i=LABEL print
{interpreter.store($i.text, $print.tree);} //stores in hash table
-> print
However $print.tree just ignores later nodes,
so in input:
label: print 1
print 2
goto label
would print 121!
(What I would like is infinite loop 1212...)
I've also tried taking token
address of print statement with
getTokenStartIndex() and setting
roots node with setTokenStartIndex
but that just looped whatever was first node over and over.
My question is, how does one implement goto statement in antlr ?
Maybe my approach is wrong, as I have overlooked something?
I would really appreciate any help.
ps. even more detail, it is related to pattern 25 - Language Implementation patterns, I'm trying to add on to examples from that pattern.
Also, I've searched quite a bit on the web, looks like it is very hard to find goto example

... anything that could help me implement basic goto statement in ANTLR?
Note that it isn't ANTLR that implements this. With ANTLR you merely describe the language you want to parse to get a lexer, parser and possibly a tree-walker. After that, it's up to you to manipulate the tree and evaluate it.
Here's a possible way. Please don't look too closely at the code. It's a quick hack: there's a bit of code-duplication and I'm am passing package protected variables around which isn't as it should be done. The grammar also dictates you to start your input source with a label, but this is just a small demo of how you could solve it.
You need the following files:
Goto.g - the combined grammar file
GotoWalker.g - the tree walker grammar file
Main.java - the main class including the Node-model classes of the language
test.goto - the test input source file
antlr-3.3.jar - the ANTLR JAR (could also be another 3.x version)
Goto.g
grammar Goto;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
FILE;
BLOCK;
}
#members {
java.util.Map<String, CommonTree[]> labels = new java.util.HashMap<String, CommonTree[]>();
}
parse
: block EOF -> block
;
block
: ID ':' stats b=block? {labels.put($ID.text, new CommonTree[]{$stats.tree, $b.tree});} -> ^(BLOCK stats $b?)
;
stats
: stat*
;
stat
: Print Number -> ^(Print Number)
| Goto ID -> ^(Goto ID)
;
Goto : 'goto';
Print : 'print';
Number : '0'..'9'+;
ID : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
GotoWalker.g
tree grammar GotoWalker;
options {
tokenVocab=Goto;
ASTLabelType=CommonTree;
}
tokens {
FILE;
BLOCK;
}
#members {
java.util.Map<String, CommonTree[]> labels = new java.util.HashMap<String, CommonTree[]>();
}
walk returns [Node n]
: block {$n = $block.n;}
;
block returns [Node n]
: ^(BLOCK stats b=block?) {$n = new BlockNode($stats.n, $b.n);}
;
stats returns [Node n]
#init{List<Node> nodes = new ArrayList<Node>();}
: (stat {nodes.add($stat.n);})* {$n = new StatsNode(nodes);}
;
stat returns [Node n]
: ^(Print Number) {$n = new PrintNode($Number.text);}
| ^(Goto ID) {$n = new GotoNode($ID.text, labels);}
;
Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
import java.util.*;
public class Main {
public static void main(String[] args) throws Exception {
GotoLexer lexer = new GotoLexer(new ANTLRFileStream("test.goto"));
GotoParser parser = new GotoParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
GotoWalker walker = new GotoWalker(new CommonTreeNodeStream(tree));
walker.labels = parser.labels;
Node root = walker.walk();
root.eval();
}
}
interface Node {
public static final Node VOID = new Node(){public Object eval(){throw new RuntimeException("VOID.eval()");}};
public static final Node BREAK = new Node(){public Object eval(){throw new RuntimeException("VOID.eval()");}};
Object eval();
}
class BlockNode implements Node {
Node stats;
Node child;
BlockNode(Node ns, Node ch) {
stats = ns;
child = ch;
}
public Object eval() {
Object o = stats.eval();
if(o != VOID) {
return o;
}
if(child != null) {
o = child.eval();
if(o != VOID) {
return o;
}
}
return VOID;
}
}
class StatsNode implements Node {
List<Node> nodes;
StatsNode(List<Node> ns) {
nodes = ns;
}
public Object eval() {
for(Node n : nodes) {
Object o = n.eval();
if(o != VOID) {
return o;
}
}
return VOID;
}
}
class PrintNode implements Node {
String text;
PrintNode(String txt) {
text = txt;
}
public Object eval() {
System.out.println(text);
return VOID;
}
}
class GotoNode implements Node {
String label;
Map<String, CommonTree[]> labels;
GotoNode(String lbl, Map<String, CommonTree[]> lbls) {
label = lbl;
labels = lbls;
}
public Object eval() {
CommonTree[] toExecute = labels.get(label);
try {
Thread.sleep(1000L);
GotoWalker walker = new GotoWalker(new CommonTreeNodeStream(toExecute[0]));
walker.labels = this.labels;
Node root = walker.stats();
Object o = root.eval();
if(o != VOID) {
return o;
}
walker = new GotoWalker(new CommonTreeNodeStream(toExecute[1]));
walker.labels = this.labels;
root = walker.block();
o = root.eval();
if(o != VOID) {
return o;
}
} catch(Exception e) {
e.printStackTrace();
}
return BREAK;
}
}
test.goto
root:
print 1
A:
print 2
B:
print 3
goto A
C:
print 4
To run the demo, do the following:
*nix/MacOS
java -cp antlr-3.3.jar org.antlr.Tool Goto.g
java -cp antlr-3.3.jar org.antlr.Tool GotoWalker.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
or:
Windows
java -cp antlr-3.3.jar org.antlr.Tool Goto.g
java -cp antlr-3.3.jar org.antlr.Tool GotoWalker.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
which will print:
1
2
3
2
3
2
3
2
3
...
Note that the 2 and 3 are repeated until you terminate the app manually.

Related

In Xtext, how to tweak certain function calls

I am using Xtext 2.15 to generate a language that, among other things, processes asynchronous calls in a way they look synchronous.
For instance, the following code in my language:
int a = 1;
int b = 2;
boolean sleepSuccess = doSleep(2000); // sleep two seconds
int c = 3;
int d = 4;
would generate the following Java code:
int a = 1;
int b = 2;
doSleep(2000, new DoSleepCallback() {
public void onTrigger(boolean rc) {
boolean sleepSuccess = rc;
int c = 3;
int d = 4;
}
});
To achieve it, I defined the grammar this way:
grammar org.qedlang.qed.QED with jbase.Jbase // Jbase inherits Xbase
...
FunctionDeclaration return XExpression:
=>({FunctionDeclaration} type=JvmTypeReference name=ValidID '(')
(params+=FullJvmFormalParameter (',' params+=FullJvmFormalParameter)*)?
')' block=XBlockExpression
;
The FunctionDeclaration rule is used to define asynchronous calls. In my language library, I would have as system call:
boolean doSleep(int millis) {} // async FunctionDeclaration element stub
The underlying Java implementation would be:
public abstract class DoSleepCallback {
public abstract void onTrigger(boolean rc);
}
public void doSleep(int millis, DoSleepCallback callback) {
<perform sleep and call callback.onTrigger(<success>)>
}
So, using the inferrer, type computer and compiler, how to identify calls to FunctionDeclaration elements, add a callback parameter and process the rest of the body in an inner class?
I could, for instance, override appendFeatureCall in the language compiler, would it work? There is still a part I don't know how to do...
override appendFeatureCall(XAbstractFeatureCall call, ITreeAppendable b) {
...
val feature = call.feature
...
if (feature instanceof JvmExecutable) {
b.append('(')
val arguments = call.actualArguments
if (!arguments.isEmpty) {
...
arguments.appendArguments(b, shouldBreakFirstArgument)
// HERE IS THE PART I DON'T KNOW HOW TO DO
<IF feature IS A FunctionDeclaration>
<argument.appendArgument(NEW GENERATED CALLBACK PARAMETER)>
<INSERT REST OF XBlockExpression body INSIDE CALLBACK INSTANCE>
<ENDIF>
}
b.append(');')
}
}
So basically, how to tell if "feature" points to FunctionDeclaration? The rest, I may be able to do it...
Related to another StackOverflow entry, I had the idea of implementing FunctionDeclaration in the inferrer as a class instead of as a method:
def void inferExpressions(JvmDeclaredType it, FunctionDeclaration function) {
// now let's go over the features
for ( f : (function.block as XBlockExpression).expressions ) {
if (f instanceof FunctionDeclaration) {
members += f.toClass(f.fullyQualifiedName) [
inferVariables(f)
superTypes += typeRef(FunctionDeclarationObject)
// let's add a default constructor
members += f.toConstructor [
for (p : f.params)
parameters += p.toParameter(p.name, p.parameterType)
body = f.block
]
inferExpressions(f)
]
}
}
}
The generated class would extend FunctionDeclarationObject, so I thought there was a way to identify FunctionDeclaration as FunctionDeclarationObject subclasses. But then, I would need to extend the XFeatureCall default scoping to include classes in order to making it work...
I fully realize the question is not obvious, sorry...
Thanks,
Martin
EDIT: modified DoSleepCallback declaration from static to abstract (was erroneous)
I don't think you can generate what you need using the jvm model inferrer.
You should provide your own subclass of the XbaseCompiler (or JBaseCompiler, if any... and don't forget to register with guice in your runtime module), and override doInternalToJavaStatement(XExpression expr, ITreeAppendable it, boolean isReferenced) to manage how your FunctionDeclaration should be generated.

Xtext: Use EClass in XExpression

I am writing on a Xtext grammar that uses XExpressions and also operates on Eclasses. Now I want to also be able to access Eclasses from the XExpression, for example I write an expression like this:
Eclass1.attribute1 = Eclass2.attribute1
I would like to know, how I can use the Eclass from within the XExpression?
Grammar
grammar org.xtext.example.mydsl.Mydsl with
org.eclipse.xtext.xbase.Xbase
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate mydsl "http://www.xtext.org/example/mydsl/Mydsl"
Model:
(operations += Operation)*;
terminal ATTR : ID ('.' ID)+;
Operation:
'operation' left=[ecore::EClass|ATTR] 'and' right=
[ecore::EClass|ATTR] 'defined' 'as' condition=XExpression
;
Inferrer/ Infer method
def dispatch void infer(Model element, IJvmDeclaredTypeAcceptor acceptor, boolean isPreIndexingPhase) {
acceptor.accept(element.toClass("example.mydsl")) [
for (operation : element.operations) {
left = operation.left
right = operation.right
if (left.eIsProxy()) {
left = EcoreUtil.resolve(left, operation) as EClass
}
if (right.eIsProxy()) {
right = EcoreUtil.resolve(right, operation) as EClass
}
//field for right class left out, but works the same
members += left.toField(left.name,typeRef(left.EPackage.name+"."+left.name))
members += operation.toMethod("conditionExpr",
typeRef(Void.TYPE)) [
body = operation.condition
]
}
]
}
RuntimeModule
class MyDslRuntimeModule extends AbstractMyDslRuntimeModule {
def Class<? extends ImplicitlyImportedFeatures> bindImplicitlyImportedTypes() {
return MyImportFeature
}
}
MyImportFeature
class MyImportFeature extends ImplicitlyImportedFeatures{
override protected getStaticImportClasses() {
(super.getStaticImportClasses() + #[PackageFromWorkSpace]).toList
}
}
I Am not sure if i get your question.
Ususally EMF generates constants for EAttributes so if you want to access the attributes themselfs
so you could either do
MyDslPackage.Literals.GREETING__NAME
or
MyDslPackage.eINSTANCE.getGreeting_Name()
can you give some more hints on what you actually want to do
update: here is a snippet on how to get a java class from a reference to an eclass
Thingy:{
val EClass eclazz = f.clazz
val uri = EcorePlugin.getEPackageNsURIToGenModelLocationMap(true).get(eclazz.EPackage.nsURI)
val rs = new ResourceSetImpl
val r = rs.getResource(uri, true)
r.load(null)
val p = r.contents.head
if (p instanceof GenModel) {
val genClass = p.findGenClassifier(eclazz)
if (genClass instanceof GenClass) {
println(genClass.qualifiedInterfaceName)
members+=f.toField(eclazz.name, genClass.qualifiedInterfaceName.typeRef)
}
}
}

Replacement + Side effect

While visiting a compilation unit--- and given a certain condition- I would like to apply a transformation (using the => operator) and count the number of times the same transformation was applied for a given compilation unit.
I was able to perform that using a kind of "global module variable", but I am quite sure that it is possible to combine both replacements and actions within a single visit expression. Is that possible?
module MultiCatch
import lang::java::\syntax::Java18;
import ParseTree;
import IO;
import Map;
import Type;
import List;
// sure, I don't like global variables.
//
// However I could not find a way to perform both
// a replacement and count the number of times
// it was applied in the same compilation unit.
int numberOfOccurences = 0;
/**
* Refactor a try-catch statement to use the
* MultiCatch construct of Java 7.
*/
public tuple[int, CompilationUnit] refactorMultiCatch(CompilationUnit unit) {
numberOfOccurences = 0;
CompilationUnit cu = visit(unit) {
case (TryStatement)`try <Block b1> <Catches c1>` => (TryStatement)`try <Block b1> <Catches mc>`
when mc := computeMultiCatches(c1)
};
return <numberOfOccurences, cu>;
}
/*
* Based on a simple notion of similarity,
* this function calculates the possible
* occurences of MultiCatch.
*/
private Catches computeMultiCatches(cs){
map [Block, tuple[list[CatchType], VariableDeclaratorId, Block] ] mCatches =();
visit(cs){
case(CatchClause)`catch (<CatchType t> <VariableDeclaratorId vId>) <Block b>` :{
if (b in mCatches){
<ts, vId, blk> = mCatches[b];
ts += t;
mCatches[b] = <ts, vId, blk>;
numberOfOccurences += 1;
}
else{
mCatches[b] = <[t], vId, b>;
}
}
}
return generateMultiCatches([mCatches[b] | b <- mCatches]);
}
/*
* Creates a syntactic catch clause (either a simple one or
* a multicatch).
*
* This is a recursive definition. The base case expects only
* one tuple, and than it returns a single catch clause. In the
* recursive definition, at least two tuples must be passed as
* arguments, and thus it returns at least two catches clauses
* (actually, one catch clause for each element in the list)
*/
private Catches generateMultiCatches([<ts, vId, b>]) = {
types = parse(#CatchType, intercalate("| ", ts));
return (Catches)`catch(<CatchType types> <VariableDeclaratorId vId>) <Block b>`;
};
private Catches generateMultiCatches([<ts, vId, b>, C*]) = {
catches = generateMultiCatches(C);
types = parse(#CatchType, intercalate("| ", ts));
return (Catches)`catch(<CatchType types> <VariableDeclaratorId vId>) <Block b> <CatchClause+ catches>`;
};
One way to do it is using a local variable and a block with an insert:
module MultiCatch
import lang::java::\syntax::Java18;
import ParseTree;
import IO;
import Map;
import Type;
import List;
/**
* Refactor a try-catch statement to use the
* MultiCatch construct of Java 7.
*/
public tuple[int, CompilationUnit] refactorMultiCatch(CompilationUnit unit) {
int numberOfOccurences = 0; /* the type is superfluous */
CompilationUnit cu = visit(unit) {
case (TryStatement)`try <Block b1> <Catches c1>` : {
numberOfOccurences += 1;
mc = computeMultiCatches(c1)
insert (TryStatement)`try <Block b1> <Catches mc>`;
}
};
return <numberOfOccurences, cu>;
}
The {...} block allows multiple statements to be executed after the match;
The local variable is now only present in the frame of refactorMultiCatch;
The insert statement has the same effect as the previous => arrow did;
Since the match := always succeeds, I changed the when clause into a simple assigment
There is also other more complex ways to share state in Rascal, but I personally prefer to have state not escape the lexical scope of a function.

How to build a parse tree of a mathematical expression?

I'm learning how to write tokenizers, parsers and as an exercise I'm writing a calculator in JavaScript.
I'm using a prase tree approach (I hope I got this term right) to build my calculator. I'm building a tree of tokens based on operator precedence.
For example, given an expression a*b+c*(d*(g-f)) the correct tree would be:
+
/ \
* \
/ \ \
a b \
*
/ \
c *
/ \
d -
/ \
g f
Once I've the tree, I can just traverse it down and apply the operations at each root node on the left and right nodes recursively to find the value of the expression.
However, the biggest problem is actually building this tree. I just can't figure out how to do it correctly. I can't just split on operators +, -, / and * and create the tree from the left and right parts because of precedence.
What I've done so far is tokenize the expression. So given a*b+c*(d*(g-f)), I end up with an array of tokens:
[a, Operator*, b, Operator+, c, Operator*, OpenParen, d, Operator*, OpenParen, g, Operator-, f, CloseParen, CloseParen]
However I can't figure out the next step about how to go from this array of tokens to a tree that I can traverse and figure out the value. Can anyone help me with ideas about how to do that?
Right mate, I dont know how to make this look pretty
I wrote a similar program in C but my tree is upside down, meaning new operators become the root.
A calculator parse tree in C code, but read the readme
ex: input 2 + 3 - 4
Start with empty node
{}
Rule 1: when you read a number, append a child either left or right of the current node, whichever one is empty
{}
/
{2}
Rule 2: then you read an operator, you have to climb starting at the current node {2} to an empty node, when you find one, change its value to +, if there are no empty nodes, you must create one then make it the root of the tree
{+}
/
{2}
you encounter another number, go to rule 1, we are currently at {+} find an side that is empty (this time right)
{+}
/ \
{2} {3}
Now we have new operand '-' but because the parent of {3} is full, you must create new node and make it the root of everything
{-}
/
{+}
/ \
{2} {3}
oh look at that, another number, because we are currently pointing at the root, lets find a child of {-} that is empty, (hint the right side)
{-}
/ \
{+} {4}
/ \
{2} {3}
Once a tree is built, take a look at the function getresult() to calculate everything
You are probably wondering how Parenthesization works. I did it this way:
I create brand new tree every time I encounter a '(' and build the rest of the input into that tree. If I read another '(', I create another one and continue build with the new tree.
Once input is read, I attach all the trees's roots to one another to make one final tree. Check out the code and readme, I have to draw to explain everything.
Hope this helps future readers as well
I've seen this question more often so I just wrote this away. This should definitely push you in the right direction but be careful, this is a top-down parser so it may not interpret the expression with the precedence you would expect.
class Program
{
static void Main()
{
Console.WriteLine(SimpleExpressionParser.Parse("(10+30*2)/20").ToString());
Console.ReadLine();
//
// ouput: ((10+30)*2)/20
}
public static class SimpleExpressionParser
{
public static SimpleExpression Parse(string str)
{
if (str == null)
{
throw new ArgumentNullException("str");
}
int index = 0;
return InternalParse(str, ref index, 0);
}
private static SimpleExpression InternalParse(string str, ref int index, int level)
{
State state = State.ExpectLeft;
SimpleExpression _expression = new SimpleExpression();
int startIndex = index;
int length = str.Length;
while (index < length)
{
char chr = str[index];
if (chr == ')' && level != 0)
{
break;
}
switch (state)
{
case State.ExpectLeft:
case State.ExpectRight:
{
SimpleExpression expression = null;
if (Char.IsDigit(chr))
{
int findRep = FindRep(Char.IsDigit, str, index + 1, length - 1);
expression = new SimpleExpression(int.Parse(str.Substring(index, findRep + 1)));
index += findRep;
}
else if (chr == '(')
{
index++;
expression = InternalParse(str, ref index, level + 1);
}
if (expression == null)
{
throw new Exception(String.Format("Expression expected at index {0}", index));
}
if (state == State.ExpectLeft)
{
_expression.Left = expression;
state = State.ExpectOperator;
}
else
{
_expression.Right = expression;
state = State.ExpectFarOperator;
}
}
break;
case State.ExpectOperator:
case State.ExpectFarOperator:
{
SimpleExpressionOperator op;
switch (chr)
{
case '+': op = SimpleExpressionOperator.Add; break;
case '-': op = SimpleExpressionOperator.Subtract; break;
case '*': op = SimpleExpressionOperator.Multiply; break;
case '/': op = SimpleExpressionOperator.Divide; break;
default:
throw new Exception(String.Format("Invalid operator encountered at index {0}", index));
}
if (state == State.ExpectOperator)
{
_expression.Operator = op;
state = State.ExpectRight;
}
else
{
index++;
return new SimpleExpression(op, _expression, InternalParse(str, ref index, level));
}
}
break;
}
index++;
}
if (state == State.ExpectLeft || state == State.ExpectRight)
{
throw new Exception("Could not complete expression");
}
return _expression;
}
private static int FindRep(Func<char, bool> validator, string str, int beginPos, int endPos)
{
int pos;
for (pos = beginPos; pos <= endPos; pos++)
{
if (!validator.Invoke(str[pos]))
{
break;
}
}
return pos - beginPos;
}
private enum State
{
ExpectLeft,
ExpectRight,
ExpectOperator,
ExpectFarOperator
}
}
public enum SimpleExpressionOperator
{
Add,
Subtract,
Multiply,
Divide
}
public class SimpleExpression
{
private static Dictionary<SimpleExpressionOperator, char> opEquivs = new Dictionary<SimpleExpressionOperator, char>()
{
{ SimpleExpressionOperator.Add, '+' },
{ SimpleExpressionOperator.Subtract, '-' },
{ SimpleExpressionOperator.Multiply, '*' },
{ SimpleExpressionOperator.Divide, '/' }
};
public SimpleExpression() { }
public SimpleExpression(int literal)
{
Literal = literal;
IsLiteral = true;
}
public SimpleExpression(SimpleExpressionOperator op, SimpleExpression left, SimpleExpression right)
{
Operator = op;
Left = left;
Right = right;
}
public bool IsLiteral
{
get;
set;
}
public int Literal
{
get;
set;
}
public SimpleExpressionOperator Operator
{
get;
set;
}
public SimpleExpression Left
{
get;
set;
}
public SimpleExpression Right
{
get;
set;
}
public override string ToString()
{
StringBuilder sb = new StringBuilder();
AppendExpression(sb, this, 0);
return sb.ToString();
}
private static void AppendExpression(StringBuilder sb, SimpleExpression expression, int level)
{
bool enclose = (level != 0 && !expression.IsLiteral && expression.Right != null);
if (enclose)
{
sb.Append('(');
}
if (expression.IsLiteral)
{
sb.Append(expression.Literal);
}
else
{
if (expression.Left == null)
{
throw new Exception("Invalid expression encountered");
}
AppendExpression(sb, expression.Left, level + 1);
if (expression.Right != null)
{
sb.Append(opEquivs[expression.Operator]);
AppendExpression(sb, expression.Right, level + 1);
}
}
if (enclose)
{
sb.Append(')');
}
}
}
}
See https://github.com/carlos-chaguendo/arboles-binarios
This algorithm converts an algebraic expression to a binary tree. Converts the expression to postfix and then evaluates and draws the tree
test: 2+3*4+2^3
output:
22 =
└── +
├── +
│ ├── 2
│ └── *
│ ├── 3
│ └── 4
└── ^
├── 2
└── 3
You may be interested in math.js, a math library for JavaScript which contains an advanced expression parser (see docs and examples. You can simply parse an expression like:
var node = math.parse('a*b+c*(d*(g-f))');
which returns a node tree (which can be compiled and evaluated).
You can study the parsers code here: https://github.com/josdejong/mathjs/blob/master/src/expression/parse.js
You can find a few examples of simpler expression parsers here (for C++ and Java): http://www.speqmath.com/tutorials/, that can be useful to study the basics of an expression parser.

mapping list of string into hierarchical structure of objects

This is not a homework problem. This questions was asked to one of my friend in an interview test.
I have a list of lines read from a file as input. Each line has a identifier such as (A,B,NN,C,DD) at the start of line. Depending upon the identifier, I need to map the list of records into a single object A which contains a hierarchy structure of objects.
Description of Hierarchy :
Each A can have zero or more B types.
Each B identifier can have zero or more NN and C as child. Similarly each C segment can have zero or more NN and DD child. Abd each DD can have zero or more NN as child.
Mapping classes and their hierarchy:
All the class will have value to hold the String value from current line.
**A - will have list of B**
class A {
List<B> bList;
String value;
public A(String value) {
this.value = value;
}
public void addB(B b) {
if (bList == null) {
bList = new ArrayList<B>();
}
bList.add(b);
}
}
**B - will have list of NN and list of C**
class B {
List<C> cList;
List<NN> nnList;
String value;
public B(String value) {
this.value = value;
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
public void addC(C c) {
if (cList == null) {
cList = new ArrayList<C>();
}
cList.add(c);
}
}
**C - will have list of DDs and NNs**
class C {
List<DD> ddList;
List<NN> nnList;
String value;
public C(String value) {
this.value = value;
}
public void addDD(DD dd) {
if (ddList == null) {
ddList = new ArrayList<DD>();
}
ddList.add(dd);
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
}
**DD - will have list of NNs**
class DD {
String value;
List<NN> nnList;
public DD(String value) {
this.value = value;
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
}
**NN- will hold the line only**
class NN {
String value;
public NN(String value) {
this.value = value;
}
}
What I Did So Far :
The method public A parse(List<String> lines) reads the input list and returns the object A. Since, there might be multiple B, i have created separate method 'parseB to parse each occurrence.
At parseB method, loops through the i = startIndex + 1 to i < lines.size() and checks the start of lines. Occurrence of "NN" is added to current object of B. If "C" is detected at start, it calls another method parseC. The loop will break when we detect "B" or "A" at start.
Similar logic is used in parseC_DD.
public class GTTest {
public A parse(List<String> lines) {
A a;
for (int i = 0; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("A")) {
a = new A(curLine);
continue;
}
if (curLine.startsWith("B")) {
i = parseB(lines, i); // returns index i to skip all the lines that are read inside parseB(...)
continue;
}
}
return a; // return mapped object
}
private int parseB(List<String> lines, int startIndex) {
int i;
B b = new B(lines.get(startIndex));
for (i = startIndex + 1; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
b.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("C")) {
i = parseC(b, lines, i);
continue;
}
a.addB(b);
if (curLine.startsWith("B") || curLine.startsWith("A")) { //ending condition
System.out.println("B A "+curLine);
--i;
break;
}
}
return i; // return nextIndex to read
}
private int parseC(B b, List<String> lines, int startIndex) {
int i;
C c = new C(lines.get(startIndex));
for (i = startIndex + 1; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
c.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("DD")) {
i = parseC_DD(c, lines, i);
continue;
}
b.addC(c);
if (curLine.startsWith("C") || curLine.startsWith("A") || curLine.startsWith("B")) {
System.out.println("C A B "+curLine);
--i;
break;
}
}
return i;//return next index
}
private int parseC_DD(C c, List<String> lines, int startIndex) {
int i;
DD d = new DD(lines.get(startIndex));
c.addDD(d);
for (i = startIndex; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
d.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("DD")) {
d=new DD(curLine);
continue;
}
c.addDD(d);
if (curLine.startsWith("NN") || curLine.startsWith("C") || curLine.startsWith("A") || curLine.startsWith("B")) {
System.out.println("NN C A B "+curLine);
--i;
break;
}
}
return i;//return next index
}
public static void main(String[] args) {
GTTest gt = new GTTest();
List<String> list = new ArrayList<String>();
list.add("A1");
list.add("B1");
list.add("NN1");
list.add("NN2");
list.add("C1");
list.add("NNXX");
list.add("DD1");
list.add("DD2");
list.add("NN3");
list.add("NN4");
list.add("DD3");
list.add("NN5");
list.add("B2");
list.add("NN6");
list.add("C2");
list.add("DD4");
list.add("DD5");
list.add("NN7");
list.add("NN8");
list.add("DD6");
list.add("NN7");
list.add("C3");
list.add("DD7");
list.add("DD8");
A a = gt.parse(list);
//show values of a
}
}
My logic is not working properly. Is there any other approach you can figure out? Do you have any suggestions/improvements to my way?
Use hierarchy of objects:
public interface Node {
Node getParent();
Node getLastChild();
boolean addChild(Node n);
void setValue(String value);
Deque getChildren();
}
private static abstract class NodeBase implements Node {
...
abstract boolean canInsert(Node n);
public String toString() {
return value;
}
...
}
public static class A extends NodeBase {
boolean canInsert(Node n) {
return n instanceof B;
}
}
public static class B extends NodeBase {
boolean canInsert(Node n) {
return n instanceof NN || n instanceof C;
}
}
...
public static class NN extends NodeBase {
boolean canInsert(Node n) {
return false;
}
}
Create a tree class:
public class MyTree {
Node root;
Node lastInserted = null;
public void insert(String label) {
Node n = NodeFactory.create(label);
if (lastInserted == null) {
root = n;
lastInserted = n;
return;
}
Node current = lastInserted;
while (!current.addChild(n)) {
current = current.getParent();
if (current == null) {
throw new RuntimeException("Impossible to insert " + n);
}
}
lastInserted = n;
}
...
}
And then print the tree:
public class MyTree {
...
public static void main(String[] args) {
List input;
...
MyTree tree = new MyTree();
for (String line : input) {
tree.insert(line);
}
tree.print();
}
public void print() {
printSubTree(root, "");
}
private static void printSubTree(Node root, String offset) {
Deque children = root.getChildren();
Iterator i = children.descendingIterator();
System.out.println(offset + root);
while (i.hasNext()) {
printSubTree(i.next(), offset + " ");
}
}
}
A mealy automaton solution with 5 states:
wait for A,
seen A,
seen B,
seen C, and
seen DD.
The parse is done completely in one method. There is one current Node that is the last Node seen except the NN ones. A Node has a parent Node except the root. In state seen (0), the current Node represents a (0) (e.g. in state seen C, current can be C1 in the example above). The most fiddling is in state seen DD, that has the most outgoing edges (B, C, DD, and NN).
public final class Parser {
private final static class Token { /* represents A1 etc. */ }
public final static class Node implements Iterable<Node> {
/* One Token + Node children, knows its parent */
}
private enum State { ExpectA, SeenA, SeenB, SeenC, SeenDD, }
public Node parse(String text) {
return parse(Token.parseStream(text));
}
private Node parse(Iterable<Token> tokens) {
State currentState = State.ExpectA;
Node current = null, root = null;
while(there are tokens) {
Token t = iterator.next();
switch(currentState) {
/* do stuff for all states */
/* example snippet for SeenC */
case SeenC:
if(t.Prefix.equals("B")) {
current.PN.PN.AddChildNode(new Node(t, current.PN.PN));
currentState = State.SeenB;
} else if(t.Prefix.equals("C")) {
}
}
return root;
}
}
I'm not satisfied with those trainwrecks to go up the hierarchy to insert a Node somewhere else (current.PN.PN). Eventually, explicit state classes would make the private parse method more readable. Then, the solution gets more akin to the one provided by #AlekseyOtrubennikov. Maybe a straight LL approach yields code that is more beautiful. Maybe best to just rephrase the grammar to a BNF one and delegate parser creation.
A straightforward LL parser, one production rule:
// "B" ("NN" || C)*
private Node rule_2(TokenStream ts, Node parent) {
// Literal "B"
Node B = literal(ts, "B", parent);
if(B == null) {
// error
return null;
}
while(true) {
// check for "NN"
Node nnLit = literal(ts, "NN", B);
if(nnLit != null)
B.AddChildNode(nnLit);
// check for C
Node c = rule_3(ts, parent);
if(c != null)
B.AddChildNode(c);
// finished when both rules did not match anything
if(nnLit == null && c == null)
break;
}
return B;
}
TokenStream enhances Iterable<Token> by allowing to lookahead into the stream - LL(1) because parser must choose between literal NN or deep diving in two cases (rule_2 being one of them). Looks nice, however, missing some C# features here...
#Stefan and #Aleksey are correct: this is simple parsing problem.
You can define your hierarchy constraints in Extended Backus-Naur Form:
A ::= { B }
B ::= { NN | C }
C ::= { NN | DD }
DD ::= { NN }
This description can be transformed into state machine and implemented. But there are a lot of tools that can effectively do this for you: Parser generators.
I am posting my answer only to show that it's quite easy to solve such problems with Haskell (or some other functional language).
Here is complete program that reads strings form stdin and prints parsed tree to the stdout.
-- We are using some standard libraries.
import Control.Applicative ((<$>), (<*>))
import Text.Parsec
import Data.Tree
-- This is EBNF-like description of what to do.
-- You can almost read it like a prose.
yourData = nodeA +>> eof
nodeA = node "A" nodeB
nodeB = node "B" (nodeC <|> nodeNN)
nodeC = node "C" (nodeNN <|> nodeDD)
nodeDD = node "DD" nodeNN
nodeNN = (`Node` []) <$> nodeLabel "NN"
node lbl children
= Node <$> nodeLabel lbl <*> many children
nodeLabel xx = (xx++)
<$> (string xx >> many digit)
+>> newline
-- And this is some auxiliary code.
f +>> g = f >>= \x -> g >> return x
main = do
txt <- getContents
case parse yourData "" txt of
Left err -> print err
Right res -> putStrLn $ drawTree res
Executing it with your data in zz.txt will print this nice tree:
$ ./xxx < zz.txt
A1
+- B1
| +- NN1
| +- NN2
| `- C1
| +- NN2
| +- DD1
| +- DD2
| | +- NN3
| | `- NN4
| `- DD3
| `- NN5
`- B2
+- NN6
+- C2
| +- DD4
| +- DD5
| | +- NN7
| | `- NN8
| `- DD6
| `- NN9
`- C3
+- DD7
`- DD8
And here is how it handles malformed input:
$ ./xxx
A1
B2
DD3
(line 3, column 1):
unexpected 'D'
expecting "B" or end of input

Resources