How to build a parse tree of a mathematical expression? - parsing

I'm learning how to write tokenizers, parsers and as an exercise I'm writing a calculator in JavaScript.
I'm using a prase tree approach (I hope I got this term right) to build my calculator. I'm building a tree of tokens based on operator precedence.
For example, given an expression a*b+c*(d*(g-f)) the correct tree would be:
+
/ \
* \
/ \ \
a b \
*
/ \
c *
/ \
d -
/ \
g f
Once I've the tree, I can just traverse it down and apply the operations at each root node on the left and right nodes recursively to find the value of the expression.
However, the biggest problem is actually building this tree. I just can't figure out how to do it correctly. I can't just split on operators +, -, / and * and create the tree from the left and right parts because of precedence.
What I've done so far is tokenize the expression. So given a*b+c*(d*(g-f)), I end up with an array of tokens:
[a, Operator*, b, Operator+, c, Operator*, OpenParen, d, Operator*, OpenParen, g, Operator-, f, CloseParen, CloseParen]
However I can't figure out the next step about how to go from this array of tokens to a tree that I can traverse and figure out the value. Can anyone help me with ideas about how to do that?

Right mate, I dont know how to make this look pretty
I wrote a similar program in C but my tree is upside down, meaning new operators become the root.
A calculator parse tree in C code, but read the readme
ex: input 2 + 3 - 4
Start with empty node
{}
Rule 1: when you read a number, append a child either left or right of the current node, whichever one is empty
{}
/
{2}
Rule 2: then you read an operator, you have to climb starting at the current node {2} to an empty node, when you find one, change its value to +, if there are no empty nodes, you must create one then make it the root of the tree
{+}
/
{2}
you encounter another number, go to rule 1, we are currently at {+} find an side that is empty (this time right)
{+}
/ \
{2} {3}
Now we have new operand '-' but because the parent of {3} is full, you must create new node and make it the root of everything
{-}
/
{+}
/ \
{2} {3}
oh look at that, another number, because we are currently pointing at the root, lets find a child of {-} that is empty, (hint the right side)
{-}
/ \
{+} {4}
/ \
{2} {3}
Once a tree is built, take a look at the function getresult() to calculate everything
You are probably wondering how Parenthesization works. I did it this way:
I create brand new tree every time I encounter a '(' and build the rest of the input into that tree. If I read another '(', I create another one and continue build with the new tree.
Once input is read, I attach all the trees's roots to one another to make one final tree. Check out the code and readme, I have to draw to explain everything.
Hope this helps future readers as well

I've seen this question more often so I just wrote this away. This should definitely push you in the right direction but be careful, this is a top-down parser so it may not interpret the expression with the precedence you would expect.
class Program
{
static void Main()
{
Console.WriteLine(SimpleExpressionParser.Parse("(10+30*2)/20").ToString());
Console.ReadLine();
//
// ouput: ((10+30)*2)/20
}
public static class SimpleExpressionParser
{
public static SimpleExpression Parse(string str)
{
if (str == null)
{
throw new ArgumentNullException("str");
}
int index = 0;
return InternalParse(str, ref index, 0);
}
private static SimpleExpression InternalParse(string str, ref int index, int level)
{
State state = State.ExpectLeft;
SimpleExpression _expression = new SimpleExpression();
int startIndex = index;
int length = str.Length;
while (index < length)
{
char chr = str[index];
if (chr == ')' && level != 0)
{
break;
}
switch (state)
{
case State.ExpectLeft:
case State.ExpectRight:
{
SimpleExpression expression = null;
if (Char.IsDigit(chr))
{
int findRep = FindRep(Char.IsDigit, str, index + 1, length - 1);
expression = new SimpleExpression(int.Parse(str.Substring(index, findRep + 1)));
index += findRep;
}
else if (chr == '(')
{
index++;
expression = InternalParse(str, ref index, level + 1);
}
if (expression == null)
{
throw new Exception(String.Format("Expression expected at index {0}", index));
}
if (state == State.ExpectLeft)
{
_expression.Left = expression;
state = State.ExpectOperator;
}
else
{
_expression.Right = expression;
state = State.ExpectFarOperator;
}
}
break;
case State.ExpectOperator:
case State.ExpectFarOperator:
{
SimpleExpressionOperator op;
switch (chr)
{
case '+': op = SimpleExpressionOperator.Add; break;
case '-': op = SimpleExpressionOperator.Subtract; break;
case '*': op = SimpleExpressionOperator.Multiply; break;
case '/': op = SimpleExpressionOperator.Divide; break;
default:
throw new Exception(String.Format("Invalid operator encountered at index {0}", index));
}
if (state == State.ExpectOperator)
{
_expression.Operator = op;
state = State.ExpectRight;
}
else
{
index++;
return new SimpleExpression(op, _expression, InternalParse(str, ref index, level));
}
}
break;
}
index++;
}
if (state == State.ExpectLeft || state == State.ExpectRight)
{
throw new Exception("Could not complete expression");
}
return _expression;
}
private static int FindRep(Func<char, bool> validator, string str, int beginPos, int endPos)
{
int pos;
for (pos = beginPos; pos <= endPos; pos++)
{
if (!validator.Invoke(str[pos]))
{
break;
}
}
return pos - beginPos;
}
private enum State
{
ExpectLeft,
ExpectRight,
ExpectOperator,
ExpectFarOperator
}
}
public enum SimpleExpressionOperator
{
Add,
Subtract,
Multiply,
Divide
}
public class SimpleExpression
{
private static Dictionary<SimpleExpressionOperator, char> opEquivs = new Dictionary<SimpleExpressionOperator, char>()
{
{ SimpleExpressionOperator.Add, '+' },
{ SimpleExpressionOperator.Subtract, '-' },
{ SimpleExpressionOperator.Multiply, '*' },
{ SimpleExpressionOperator.Divide, '/' }
};
public SimpleExpression() { }
public SimpleExpression(int literal)
{
Literal = literal;
IsLiteral = true;
}
public SimpleExpression(SimpleExpressionOperator op, SimpleExpression left, SimpleExpression right)
{
Operator = op;
Left = left;
Right = right;
}
public bool IsLiteral
{
get;
set;
}
public int Literal
{
get;
set;
}
public SimpleExpressionOperator Operator
{
get;
set;
}
public SimpleExpression Left
{
get;
set;
}
public SimpleExpression Right
{
get;
set;
}
public override string ToString()
{
StringBuilder sb = new StringBuilder();
AppendExpression(sb, this, 0);
return sb.ToString();
}
private static void AppendExpression(StringBuilder sb, SimpleExpression expression, int level)
{
bool enclose = (level != 0 && !expression.IsLiteral && expression.Right != null);
if (enclose)
{
sb.Append('(');
}
if (expression.IsLiteral)
{
sb.Append(expression.Literal);
}
else
{
if (expression.Left == null)
{
throw new Exception("Invalid expression encountered");
}
AppendExpression(sb, expression.Left, level + 1);
if (expression.Right != null)
{
sb.Append(opEquivs[expression.Operator]);
AppendExpression(sb, expression.Right, level + 1);
}
}
if (enclose)
{
sb.Append(')');
}
}
}
}

See https://github.com/carlos-chaguendo/arboles-binarios
This algorithm converts an algebraic expression to a binary tree. Converts the expression to postfix and then evaluates and draws the tree
test: 2+3*4+2^3
output:
22 =
└── +
├── +
│ ├── 2
│ └── *
│ ├── 3
│ └── 4
└── ^
├── 2
└── 3

You may be interested in math.js, a math library for JavaScript which contains an advanced expression parser (see docs and examples. You can simply parse an expression like:
var node = math.parse('a*b+c*(d*(g-f))');
which returns a node tree (which can be compiled and evaluated).
You can study the parsers code here: https://github.com/josdejong/mathjs/blob/master/src/expression/parse.js
You can find a few examples of simpler expression parsers here (for C++ and Java): http://www.speqmath.com/tutorials/, that can be useful to study the basics of an expression parser.

Related

Iterating through a linked list using recursion gives me a run time error (stack overflow) on LeetCode

I am new to programming and today I wanted to try out the LeetCode problem 234. Palindrome Linked List:
Given the head of a singly linked list, return true if it is a palindrome or false otherwise.
Example 1:
Input: head = [1,2,2,1]
Output: true
but I couldn't even manage the first problem.
I first tried to convert the linked list to a string and compare like:
String[i] == String[length-i-1]
which worked for small lists but not for the gigantic test list where I got:
Time Limit Exceeded
In my second attempt I used recursion like this:
/**
* Definition for singly-linked list.
* class ListNode {
* int val;
* ListNode? next;
* ListNode([this.val = 0, this.next]);
* }
*/
class Solution {
bool isPalindrome(ListNode? head) {
ListNode? current = head;
while(current?.next != null)
current = current?.next;
if (current?.val != head?.val)
return false;
else{
current = null;
isPalindrome(head?.next);
}
return true;
}
}
This also works with small lists, but for the test list I get a run time error:
Stack overflow
I wonder where this issue comes from.
Is it due to the maximum number of nested calls? And where can I find the recursion depth of Dart?
Or is there just a way simpler solution for this?
There are several issues with your attempt:
current = null will only set that variable to null. It does not affect the list. If you want to remove the last element from the list, you'll need access to the node that precedes it, and set its next property to null.
The boolean that the recursive call returns is always ignore. Instead execution continues with the return true statement, which will lead to incorrect results (when false was expected).
Before mentioning another problem, here is the correction for your algorithm:
bool isPalindrome(ListNode? head) {
ListNode? current = head;
ListNode? prev = null;
while(current?.next != null) {
prev = current; // follow behind current
current = current?.next;
}
if (current?.val != head?.val)
return false;
else if (prev == null)
return true; // List has only one node
else {
prev?.next = null; // Detach the tail node
return isPalindrome(head?.next); // Return the recursive result!
}
}
This will do the job correctly, but it is too slow. At every level of recursion almost all of the same nodes are iterated again (with that while loop), so for a list with 100 nodes, there are 100+98+96+94+...+2 iterations. In other words, the time complexity of this algorithm is quadratic. You'll need a different idea for the algorithm.
One idea for an efficient algorithm that doesn't require extra O(n) space, is to:
find the middle node of the list. You can for instance first determine the length of the list with a first iteration, and then in a second iteration you can stop half way.
reverse the second half of the list, giving you two shorter lists. Here you could use recursion if you wanted to.
and then compare those two lists node by node.
There are several Q&A on this algorithm, also on this site, so I'll leave that for your further research.
If you cannot make it work, here is a solution (spoiler!):
class Solution {
int listSize(ListNode? head) {
int size = 0;
while(head?.next != null) {
head = head?.next;
size++;
}
return size;
}
ListNode? nodeAt(ListNode? head, int index) {
while(head?.next != null && index > 0) {
head = head?.next;
index--;
}
return index == 0 ? head : null;
}
ListNode? reverse(ListNode? head) {
ListNode? prev = null;
ListNode? next;
while (head != null) {
next = head.next;
head.next = prev;
prev = head;
head = next;
}
return prev;
}
bool isEqual(ListNode? head1, ListNode? head2) {
// Only compares the nodes that both lists have:
while (head1 != null && head2 != null) {
if (head1.val != head2.val) return false;
head1 = head1.next;
head2 = head2.next;
}
return true;
}
bool isPalindrome(ListNode? head) {
return isEqual(head, reverse(nodeAt(head, listSize(head) >> 1)));
}
}
I can't quite understand your recursion solution without pointers. I solved it using a list. It is not the best solution but is simple.
// Definition for singly-linked list.
// class ListNode {
// int val;
// ListNode? next;
// ListNode([this.val = 0, this.next]);
// }
class Solution {
bool isPalindrome(ListNode? head) {
List<int> stack = [];
ListNode? p = head;
while (p != null) {
stack.add(p.val);
p = p.next;
}
print(stack);
bool isP = true;
while(head!.next!=null&&isP){
var a = stack.removeLast();
print('removed $a');
print('head val ${head.val}');
isP = head.val==a;
head=head.next!;
}
return isP;
}
}
The problem with your solution is
After a while loop current is rightmost node in the list
while (current?.next != null) {
current = current?.next;
}
comparing leftmost and rightmost node of LinkedList
if (current?.val != head?.val)
return false;
Start over with head shifted one place to the right
else {
current = null;
isPalindrome(head?.next);
}
But current is still rightmost node after a while loop
while (current?.next != null) {
current = current?.next;
}
And this will return false
if (current?.val != head?.val)
{
return false;
}
After second recursion program exits returning true

How do I pretty-print productions and line numbers, using ANTLR4?

I'm trying to write a piece of code that will take an ANTLR4 parser and use it to generate ASTs for inputs similar to the ones given by the -tree option on grun (misc.TestRig). However, I'd additionally like for the output to include all the line number/offset information.
For example, instead of printing
(add (int 5) '+' (int 6))
I'd like to get
(add (int 5 [line 3, offset 6:7]) '+' (int 6 [line 3, offset 8:9]) [line 3, offset 5:10])
Or something similar.
There aren't a tremendous number of visitor examples for ANTLR4 yet, but I am pretty sure I can do most of this by copying the default implementation for toStringTree (used by grun). However, I do not see any information about the line numbers or offsets.
I expected to be able to write super simple code like this:
String visit(ParseTree t) {
return "(" + t.productionName + t.visitChildren() + t.lineNumber + ")";
}
but it doesn't seem to be this simple. I'm guessing I should be able to get line number information from the parser, but I haven't figured out how to do so. How can I grab this line number/offset information in my traversal?
To fill in the few blanks in the solution below, I used:
List<String> ruleNames = Arrays.asList(parser.getRuleNames());
parser.setBuildParseTree(true);
ParserRuleContext prc = parser.program();
ParseTree tree = prc;
to get the tree and the ruleNames. program is the name for the top production in my grammar.
The Trees.toStringTree method can be implemented using a ParseTreeListener. The following listener produces exactly the same output as Trees.toStringTree.
public class TreePrinterListener implements ParseTreeListener {
private final List<String> ruleNames;
private final StringBuilder builder = new StringBuilder();
public TreePrinterListener(Parser parser) {
this.ruleNames = Arrays.asList(parser.getRuleNames());
}
public TreePrinterListener(List<String> ruleNames) {
this.ruleNames = ruleNames;
}
#Override
public void visitTerminal(TerminalNode node) {
if (builder.length() > 0) {
builder.append(' ');
}
builder.append(Utils.escapeWhitespace(Trees.getNodeText(node, ruleNames), false));
}
#Override
public void visitErrorNode(ErrorNode node) {
if (builder.length() > 0) {
builder.append(' ');
}
builder.append(Utils.escapeWhitespace(Trees.getNodeText(node, ruleNames), false));
}
#Override
public void enterEveryRule(ParserRuleContext ctx) {
if (builder.length() > 0) {
builder.append(' ');
}
if (ctx.getChildCount() > 0) {
builder.append('(');
}
int ruleIndex = ctx.getRuleIndex();
String ruleName;
if (ruleIndex >= 0 && ruleIndex < ruleNames.size()) {
ruleName = ruleNames.get(ruleIndex);
}
else {
ruleName = Integer.toString(ruleIndex);
}
builder.append(ruleName);
}
#Override
public void exitEveryRule(ParserRuleContext ctx) {
if (ctx.getChildCount() > 0) {
builder.append(')');
}
}
#Override
public String toString() {
return builder.toString();
}
}
The class can be used as follows:
List<String> ruleNames = ...;
ParseTree tree = ...;
TreePrinterListener listener = new TreePrinterListener(ruleNames);
ParseTreeWalker.DEFAULT.walk(listener, tree);
String formatted = listener.toString();
The class can be modified to produce the information in your output by updating the exitEveryRule method:
#Override
public void exitEveryRule(ParserRuleContext ctx) {
if (ctx.getChildCount() > 0) {
Token positionToken = ctx.getStart();
if (positionToken != null) {
builder.append(" [line ");
builder.append(positionToken.getLine());
builder.append(", offset ");
builder.append(positionToken.getStartIndex());
builder.append(':');
builder.append(positionToken.getStopIndex());
builder.append("])");
}
else {
builder.append(')');
}
}
}

Using scanner to read phrases

Hey StackOverflow Community,
So, I have this line of information from a txt file that I need to parse.
Here is an example lines:
-> date & time AC Power Insolation Temperature Wind Speed
-> mm/dd/yyyy hh:mm.ss kw W/m^2 deg F mph
Using a scanner.nextLine() gives me a String with a whole line in it, and then I pass this off into StringTokenizer, which then separates them into individual Strings using whitespace as a separator.
so for the first line it would break up into:
date
&
time
AC
Power
Insolation
etc...
I need things like "date & time" together, and "AC Power" together. Is there anyway I can specify this using a method already defined in StringTokenizer or Scanner? Or would I have to develop my own algorithm to do this?
Would you guys suggest I use some other form of parsing lines instead of Scanner? Or, is Scanner sufficient enough for my needs?
ejay
oh, this one was tricky, maybe you could build up some Trie structure with your tokens, i was bored and wrote a little class which solves your problem. Warning: it's a bit hacky, but was fun to implement.
The Trie class:
class Trie extends HashMap<String, Trie> {
private static final long serialVersionUID = 1L;
boolean end = false;
public void addToken(String strings) {
addToken(strings.split("\\s+"), 0);
}
private void addToken(String[] strings, int begin) {
if (begin == strings.length) {
end = true;
return;
}
String key = strings[begin];
Trie t = get(key);
if (t == null) {
t = new Trie();
put(key, t);
}
t.addToken(strings, begin + 1);
}
public List<String> tokenize(String data) {
String[] split = data.split("\\s+");
List<String> tokens = new ArrayList<String>();
int pos = 0;
while (pos < split.length) {
int tokenLength = getToken(split, pos, 0);
tokens.add(glue(split, pos, tokenLength));
pos += tokenLength;
}
return tokens;
}
public String glue(String[] parts, int pos, int length) {
StringBuilder sb = new StringBuilder();
sb.append(parts[pos]);
for (int i = pos + 1; i < pos + length; i++) {
sb.append(" ");
sb.append(parts[i]);
}
return sb.toString();
}
private int getToken(String[] tokens, int begin, int length) {
if (end) {
return length;
}
if (begin == tokens.length) {
return 1;
}
String key = tokens[begin];
Trie t = get(key);
if (t != null) {
return t.getToken(tokens, begin + 1, length + 1);
}
return 1;
}
}
and how to use it:
Trie t = new Trie();
t.addToken("AC Power");
t.addToken("date & time");
t.addToken("date & foo");
t.addToken("Speed & fun");
String data = "date & time AC Power Insolation Temperature Wind Speed";
List<String> tokens = t.tokenize(data);
for (String s : tokens) {
System.out.println(s);
}

mapping list of string into hierarchical structure of objects

This is not a homework problem. This questions was asked to one of my friend in an interview test.
I have a list of lines read from a file as input. Each line has a identifier such as (A,B,NN,C,DD) at the start of line. Depending upon the identifier, I need to map the list of records into a single object A which contains a hierarchy structure of objects.
Description of Hierarchy :
Each A can have zero or more B types.
Each B identifier can have zero or more NN and C as child. Similarly each C segment can have zero or more NN and DD child. Abd each DD can have zero or more NN as child.
Mapping classes and their hierarchy:
All the class will have value to hold the String value from current line.
**A - will have list of B**
class A {
List<B> bList;
String value;
public A(String value) {
this.value = value;
}
public void addB(B b) {
if (bList == null) {
bList = new ArrayList<B>();
}
bList.add(b);
}
}
**B - will have list of NN and list of C**
class B {
List<C> cList;
List<NN> nnList;
String value;
public B(String value) {
this.value = value;
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
public void addC(C c) {
if (cList == null) {
cList = new ArrayList<C>();
}
cList.add(c);
}
}
**C - will have list of DDs and NNs**
class C {
List<DD> ddList;
List<NN> nnList;
String value;
public C(String value) {
this.value = value;
}
public void addDD(DD dd) {
if (ddList == null) {
ddList = new ArrayList<DD>();
}
ddList.add(dd);
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
}
**DD - will have list of NNs**
class DD {
String value;
List<NN> nnList;
public DD(String value) {
this.value = value;
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
}
**NN- will hold the line only**
class NN {
String value;
public NN(String value) {
this.value = value;
}
}
What I Did So Far :
The method public A parse(List<String> lines) reads the input list and returns the object A. Since, there might be multiple B, i have created separate method 'parseB to parse each occurrence.
At parseB method, loops through the i = startIndex + 1 to i < lines.size() and checks the start of lines. Occurrence of "NN" is added to current object of B. If "C" is detected at start, it calls another method parseC. The loop will break when we detect "B" or "A" at start.
Similar logic is used in parseC_DD.
public class GTTest {
public A parse(List<String> lines) {
A a;
for (int i = 0; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("A")) {
a = new A(curLine);
continue;
}
if (curLine.startsWith("B")) {
i = parseB(lines, i); // returns index i to skip all the lines that are read inside parseB(...)
continue;
}
}
return a; // return mapped object
}
private int parseB(List<String> lines, int startIndex) {
int i;
B b = new B(lines.get(startIndex));
for (i = startIndex + 1; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
b.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("C")) {
i = parseC(b, lines, i);
continue;
}
a.addB(b);
if (curLine.startsWith("B") || curLine.startsWith("A")) { //ending condition
System.out.println("B A "+curLine);
--i;
break;
}
}
return i; // return nextIndex to read
}
private int parseC(B b, List<String> lines, int startIndex) {
int i;
C c = new C(lines.get(startIndex));
for (i = startIndex + 1; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
c.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("DD")) {
i = parseC_DD(c, lines, i);
continue;
}
b.addC(c);
if (curLine.startsWith("C") || curLine.startsWith("A") || curLine.startsWith("B")) {
System.out.println("C A B "+curLine);
--i;
break;
}
}
return i;//return next index
}
private int parseC_DD(C c, List<String> lines, int startIndex) {
int i;
DD d = new DD(lines.get(startIndex));
c.addDD(d);
for (i = startIndex; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
d.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("DD")) {
d=new DD(curLine);
continue;
}
c.addDD(d);
if (curLine.startsWith("NN") || curLine.startsWith("C") || curLine.startsWith("A") || curLine.startsWith("B")) {
System.out.println("NN C A B "+curLine);
--i;
break;
}
}
return i;//return next index
}
public static void main(String[] args) {
GTTest gt = new GTTest();
List<String> list = new ArrayList<String>();
list.add("A1");
list.add("B1");
list.add("NN1");
list.add("NN2");
list.add("C1");
list.add("NNXX");
list.add("DD1");
list.add("DD2");
list.add("NN3");
list.add("NN4");
list.add("DD3");
list.add("NN5");
list.add("B2");
list.add("NN6");
list.add("C2");
list.add("DD4");
list.add("DD5");
list.add("NN7");
list.add("NN8");
list.add("DD6");
list.add("NN7");
list.add("C3");
list.add("DD7");
list.add("DD8");
A a = gt.parse(list);
//show values of a
}
}
My logic is not working properly. Is there any other approach you can figure out? Do you have any suggestions/improvements to my way?
Use hierarchy of objects:
public interface Node {
Node getParent();
Node getLastChild();
boolean addChild(Node n);
void setValue(String value);
Deque getChildren();
}
private static abstract class NodeBase implements Node {
...
abstract boolean canInsert(Node n);
public String toString() {
return value;
}
...
}
public static class A extends NodeBase {
boolean canInsert(Node n) {
return n instanceof B;
}
}
public static class B extends NodeBase {
boolean canInsert(Node n) {
return n instanceof NN || n instanceof C;
}
}
...
public static class NN extends NodeBase {
boolean canInsert(Node n) {
return false;
}
}
Create a tree class:
public class MyTree {
Node root;
Node lastInserted = null;
public void insert(String label) {
Node n = NodeFactory.create(label);
if (lastInserted == null) {
root = n;
lastInserted = n;
return;
}
Node current = lastInserted;
while (!current.addChild(n)) {
current = current.getParent();
if (current == null) {
throw new RuntimeException("Impossible to insert " + n);
}
}
lastInserted = n;
}
...
}
And then print the tree:
public class MyTree {
...
public static void main(String[] args) {
List input;
...
MyTree tree = new MyTree();
for (String line : input) {
tree.insert(line);
}
tree.print();
}
public void print() {
printSubTree(root, "");
}
private static void printSubTree(Node root, String offset) {
Deque children = root.getChildren();
Iterator i = children.descendingIterator();
System.out.println(offset + root);
while (i.hasNext()) {
printSubTree(i.next(), offset + " ");
}
}
}
A mealy automaton solution with 5 states:
wait for A,
seen A,
seen B,
seen C, and
seen DD.
The parse is done completely in one method. There is one current Node that is the last Node seen except the NN ones. A Node has a parent Node except the root. In state seen (0), the current Node represents a (0) (e.g. in state seen C, current can be C1 in the example above). The most fiddling is in state seen DD, that has the most outgoing edges (B, C, DD, and NN).
public final class Parser {
private final static class Token { /* represents A1 etc. */ }
public final static class Node implements Iterable<Node> {
/* One Token + Node children, knows its parent */
}
private enum State { ExpectA, SeenA, SeenB, SeenC, SeenDD, }
public Node parse(String text) {
return parse(Token.parseStream(text));
}
private Node parse(Iterable<Token> tokens) {
State currentState = State.ExpectA;
Node current = null, root = null;
while(there are tokens) {
Token t = iterator.next();
switch(currentState) {
/* do stuff for all states */
/* example snippet for SeenC */
case SeenC:
if(t.Prefix.equals("B")) {
current.PN.PN.AddChildNode(new Node(t, current.PN.PN));
currentState = State.SeenB;
} else if(t.Prefix.equals("C")) {
}
}
return root;
}
}
I'm not satisfied with those trainwrecks to go up the hierarchy to insert a Node somewhere else (current.PN.PN). Eventually, explicit state classes would make the private parse method more readable. Then, the solution gets more akin to the one provided by #AlekseyOtrubennikov. Maybe a straight LL approach yields code that is more beautiful. Maybe best to just rephrase the grammar to a BNF one and delegate parser creation.
A straightforward LL parser, one production rule:
// "B" ("NN" || C)*
private Node rule_2(TokenStream ts, Node parent) {
// Literal "B"
Node B = literal(ts, "B", parent);
if(B == null) {
// error
return null;
}
while(true) {
// check for "NN"
Node nnLit = literal(ts, "NN", B);
if(nnLit != null)
B.AddChildNode(nnLit);
// check for C
Node c = rule_3(ts, parent);
if(c != null)
B.AddChildNode(c);
// finished when both rules did not match anything
if(nnLit == null && c == null)
break;
}
return B;
}
TokenStream enhances Iterable<Token> by allowing to lookahead into the stream - LL(1) because parser must choose between literal NN or deep diving in two cases (rule_2 being one of them). Looks nice, however, missing some C# features here...
#Stefan and #Aleksey are correct: this is simple parsing problem.
You can define your hierarchy constraints in Extended Backus-Naur Form:
A ::= { B }
B ::= { NN | C }
C ::= { NN | DD }
DD ::= { NN }
This description can be transformed into state machine and implemented. But there are a lot of tools that can effectively do this for you: Parser generators.
I am posting my answer only to show that it's quite easy to solve such problems with Haskell (or some other functional language).
Here is complete program that reads strings form stdin and prints parsed tree to the stdout.
-- We are using some standard libraries.
import Control.Applicative ((<$>), (<*>))
import Text.Parsec
import Data.Tree
-- This is EBNF-like description of what to do.
-- You can almost read it like a prose.
yourData = nodeA +>> eof
nodeA = node "A" nodeB
nodeB = node "B" (nodeC <|> nodeNN)
nodeC = node "C" (nodeNN <|> nodeDD)
nodeDD = node "DD" nodeNN
nodeNN = (`Node` []) <$> nodeLabel "NN"
node lbl children
= Node <$> nodeLabel lbl <*> many children
nodeLabel xx = (xx++)
<$> (string xx >> many digit)
+>> newline
-- And this is some auxiliary code.
f +>> g = f >>= \x -> g >> return x
main = do
txt <- getContents
case parse yourData "" txt of
Left err -> print err
Right res -> putStrLn $ drawTree res
Executing it with your data in zz.txt will print this nice tree:
$ ./xxx < zz.txt
A1
+- B1
| +- NN1
| +- NN2
| `- C1
| +- NN2
| +- DD1
| +- DD2
| | +- NN3
| | `- NN4
| `- DD3
| `- NN5
`- B2
+- NN6
+- C2
| +- DD4
| +- DD5
| | +- NN7
| | `- NN8
| `- DD6
| `- NN9
`- C3
+- DD7
`- DD8
And here is how it handles malformed input:
$ ./xxx
A1
B2
DD3
(line 3, column 1):
unexpected 'D'
expecting "B" or end of input

goto statement in antlr?

I would really appreciate if someone could give me advice,or point me to tutorial, or sample implementation, anything that could help me implement basic goto statement in ANTLR?
Thanks for any help
edit. ver2 of question:
Say I have this tree structure:
(BLOCK (PRINT 1) (PRINT 2) (PRINT 3) (PRINT 4) )
Now, I'm interested to know is there a way to
select, say, node (PRINT 2) and all nodes that follow
that node ((PRINT 2) (PRINT 3) (PRINT 4)) ?
I'm asking this because I'm trying to implement
basic goto mechanism.
I have print statement like this:
i=LABEL print
{interpreter.store($i.text, $print.tree);} //stores in hash table
-> print
However $print.tree just ignores later nodes,
so in input:
label: print 1
print 2
goto label
would print 121!
(What I would like is infinite loop 1212...)
I've also tried taking token
address of print statement with
getTokenStartIndex() and setting
roots node with setTokenStartIndex
but that just looped whatever was first node over and over.
My question is, how does one implement goto statement in antlr ?
Maybe my approach is wrong, as I have overlooked something?
I would really appreciate any help.
ps. even more detail, it is related to pattern 25 - Language Implementation patterns, I'm trying to add on to examples from that pattern.
Also, I've searched quite a bit on the web, looks like it is very hard to find goto example
... anything that could help me implement basic goto statement in ANTLR?
Note that it isn't ANTLR that implements this. With ANTLR you merely describe the language you want to parse to get a lexer, parser and possibly a tree-walker. After that, it's up to you to manipulate the tree and evaluate it.
Here's a possible way. Please don't look too closely at the code. It's a quick hack: there's a bit of code-duplication and I'm am passing package protected variables around which isn't as it should be done. The grammar also dictates you to start your input source with a label, but this is just a small demo of how you could solve it.
You need the following files:
Goto.g - the combined grammar file
GotoWalker.g - the tree walker grammar file
Main.java - the main class including the Node-model classes of the language
test.goto - the test input source file
antlr-3.3.jar - the ANTLR JAR (could also be another 3.x version)
Goto.g
grammar Goto;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
FILE;
BLOCK;
}
#members {
java.util.Map<String, CommonTree[]> labels = new java.util.HashMap<String, CommonTree[]>();
}
parse
: block EOF -> block
;
block
: ID ':' stats b=block? {labels.put($ID.text, new CommonTree[]{$stats.tree, $b.tree});} -> ^(BLOCK stats $b?)
;
stats
: stat*
;
stat
: Print Number -> ^(Print Number)
| Goto ID -> ^(Goto ID)
;
Goto : 'goto';
Print : 'print';
Number : '0'..'9'+;
ID : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
GotoWalker.g
tree grammar GotoWalker;
options {
tokenVocab=Goto;
ASTLabelType=CommonTree;
}
tokens {
FILE;
BLOCK;
}
#members {
java.util.Map<String, CommonTree[]> labels = new java.util.HashMap<String, CommonTree[]>();
}
walk returns [Node n]
: block {$n = $block.n;}
;
block returns [Node n]
: ^(BLOCK stats b=block?) {$n = new BlockNode($stats.n, $b.n);}
;
stats returns [Node n]
#init{List<Node> nodes = new ArrayList<Node>();}
: (stat {nodes.add($stat.n);})* {$n = new StatsNode(nodes);}
;
stat returns [Node n]
: ^(Print Number) {$n = new PrintNode($Number.text);}
| ^(Goto ID) {$n = new GotoNode($ID.text, labels);}
;
Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
import java.util.*;
public class Main {
public static void main(String[] args) throws Exception {
GotoLexer lexer = new GotoLexer(new ANTLRFileStream("test.goto"));
GotoParser parser = new GotoParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
GotoWalker walker = new GotoWalker(new CommonTreeNodeStream(tree));
walker.labels = parser.labels;
Node root = walker.walk();
root.eval();
}
}
interface Node {
public static final Node VOID = new Node(){public Object eval(){throw new RuntimeException("VOID.eval()");}};
public static final Node BREAK = new Node(){public Object eval(){throw new RuntimeException("VOID.eval()");}};
Object eval();
}
class BlockNode implements Node {
Node stats;
Node child;
BlockNode(Node ns, Node ch) {
stats = ns;
child = ch;
}
public Object eval() {
Object o = stats.eval();
if(o != VOID) {
return o;
}
if(child != null) {
o = child.eval();
if(o != VOID) {
return o;
}
}
return VOID;
}
}
class StatsNode implements Node {
List<Node> nodes;
StatsNode(List<Node> ns) {
nodes = ns;
}
public Object eval() {
for(Node n : nodes) {
Object o = n.eval();
if(o != VOID) {
return o;
}
}
return VOID;
}
}
class PrintNode implements Node {
String text;
PrintNode(String txt) {
text = txt;
}
public Object eval() {
System.out.println(text);
return VOID;
}
}
class GotoNode implements Node {
String label;
Map<String, CommonTree[]> labels;
GotoNode(String lbl, Map<String, CommonTree[]> lbls) {
label = lbl;
labels = lbls;
}
public Object eval() {
CommonTree[] toExecute = labels.get(label);
try {
Thread.sleep(1000L);
GotoWalker walker = new GotoWalker(new CommonTreeNodeStream(toExecute[0]));
walker.labels = this.labels;
Node root = walker.stats();
Object o = root.eval();
if(o != VOID) {
return o;
}
walker = new GotoWalker(new CommonTreeNodeStream(toExecute[1]));
walker.labels = this.labels;
root = walker.block();
o = root.eval();
if(o != VOID) {
return o;
}
} catch(Exception e) {
e.printStackTrace();
}
return BREAK;
}
}
test.goto
root:
print 1
A:
print 2
B:
print 3
goto A
C:
print 4
To run the demo, do the following:
*nix/MacOS
java -cp antlr-3.3.jar org.antlr.Tool Goto.g
java -cp antlr-3.3.jar org.antlr.Tool GotoWalker.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
or:
Windows
java -cp antlr-3.3.jar org.antlr.Tool Goto.g
java -cp antlr-3.3.jar org.antlr.Tool GotoWalker.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
which will print:
1
2
3
2
3
2
3
2
3
...
Note that the 2 and 3 are repeated until you terminate the app manually.

Resources