ANTLR's parser enters into "wrong" rule - parsing

I'm trying to create an interpreter for a simple programming language using ANTLR. So far it consists of print and numeric expressions.
I created a 'simpleExpr' parser rule to handle negative numbers. I tried other ways, too, but that's the only one which seems to work right for me. However, for some reason my visitor enters to this rule, even if I would expect it to visit my 'number' rule. I really think, that it's not the visitor's fault, because even the tree drawn by ANTLR shows this behavior. It's weird, but it would be OK, but my problem is, that when I try to print out the result of a simple addition, e.g. print(1+2); then it doesn't do that, but enters into 'number' rule instead of 'Plus' rule.
My grammar:
grammar BatshG;
/*
* Parser Rules
*/
compileUnit: (expression | ( println ';') | ( print ';' ))+;
expression:
left=expression '/' right=simpleExpr #Divi
| left=expression '*' right=simpleExpr #Mult
| left=expression '-' right=simpleExpr #Minus
| left=expression '+' right=simpleExpr #Plus
| number=simpleExpr #Number
;
println: 'println' '(' argument=expression ')';
print: 'print' '(' argument=expression ')';
simpleExpr
: (MINUS)?
(FLOAT | INTEGER)
;
MINUS: '-';
INTEGER: [0-9] [0-9]*;
DIGIT : [0-9] ;
FRAC : '.' DIGIT+ ;
EXP : [eE] [-+]? DIGIT+ ;
FLOAT : DIGIT* FRAC EXP? ;
WS: [ \n\t\r]+ -> channel(HIDDEN);
If it helps, here is my visualized tree generated by ANTLR for
print(1+2);
Update:
The visitor class, if it counts:
public class BatshGVisitor : BatshGBaseVisitor<ResultValue>
{
public ResultValue Result { get; set; }
public StringBuilder OutputForPrint { get; set; }
public override ResultValue VisitCompileUnit([NotNull] BatshGParser.CompileUnitContext context)
{
OutputForPrint = new StringBuilder("");
var resultvalue = VisitChildren(context);
Result = new ResultValue() { ExpType = "string", ExpValue = resultvalue.ExpValue };
return Result;
}
public override ResultValue VisitPlus([NotNull] BatshGParser.PlusContext context)
{
var leftExp = VisitChildren(context.left);
var rigthExp = VisitChildren(context.right);
return new ResultValue()
{
ExpType = "number",
ExpValue = (double)leftExp.ExpValue + (double)rigthExp.ExpValue
};
}
//public override ResultValue VisitNumber([NotNull] BatshGParser.NumberContext context)
//{
// return new ResultValue()
// {
// ExpType = "number",
// ExpValue = Double.Parse(context.GetChild(0).GetText()
// + context.GetChild(1).GetText()
// + context.GetChild(2).GetText()
// , CultureInfo.InvariantCulture)
// };
//}
public override ResultValue VisitPrint([NotNull] BatshGParser.PrintContext context)
{
var viCh = VisitChildren(context.argument);
var viChVa = viCh.ExpValue;
string printInner = viChVa.ToString();
var toPrint = new ResultValue()
{
ExpType = viCh.ExpType,
ExpValue = printInner
};
OutputForPrint.Append(toPrint.ExpValue);
return toPrint;
}
public override ResultValue VisitSimpleExpr([NotNull] BatshGParser.SimpleExprContext context)
{
string numberToConvert = "";
if (context.ChildCount == 1)
{
numberToConvert = context.GetChild(0).GetText();
}
else if (context.GetChild(0).ToString() == "-")
{
if (context.ChildCount == 2)
{
numberToConvert = "-" + context.GetChild(1);
}
if (context.ChildCount == 4)
{
numberToConvert = context.GetChild(0).ToString() + context.GetChild(1).ToString() +
context.GetChild(2).ToString() + context.GetChild(3).ToString();
}
}
return new ResultValue()
{
ExpType = "number",
ExpValue = Double.Parse(numberToConvert, CultureInfo.InvariantCulture)
};
}
protected override ResultValue AggregateResult(ResultValue aggregate, ResultValue nextResult)
{
if (aggregate == null)
return new ResultValue()
{
ExpType = nextResult.ExpType,
ExpValue = nextResult.ExpValue
};
if (nextResult == null)
{
return aggregate;
}
return null;
}
}
What's the problem with my grammar?
Thank you!

Inside the visit method for print statements, you have this:
var viCh = VisitChildren(context.argument);
So let's say your input was print(1+2);. Then context.argument would be the PlusContext for 1+2 and the children of context.argument would be a NumberContext for 1, a Token object for + and a SimpleExpression object for 2. So by calling VisitChildren, you're going to visit those children, which is why it never runs VisitPlus and goes directly to the numbers.
Generally, you rarely want to visit the children of some other node. You usually want to visit your own children, not skip the children and directly visit the grand children. So what you should do instead is to call Visit(context.argument);.

Related

Listener-Parser in ANTLR4

ANTLR4 rule :
listExpr locals [Object in, Object out] : ( expr ',')* expr ;
Parser :
public static class ListExprContext extends ParserRuleContext {
public Object in;
public Object out;
public List<ExprContext> expr() {
return getRuleContexts(ExprContext.class);
}
....
}
Listener :
override def exitListExpr(ctx : BKOOLParser.ListExprContext) : Unit =
{
val listExpr = ctx.expr
val length = listExpr.length
ctx.out = length
}
I want to have the numbers of expr I have in ListExpr,but I have the error :
the result type of an implicit must be more specific than AnyRef
at line :
ctx.out = length
How to fix it ? Your help will be appreciated .

javacc java.lang.NullPointerException

im trying to make a miniJava parser but im having trouble figuring out a way to parse method declarations that have no formal parameters.
e.g public int getNumber()
The code that i have right now works for parameters of one or more, but im not sure how to return an empty formal object as clearly the problem lies with the line returning null.
Is there a way to skip the return statement altogether and return nothing?
public Formal nt_FormalList() :
{
Type t;
Token s;
LinkedList<Formal> fl = new LinkedList<Formal>();
Formal f;
}
{
t = nt_Type() s = <IDENTIFIER> (f = nt_FormalRest() {fl.add(f);})*
{ return new Formal(t, s.image); }
| {}
{ return null; }
}
.....
public class Formal {
public final Type t;
public final String i;
public Formal(Type at, String ai) {
t = at;
i = ai;
}
I'd suggest that you return list of Formals from nt_FormalList.
public List<Formal> nt_FormalList() :
{
LinkedList<Formal> fl = new LinkedList<Formal>();
Formal f;
}
{
[ f = nt_Formal() {fl.add(f);}
(<COMMA> f = nt_Formal() {fl.add(f);})*
]
{ return fl; }
}

JFLEX AND CUP, cannot make it work correctly

I'm working with jflex and cup, trying to make a html parser,
but can't make it work correctly,
Netbeans, the compile process dont stop, always continues,
Can't add more "tokens" correctly in the parse tree,
Also can't add space in "TEXTO" , this break the entire tree
lexicoh.jlex
package compiladorhtml;
import java_cup.runtime.*;
%%
%class Lexer
%line
%column
%cup
%{
private Symbol symbol(int type) {
return new Symbol(type, yyline, yycolumn);
}
private Symbol symbol(int type, Object value) {
return new Symbol(type, yyline, yycolumn, value);
}
%}
LineTerminator = \r|\n|\r\n
WhiteSpace = {LineTerminator} | [ \t\f]
texto = [a-zA-Z0-9_]*
%%
<YYINITIAL> {
"::" { System.out.print("<"); return symbol(sym.INI);}
"ENCA" { System.out.print("HEAD>"); return symbol(sym.HEAD);}
"/" { System.out.print("</"); return symbol(sym.FIN);}
{texto} { System.out.print(yytext()); return symbol(sym.TEXTO);}
{WhiteSpace} { /* just skip what was found, do nothing */ }
"&&" { System.out.print(""); return symbol(sym.FINAL); }
}
[^] { throw new Error("Illegal character <"+yytext()+">"); }
sintaticoh.cup
package compiladorhtml;
import java_cup.runtime.*;
parser code {:
public void report_error(String message, Object info) {
StringBuilder m = new StringBuilder("Error");
if (info instanceof java_cup.runtime.Symbol) {
java_cup.runtime.Symbol s = ((java_cup.runtime.Symbol) info);
if (s.left >= 0) {
m.append(" in line "+(s.left+1));
if (s.right >= 0)
m.append(", column "+(s.right+1));
}
}
m.append(" : "+message);
System.err.println(m);
}
public void report_fatal_error(String message, Object info) {
report_error(message, info);
System.exit(1);
}
:};
terminal INI, HEAD, TEXTO, FIN, FINAL;
non terminal Object expr_list, expr_part;
non terminal String expr;
expr_list ::= expr_list expr_part | expr_part;
expr_part ::= expr:e;
expr ::= INI HEAD
| TEXTO
| FIN HEAD
| FINAL;
java Main
public static void main(String[] args) throws IOException, Exception {
//CreateFiles();
//EJECUTAR PARA VER SI FUNCIONA, YA LO VI Y FUNCIONA
File fichero = new File("fichero.txt");
PrintWriter writer;
try {
writer = new PrintWriter(fichero);
writer.print("::ENCA NOMBRE ENCABEZADO /ENCA &&");
writer.close();
} catch (FileNotFoundException ex) {
System.out.println(ex);
}
Lexer thisscanner = new Lexer(new FileReader("fichero.txt"));
parser thisparser = new parser(thisscanner);
thisparser.parse();
}
public static void CreateFiles() {
String filelex = "path\\lexicoh.jlex";
File file = new File(filelex);
jflex.Main.generate(file);
String opciones[] = new String[5];
opciones[0] = "-destdir";
opciones[1] = "path";
opciones[2] = "-parser";
opciones[3] = "parser";
opciones[4] = "path\\sintacticoh.cup";
try {
java_cup.Main.main(opciones);
} catch (Exception ex) {
Logger.getLogger(CompiladorHTML.class.getName()).log(Level.SEVERE, null, ex);
}
}
thanks
I think that you should do this:
for {texto}, since it is a string composed by text and number, you should redefine it in this way:
{texto} { System.out.println (yytext()); return new symbol(sym.TEXTO, new String (yytext())); }
Then, if the program doesn't stop, there could be some problems during the read of the source file.

Using scanner to read phrases

Hey StackOverflow Community,
So, I have this line of information from a txt file that I need to parse.
Here is an example lines:
-> date & time AC Power Insolation Temperature Wind Speed
-> mm/dd/yyyy hh:mm.ss kw W/m^2 deg F mph
Using a scanner.nextLine() gives me a String with a whole line in it, and then I pass this off into StringTokenizer, which then separates them into individual Strings using whitespace as a separator.
so for the first line it would break up into:
date
&
time
AC
Power
Insolation
etc...
I need things like "date & time" together, and "AC Power" together. Is there anyway I can specify this using a method already defined in StringTokenizer or Scanner? Or would I have to develop my own algorithm to do this?
Would you guys suggest I use some other form of parsing lines instead of Scanner? Or, is Scanner sufficient enough for my needs?
ejay
oh, this one was tricky, maybe you could build up some Trie structure with your tokens, i was bored and wrote a little class which solves your problem. Warning: it's a bit hacky, but was fun to implement.
The Trie class:
class Trie extends HashMap<String, Trie> {
private static final long serialVersionUID = 1L;
boolean end = false;
public void addToken(String strings) {
addToken(strings.split("\\s+"), 0);
}
private void addToken(String[] strings, int begin) {
if (begin == strings.length) {
end = true;
return;
}
String key = strings[begin];
Trie t = get(key);
if (t == null) {
t = new Trie();
put(key, t);
}
t.addToken(strings, begin + 1);
}
public List<String> tokenize(String data) {
String[] split = data.split("\\s+");
List<String> tokens = new ArrayList<String>();
int pos = 0;
while (pos < split.length) {
int tokenLength = getToken(split, pos, 0);
tokens.add(glue(split, pos, tokenLength));
pos += tokenLength;
}
return tokens;
}
public String glue(String[] parts, int pos, int length) {
StringBuilder sb = new StringBuilder();
sb.append(parts[pos]);
for (int i = pos + 1; i < pos + length; i++) {
sb.append(" ");
sb.append(parts[i]);
}
return sb.toString();
}
private int getToken(String[] tokens, int begin, int length) {
if (end) {
return length;
}
if (begin == tokens.length) {
return 1;
}
String key = tokens[begin];
Trie t = get(key);
if (t != null) {
return t.getToken(tokens, begin + 1, length + 1);
}
return 1;
}
}
and how to use it:
Trie t = new Trie();
t.addToken("AC Power");
t.addToken("date & time");
t.addToken("date & foo");
t.addToken("Speed & fun");
String data = "date & time AC Power Insolation Temperature Wind Speed";
List<String> tokens = t.tokenize(data);
for (String s : tokens) {
System.out.println(s);
}

mapping list of string into hierarchical structure of objects

This is not a homework problem. This questions was asked to one of my friend in an interview test.
I have a list of lines read from a file as input. Each line has a identifier such as (A,B,NN,C,DD) at the start of line. Depending upon the identifier, I need to map the list of records into a single object A which contains a hierarchy structure of objects.
Description of Hierarchy :
Each A can have zero or more B types.
Each B identifier can have zero or more NN and C as child. Similarly each C segment can have zero or more NN and DD child. Abd each DD can have zero or more NN as child.
Mapping classes and their hierarchy:
All the class will have value to hold the String value from current line.
**A - will have list of B**
class A {
List<B> bList;
String value;
public A(String value) {
this.value = value;
}
public void addB(B b) {
if (bList == null) {
bList = new ArrayList<B>();
}
bList.add(b);
}
}
**B - will have list of NN and list of C**
class B {
List<C> cList;
List<NN> nnList;
String value;
public B(String value) {
this.value = value;
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
public void addC(C c) {
if (cList == null) {
cList = new ArrayList<C>();
}
cList.add(c);
}
}
**C - will have list of DDs and NNs**
class C {
List<DD> ddList;
List<NN> nnList;
String value;
public C(String value) {
this.value = value;
}
public void addDD(DD dd) {
if (ddList == null) {
ddList = new ArrayList<DD>();
}
ddList.add(dd);
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
}
**DD - will have list of NNs**
class DD {
String value;
List<NN> nnList;
public DD(String value) {
this.value = value;
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
}
**NN- will hold the line only**
class NN {
String value;
public NN(String value) {
this.value = value;
}
}
What I Did So Far :
The method public A parse(List<String> lines) reads the input list and returns the object A. Since, there might be multiple B, i have created separate method 'parseB to parse each occurrence.
At parseB method, loops through the i = startIndex + 1 to i < lines.size() and checks the start of lines. Occurrence of "NN" is added to current object of B. If "C" is detected at start, it calls another method parseC. The loop will break when we detect "B" or "A" at start.
Similar logic is used in parseC_DD.
public class GTTest {
public A parse(List<String> lines) {
A a;
for (int i = 0; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("A")) {
a = new A(curLine);
continue;
}
if (curLine.startsWith("B")) {
i = parseB(lines, i); // returns index i to skip all the lines that are read inside parseB(...)
continue;
}
}
return a; // return mapped object
}
private int parseB(List<String> lines, int startIndex) {
int i;
B b = new B(lines.get(startIndex));
for (i = startIndex + 1; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
b.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("C")) {
i = parseC(b, lines, i);
continue;
}
a.addB(b);
if (curLine.startsWith("B") || curLine.startsWith("A")) { //ending condition
System.out.println("B A "+curLine);
--i;
break;
}
}
return i; // return nextIndex to read
}
private int parseC(B b, List<String> lines, int startIndex) {
int i;
C c = new C(lines.get(startIndex));
for (i = startIndex + 1; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
c.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("DD")) {
i = parseC_DD(c, lines, i);
continue;
}
b.addC(c);
if (curLine.startsWith("C") || curLine.startsWith("A") || curLine.startsWith("B")) {
System.out.println("C A B "+curLine);
--i;
break;
}
}
return i;//return next index
}
private int parseC_DD(C c, List<String> lines, int startIndex) {
int i;
DD d = new DD(lines.get(startIndex));
c.addDD(d);
for (i = startIndex; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
d.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("DD")) {
d=new DD(curLine);
continue;
}
c.addDD(d);
if (curLine.startsWith("NN") || curLine.startsWith("C") || curLine.startsWith("A") || curLine.startsWith("B")) {
System.out.println("NN C A B "+curLine);
--i;
break;
}
}
return i;//return next index
}
public static void main(String[] args) {
GTTest gt = new GTTest();
List<String> list = new ArrayList<String>();
list.add("A1");
list.add("B1");
list.add("NN1");
list.add("NN2");
list.add("C1");
list.add("NNXX");
list.add("DD1");
list.add("DD2");
list.add("NN3");
list.add("NN4");
list.add("DD3");
list.add("NN5");
list.add("B2");
list.add("NN6");
list.add("C2");
list.add("DD4");
list.add("DD5");
list.add("NN7");
list.add("NN8");
list.add("DD6");
list.add("NN7");
list.add("C3");
list.add("DD7");
list.add("DD8");
A a = gt.parse(list);
//show values of a
}
}
My logic is not working properly. Is there any other approach you can figure out? Do you have any suggestions/improvements to my way?
Use hierarchy of objects:
public interface Node {
Node getParent();
Node getLastChild();
boolean addChild(Node n);
void setValue(String value);
Deque getChildren();
}
private static abstract class NodeBase implements Node {
...
abstract boolean canInsert(Node n);
public String toString() {
return value;
}
...
}
public static class A extends NodeBase {
boolean canInsert(Node n) {
return n instanceof B;
}
}
public static class B extends NodeBase {
boolean canInsert(Node n) {
return n instanceof NN || n instanceof C;
}
}
...
public static class NN extends NodeBase {
boolean canInsert(Node n) {
return false;
}
}
Create a tree class:
public class MyTree {
Node root;
Node lastInserted = null;
public void insert(String label) {
Node n = NodeFactory.create(label);
if (lastInserted == null) {
root = n;
lastInserted = n;
return;
}
Node current = lastInserted;
while (!current.addChild(n)) {
current = current.getParent();
if (current == null) {
throw new RuntimeException("Impossible to insert " + n);
}
}
lastInserted = n;
}
...
}
And then print the tree:
public class MyTree {
...
public static void main(String[] args) {
List input;
...
MyTree tree = new MyTree();
for (String line : input) {
tree.insert(line);
}
tree.print();
}
public void print() {
printSubTree(root, "");
}
private static void printSubTree(Node root, String offset) {
Deque children = root.getChildren();
Iterator i = children.descendingIterator();
System.out.println(offset + root);
while (i.hasNext()) {
printSubTree(i.next(), offset + " ");
}
}
}
A mealy automaton solution with 5 states:
wait for A,
seen A,
seen B,
seen C, and
seen DD.
The parse is done completely in one method. There is one current Node that is the last Node seen except the NN ones. A Node has a parent Node except the root. In state seen (0), the current Node represents a (0) (e.g. in state seen C, current can be C1 in the example above). The most fiddling is in state seen DD, that has the most outgoing edges (B, C, DD, and NN).
public final class Parser {
private final static class Token { /* represents A1 etc. */ }
public final static class Node implements Iterable<Node> {
/* One Token + Node children, knows its parent */
}
private enum State { ExpectA, SeenA, SeenB, SeenC, SeenDD, }
public Node parse(String text) {
return parse(Token.parseStream(text));
}
private Node parse(Iterable<Token> tokens) {
State currentState = State.ExpectA;
Node current = null, root = null;
while(there are tokens) {
Token t = iterator.next();
switch(currentState) {
/* do stuff for all states */
/* example snippet for SeenC */
case SeenC:
if(t.Prefix.equals("B")) {
current.PN.PN.AddChildNode(new Node(t, current.PN.PN));
currentState = State.SeenB;
} else if(t.Prefix.equals("C")) {
}
}
return root;
}
}
I'm not satisfied with those trainwrecks to go up the hierarchy to insert a Node somewhere else (current.PN.PN). Eventually, explicit state classes would make the private parse method more readable. Then, the solution gets more akin to the one provided by #AlekseyOtrubennikov. Maybe a straight LL approach yields code that is more beautiful. Maybe best to just rephrase the grammar to a BNF one and delegate parser creation.
A straightforward LL parser, one production rule:
// "B" ("NN" || C)*
private Node rule_2(TokenStream ts, Node parent) {
// Literal "B"
Node B = literal(ts, "B", parent);
if(B == null) {
// error
return null;
}
while(true) {
// check for "NN"
Node nnLit = literal(ts, "NN", B);
if(nnLit != null)
B.AddChildNode(nnLit);
// check for C
Node c = rule_3(ts, parent);
if(c != null)
B.AddChildNode(c);
// finished when both rules did not match anything
if(nnLit == null && c == null)
break;
}
return B;
}
TokenStream enhances Iterable<Token> by allowing to lookahead into the stream - LL(1) because parser must choose between literal NN or deep diving in two cases (rule_2 being one of them). Looks nice, however, missing some C# features here...
#Stefan and #Aleksey are correct: this is simple parsing problem.
You can define your hierarchy constraints in Extended Backus-Naur Form:
A ::= { B }
B ::= { NN | C }
C ::= { NN | DD }
DD ::= { NN }
This description can be transformed into state machine and implemented. But there are a lot of tools that can effectively do this for you: Parser generators.
I am posting my answer only to show that it's quite easy to solve such problems with Haskell (or some other functional language).
Here is complete program that reads strings form stdin and prints parsed tree to the stdout.
-- We are using some standard libraries.
import Control.Applicative ((<$>), (<*>))
import Text.Parsec
import Data.Tree
-- This is EBNF-like description of what to do.
-- You can almost read it like a prose.
yourData = nodeA +>> eof
nodeA = node "A" nodeB
nodeB = node "B" (nodeC <|> nodeNN)
nodeC = node "C" (nodeNN <|> nodeDD)
nodeDD = node "DD" nodeNN
nodeNN = (`Node` []) <$> nodeLabel "NN"
node lbl children
= Node <$> nodeLabel lbl <*> many children
nodeLabel xx = (xx++)
<$> (string xx >> many digit)
+>> newline
-- And this is some auxiliary code.
f +>> g = f >>= \x -> g >> return x
main = do
txt <- getContents
case parse yourData "" txt of
Left err -> print err
Right res -> putStrLn $ drawTree res
Executing it with your data in zz.txt will print this nice tree:
$ ./xxx < zz.txt
A1
+- B1
| +- NN1
| +- NN2
| `- C1
| +- NN2
| +- DD1
| +- DD2
| | +- NN3
| | `- NN4
| `- DD3
| `- NN5
`- B2
+- NN6
+- C2
| +- DD4
| +- DD5
| | +- NN7
| | `- NN8
| `- DD6
| `- NN9
`- C3
+- DD7
`- DD8
And here is how it handles malformed input:
$ ./xxx
A1
B2
DD3
(line 3, column 1):
unexpected 'D'
expecting "B" or end of input

Resources