JFLEX AND CUP, cannot make it work correctly - flex-lexer

I'm working with jflex and cup, trying to make a html parser,
but can't make it work correctly,
Netbeans, the compile process dont stop, always continues,
Can't add more "tokens" correctly in the parse tree,
Also can't add space in "TEXTO" , this break the entire tree
lexicoh.jlex
package compiladorhtml;
import java_cup.runtime.*;
%%
%class Lexer
%line
%column
%cup
%{
private Symbol symbol(int type) {
return new Symbol(type, yyline, yycolumn);
}
private Symbol symbol(int type, Object value) {
return new Symbol(type, yyline, yycolumn, value);
}
%}
LineTerminator = \r|\n|\r\n
WhiteSpace = {LineTerminator} | [ \t\f]
texto = [a-zA-Z0-9_]*
%%
<YYINITIAL> {
"::" { System.out.print("<"); return symbol(sym.INI);}
"ENCA" { System.out.print("HEAD>"); return symbol(sym.HEAD);}
"/" { System.out.print("</"); return symbol(sym.FIN);}
{texto} { System.out.print(yytext()); return symbol(sym.TEXTO);}
{WhiteSpace} { /* just skip what was found, do nothing */ }
"&&" { System.out.print(""); return symbol(sym.FINAL); }
}
[^] { throw new Error("Illegal character <"+yytext()+">"); }
sintaticoh.cup
package compiladorhtml;
import java_cup.runtime.*;
parser code {:
public void report_error(String message, Object info) {
StringBuilder m = new StringBuilder("Error");
if (info instanceof java_cup.runtime.Symbol) {
java_cup.runtime.Symbol s = ((java_cup.runtime.Symbol) info);
if (s.left >= 0) {
m.append(" in line "+(s.left+1));
if (s.right >= 0)
m.append(", column "+(s.right+1));
}
}
m.append(" : "+message);
System.err.println(m);
}
public void report_fatal_error(String message, Object info) {
report_error(message, info);
System.exit(1);
}
:};
terminal INI, HEAD, TEXTO, FIN, FINAL;
non terminal Object expr_list, expr_part;
non terminal String expr;
expr_list ::= expr_list expr_part | expr_part;
expr_part ::= expr:e;
expr ::= INI HEAD
| TEXTO
| FIN HEAD
| FINAL;
java Main
public static void main(String[] args) throws IOException, Exception {
//CreateFiles();
//EJECUTAR PARA VER SI FUNCIONA, YA LO VI Y FUNCIONA
File fichero = new File("fichero.txt");
PrintWriter writer;
try {
writer = new PrintWriter(fichero);
writer.print("::ENCA NOMBRE ENCABEZADO /ENCA &&");
writer.close();
} catch (FileNotFoundException ex) {
System.out.println(ex);
}
Lexer thisscanner = new Lexer(new FileReader("fichero.txt"));
parser thisparser = new parser(thisscanner);
thisparser.parse();
}
public static void CreateFiles() {
String filelex = "path\\lexicoh.jlex";
File file = new File(filelex);
jflex.Main.generate(file);
String opciones[] = new String[5];
opciones[0] = "-destdir";
opciones[1] = "path";
opciones[2] = "-parser";
opciones[3] = "parser";
opciones[4] = "path\\sintacticoh.cup";
try {
java_cup.Main.main(opciones);
} catch (Exception ex) {
Logger.getLogger(CompiladorHTML.class.getName()).log(Level.SEVERE, null, ex);
}
}
thanks

I think that you should do this:
for {texto}, since it is a string composed by text and number, you should redefine it in this way:
{texto} { System.out.println (yytext()); return new symbol(sym.TEXTO, new String (yytext())); }
Then, if the program doesn't stop, there could be some problems during the read of the source file.

Related

C# ANTLR4 DefaultErrorStrategy or custom error listener does not catch unrecognized characters

It's quite strange, but DefaultErrorStrategy does not do anything for catching unrecognized characters from a stream. I tried a custom error strategy, a custom error listener and BailErrorStrategy - no luck here.
My grammar
grammar Polynomial;
parse : canonical EOF
;
canonical : polynomial+ #canonicalPolynom
| polynomial+ EQUAL polynomial+ #equality
;
polynomial : SIGN? '(' (polynomial)* ')' #parens
| monomial #monom
;
monomial : SIGN? coefficient? VAR ('^' INT)? #addend
| SIGN? coefficient #number
;
coefficient : INT | DEC;
INT : ('0'..'9')+;
DEC : INT '.' INT;
VAR : [a-z]+;
SIGN : '+' | '-';
EQUAL : '=';
WHITESPACE : (' '|'\t')+ -> skip;
and I'm giving an input 23*44=12 or #1234
I'm expecting that my parser throws mismatched token or any kind of exception for a character * or # that is not defined in my grammar.
Instead, my parser just skips * or # and traverse a tree like there are do not exist.
My handler function where I'm calling lexer, parser and that's kind of stuff.
private static (IParseTree tree, string parseErrorMessage) TryParseExpression(string expression)
{
ICharStream stream = CharStreams.fromstring(expression);
ITokenSource lexer = new PolynomialLexer(stream);
ITokenStream tokens = new CommonTokenStream(lexer);
PolynomialParser parser = new PolynomialParser(tokens);
//parser.ErrorHandler = new PolynomialErrorStrategy(); -> I tried custom error strategy
//parser.RemoveErrorListeners();
//parser.AddErrorListener(new PolynomialErrorListener()); -> I tried custom error listener
parser.BuildParseTree = true;
try
{
var tree = parser.canonical();
return (tree, string.Empty);
}
catch (RecognitionException re)
{
return (null, re.Message);
}
catch (ParseCanceledException pce)
{
return (null, pce.Message);
}
}
I tried to add a custom error listener.
public class PolynomialErrorListener : BaseErrorListener
{
private const string Eof = "EOF";
public override void SyntaxError(TextWriter output, IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg,
RecognitionException e)
{
if (msg.Contains(Eof))
{
throw new ParseCanceledException($"{GetSyntaxErrorHeader(charPositionInLine)}. Missing an expression after '=' sign");
}
if (e is NoViableAltException || e is InputMismatchException)
{
throw new ParseCanceledException($"{GetSyntaxErrorHeader(charPositionInLine)}. Probably, not closed operator");
}
throw new ParseCanceledException($"{GetSyntaxErrorHeader(charPositionInLine)}. {msg}");
}
private static string GetSyntaxErrorHeader(int errorPosition)
{
return $"Expression is invalid. Input is not valid at {--errorPosition} position";
}
}
After that, I tried to implement a custom error strategy.
public class PolynomialErrorStrategy : DefaultErrorStrategy
{
public override void ReportError(Parser recognizer, RecognitionException e)
{
throw e;
}
public override void Recover(Parser recognizer, RecognitionException e)
{
for (ParserRuleContext context = recognizer.Context; context != null; context = (ParserRuleContext) context.Parent) {
context.exception = e;
}
throw new ParseCanceledException(e);
}
public override IToken RecoverInline(Parser recognizer)
{
InputMismatchException e = new InputMismatchException(recognizer);
for (ParserRuleContext context = recognizer.Context; context != null; context = (ParserRuleContext) context.Parent) {
context.exception = e;
}
throw new ParseCanceledException(e);
}
protected override void ReportInputMismatch(Parser recognizer, InputMismatchException e)
{
string msg = "mismatched input " + GetTokenErrorDisplay(e.OffendingToken);
// msg += " expecting one of " + e.GetExpectedTokens().ToString(recognizer.());
RecognitionException ex = new RecognitionException(msg, recognizer, recognizer.InputStream, recognizer.Context);
throw ex;
}
protected override void ReportMissingToken(Parser recognizer)
{
BeginErrorCondition(recognizer);
IToken token = recognizer.CurrentToken;
IntervalSet expecting = GetExpectedTokens(recognizer);
string msg = "missing " + expecting.ToString() + " at " + GetTokenErrorDisplay(token);
throw new RecognitionException(msg, recognizer, recognizer.InputStream, recognizer.Context);
}
}
Is there any flag that I forgot to specify in a parser or I have incorrect grammar?
Funny thing that I'm using ANTLR plugin in my IDE and when I'm testing my grammar in here this plugin correctly responds with line 1:2 token recognition error at: '*'
Full source code: https://github.com/EvgeniyZ/PolynomialCanonicForm
I'm using ANTLR 4.8-complete.jar
Edit
I tried to add to a grammar rule
parse : canonical EOF
;
Still no luck here
What happens if you do this:
parse
: canonical EOF
;
and also invoke this rule:
var tree = parser.parse();
By adding the EOF token (end of input), you are forcing the parser to consume all tokens, which should result in an error when the parser cannot handle them properly.
Funny thing that I'm using ANTLR plugin in my IDE and when I'm testing my grammar in here this plugin correctly responds with line 1:2 token recognition error at: '*'
That is what the lexer emits on the std.err stream. The lexer just reports this warning and goes its merry way. So the lexer just ignores these chars and therefor never end up in the parser. If you add the following line at the end of your lexer:
// Fallback rule: matches any single character if not matched by another lexer rule
UNKNOWN : . ;
then the * and # chars will be sent to the parser as UNKNOWN tokens and should then cause recognition errors.

ANTLR's parser enters into "wrong" rule

I'm trying to create an interpreter for a simple programming language using ANTLR. So far it consists of print and numeric expressions.
I created a 'simpleExpr' parser rule to handle negative numbers. I tried other ways, too, but that's the only one which seems to work right for me. However, for some reason my visitor enters to this rule, even if I would expect it to visit my 'number' rule. I really think, that it's not the visitor's fault, because even the tree drawn by ANTLR shows this behavior. It's weird, but it would be OK, but my problem is, that when I try to print out the result of a simple addition, e.g. print(1+2); then it doesn't do that, but enters into 'number' rule instead of 'Plus' rule.
My grammar:
grammar BatshG;
/*
* Parser Rules
*/
compileUnit: (expression | ( println ';') | ( print ';' ))+;
expression:
left=expression '/' right=simpleExpr #Divi
| left=expression '*' right=simpleExpr #Mult
| left=expression '-' right=simpleExpr #Minus
| left=expression '+' right=simpleExpr #Plus
| number=simpleExpr #Number
;
println: 'println' '(' argument=expression ')';
print: 'print' '(' argument=expression ')';
simpleExpr
: (MINUS)?
(FLOAT | INTEGER)
;
MINUS: '-';
INTEGER: [0-9] [0-9]*;
DIGIT : [0-9] ;
FRAC : '.' DIGIT+ ;
EXP : [eE] [-+]? DIGIT+ ;
FLOAT : DIGIT* FRAC EXP? ;
WS: [ \n\t\r]+ -> channel(HIDDEN);
If it helps, here is my visualized tree generated by ANTLR for
print(1+2);
Update:
The visitor class, if it counts:
public class BatshGVisitor : BatshGBaseVisitor<ResultValue>
{
public ResultValue Result { get; set; }
public StringBuilder OutputForPrint { get; set; }
public override ResultValue VisitCompileUnit([NotNull] BatshGParser.CompileUnitContext context)
{
OutputForPrint = new StringBuilder("");
var resultvalue = VisitChildren(context);
Result = new ResultValue() { ExpType = "string", ExpValue = resultvalue.ExpValue };
return Result;
}
public override ResultValue VisitPlus([NotNull] BatshGParser.PlusContext context)
{
var leftExp = VisitChildren(context.left);
var rigthExp = VisitChildren(context.right);
return new ResultValue()
{
ExpType = "number",
ExpValue = (double)leftExp.ExpValue + (double)rigthExp.ExpValue
};
}
//public override ResultValue VisitNumber([NotNull] BatshGParser.NumberContext context)
//{
// return new ResultValue()
// {
// ExpType = "number",
// ExpValue = Double.Parse(context.GetChild(0).GetText()
// + context.GetChild(1).GetText()
// + context.GetChild(2).GetText()
// , CultureInfo.InvariantCulture)
// };
//}
public override ResultValue VisitPrint([NotNull] BatshGParser.PrintContext context)
{
var viCh = VisitChildren(context.argument);
var viChVa = viCh.ExpValue;
string printInner = viChVa.ToString();
var toPrint = new ResultValue()
{
ExpType = viCh.ExpType,
ExpValue = printInner
};
OutputForPrint.Append(toPrint.ExpValue);
return toPrint;
}
public override ResultValue VisitSimpleExpr([NotNull] BatshGParser.SimpleExprContext context)
{
string numberToConvert = "";
if (context.ChildCount == 1)
{
numberToConvert = context.GetChild(0).GetText();
}
else if (context.GetChild(0).ToString() == "-")
{
if (context.ChildCount == 2)
{
numberToConvert = "-" + context.GetChild(1);
}
if (context.ChildCount == 4)
{
numberToConvert = context.GetChild(0).ToString() + context.GetChild(1).ToString() +
context.GetChild(2).ToString() + context.GetChild(3).ToString();
}
}
return new ResultValue()
{
ExpType = "number",
ExpValue = Double.Parse(numberToConvert, CultureInfo.InvariantCulture)
};
}
protected override ResultValue AggregateResult(ResultValue aggregate, ResultValue nextResult)
{
if (aggregate == null)
return new ResultValue()
{
ExpType = nextResult.ExpType,
ExpValue = nextResult.ExpValue
};
if (nextResult == null)
{
return aggregate;
}
return null;
}
}
What's the problem with my grammar?
Thank you!
Inside the visit method for print statements, you have this:
var viCh = VisitChildren(context.argument);
So let's say your input was print(1+2);. Then context.argument would be the PlusContext for 1+2 and the children of context.argument would be a NumberContext for 1, a Token object for + and a SimpleExpression object for 2. So by calling VisitChildren, you're going to visit those children, which is why it never runs VisitPlus and goes directly to the numbers.
Generally, you rarely want to visit the children of some other node. You usually want to visit your own children, not skip the children and directly visit the grand children. So what you should do instead is to call Visit(context.argument);.

how to indent for tab space after writing a statement

How to get a tab space for beautification after writing a statement in Xtext.
Here is my code is in Xtext grammar :
Block:
'_block'
name=ID
'_endblock'
;
and UI template is
override complete_BLOCK(EObject model, RuleCall ruleCall, ContentAssistContext context, ICompletionProposalAcceptor acceptor) {
super.complete_BLOCK(model, ruleCall, context, acceptor)
acceptor.accept(createCompletionProposal("_block \n
_endblock","_block",null,context));
}
How do I indent for a tab space after writing a block statement?
to implement a formatter
open the mwe2 file
add formatter = {generateStub = true} to the language = StandardLanguage { section of the workflow
regenerate the language
open the MyDslFormatter Xtend class and implement it
to call the formatter
mark the section to format or dont mark to format everything
call rightclick -> Source -> Format or the Shortcut Cmd/Crtl + Shift + F
here is a very naive no failsafe impl of an auto edit strategy
package org.xtext.example.mydsl1.ui;
import org.eclipse.jface.text.BadLocationException;
import org.eclipse.jface.text.DocumentCommand;
import org.eclipse.jface.text.IAutoEditStrategy;
import org.eclipse.jface.text.IDocument;
import org.eclipse.jface.text.IRegion;
import org.eclipse.xtext.ui.editor.autoedit.DefaultAutoEditStrategyProvider;
import com.google.inject.Inject;
import com.google.inject.Provider;
public class YourAutoEditStrategyProvider extends DefaultAutoEditStrategyProvider {
public static class BlockStrategy implements IAutoEditStrategy {
private static final String BLOCK = "_block";
protected int findEndOfWhiteSpace(IDocument document, int offset, int end) throws BadLocationException {
while (offset < end) {
char c= document.getChar(offset);
if (c != ' ' && c != '\t') {
return offset;
}
offset++;
}
return end;
}
#Override
public void customizeDocumentCommand(IDocument d, DocumentCommand c) {
if ("\n".equals(c.text)) {
if (d.getLength()> BLOCK.length()) {
try {
if ((BLOCK+" ").equals(d.get(c.offset-BLOCK.length()-1, BLOCK.length()+1)) || (BLOCK).equals(d.get(c.offset-BLOCK.length(), BLOCK.length()))) {
int p= (c.offset == d.getLength() ? c.offset - 1 : c.offset);
IRegion info= d.getLineInformationOfOffset(p);
int start= info.getOffset();
// find white spaces
int end= findEndOfWhiteSpace(d, start, c.offset);
int l = 0;
StringBuilder buf= new StringBuilder(c.text);
if (end > start) {
// append to input
buf.append(d.get(start, end - start));
l += (end - start);
}
buf.append("\t");
buf.append("\n");
buf.append(d.get(start, end - start));
c.text= buf.toString();
c.caretOffset = c.offset+2+l;
c.shiftsCaret=false;
}
} catch (BadLocationException e) {
e.printStackTrace();
}
}
}
}
}
#Inject
private Provider<BlockStrategy> blockStrategy;
#Override
protected void configure(IEditStrategyAcceptor acceptor) {
super.configure(acceptor);
acceptor.accept(blockStrategy.get(), IDocument.DEFAULT_CONTENT_TYPE);
}
}
and dont forget to bind
class MyDslUiModule extends AbstractMyDslUiModule {
override bindAbstractEditStrategyProvider() {
YourAutoEditStrategyProvider
}
}

How do I pretty-print productions and line numbers, using ANTLR4?

I'm trying to write a piece of code that will take an ANTLR4 parser and use it to generate ASTs for inputs similar to the ones given by the -tree option on grun (misc.TestRig). However, I'd additionally like for the output to include all the line number/offset information.
For example, instead of printing
(add (int 5) '+' (int 6))
I'd like to get
(add (int 5 [line 3, offset 6:7]) '+' (int 6 [line 3, offset 8:9]) [line 3, offset 5:10])
Or something similar.
There aren't a tremendous number of visitor examples for ANTLR4 yet, but I am pretty sure I can do most of this by copying the default implementation for toStringTree (used by grun). However, I do not see any information about the line numbers or offsets.
I expected to be able to write super simple code like this:
String visit(ParseTree t) {
return "(" + t.productionName + t.visitChildren() + t.lineNumber + ")";
}
but it doesn't seem to be this simple. I'm guessing I should be able to get line number information from the parser, but I haven't figured out how to do so. How can I grab this line number/offset information in my traversal?
To fill in the few blanks in the solution below, I used:
List<String> ruleNames = Arrays.asList(parser.getRuleNames());
parser.setBuildParseTree(true);
ParserRuleContext prc = parser.program();
ParseTree tree = prc;
to get the tree and the ruleNames. program is the name for the top production in my grammar.
The Trees.toStringTree method can be implemented using a ParseTreeListener. The following listener produces exactly the same output as Trees.toStringTree.
public class TreePrinterListener implements ParseTreeListener {
private final List<String> ruleNames;
private final StringBuilder builder = new StringBuilder();
public TreePrinterListener(Parser parser) {
this.ruleNames = Arrays.asList(parser.getRuleNames());
}
public TreePrinterListener(List<String> ruleNames) {
this.ruleNames = ruleNames;
}
#Override
public void visitTerminal(TerminalNode node) {
if (builder.length() > 0) {
builder.append(' ');
}
builder.append(Utils.escapeWhitespace(Trees.getNodeText(node, ruleNames), false));
}
#Override
public void visitErrorNode(ErrorNode node) {
if (builder.length() > 0) {
builder.append(' ');
}
builder.append(Utils.escapeWhitespace(Trees.getNodeText(node, ruleNames), false));
}
#Override
public void enterEveryRule(ParserRuleContext ctx) {
if (builder.length() > 0) {
builder.append(' ');
}
if (ctx.getChildCount() > 0) {
builder.append('(');
}
int ruleIndex = ctx.getRuleIndex();
String ruleName;
if (ruleIndex >= 0 && ruleIndex < ruleNames.size()) {
ruleName = ruleNames.get(ruleIndex);
}
else {
ruleName = Integer.toString(ruleIndex);
}
builder.append(ruleName);
}
#Override
public void exitEveryRule(ParserRuleContext ctx) {
if (ctx.getChildCount() > 0) {
builder.append(')');
}
}
#Override
public String toString() {
return builder.toString();
}
}
The class can be used as follows:
List<String> ruleNames = ...;
ParseTree tree = ...;
TreePrinterListener listener = new TreePrinterListener(ruleNames);
ParseTreeWalker.DEFAULT.walk(listener, tree);
String formatted = listener.toString();
The class can be modified to produce the information in your output by updating the exitEveryRule method:
#Override
public void exitEveryRule(ParserRuleContext ctx) {
if (ctx.getChildCount() > 0) {
Token positionToken = ctx.getStart();
if (positionToken != null) {
builder.append(" [line ");
builder.append(positionToken.getLine());
builder.append(", offset ");
builder.append(positionToken.getStartIndex());
builder.append(':');
builder.append(positionToken.getStopIndex());
builder.append("])");
}
else {
builder.append(')');
}
}
}

mapping list of string into hierarchical structure of objects

This is not a homework problem. This questions was asked to one of my friend in an interview test.
I have a list of lines read from a file as input. Each line has a identifier such as (A,B,NN,C,DD) at the start of line. Depending upon the identifier, I need to map the list of records into a single object A which contains a hierarchy structure of objects.
Description of Hierarchy :
Each A can have zero or more B types.
Each B identifier can have zero or more NN and C as child. Similarly each C segment can have zero or more NN and DD child. Abd each DD can have zero or more NN as child.
Mapping classes and their hierarchy:
All the class will have value to hold the String value from current line.
**A - will have list of B**
class A {
List<B> bList;
String value;
public A(String value) {
this.value = value;
}
public void addB(B b) {
if (bList == null) {
bList = new ArrayList<B>();
}
bList.add(b);
}
}
**B - will have list of NN and list of C**
class B {
List<C> cList;
List<NN> nnList;
String value;
public B(String value) {
this.value = value;
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
public void addC(C c) {
if (cList == null) {
cList = new ArrayList<C>();
}
cList.add(c);
}
}
**C - will have list of DDs and NNs**
class C {
List<DD> ddList;
List<NN> nnList;
String value;
public C(String value) {
this.value = value;
}
public void addDD(DD dd) {
if (ddList == null) {
ddList = new ArrayList<DD>();
}
ddList.add(dd);
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
}
**DD - will have list of NNs**
class DD {
String value;
List<NN> nnList;
public DD(String value) {
this.value = value;
}
public void addNN(NN nn) {
if (nnList == null) {
nnList = new ArrayList<NN>();
}
nnList.add(nn);
}
}
**NN- will hold the line only**
class NN {
String value;
public NN(String value) {
this.value = value;
}
}
What I Did So Far :
The method public A parse(List<String> lines) reads the input list and returns the object A. Since, there might be multiple B, i have created separate method 'parseB to parse each occurrence.
At parseB method, loops through the i = startIndex + 1 to i < lines.size() and checks the start of lines. Occurrence of "NN" is added to current object of B. If "C" is detected at start, it calls another method parseC. The loop will break when we detect "B" or "A" at start.
Similar logic is used in parseC_DD.
public class GTTest {
public A parse(List<String> lines) {
A a;
for (int i = 0; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("A")) {
a = new A(curLine);
continue;
}
if (curLine.startsWith("B")) {
i = parseB(lines, i); // returns index i to skip all the lines that are read inside parseB(...)
continue;
}
}
return a; // return mapped object
}
private int parseB(List<String> lines, int startIndex) {
int i;
B b = new B(lines.get(startIndex));
for (i = startIndex + 1; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
b.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("C")) {
i = parseC(b, lines, i);
continue;
}
a.addB(b);
if (curLine.startsWith("B") || curLine.startsWith("A")) { //ending condition
System.out.println("B A "+curLine);
--i;
break;
}
}
return i; // return nextIndex to read
}
private int parseC(B b, List<String> lines, int startIndex) {
int i;
C c = new C(lines.get(startIndex));
for (i = startIndex + 1; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
c.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("DD")) {
i = parseC_DD(c, lines, i);
continue;
}
b.addC(c);
if (curLine.startsWith("C") || curLine.startsWith("A") || curLine.startsWith("B")) {
System.out.println("C A B "+curLine);
--i;
break;
}
}
return i;//return next index
}
private int parseC_DD(C c, List<String> lines, int startIndex) {
int i;
DD d = new DD(lines.get(startIndex));
c.addDD(d);
for (i = startIndex; i < lines.size(); i++) {
String curLine = lines.get(i);
if (curLine.startsWith("NN")) {
d.addNN(new NN(curLine));
continue;
}
if (curLine.startsWith("DD")) {
d=new DD(curLine);
continue;
}
c.addDD(d);
if (curLine.startsWith("NN") || curLine.startsWith("C") || curLine.startsWith("A") || curLine.startsWith("B")) {
System.out.println("NN C A B "+curLine);
--i;
break;
}
}
return i;//return next index
}
public static void main(String[] args) {
GTTest gt = new GTTest();
List<String> list = new ArrayList<String>();
list.add("A1");
list.add("B1");
list.add("NN1");
list.add("NN2");
list.add("C1");
list.add("NNXX");
list.add("DD1");
list.add("DD2");
list.add("NN3");
list.add("NN4");
list.add("DD3");
list.add("NN5");
list.add("B2");
list.add("NN6");
list.add("C2");
list.add("DD4");
list.add("DD5");
list.add("NN7");
list.add("NN8");
list.add("DD6");
list.add("NN7");
list.add("C3");
list.add("DD7");
list.add("DD8");
A a = gt.parse(list);
//show values of a
}
}
My logic is not working properly. Is there any other approach you can figure out? Do you have any suggestions/improvements to my way?
Use hierarchy of objects:
public interface Node {
Node getParent();
Node getLastChild();
boolean addChild(Node n);
void setValue(String value);
Deque getChildren();
}
private static abstract class NodeBase implements Node {
...
abstract boolean canInsert(Node n);
public String toString() {
return value;
}
...
}
public static class A extends NodeBase {
boolean canInsert(Node n) {
return n instanceof B;
}
}
public static class B extends NodeBase {
boolean canInsert(Node n) {
return n instanceof NN || n instanceof C;
}
}
...
public static class NN extends NodeBase {
boolean canInsert(Node n) {
return false;
}
}
Create a tree class:
public class MyTree {
Node root;
Node lastInserted = null;
public void insert(String label) {
Node n = NodeFactory.create(label);
if (lastInserted == null) {
root = n;
lastInserted = n;
return;
}
Node current = lastInserted;
while (!current.addChild(n)) {
current = current.getParent();
if (current == null) {
throw new RuntimeException("Impossible to insert " + n);
}
}
lastInserted = n;
}
...
}
And then print the tree:
public class MyTree {
...
public static void main(String[] args) {
List input;
...
MyTree tree = new MyTree();
for (String line : input) {
tree.insert(line);
}
tree.print();
}
public void print() {
printSubTree(root, "");
}
private static void printSubTree(Node root, String offset) {
Deque children = root.getChildren();
Iterator i = children.descendingIterator();
System.out.println(offset + root);
while (i.hasNext()) {
printSubTree(i.next(), offset + " ");
}
}
}
A mealy automaton solution with 5 states:
wait for A,
seen A,
seen B,
seen C, and
seen DD.
The parse is done completely in one method. There is one current Node that is the last Node seen except the NN ones. A Node has a parent Node except the root. In state seen (0), the current Node represents a (0) (e.g. in state seen C, current can be C1 in the example above). The most fiddling is in state seen DD, that has the most outgoing edges (B, C, DD, and NN).
public final class Parser {
private final static class Token { /* represents A1 etc. */ }
public final static class Node implements Iterable<Node> {
/* One Token + Node children, knows its parent */
}
private enum State { ExpectA, SeenA, SeenB, SeenC, SeenDD, }
public Node parse(String text) {
return parse(Token.parseStream(text));
}
private Node parse(Iterable<Token> tokens) {
State currentState = State.ExpectA;
Node current = null, root = null;
while(there are tokens) {
Token t = iterator.next();
switch(currentState) {
/* do stuff for all states */
/* example snippet for SeenC */
case SeenC:
if(t.Prefix.equals("B")) {
current.PN.PN.AddChildNode(new Node(t, current.PN.PN));
currentState = State.SeenB;
} else if(t.Prefix.equals("C")) {
}
}
return root;
}
}
I'm not satisfied with those trainwrecks to go up the hierarchy to insert a Node somewhere else (current.PN.PN). Eventually, explicit state classes would make the private parse method more readable. Then, the solution gets more akin to the one provided by #AlekseyOtrubennikov. Maybe a straight LL approach yields code that is more beautiful. Maybe best to just rephrase the grammar to a BNF one and delegate parser creation.
A straightforward LL parser, one production rule:
// "B" ("NN" || C)*
private Node rule_2(TokenStream ts, Node parent) {
// Literal "B"
Node B = literal(ts, "B", parent);
if(B == null) {
// error
return null;
}
while(true) {
// check for "NN"
Node nnLit = literal(ts, "NN", B);
if(nnLit != null)
B.AddChildNode(nnLit);
// check for C
Node c = rule_3(ts, parent);
if(c != null)
B.AddChildNode(c);
// finished when both rules did not match anything
if(nnLit == null && c == null)
break;
}
return B;
}
TokenStream enhances Iterable<Token> by allowing to lookahead into the stream - LL(1) because parser must choose between literal NN or deep diving in two cases (rule_2 being one of them). Looks nice, however, missing some C# features here...
#Stefan and #Aleksey are correct: this is simple parsing problem.
You can define your hierarchy constraints in Extended Backus-Naur Form:
A ::= { B }
B ::= { NN | C }
C ::= { NN | DD }
DD ::= { NN }
This description can be transformed into state machine and implemented. But there are a lot of tools that can effectively do this for you: Parser generators.
I am posting my answer only to show that it's quite easy to solve such problems with Haskell (or some other functional language).
Here is complete program that reads strings form stdin and prints parsed tree to the stdout.
-- We are using some standard libraries.
import Control.Applicative ((<$>), (<*>))
import Text.Parsec
import Data.Tree
-- This is EBNF-like description of what to do.
-- You can almost read it like a prose.
yourData = nodeA +>> eof
nodeA = node "A" nodeB
nodeB = node "B" (nodeC <|> nodeNN)
nodeC = node "C" (nodeNN <|> nodeDD)
nodeDD = node "DD" nodeNN
nodeNN = (`Node` []) <$> nodeLabel "NN"
node lbl children
= Node <$> nodeLabel lbl <*> many children
nodeLabel xx = (xx++)
<$> (string xx >> many digit)
+>> newline
-- And this is some auxiliary code.
f +>> g = f >>= \x -> g >> return x
main = do
txt <- getContents
case parse yourData "" txt of
Left err -> print err
Right res -> putStrLn $ drawTree res
Executing it with your data in zz.txt will print this nice tree:
$ ./xxx < zz.txt
A1
+- B1
| +- NN1
| +- NN2
| `- C1
| +- NN2
| +- DD1
| +- DD2
| | +- NN3
| | `- NN4
| `- DD3
| `- NN5
`- B2
+- NN6
+- C2
| +- DD4
| +- DD5
| | +- NN7
| | `- NN8
| `- DD6
| `- NN9
`- C3
+- DD7
`- DD8
And here is how it handles malformed input:
$ ./xxx
A1
B2
DD3
(line 3, column 1):
unexpected 'D'
expecting "B" or end of input

Resources