How to set timeout for parser when it takes too long? - parsing

I'm using ANTLR4 in C# with the following code sample:
AntlrInputStream antlrStream = new AntlrInputStream(text);
MyLexer myLexer = new(new AntlrInputStream());
myLexer.SetInputStream(antlrStream);
CommonTokenStream myTokens = new CommonTokenStream(myLexer);
parser = new MyParser(myTokens)
{
BuildParseTree = true,
};
IParseTree tree = parser.startRule();
Class MyLexer/MyParser are derived from the classes Lexer/Parser of Anlr4.Runtime and were auto generated by ANTLR4.
In some rare cases, with specific text, startRule() takes forever and never finishes. I want to be able to set some kind of a "Timeout" for the parsing and throw an Exception.
Any advice what is the recommended way to do it?

I looked at this temporarily a while back. You can essentially create a wrapper over the generated parser and override one of the methods. I use ANTLR with Kotlin, so excuse the below example.
class InterruptibleParser : YourParser() {
override fun enterRule() {
if (Thread.interrupted()) {
throw InterruptedException()
}
return super.enterRule()
}
}
I tried it with either enterRule, or consume, or getContext -- I don't remember which function gets called frequently enough.
But with the above working, you can instantiate the parser and interrupt its thread after a certain amount of time. Would likely make your parsing fairly slower (maybe around 25% slower if I'm remembering correctly). Anyways, hope this helps.

Related

Should I use Exceptions while parsing complex user input

when looking for Information when and why to use Exceptions there are many people (also on this platform) making the point of not using exceptions when validating user-input because invalid input is not an exceptional thing to happen.
I now have the case where I have to parse a complex string of user input and map it to an Object-Tree basically, similar to a Parser.
Example in pseudo code:
input:
----
hello[5]
+
foo["ok"]
----
results in something like that:
class Hello {
int id = 5
}
class Add {}
class foo {
string name = 'ok'
}
Now in order to "validate" that input I have to parse it, having code that parses the input for validation and code to create the objects separately feels redundant.
Currently I'm using Exceptions while parsing single tokens to collect all Errors.
// one token is basically a single
try {
foreach (token in tokens) {
factory = getFactory(token) // throws ParseException
addObject(factory.create(token)) // throws ParseException
}
} catch (ParseException e) {
// e.g. "Foo Token expects value to be string"
addError(e)
}
is this bad use of exceptions?
An alternative would be to inject a validation class in every factory or mess around with return types (feels a bit dirty)
If exceptions work for your use case, go for it.
The usual problem with exceptions is that they don't let you fix things up and continue, which makes it hard to implement parser error recovery. You can't really fix up a bad input, and you probably shouldn't even in cases where you could, but error recovery lets you report more than one error from the same input, which is often considered convenient.
All of that depends on your needs and parsing strategy, so there's not a lot of information to go on here.

OWLKnowledgeExplorerReasoner - getObjectLabel always ends in error Unreachable situation

I am trying to access information about completion graph, but everytime it ends with error uk.ac.manchester.cs.jfact.helpers.UnreachableSituationException: Unreachable situation! when I call getObjectLabel(rootNode, false/true). I was trying it on every class expression from the ontology but always ended up with the error message.
Set<OWLClassExpression> types = classSet2classExpSet(hybridSolver.ontology.classesInSignature().collect(toSet()));
for (OWLClassExpression e : types) {
OWLKnowledgeExplorerReasoner.RootNode rootNode = loader.getReasoner().getRoot(e);
System.out.println(loader.getReasoner().getObjectLabel(rootNode, false)); //problem UnreachableSituation !!
Node<OWLObjectProperty> propertyNode = (Node<OWLObjectProperty>) loader.getReasoner().getObjectNeighbours(rootNode, false);
for (OWLObjectProperty p : propertyNode.getEntities()) {
Collection<OWLKnowledgeExplorerReasoner.RootNode> rootNodes = loader.getReasoner().getObjectNeighbours(rootNode, p);
...
}
}
Other method getObjectNeighbours(rootNote, false) works fine.
Can somebody help? Is there any way to access completion graph with OWLAPI? Why it might end with this error?
The labels found for the nodes in question are not named class expressions (e.g., they are AND nodes. These cannot be translated back to OWLClass and there's no current implementation for translating back class expressions.
Tweaking the code to remove the exceptions is doable but for your ontology example you'd always get back empty nodes, which isn't very informative.
I have removed the exception throwing in the latest version 5 branch, however I doubt this is sufficient for your needs.

Caching streams in Functional Reactive Programming

I have an application which is written entirely using the FRP paradigm and I think I am having performance issues due to the way that I am creating the streams. It is written in Haxe but the problem is not language specific.
For example, I have this function which returns a stream that resolves every time a config file is updated for that specific section like the following:
function getConfigSection(section:String) : Stream<Map<String, String>> {
return configFileUpdated()
.then(filterForSectionChanged(section))
.then(readFile)
.then(parseYaml);
}
In the reactive programming library I am using called promhx each step of the chain should remember its last resolved value but I think every time I call this function I am recreating the stream and reprocessing each step. This is a problem with the way I am using it rather than the library.
Since this function is called everywhere parsing the YAML every time it is needed is killing the performance and is taking up over 50% of the CPU time according to profiling.
As a fix I have done something like the following using a Map stored as an instance variable that caches the streams:
function getConfigSection(section:String) : Stream<Map<String, String>> {
var cachedStream = this._streamCache.get(section);
if (cachedStream != null) {
return cachedStream;
}
var stream = configFileUpdated()
.filter(sectionFilter(section))
.then(readFile)
.then(parseYaml);
this._streamCache.set(section, stream);
return stream;
}
This might be a good solution to the problem but it doesn't feel right to me. I am wondering if anyone can think of a cleaner solution that maybe uses a more functional approach (closures etc.) or even an extension I can add to the stream like a cache function.
Another way I could do it is to create the streams before hand and store them in fields that can be accessed by consumers. I don't like this approach because I don't want to make a field for every config section, I like being able to call a function with a specific section and get a stream back.
I'd love any ideas that could give me a fresh perspective!
Well, I think one answer is to just abstract away the caching like so:
class Test {
static function main() {
var sideeffects = 0;
var cached = memoize(function (x) return x + sideeffects++);
cached(1);
trace(sideeffects);//1
cached(1);
trace(sideeffects);//1
cached(3);
trace(sideeffects);//2
cached(3);
trace(sideeffects);//2
}
#:generic static function memoize<In, Out>(f:In->Out):In->Out {
var m = new Map<In, Out>();
return
function (input:In)
return switch m[input] {
case null: m[input] = f(input);
case output: output;
}
}
}
You may be able to find a more "functional" implementation for memoize down the road. But the important thing is that it is a separate thing now and you can use it at will.
You may choose to memoize(parseYaml) so that toggling two states in the file actually becomes very cheap after both have been parsed once. You can also tweak memoize to manage the cache size according to whatever strategy proves the most valuable.

ANTLR Parse tree modification

I'm using ANTLR4 to create a parse tree for my grammar, what I want to do is modify certain nodes in the tree. This will include removing certain nodes and inserting new ones. The purpose behind this is optimization for the language I am writing. I have yet to find a solution to this problem. What would be the best way to go about this?
While there is currently no real support or tools for tree rewriting, it is very possible to do. It's not even that painful.
The ParseTreeListener or your MyBaseListener can be used with a ParseTreeWalker to walk your parse tree.
From here, you can remove nodes with ParserRuleContext.removeLastChild(), however when doing this, you have to watch out for ParseTreeWalker.walk:
public void walk(ParseTreeListener listener, ParseTree t) {
if ( t instanceof ErrorNode) {
listener.visitErrorNode((ErrorNode)t);
return;
}
else if ( t instanceof TerminalNode) {
listener.visitTerminal((TerminalNode)t);
return;
}
RuleNode r = (RuleNode)t;
enterRule(listener, r);
int n = r.getChildCount();
for (int i = 0; i<n; i++) {
walk(listener, r.getChild(i));
}
exitRule(listener, r);
}
You must replace removed nodes with something if the walker has visited parents of those nodes, I usually pick empty ParseRuleContext objects (this is because of the cached value of n in the method above). This prevents the ParseTreeWalker from throwing a NPE.
When adding nodes, make sure to set the mutable parent on the ParseRuleContext to the new parent. Also, because of the cached n in the method above, a good strategy is to detect where the changes need to be before you hit where you want your changes to go in the walk, so the ParseTreeWalker will walk over them in the same pass (other wise you might need multiple passes...)
Your pseudo code should look like this:
public void enterRewriteTarget(#NotNull MyParser.RewriteTargetContext ctx){
if(shouldRewrite(ctx)){
ArrayList<ParseTree> nodesReplaced = replaceNodes(ctx);
addChildTo(ctx, createNewParentFor(nodesReplaced));
}
}
I've used this method to write a transpiler that compiled a synchronous internal language into asynchronous javascript. It was pretty painful.
Another approach would be to write a ParseTreeVisitor that converts the tree back to a string. (This can be trivial in some cases, because you are only calling TerminalNode.getText() and concatenate in aggregateResult(..).)
You then add the modifications to this visitor so that the resulting string representation contains the modifications you try to achieve.
Then parse the string and you get a parse tree with the desired modifications.
This is certainly hackish in some ways, since you parse the string twice. On the other hand the solution does not rely on antlr implementation details.
I needed something similar for simple transformations. I ended up using a ParseTreeWalker and a custom ...BaseListener where I overwrote the enter... methods. Inside this method the ParserRuleContext.children is available and can be manipulated.
class MyListener extends ...BaseListener {
#Override
public void enter...(...Context ctx) {
super.enter...(ctx);
ctx.children.add(...);
}
}
new ParseTreeWalker().walk(new MyListener(), parseTree);

Irony AST generation throws nullreference excepttion

I'm getting started with Irony (version Irony_2012_03_15) but I pretty quickly got stuck when trying to generate an AST. Below is a completely strpped language that throws the exception:
[Language("myLang", "0.1", "Bla Bla")]
public class MyLang: Grammar {
public NModel()
: base(false) {
var number = TerminalFactory.CreateCSharpNumber("number");
var binExpr = new NonTerminal("binExpr", typeof(BinaryOperationNode));
var binOp = new NonTerminal("BinOp");
binExpr.Rule = number + binOp + number;
binOp.Rule = ToTerm("+");
RegisterOperators(1, "+");
//MarkTransient(binOp);
this.Root = binExpr;
this.LanguageFlags = Parsing.LanguageFlags.CreateAst; // if I uncomment this line it throws the error
}
}
As soon as I uncomment the last line it throws a NullReferenceException in the grammar explorer or when i want to parse a test. The error is on AstBuilder.cs line 96:
parseNode.AstNode = config.DefaultNodeCreator();
DefaultNodeCreator is a delegate that has not been set.
I've tried setting things with MarkTransient etc but no dice.
Can someone help me afloat here? I'm proably missing something obvious. Looked for AST tutorials all over the webs but I can't seem to find an explanation on how that works.
Thanks in advance,
Gert-Jan
Once you set the LanguageFlags.CreateAst flag on the grammar, you must provide additional information about how to create the AST.
You're supposed to be able to set AstContext.Default*Type for the whole language, but that is currently bugged.
Set TermFlags.NoAstNode. Irony will ignore this node and its children.
Set AstConfig.NodeCreator. This is a delegate that can do the right thing.
Set AstConfig.NodeType to the type of the AstNode. This type should be accessible, implement IAstInit, and have a public, no-parameters constructor. Accessible in this case means either public or internal with the InternalsVisibleTo attribute.
To be honest, I was facing the same problem and did not understand Jay Bazuzi answer, though it looks like valid one(maybe it's outdated).
If there's anyone like me;
I just inherited my Grammar from Irony.Interpreter.InterpretedLanguageGrammar class, and it works. Also, anyone trying to get AST working, make sure your nodes are "public" :- )
On top of Jay's and Erti-Chris's responses, this thread is also useful:
https://irony.codeplex.com/discussions/361018
The creator of Irony points out the relevant configuration code in InterpretedLanguageGrammar.BuildAst.
HTH

Resources