Should I use Exceptions while parsing complex user input - parsing

when looking for Information when and why to use Exceptions there are many people (also on this platform) making the point of not using exceptions when validating user-input because invalid input is not an exceptional thing to happen.
I now have the case where I have to parse a complex string of user input and map it to an Object-Tree basically, similar to a Parser.
Example in pseudo code:
input:
----
hello[5]
+
foo["ok"]
----
results in something like that:
class Hello {
int id = 5
}
class Add {}
class foo {
string name = 'ok'
}
Now in order to "validate" that input I have to parse it, having code that parses the input for validation and code to create the objects separately feels redundant.
Currently I'm using Exceptions while parsing single tokens to collect all Errors.
// one token is basically a single
try {
foreach (token in tokens) {
factory = getFactory(token) // throws ParseException
addObject(factory.create(token)) // throws ParseException
}
} catch (ParseException e) {
// e.g. "Foo Token expects value to be string"
addError(e)
}
is this bad use of exceptions?
An alternative would be to inject a validation class in every factory or mess around with return types (feels a bit dirty)

If exceptions work for your use case, go for it.
The usual problem with exceptions is that they don't let you fix things up and continue, which makes it hard to implement parser error recovery. You can't really fix up a bad input, and you probably shouldn't even in cases where you could, but error recovery lets you report more than one error from the same input, which is often considered convenient.
All of that depends on your needs and parsing strategy, so there's not a lot of information to go on here.

Related

How to set timeout for parser when it takes too long?

I'm using ANTLR4 in C# with the following code sample:
AntlrInputStream antlrStream = new AntlrInputStream(text);
MyLexer myLexer = new(new AntlrInputStream());
myLexer.SetInputStream(antlrStream);
CommonTokenStream myTokens = new CommonTokenStream(myLexer);
parser = new MyParser(myTokens)
{
BuildParseTree = true,
};
IParseTree tree = parser.startRule();
Class MyLexer/MyParser are derived from the classes Lexer/Parser of Anlr4.Runtime and were auto generated by ANTLR4.
In some rare cases, with specific text, startRule() takes forever and never finishes. I want to be able to set some kind of a "Timeout" for the parsing and throw an Exception.
Any advice what is the recommended way to do it?
I looked at this temporarily a while back. You can essentially create a wrapper over the generated parser and override one of the methods. I use ANTLR with Kotlin, so excuse the below example.
class InterruptibleParser : YourParser() {
override fun enterRule() {
if (Thread.interrupted()) {
throw InterruptedException()
}
return super.enterRule()
}
}
I tried it with either enterRule, or consume, or getContext -- I don't remember which function gets called frequently enough.
But with the above working, you can instantiate the parser and interrupt its thread after a certain amount of time. Would likely make your parsing fairly slower (maybe around 25% slower if I'm remembering correctly). Anyways, hope this helps.

ANTLR best practice for finding and catching parse errors

This question concerns how to get error messages out of an ANTLR4 parser in C# in Visual Studio. I feed the ANTLR parser a known bad input string, but I am not seeing any errors or parse exceptions thrown during the (bad) parse. Thus, my exception handler does not get a chance to create and store any error messages during the parse.
I am working with an ANTLR4 grammar that I know to be correct because I can see correct parse operation outputs in graphical form with an ANTLR extension to Visual Studio Code. I know the generated parser code is correct because I can compile it correctly without errors, override the base visitor class, and print out various bits of information from the parse tree with my overwritten VisitXXX methods.
At this point, I am running a very simple test case that feeds in a bad input string and looks for a nonzero count on my list of stored parse errors. I am confident of the error-handling code because it works in a similar situation on another grammar. But the error-handling code must catch a parse exception to generate an error message. (Maybe that's not the right way to catch/detect parse errors such as unexpected tokens or other errors in the input stream.)
Here is the code that I used to replace the default lexer and parser error listeners.
// install the custom ErrorListener into the parser object
sendLexer.RemoveErrorListeners();
sendLexer.AddErrorListener(MyErrorListener.Instance);
Parser.RemoveErrorListeners();
Parser.AddErrorListener(MyErrorListener.Instance);
I have attached a screenshot of the graphical output showing the presence of unexpected tokens in the input string.
Q1. Why don't the unexpected tokens cause parse exceptions that I can catch with my exception handler? Are all parse errors supposed to throw exceptions?
Q2. If catching parse exceptions is not the right way, could someone please suggest a strategy for me to follow to detect the unexpected token errors (or other errors that do not throw parse exceptions)?
Q3. Is there a best practice way of catching or finding parse errors, such as generating errors from walking the parse tree, rather than hoping that ANTLR will throw a parse exception for every unexpected token? (I am wondering if unexpected tokens are supposed to generate parse exceptions, as opposed to producing and legitimate parse tree that happens to contain unexpected tokens? If so, do they just show up as unexpected children in the parse tree?)
Thank you.
Screenshot showing unexpected tokens in the (deliberate) bad input string to trigger errors:
UPDATE:
Currently, the parser and unit tests are working. If I feed a bad input string into the parser, the default parser error listener produces a suitable error message. However, when I install a custom error listener, it never gets called. I don't know why it doesn't get called when I see an error message when the custom error listener is not installed.
I have the parser and unit tests working now. When I inject a bad input string, the default parse error listener prints out a message. But when I install a custom error listener, it never gets called. 1) A breakpoint placed in the error listener never gets hit, and 2) (as a consequence) no error message is collected nor printed.
Here is my C# code for the unit test call to ParseText:
// the unit test
public void ModkeyComboThreeTest() {
SendKeysHelper.ParseText("this input causes a parse error);
Assert.AreEqual(0, ParseErrors.Count);
// the helper class that installs the custom error listener
public static class SendKeysHelper {
public static List<string> ParseErrorList = new List<string>();
public static MyErrorListener MyErrorListener;
public static SendKeysParser ParseText(string text) {
ParseErrors.Clear();
try {
var inputStream = new AntlrInputStream(text);
var sendLexer = new SendKeysLexer(inputStream);
var commonTokenStream = new CommonTokenStream(sendLexer);
var sendKeysParser = new SendKeysParser(commonTokenStream);
Parser = sendKeysParser;
MyErrorListener = new MyErrorListener(ParseErrorList);
Parser.RemoveErrorListeners();
Parser.AddErrorListener(MyErrorListener);
// parse the input from the starting rule
var ctx = Parser.toprule();
if (ParseErrorList.Count > 0) {
Dprint($"Parse error count: {ParseErrorList.Count}");
}
...
}
// the custom error listener class
public class MyErrorListener : BaseErrorListener, IAntlrErrorListener<int>{
public List<string> ErrorList { get; private set; }
// pass in the helper class error list to this constructor
public MyErrorListener(List<string> errorList) {
ErrorList = errorList;
}
public void SyntaxError(IRecognizer recognizer, int offendingSymbol,
int line, int offset, string msg, RecognitionException e) {
var errmsg = "Line " + line + ", 0-offset " + offset + ": " + msg;
ErrorList.Add(errmsg);
}
}
So, I'm still trying to answer my original question on how to get error information out of the failed parse. With no syntax errors on installation, 1) the default error message goes away (suggesting my custom error listener was installed), but 2) my custom error listener SyntaxError method does not get called to register an error.
Or, alternatively, I leave the default error listener in place and add my custom error listener as well. In the debugger, I can see both of them registered in the parser data structure. On an error, the default listener gets called, but my custom error listener does not get called (meaning that a breakpoint in the custom listener does not get hit). No syntax errors or operational errors in the unit tests, other than that my custom error listener does not appear to get called.
Maybe the reference to the custom listener is somehow corrupt or not working, even though I can see it in the parser data structure. Or maybe a base class version of my custom listener is being called instead. Very strange.
UPDATE
The helpful discussion/answer for this thread was deleted for some reason. It provided much useful information on writing custom error listeners and error strategies for ANTLR4.
I have opened a second question here ANTLR4 errors not being reported to custom lexer / parser error listeners that suggests an underlying cause for why I can't get error messages out of ANTLR4. But the second question does not address the main question of this post, which is about best practices. I hope the admin who deleted this thread undeletes it to make the best practice information visible again.
The parser ErrorListener SyntaxError method needs the override modifier to bypass the default method.
public class ParserErrorListener : BaseErrorListener
{
public override void SyntaxError(
TextWriter output, IRecognizer recognizer,
IToken offendingSymbol, int line,
int charPositionInLine, string msg,
RecognitionException e)
{
string sourceName = recognizer.InputStream.SourceName;
Console.WriteLine("line:{0} col:{1} src:{2} msg:{3}", line, charPositionInLine, sourceName, msg);
Console.WriteLine("--------------------");
Console.WriteLine(e);
Console.WriteLine("--------------------");
}
}
The lexer ErrorListener is a little different. While the parser BaseErrorListener implements IAntlrErrorListener of type IToken, the lexer requires an implementation of IAntlrErrorListener of type int. The SyntaxError method does not have an override modifier. Parameter offendingSymbol is an int instead of IToken.
public class LexerErrorListener : IAntlrErrorListener<int>
{
public void SyntaxError(
TextWriter output, IRecognizer recognizer,
int offendingSymbol, int line,
int charPositionInLine, string msg,
RecognitionException e)
{
string sourceName = recognizer.InputStream.SourceName;
Console.WriteLine("line:{0} col:{1} src:{2} msg:{3}", line, charPositionInLine, sourceName, msg);
Console.WriteLine("--------------------");
Console.WriteLine(e);
Console.WriteLine("--------------------");
}
}

How can I read input after the wrong type has been entered in D readf?

I am wondering how to continue using stdin in D after the program has read an unsuitable value. (for example, letters when it was expecting an int)
I wrote this to test it:
import std.stdio;
void main()
{
int a;
for(;;){
try{
stdin.readf(" %s", a);
break;
}catch(Exception e){
writeln(e);
writeln("Please enter a number.");
}
}
writeln(a);
}
After entering incorrect values such as 'b', the program would print out the message indefinitly. I also examined the exception which indicated that it was trying to read the same characters again, so I made a version like this:
import std.stdio;
void main()
{
int a;
for(;;){
try{
stdin.readf(" %s", a);
break;
}catch(Exception e){
writeln(e);
writeln("Please enter a number.");
char c;
readf("%c", c);
}
}
writeln(a);
}
Which still threw an exception when trying to read a, but not c. I also tried using stdin.clearerr(), which had no effect. Does anyone know how to solve this? Thanks.
My recommendation: don't use readf. It is so bad. Everyone goes to it at first since it is in the stdlib (and has been since 1979 lol, well scanf has... and imo i think scanf is better than readf! but i digress), and almost everyone has trouble with it. It is really picky about formats and whitespace consumption when it goes right, and when it goes wrong, it gives crappy error messages and leaves the input stream in an indeterminate state. And, on top of that, is still really limited in what data types it can actually read in and is horribly user-unfriendly, not even allowing things like working backspacing on most systems!
Slightly less bad than readf is to use readln then strip and to!int it once you check the line and give errors. Something like this:
import std.stdio;
import std.string; // for strip, cuts off whitespace
import std.algorithm.searching; // for all
import std.ascii; // for isAscii
import std.conv; // for to, does string to other type conversions
int readInt() {
for(;;) {
string line = stdin.readln();
line = line.strip();
if(all!isDigit(line))
return to!int(line);
else
writeln("Please enter a number");
}
assert(0);
}
void main()
{
int a = readInt();
writeln(a);
}
I know that's a lot of import spam (and for a bunch of individual trivial functions too), and readln still sucks for the end user, but this little function is going to be so much nicer on your users and on yourself than trying to use readf. It will consistently consume one line at a time and give a nice message. Moreover, the same pattern can be extended to any other type of validation you need, and the call to readln can be replaced by a call to a more user-friendly function that allows editing and history and stuff later if you decide to go down that route.
If you must use readf anyway though, easiest way to make things sane again in your catch block is still to just call readln and discard its result. So then it just skips the whole line containing the error, allowing your user to start fresh. That'd also drop if they were doing "1 2" and wanted two ints to be read at once... but meh, I'd rather start them fresh anyway than try to pick up an errored line half way through.

ANTLR Parse tree modification

I'm using ANTLR4 to create a parse tree for my grammar, what I want to do is modify certain nodes in the tree. This will include removing certain nodes and inserting new ones. The purpose behind this is optimization for the language I am writing. I have yet to find a solution to this problem. What would be the best way to go about this?
While there is currently no real support or tools for tree rewriting, it is very possible to do. It's not even that painful.
The ParseTreeListener or your MyBaseListener can be used with a ParseTreeWalker to walk your parse tree.
From here, you can remove nodes with ParserRuleContext.removeLastChild(), however when doing this, you have to watch out for ParseTreeWalker.walk:
public void walk(ParseTreeListener listener, ParseTree t) {
if ( t instanceof ErrorNode) {
listener.visitErrorNode((ErrorNode)t);
return;
}
else if ( t instanceof TerminalNode) {
listener.visitTerminal((TerminalNode)t);
return;
}
RuleNode r = (RuleNode)t;
enterRule(listener, r);
int n = r.getChildCount();
for (int i = 0; i<n; i++) {
walk(listener, r.getChild(i));
}
exitRule(listener, r);
}
You must replace removed nodes with something if the walker has visited parents of those nodes, I usually pick empty ParseRuleContext objects (this is because of the cached value of n in the method above). This prevents the ParseTreeWalker from throwing a NPE.
When adding nodes, make sure to set the mutable parent on the ParseRuleContext to the new parent. Also, because of the cached n in the method above, a good strategy is to detect where the changes need to be before you hit where you want your changes to go in the walk, so the ParseTreeWalker will walk over them in the same pass (other wise you might need multiple passes...)
Your pseudo code should look like this:
public void enterRewriteTarget(#NotNull MyParser.RewriteTargetContext ctx){
if(shouldRewrite(ctx)){
ArrayList<ParseTree> nodesReplaced = replaceNodes(ctx);
addChildTo(ctx, createNewParentFor(nodesReplaced));
}
}
I've used this method to write a transpiler that compiled a synchronous internal language into asynchronous javascript. It was pretty painful.
Another approach would be to write a ParseTreeVisitor that converts the tree back to a string. (This can be trivial in some cases, because you are only calling TerminalNode.getText() and concatenate in aggregateResult(..).)
You then add the modifications to this visitor so that the resulting string representation contains the modifications you try to achieve.
Then parse the string and you get a parse tree with the desired modifications.
This is certainly hackish in some ways, since you parse the string twice. On the other hand the solution does not rely on antlr implementation details.
I needed something similar for simple transformations. I ended up using a ParseTreeWalker and a custom ...BaseListener where I overwrote the enter... methods. Inside this method the ParserRuleContext.children is available and can be manipulated.
class MyListener extends ...BaseListener {
#Override
public void enter...(...Context ctx) {
super.enter...(ctx);
ctx.children.add(...);
}
}
new ParseTreeWalker().walk(new MyListener(), parseTree);

How to get try / catch to work in erlang

i'm pretty new to erlang and i'm trying to get a basic try / catch statement to work. I"m using webmachine to process some requests and all i really want to do is parse some JSON data and return it. In the event that the JSON data is invalid, I just want to return an error msg. Here is the code I have so far.
(the JSON data is invalid)
to_text(ReqData, Context) ->
Body = "{\"firstName\": \"John\"\"lastName\": \"Smith\"}",
try decode(Body) of
_ -> {"Success! Json decoded!",ReqData,Context}
catch
_ -> {"Error! Json is invalid",ReqData,Context}
end.
decode(Body) ->
{struct, MJ} = mochijson:decode(Body).
The code compiles, but when i run it, and send a request for the text, i get the following error back.
error,{error,{case_clause,{{const,"lastName"},
": \"Smith\"}",
{decoder,utf8,null,1,31,comma}}},
[{mochijson,decode_object,3},
{mochijson,json_decode,2},
{webmachine_demo_resource,test,1},
{webmachine_demo_resource,to_text,2},
{webmachine_demo_resource,to_html,2},
{webmachine_resource,resource_call,3},
{webmachine_resource,do,3},
{webmachine_decision_core,resource_call,1}]}}
What exactly am i doing wrong? documentation says the "catch" statement handles all errors, or do i have to do something to catch a specific error that is thrown by mochijson:decode.
Please any leads or advice would be helpful. Thanks.
The catch-clause _ -> ... only catches exceptions of the 'throw' class. To catch other kinds of exceptions, you need to write a pattern on the form Class:Term -> ... (i.e., the default Class is throw). In your case:
catch
_:_ -> {"Error! Json is invalid", ReqData, Context}
end
When you do this, you should always ask yourself why you're catching every possible exception. If it's because you're calling third-party code that you don't know how it might behave, it's usually OK. If you're calling your own code, remember that you're basically throwing away all information about the failure, possibly making debugging a lot more difficult. If you can narrow it down to catching only particular expected cases and let any other exceptions fall through (so you see where the real failure occurred), then do so.

Resources