ANTLR best practice for finding and catching parse errors - parsing

This question concerns how to get error messages out of an ANTLR4 parser in C# in Visual Studio. I feed the ANTLR parser a known bad input string, but I am not seeing any errors or parse exceptions thrown during the (bad) parse. Thus, my exception handler does not get a chance to create and store any error messages during the parse.
I am working with an ANTLR4 grammar that I know to be correct because I can see correct parse operation outputs in graphical form with an ANTLR extension to Visual Studio Code. I know the generated parser code is correct because I can compile it correctly without errors, override the base visitor class, and print out various bits of information from the parse tree with my overwritten VisitXXX methods.
At this point, I am running a very simple test case that feeds in a bad input string and looks for a nonzero count on my list of stored parse errors. I am confident of the error-handling code because it works in a similar situation on another grammar. But the error-handling code must catch a parse exception to generate an error message. (Maybe that's not the right way to catch/detect parse errors such as unexpected tokens or other errors in the input stream.)
Here is the code that I used to replace the default lexer and parser error listeners.
// install the custom ErrorListener into the parser object
sendLexer.RemoveErrorListeners();
sendLexer.AddErrorListener(MyErrorListener.Instance);
Parser.RemoveErrorListeners();
Parser.AddErrorListener(MyErrorListener.Instance);
I have attached a screenshot of the graphical output showing the presence of unexpected tokens in the input string.
Q1. Why don't the unexpected tokens cause parse exceptions that I can catch with my exception handler? Are all parse errors supposed to throw exceptions?
Q2. If catching parse exceptions is not the right way, could someone please suggest a strategy for me to follow to detect the unexpected token errors (or other errors that do not throw parse exceptions)?
Q3. Is there a best practice way of catching or finding parse errors, such as generating errors from walking the parse tree, rather than hoping that ANTLR will throw a parse exception for every unexpected token? (I am wondering if unexpected tokens are supposed to generate parse exceptions, as opposed to producing and legitimate parse tree that happens to contain unexpected tokens? If so, do they just show up as unexpected children in the parse tree?)
Thank you.
Screenshot showing unexpected tokens in the (deliberate) bad input string to trigger errors:
UPDATE:
Currently, the parser and unit tests are working. If I feed a bad input string into the parser, the default parser error listener produces a suitable error message. However, when I install a custom error listener, it never gets called. I don't know why it doesn't get called when I see an error message when the custom error listener is not installed.
I have the parser and unit tests working now. When I inject a bad input string, the default parse error listener prints out a message. But when I install a custom error listener, it never gets called. 1) A breakpoint placed in the error listener never gets hit, and 2) (as a consequence) no error message is collected nor printed.
Here is my C# code for the unit test call to ParseText:
// the unit test
public void ModkeyComboThreeTest() {
SendKeysHelper.ParseText("this input causes a parse error);
Assert.AreEqual(0, ParseErrors.Count);
// the helper class that installs the custom error listener
public static class SendKeysHelper {
public static List<string> ParseErrorList = new List<string>();
public static MyErrorListener MyErrorListener;
public static SendKeysParser ParseText(string text) {
ParseErrors.Clear();
try {
var inputStream = new AntlrInputStream(text);
var sendLexer = new SendKeysLexer(inputStream);
var commonTokenStream = new CommonTokenStream(sendLexer);
var sendKeysParser = new SendKeysParser(commonTokenStream);
Parser = sendKeysParser;
MyErrorListener = new MyErrorListener(ParseErrorList);
Parser.RemoveErrorListeners();
Parser.AddErrorListener(MyErrorListener);
// parse the input from the starting rule
var ctx = Parser.toprule();
if (ParseErrorList.Count > 0) {
Dprint($"Parse error count: {ParseErrorList.Count}");
}
...
}
// the custom error listener class
public class MyErrorListener : BaseErrorListener, IAntlrErrorListener<int>{
public List<string> ErrorList { get; private set; }
// pass in the helper class error list to this constructor
public MyErrorListener(List<string> errorList) {
ErrorList = errorList;
}
public void SyntaxError(IRecognizer recognizer, int offendingSymbol,
int line, int offset, string msg, RecognitionException e) {
var errmsg = "Line " + line + ", 0-offset " + offset + ": " + msg;
ErrorList.Add(errmsg);
}
}
So, I'm still trying to answer my original question on how to get error information out of the failed parse. With no syntax errors on installation, 1) the default error message goes away (suggesting my custom error listener was installed), but 2) my custom error listener SyntaxError method does not get called to register an error.
Or, alternatively, I leave the default error listener in place and add my custom error listener as well. In the debugger, I can see both of them registered in the parser data structure. On an error, the default listener gets called, but my custom error listener does not get called (meaning that a breakpoint in the custom listener does not get hit). No syntax errors or operational errors in the unit tests, other than that my custom error listener does not appear to get called.
Maybe the reference to the custom listener is somehow corrupt or not working, even though I can see it in the parser data structure. Or maybe a base class version of my custom listener is being called instead. Very strange.
UPDATE
The helpful discussion/answer for this thread was deleted for some reason. It provided much useful information on writing custom error listeners and error strategies for ANTLR4.
I have opened a second question here ANTLR4 errors not being reported to custom lexer / parser error listeners that suggests an underlying cause for why I can't get error messages out of ANTLR4. But the second question does not address the main question of this post, which is about best practices. I hope the admin who deleted this thread undeletes it to make the best practice information visible again.

The parser ErrorListener SyntaxError method needs the override modifier to bypass the default method.
public class ParserErrorListener : BaseErrorListener
{
public override void SyntaxError(
TextWriter output, IRecognizer recognizer,
IToken offendingSymbol, int line,
int charPositionInLine, string msg,
RecognitionException e)
{
string sourceName = recognizer.InputStream.SourceName;
Console.WriteLine("line:{0} col:{1} src:{2} msg:{3}", line, charPositionInLine, sourceName, msg);
Console.WriteLine("--------------------");
Console.WriteLine(e);
Console.WriteLine("--------------------");
}
}
The lexer ErrorListener is a little different. While the parser BaseErrorListener implements IAntlrErrorListener of type IToken, the lexer requires an implementation of IAntlrErrorListener of type int. The SyntaxError method does not have an override modifier. Parameter offendingSymbol is an int instead of IToken.
public class LexerErrorListener : IAntlrErrorListener<int>
{
public void SyntaxError(
TextWriter output, IRecognizer recognizer,
int offendingSymbol, int line,
int charPositionInLine, string msg,
RecognitionException e)
{
string sourceName = recognizer.InputStream.SourceName;
Console.WriteLine("line:{0} col:{1} src:{2} msg:{3}", line, charPositionInLine, sourceName, msg);
Console.WriteLine("--------------------");
Console.WriteLine(e);
Console.WriteLine("--------------------");
}
}

Related

Should I use Exceptions while parsing complex user input

when looking for Information when and why to use Exceptions there are many people (also on this platform) making the point of not using exceptions when validating user-input because invalid input is not an exceptional thing to happen.
I now have the case where I have to parse a complex string of user input and map it to an Object-Tree basically, similar to a Parser.
Example in pseudo code:
input:
----
hello[5]
+
foo["ok"]
----
results in something like that:
class Hello {
int id = 5
}
class Add {}
class foo {
string name = 'ok'
}
Now in order to "validate" that input I have to parse it, having code that parses the input for validation and code to create the objects separately feels redundant.
Currently I'm using Exceptions while parsing single tokens to collect all Errors.
// one token is basically a single
try {
foreach (token in tokens) {
factory = getFactory(token) // throws ParseException
addObject(factory.create(token)) // throws ParseException
}
} catch (ParseException e) {
// e.g. "Foo Token expects value to be string"
addError(e)
}
is this bad use of exceptions?
An alternative would be to inject a validation class in every factory or mess around with return types (feels a bit dirty)
If exceptions work for your use case, go for it.
The usual problem with exceptions is that they don't let you fix things up and continue, which makes it hard to implement parser error recovery. You can't really fix up a bad input, and you probably shouldn't even in cases where you could, but error recovery lets you report more than one error from the same input, which is often considered convenient.
All of that depends on your needs and parsing strategy, so there's not a lot of information to go on here.

"Guid should contain 32 digits" serilog error with sql server sink

I am getting this error occasionally with the MSSQLServer sink. I can't see what's wrong with this guid. Any ideas? I've verified in every place I can find the data type of the source guid is "Guid" not a string. I'm just a bit mystified.
Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).Couldn't store <"7526f485-ec2d-4ec8-bd73-12a7d1c49a5d"> in UserId Column. Expected type is Guid.
The guid in this example is:
7526f485-ec2d-4ec8-bd73-12a7d1c49a5d
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
seems to match the template to me?
Further details:
This is an occasional issue, but when it arises it arises a lot. It seems to be tied to specific Guids. Most Guids are fine, but a small subset have this issue. Our app logs thousands of messages a day, but these messages are not logged (because of the issue) so it is difficult for me to track down exactly where the specific logs that are causing this error come from. However, we use a centralized logging method that is run something like this. This test passes for me, but it mirrors the setup and code we use for logging generally, which normally succeeds. As I said, this is an intermittent issue:
[Fact]
public void Foobar()
{
// arrange
var columnOptions = new ColumnOptions
{
AdditionalColumns = new Collection<SqlColumn>
{
new SqlColumn {DataType = SqlDbType.UniqueIdentifier, ColumnName = "UserId"},
},
};
columnOptions.Store.Remove(StandardColumn.MessageTemplate);
columnOptions.Store.Remove(StandardColumn.Properties);
columnOptions.Store.Remove(StandardColumn.LogEvent);
columnOptions.Properties.ExcludeAdditionalProperties = true;
var badGuid = new Guid("7526f485-ec2d-4ec8-bd73-12a7d1c49a5d");
var connectionString = "Server=(localdb)\\MSSQLLocalDB;Database=SomeDb;Trusted_Connection=True;MultipleActiveResultSets=true";
var logConfiguration = new LoggerConfiguration()
.MinimumLevel.Information()
.Enrich.FromLogContext()
.WriteTo.MSSqlServer(connectionString, "Logs",
restrictedToMinimumLevel: LogEventLevel.Information, autoCreateSqlTable: false,
columnOptions: columnOptions)
.WriteTo.Console(restrictedToMinimumLevel: LogEventLevel.Information);
Log.Logger = logConfiguration.CreateLogger();
// Suspect the issue is with this line
LogContext.PushProperty("UserId", badGuid);
// Best practice would be to do something like this:
// using (LogContext.PushProperty("UserId", badGuid)
// {
Log.Logger.Information(new FormatException("Foobar"),"This is a test");
// }
Log.CloseAndFlush();
}
One thing I have noticed since constructing this test code is that the "PushProperty" for the UserId property is not captured and disposed. Since behaviour is "undefined" in this case, I am inclined to fix it anyway and see if the problem goes away.
full stack:
2020-04-20T08:38:17.5145399Z Exception while emitting periodic batch from Serilog.Sinks.MSSqlServer.MSSqlServerSink: System.ArgumentException: Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).Couldn't store <"7526f485-ec2d-4ec8-bd73-12a7d1c49a5d"> in UserId Column. Expected type is Guid.
---> System.FormatException: Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).
at System.Guid.GuidResult.SetFailure(Boolean overflow, String failureMessageID)
at System.Guid.TryParseExactD(ReadOnlySpan`1 guidString, GuidResult& result)
at System.Guid.TryParseGuid(ReadOnlySpan`1 guidString, GuidResult& result)
at System.Guid..ctor(String g)
at System.Data.Common.ObjectStorage.Set(Int32 recordNo, Object value)
at System.Data.DataColumn.set_Item(Int32 record, Object value)
--- End of inner exception stack trace ---
at System.Data.DataColumn.set_Item(Int32 record, Object value)
at System.Data.DataRow.set_Item(DataColumn column, Object value)
at Serilog.Sinks.MSSqlServer.MSSqlServerSink.FillDataTable(IEnumerable`1 events)
at Serilog.Sinks.MSSqlServer.MSSqlServerSink.EmitBatchAsync(IEnumerable`1 events)
at Serilog.Sinks.PeriodicBatching.PeriodicBatchingSink.OnTick()
RESOLUTION
This issue was caused because someone created a log message with a placeholder that had the same name as our custom data column, but was passing in a string version of a guid instead of one typed as a guid.
Very simple example:
var badGuid = "7526f485-ec2d-4ec8-bd73-12a7d1c49a5d";
var badGuidConverted = Guid.Parse(badGuid); // just proving the guid is actually valid.
var goodGuid = Guid.NewGuid();
using (LogContext.PushProperty("UserId",goodGuid))
{
Log.Logger.Information("This is a problem with my other user {userid} that will crash serilog. This message will never end up in the database.", badGuid);
}
The quick fix is to edit the message template to change the placeholder from {userid} to something else.
Since our code was centralized around the place where the PushProperty occurs, I put some checks in there to monitor for this and throw a more useful error message in the future when someone does this again.
I don't see anything obvious in the specific code above that would cause the issue. The fact that you call PushProperty before setting up Serilog would be something I would change (i.e. set up Serilog first, then call PushProperty) but that doesn't seem to be the root cause of the issue you're having.
My guess, is that you have some code paths that are logging the UserId as a string, instead of a Guid. Serilog is expecting a Guid value type, so if you give it a string representation of a Guid it won't work and will give you that type of exception.
Maybe somewhere in the codebase you're calling .ToString on the UserId before logging? Or perhaps using string interpolation e.g. Log.Information("User is {UserId}", $"{UserId}");?
For example:
var badGuid = "7526f485-ec2d- 4ec8-bd73-12a7d1c49a5d";
LogContext.PushProperty("UserId", badGuid);
Log.Information(new FormatException("Foobar"), "This is a test");
Or even just logging a message with the UserId property directly:
var badGuid = "7526f485-ec2d-4ec8-bd73-12a7d1c49a5d";
Log.Information("The {UserId} is doing work", badGuid);
Both snippets above would throw the same exception you're having, because they use string values rather than real Guid values.

Reactor StepVerifier test fails with blockFirst()

Here is code to check duplicate names in database
public Mono<Void> validateDuplicateName(String name) throws RuntimeException {
Flux<Customer> customerFlux = customerRepository.findByNameIgnoreCase(name);
customerFlux.take(1).flatMap( customer -> {
return Mono.error( new RuntimeException ("ABC99") );
}).blockFirst();
return Mono.empty();
}
Below is the test script to test the validateDuplicateName method
when(customerRepositoryMocked.findByNameIgnoreCase(Mockito.anyString())).thenReturn(Flux.just(customerMocked));
StepVerifier.create(customerValidator.validateDuplicateName(Mockito.anyString()))
.expectErrorMatches( exception -> exception instanceof RuntimeException )
.verify();
But the test fails with the below error
java.lang.RuntimeException: ABC99
..
..
Suppressed: java.lang.Exception: #block terminated with an error
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:93)
at reactor.core.publisher.Flux.blockFirst(Flux.java:2013)
... 34 more
Can I please get some help ?
Your StepVerifier can actually only test the Mono.empty() that is systematically returned by the method. In the meantime, your use blockFirst, which will throw any exception emitted by the publisher, shortcircuiting the whole assertion.
Remember StepVerifier lets you assert what you expect will be asynchronously emitted by the sequence (including errors, in the form of onError signals). If the sequence can't even be created because the method creating it (validateDuplicateName) throws, then the StepVerifier is helpless.
But the real question is why on earth would you block inside a method that has a Mono return type AND has a perfectly fine Mono source handy? Your return Mono must be derived from that customerFlux.
You could use then() to switch to a Mono<Void>: this ignores the source's elements, but correctly propagates an error.

JNA: invalid memory access with callback function parameter (struct)

To lone travelers stumbling upon this: see comments for the answer.
...
Writing a Java wrapper for a native library. A device generates data samples and stores them as structs. Two native ways of accessing them: either you request one with a getSample(&sampleStruct) or you set a callback function. Now, here is what does work:
The polling method does fill the JNA Structure
The callback function is called after being set
In fact, I am currently getting the sample right from the callback function
The problem: trying to do anything with the callback argument, which should be a struct, causes an "invalid memory access". Declaring the argument as the Structure does this, so I declared it as a Pointer. Trying a Pointer.getInt(0) causes invalid memory access. So then I declared the argument as an int, and an int is delivered; in fact, it looks very much like the first field of the struct I am trying to get! So does it mean that the struct was at that address but disappeared before Java had time to access it?
This is what I am doing now:
public class SampleCallback implements Callback{
SampleStruct sample;
public int callback(Pointer refToSample) throws IOException{
lib.INSTANCE.GetSample(sample); // works no problem
adapter.handleSample(sample);
return 1;
} ...
But neither of these does:
public int callback(SampleStruct sample) throws IOException{
adapter.handleSample(sample);
return 1;
}
...
public int callback(Pointer refToSample) throws IOException{
SampleStruct sample = new SampleStruct();
sample.timestamp = refToSample.getInt(0);
...
adapter.handleSample(sample);
return 1;
}
Also, this does in fact deliver the timestamp,
public int callback(int timestamp) throws IOException{
System.out.println("It is " + timestamp + "o'clock");
return 1;
}
but I would really prefer the whole struct.
This is clearly not going to be a popular topic and I do have a working solution, so the description is not exactly full. Will copy anything else that might be helpful if requested. Gratitude prematurely extended.

Why isn't my own Exception derived type recognised by Elmah?

I've just implemented Elmah for logging in my MVC3 app, and of course all is well, except that when I use signalling to log a custom exception, Elmah seems to 'see' the InnerException property of my custom exception, but not the custom exception itself.
When I use the code below to signal the exception, instead of seeing, "CtsDataException: Error" in my error log, as I would expect, I see, "DbEntityValidation: Validation failed for one or more entities.", the inner exception and its message. If I open the log item, I see that my custom exception has correctly been logged, so it looks like the 'exception descriptor' is wrong, not the actual log entry.
What am I doing wrong?
PS, my custom exception is as such:
public class CtsDataException: Exception
{
public CtsDataException(string message, Exception innerException): base(message, innerException)
{
ValidationResults = new List<CtsDbValidationResult>();
var vex = innerException as DbEntityValidationException;
if (vex != null)
{
ValidationResults = vex.EntityValidationErrors.Select(e => new CtsDbValidationResult(e)).ToList();
}
}
public IEnumerable<CtsDbValidationResult> ValidationResults { get; set; }
}
The signalling code looks like this:
protected void HandleDbEntityValidationException(DbEntityValidationException vex, string message)
{
var ctsEx = new CtsDataException(message, vex);
ErrorSignal.FromCurrentContext().Raise(ctsEx);
}
HandleDbEntityValidationException is on my base controller. It is invoked in derived controllers like this:
catch (DbEntityValidationException vx)
{
var msg = string.Format("Error updating employee '{0}'", entity.RefNum);
HandleDbEntityValidationException(vx, msg);
}
I've done some of my own testing and come to the conclusion that it's not ELMAH choosing to report only the InnerException. If you take a look at the details for the error and then click Original ASP.NET error page, you'll see that the original yellow screen that occurred will list the Exception Details as the InnerException and not the primary custom exception thrown. The stack trace further shows the original custom exception that was thrown instead. This is the information ELMAH is using in logging the error.
My testing consisted of creating a CustomException class that did nothing more than inherit from Exception. I then simply called:
throw new CustomException("error!", new NullReferenceException());
... and what got reported was the NullReferenceException with CustomException only appearing in the Stack Trace.
My theory of choice in this scenario is that ELMAH is choosing to display the exception that was thrown rather than the exceptions used to wrap it. If ELMAH had a way to include a message with the log, I think this "wrap in a more descriptive but ultimately irrelevant exception that is never thrown" malarkey could end.
I realize that I'm a few years late to actually answer the original question on time, but for others trying to achieve what I think was the OP's intention, I came across a bit of smartness in the LibLog package for Hangfire (slightly adapted):
var _errorType = Type.GetType("Elmah.Error, Elmah");
dynamic error = Activator.CreateInstance(_errorType, originalException);
error.Message = "Your custom message";
error.Type = "Error"; // Or type of original exception or ...
error.Time = DateTime.Now;
Elmah.ErrorLog.GetDefault(null).Log(error);

Resources