Handling Antlr Syntax Errors or how to give a better message on unexpected token - parsing

We have the following sub-part of an Antlr grammar:
signed_int
: SIGN? INT
;
INT : '0'..'9'+
;
When someone enters a numeric value everything is fine, but if they
mistakenly type something like 1O (one and capital o) we get a cryptic
error message like:
error 1 : Missing token at offset 14
near [Index: 0 (Start: 0-Stop: 0) ='<missing COLON>' type<24> Line: 26 LinePos:14]
: syntax error...
What is a good way to handle this type of error? I thought of
defining catch-all SYMBOL token type but this lead to too many
parser building errors. I will continue looking into Antlr error handling but I
thought I would post this here to look for some insights.

You should Override the reportError methods in lexer and parser.
You can do it by adding this code to your lexer file:
#Override
public void reportError(RecognitionException e) {
throw new RuntimeException(e);
}
And create a method matches in parser that checks if input string matches the specified grammar:
public static boolean matches(String input) {
try {
regExLexer lexer = new regExLexer(new ANTLRStringStream(input));
regExParser parser = new regExParser(new CommonTokenStream(lexer));
parser.goal();
return true;
} catch (RuntimeException e) {
return false;
}
catch (Exception e) {
return false;
}
catch (OutOfMemoryError e) {
return false;
}
}
#Override
public void reportError(RecognitionException e) {
throw new RuntimeException(e);
}
Then in your file use the Parser.matches(input); to check if the given input matches the gramar. If it matches the method returns true, otherwise returns false, so when it returns false you can give any customized error message to users.

You could try to use an ANTLRErrorStrategy, by overriding some of the messages in DefaultErrorStrategy.

Related

Dart The function 'errorMessage' isn't defined

I am new to dart and I am learning dart from youtube. And courses that I am following are of 2018. The programs that they created in their videos are not working. I am facing the below issue in all my programs. Anyone, please guide me that why the programs show errors while the programs are running properly in their videos. Is it happening due to an update in dart? or any other reason? Please help to fix this issue. Thanks!
The function 'errorMessage' isn't defined.
Try importing the library that defines 'errorMessage', correcting the name to the name of an existing function, or defining a function named 'errorMessage'.
class CustomException implements Exception {
String errorMessage() {
return ("Invalid Amount");
}
}
void AmountException(int amount) {
if (amount <= 0) {
throw new CustomException();
}
}
void main() {
try {
AmountException(0);
} catch (e) {
print(errorMessage());
}
}
You are not calling the errorMessage() message on the exception. Another problem is that your catch is set to handle all types of exceptions. Since Exception does not have the errorMessage() method, you cannot call it.
You should therefore specify the type of exception you want to catch which will allow you to call the errorMessage() method on the catched exception:
class CustomException implements Exception {
String errorMessage() {
return ("Invalid Amount");
}
}
void AmountException(int amount) {
if (amount <= 0) {
throw new CustomException();
}
}
void main() {
try {
AmountException(0);
} on CustomException catch (e) {
print(e.errorMessage());
}
}

C# ANTLR4 DefaultErrorStrategy or custom error listener does not catch unrecognized characters

It's quite strange, but DefaultErrorStrategy does not do anything for catching unrecognized characters from a stream. I tried a custom error strategy, a custom error listener and BailErrorStrategy - no luck here.
My grammar
grammar Polynomial;
parse : canonical EOF
;
canonical : polynomial+ #canonicalPolynom
| polynomial+ EQUAL polynomial+ #equality
;
polynomial : SIGN? '(' (polynomial)* ')' #parens
| monomial #monom
;
monomial : SIGN? coefficient? VAR ('^' INT)? #addend
| SIGN? coefficient #number
;
coefficient : INT | DEC;
INT : ('0'..'9')+;
DEC : INT '.' INT;
VAR : [a-z]+;
SIGN : '+' | '-';
EQUAL : '=';
WHITESPACE : (' '|'\t')+ -> skip;
and I'm giving an input 23*44=12 or #1234
I'm expecting that my parser throws mismatched token or any kind of exception for a character * or # that is not defined in my grammar.
Instead, my parser just skips * or # and traverse a tree like there are do not exist.
My handler function where I'm calling lexer, parser and that's kind of stuff.
private static (IParseTree tree, string parseErrorMessage) TryParseExpression(string expression)
{
ICharStream stream = CharStreams.fromstring(expression);
ITokenSource lexer = new PolynomialLexer(stream);
ITokenStream tokens = new CommonTokenStream(lexer);
PolynomialParser parser = new PolynomialParser(tokens);
//parser.ErrorHandler = new PolynomialErrorStrategy(); -> I tried custom error strategy
//parser.RemoveErrorListeners();
//parser.AddErrorListener(new PolynomialErrorListener()); -> I tried custom error listener
parser.BuildParseTree = true;
try
{
var tree = parser.canonical();
return (tree, string.Empty);
}
catch (RecognitionException re)
{
return (null, re.Message);
}
catch (ParseCanceledException pce)
{
return (null, pce.Message);
}
}
I tried to add a custom error listener.
public class PolynomialErrorListener : BaseErrorListener
{
private const string Eof = "EOF";
public override void SyntaxError(TextWriter output, IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg,
RecognitionException e)
{
if (msg.Contains(Eof))
{
throw new ParseCanceledException($"{GetSyntaxErrorHeader(charPositionInLine)}. Missing an expression after '=' sign");
}
if (e is NoViableAltException || e is InputMismatchException)
{
throw new ParseCanceledException($"{GetSyntaxErrorHeader(charPositionInLine)}. Probably, not closed operator");
}
throw new ParseCanceledException($"{GetSyntaxErrorHeader(charPositionInLine)}. {msg}");
}
private static string GetSyntaxErrorHeader(int errorPosition)
{
return $"Expression is invalid. Input is not valid at {--errorPosition} position";
}
}
After that, I tried to implement a custom error strategy.
public class PolynomialErrorStrategy : DefaultErrorStrategy
{
public override void ReportError(Parser recognizer, RecognitionException e)
{
throw e;
}
public override void Recover(Parser recognizer, RecognitionException e)
{
for (ParserRuleContext context = recognizer.Context; context != null; context = (ParserRuleContext) context.Parent) {
context.exception = e;
}
throw new ParseCanceledException(e);
}
public override IToken RecoverInline(Parser recognizer)
{
InputMismatchException e = new InputMismatchException(recognizer);
for (ParserRuleContext context = recognizer.Context; context != null; context = (ParserRuleContext) context.Parent) {
context.exception = e;
}
throw new ParseCanceledException(e);
}
protected override void ReportInputMismatch(Parser recognizer, InputMismatchException e)
{
string msg = "mismatched input " + GetTokenErrorDisplay(e.OffendingToken);
// msg += " expecting one of " + e.GetExpectedTokens().ToString(recognizer.());
RecognitionException ex = new RecognitionException(msg, recognizer, recognizer.InputStream, recognizer.Context);
throw ex;
}
protected override void ReportMissingToken(Parser recognizer)
{
BeginErrorCondition(recognizer);
IToken token = recognizer.CurrentToken;
IntervalSet expecting = GetExpectedTokens(recognizer);
string msg = "missing " + expecting.ToString() + " at " + GetTokenErrorDisplay(token);
throw new RecognitionException(msg, recognizer, recognizer.InputStream, recognizer.Context);
}
}
Is there any flag that I forgot to specify in a parser or I have incorrect grammar?
Funny thing that I'm using ANTLR plugin in my IDE and when I'm testing my grammar in here this plugin correctly responds with line 1:2 token recognition error at: '*'
Full source code: https://github.com/EvgeniyZ/PolynomialCanonicForm
I'm using ANTLR 4.8-complete.jar
Edit
I tried to add to a grammar rule
parse : canonical EOF
;
Still no luck here
What happens if you do this:
parse
: canonical EOF
;
and also invoke this rule:
var tree = parser.parse();
By adding the EOF token (end of input), you are forcing the parser to consume all tokens, which should result in an error when the parser cannot handle them properly.
Funny thing that I'm using ANTLR plugin in my IDE and when I'm testing my grammar in here this plugin correctly responds with line 1:2 token recognition error at: '*'
That is what the lexer emits on the std.err stream. The lexer just reports this warning and goes its merry way. So the lexer just ignores these chars and therefor never end up in the parser. If you add the following line at the end of your lexer:
// Fallback rule: matches any single character if not matched by another lexer rule
UNKNOWN : . ;
then the * and # chars will be sent to the parser as UNKNOWN tokens and should then cause recognition errors.

Stopping Fitnesse (Slim) on any exception

We've found the "Fail Fast" principle crucial for improving maintainability of our large Fitnesse-based battery of tests. Slim's StopTestException is our saviour.
However, it's very cumbersome and counterproductive to catch and convert any possible exception to those custom StopExceptions. And this approach doesn't work outside of fixtures. Is there a way to tell fitnesse (preferably using Slim test system) to stop test on any error / exception?
Update: corresponding feature request https://github.com/unclebob/fitnesse/issues/935
Most of the exceptions coming from fixtures are possible to conveniently convert to the StopTestException by implementing the FixtureInteraction interface, e.g.:
public class StopOnException extends DefaultInteraction {
#Override
public Object newInstance(Constructor<?> constructor, Object... initargs) throws InvocationTargetException, InstantiationException, IllegalAccessException {
try {
return super.newInstance(constructor, initargs);
} catch (Throwable e) {
throw new StopTestException("Instantiation failed", e);
}
}
#Override
public Object methodInvoke(Method method, Object instance, Object... convertedArgs) throws InvocationTargetException, IllegalAccessException {
try {
return super.methodInvoke(method, instance, convertedArgs);
} catch (Throwable e) {
throw new StopTestException(e.getMessage(), e);
}
}
public static class StopTestException extends RuntimeException {
public StopTestException(String s, Throwable e) {
super(s, e);
}
}
}

How to walk the parse tree to check for syntax errors in ANTLR

I have written a fairly simple language in ANTLR. Before actually interpreting the code written by a user, I wish to parse the code and check for syntax errors. If found I wish to output the cause for the error and exit. How can I check the code for syntax errors and output the corresponding error. Please not that for my purposes the error statements similar to those generated by the ANTLR tool are more than sufficient. For example
line 3:0 missing ';'
There is ErrorListener that you can use to get more information.
For example:
...
FormulaParser parser = new FormulaParser(tokens);
parser.IsCompletion = options.IsForCompletion;
ErrorListener errListener = new ErrorListener();
parser.AddErrorListener(errListener);
IParseTree tree = parser.formula();
Only thing you need to do is to attach ErrorListener to the parser.
Here is the code of ErrorListener.
/// <summary>
/// Error listener recording all errors that Antlr parser raises during parsing.
/// </summary>
internal class ErrorListener : BaseErrorListener
{
private const string Eof = "the end of formula";
public ErrorListener()
{
ErrorMessages = new List<ErrorInfo>();
}
public bool ErrorOccured { get; private set; }
public List<ErrorInfo> ErrorMessages { get; private set; }
public override void SyntaxError(IRecognizer recognizer, IToken offendingSymbol, int line, int charPositionInLine, string msg, RecognitionException e)
{
ErrorOccured = true;
if (e == null || e.GetType() != typeof(NoViableAltException))
{
ErrorMessages.Add(new ErrorInfo()
{
Message = ConvertMessage(msg),
StartIndex = offendingSymbol.StartIndex,
Column = offendingSymbol.Column + 1,
Line = offendingSymbol.Line,
Length = offendingSymbol.Text.Length
});
return;
}
ErrorMessages.Add(new ErrorInfo()
{
Message = string.Format("{0}{1}", ConvertToken(offendingSymbol.Text), " unexpected"),
StartIndex = offendingSymbol.StartIndex,
Column = offendingSymbol.Column + 1,
Line = offendingSymbol.Line,
Length = offendingSymbol.Text.Length
});
}
public override void ReportAmbiguity(Antlr4.Runtime.Parser recognizer, DFA dfa, int startIndex, int stopIndex, bool exact, BitSet ambigAlts, ATNConfigSet configs)
{
ErrorOccured = true;
ErrorMessages.Add(new ErrorInfo()
{
Message = "Ambiguity", Column = startIndex, StartIndex = startIndex
});
base.ReportAmbiguity(recognizer, dfa, startIndex, stopIndex, exact, ambigAlts, configs);
}
private string ConvertToken(string token)
{
return string.Equals(token, "<EOF>", StringComparison.InvariantCultureIgnoreCase)
? Eof
: token;
}
private string ConvertMessage(string message)
{
StringBuilder builder = new StringBuilder(message);
builder.Replace("<EOF>", Eof);
return builder.ToString();
}
}
It is some dummy listener, but you can see what it does. And that you can tell if the error is syntax error, or some ambiguity error. After parsing, you can ask directly the errorListener, if some error occurred.

Modify tokenizer in ANTLR

In ANTLR, how to make output the tokens one by one following like push "enter" in keyboard that I try to a class named hello.java like this
public class Hello{
public static void main(String args[]){
System.out.println("Hello World ...");
}
}
Now, it is time to parse the tokens
final Antlr3JavaLexer lexer = new Antlr3JavaLexer();
try {
lexer.setCharStream(new ANTLRReaderStream(in)); // in is a file
} catch (IOException e) {
e.printStackTrace();
}
final CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
tokens.LT(10); // force load
Antlr3JavaParser parser = new Antlr3JavaParser(tokens);
System.out.println(tokens);
it gives me an output like this,
publicclassHello{publicstaticvoidmain(Stringarggs[]){System.out.println("Hello World ...");}}
How to make an output looked like this
public
class
Hello
{
public
static ... untill the end...
I've try using Stringbuilder, but it's not working.
Thanks 4 the help..
Instead of just printing out tokens, you have to iterate over tokenstream to get back desired result.
Modify your code like this.
final Antlr3JavaLexer lexer = new Antlr3JavaLexer();
try {
lexer.setCharStream(new ANTLRReaderStream(in)); // in is a file
} catch (IOException e) {
e.printStackTrace();
}
final CommonTokenStream tokens = new CommonTokenStream();
tokens.setTokenSource(lexer);
//tokens.LT(10); // force load - not needed
Antlr3JavaParser parser = new Antlr3JavaParser(tokens);
// Iterate over tokenstream
for (Object tk: tokens.getTokens())
{
CommonToken commontk = (CommonToken) tk;
if (commontk.getText() != null && commontk.getText().trim().isEmpty() == false)
{
System.out.println(commontk.getText());
}
}
After this, You will get this result.
public
class
Hello
{
public
static ... etc...
Hope this will solve your issue.

Resources