How to continue parsing after an error in Antlr 4.4 - parsing

Does any one knows if there is an ErrorStrategy on Antlr4.4 to continue parsing after an error is found, i need to show all the errors found in the program but Antlr stop parsing after the first error is found, im using the DefaultErrorStrategy.
This is my input, I should get error ond line 2 3 and 6 but it only reports the error on line 2.
class Program {
bool
bool test
int prueba ;
int prueba ;
int test;
bool prueba
}
The error is:
line 2:1 mismatched input 'bool' expecting {'boolean', 'int', 'void', '}'}
bool
^^^^

Antlr 4's parser error strategy is to drop tokens from the input until it can detect a sane state and then it continues parsing. Looking at your example, while trying to recover maybe it never reaches a sane state before EOF.

Seems like antlr can't recover if there are 2 contiguous error in the program. If the errors are not contiguous it works fine. Thank you very much for your help.

Related

how to report error for undefined grammar defined using anltr

I am currently trying to improvise/fix bug an existing grammar which someone else has created.
We have our own language for which we have created an editor We are using eclipse ide.
Some grammar examples like
calc : choice INTEGER INTEGER
choice : add|sub|div|mul
INTEGER : ('0'..'9')+
So in my editor, if I type
calc add 2 aaa
So the error parser of antlr recognizes it as an error since it is expecting an integer and we typed string and throws error message such as
extraneous input 'aaa' expecting {'{', INTEGER}"
(I have my class extends BaseErrorListener, where I create markers for these errors )
Similarly, I have such grammar defined for my editor.
Now the question is: for all this, it identifies that something is wrong in the syntax and it throws errors, but what for syntax which is not part of grammar like
If I type any garbage value such as
abc add 2 3
or
just_type_junk_in_editor
it does not throw any error since ‘abc’ or ‘just_type_junk_in_editor‘ is not in my grammar
so is there a way that for keywords which are not part of grammar, the error parser of antlr should parse it as an error.
Without having seen the full grammar I think your problem is the missing EOF token in your main rule. ANTLR4 consumes input as much as it can, but if it doesn't match anything at least in the main rule, it ignores the rest, which explains why you don't see an error. By adding EOF you tell your ANTLR4 that all input must be matched:
calc: choice INTEGER INTEGER EOF;

Avoiding usage of fail in parsers from Parsers library

From what I was told using fail is not recommended and it will later be removed.
What should be properly used instead of fail in the following Parsers/Trifecta example?
parserNaturalNoLeadZero :: Parser Integer
parserNaturalNoLeadZero = do
digits <- some digit
if length digits > 1 && head digits == '0'
then fail "Leading Zeros"
else return $ read digits
As the documentation tells you, a new MonadFail class is being introduced to fulfill that role.
But, for stuff like parsers, the sensible choice is usually empty, which has been around for much longer.
Parsec:
unexpected
fail
empty
Trifecta:
unexpected
fail
empty
The only difference is the error message they produce.
Use unexpected on an unexpected token. unexpected "token" will result in an error message like "unexpected: 'token'".
Annotate parsers with the high-level constructs they represent using (<?>).
This is normally used at the end of a set alternatives where we want to return an error message in terms of a higher level construct rather than returning all possible characters.
parseExpr = ... <?> "expression"
parseId = ... <?> "identifier"
parseTy = ... <?> "type"
empty doesn't produce any error message. It can still be useful to backtrack and let another branch succeed or take care of reporting a meaningful error.
Use fail for other kinds of errors, libraries can't assume much about what goes into it so they'll probably treat its argument as a raw message.

Error message on match fail in Rebol Parse

PEG-based parser generators usually provide limited error reporting on invalid inputs. From what I read, the parse dialect of rebol is inspired by PEG grammars extended with regular expressions.
For example, typing the following in JavaScript:
d8> function () {}
gives the following error, because no identifier was provided in declaring a global function:
(d8):1: SyntaxError: Unexpected token (
function () {}
^
The parser is able to pinpoint exactly the position during parsing where an expected token is missing. The character position of the expected token is used to position the arrow in the error message.
Does the parse dialect in rebol provides built-in facilities to report the line and column errors on invalid inputs?
Otherwise, are there examples out there of custom rolled out parse rules that provide such error reporting?
I've done very advanced Rebol parsers which manage live and mission-critical TCP servers, and doing proper error reporting was a requirement. So this is important!
Probably one of the most unique aspects of Rebol's PARSE is that you can include direct evaluation within the rules. So you can set variables to track the parse position, or the error messages, etc. (It's very easy because the nature of Rebol is that mixing code and data as the same thing is a core idea.)
So here's the way I did it. Before each match rule is attempted, I save the parse position into "here" (by writing here:) and then also save an error into a variable using code execution (by putting (error: {some error string}) in parentheses so that the parse dialect runs it). If the match rule succeeds, we don't need to use the error or position...and we just go on to the next rule. But if it fails we will have the last state we set to report after the failure.
Thus the pattern in the parse dialect is simply:
; use PARSE dialect handling of "set-word!" instances to save parse
; position into variable named "here"
here:
; escape out of the parse dialect using parentheses, and into the DO
; dialect to run arbitrary code. Here we run code that saves an error
; message string into a variable named "error"
(error: "<some error message relating to rule that follows>")
; back into the PARSE dialect again, express whatever your rule is,
; and if it fails then we will have the above to use in error reporting
what: (ever your) [rule | {is}]
That's basically what you need to do. Here is an example for phone numbers:
digit: charset "012345689"
phone-number-rule: [
here:
(error: "invalid area code")
["514" | "800" | "888" | "916" "877"]
here:
(error: "expecting dash")
"-"
here:
(error: "expecting 3 digits")
3 digit
here:
(error: "expecting dash")
"-"
here:
(error: "expecting 4 digits")
4 digit
(error: none)
]
Then you can see it in action. Notice that we set error to none if we reach the end of the parse rules. PARSE will return false if there is still more input to process, so if we notice there is no error set but PARSE returns false anyway... we failed because there was too much extra input:
input: "800-22r2-3333"
if not parse input phone-number-rule [
if none? error [
error: "too much data for phone number"
]
]
either error [
column: length? copy/part input here newline
print rejoin ["error at position:" space column]
print error
print input
print rejoin [head insert/dup "" space column "^^"}
print newline
][
print {all good}
]
The above will print the following:
error at position: 4
expecting 3 digits
800-22r2-3333
^
Obviously, you could do much more potent stuff, since whatever you put in parens will be evaluated just like normal Rebol source code. It's really flexible. I even have parsers which update progress bars while loading huge datasets... :-)
Here is a simple example of finding the position during parsing a string which could be used to do what you ask.
Let us say that our code is only valid if it contains a and b characters, anything else would be illegal input.
code-rule: [
some [
"a" |
"b"
]
[ end | mark: (print [ "Failed at position" index? mark ]) ]
]
Let's check that with some valid code
>> parse "aaaabbabb" code-rule
== true
Now we can try again with some invalid input
>> parse "aaaabbXabb" code-rule
Failed at position 7
== false
This is a rather simplified example language, but it should be easy to extend to more a complex example.

Hex constant = malformed number?

I have a Lua script, where I'm trying to use hex numbers (0x..). If I run this script in the console, with the official Windows binaries, it works fine. But if I run it in my application (simple dofile), I get
malformed number near '0x1F'
It doesn't matter what the hex is, I always get that error, as if it wouldn't support them. The library I'm using is Lua 5.1.4, and I've tried 2 different ones (the first one being one I've compiled myself), so that shouldn't be the problem.
Does anyone have a clue what might be wrong here?
Edit:
It's not the script. No matter what I do, a simple "foo = 0xf" already triggers the error, even if there's nothing else in the file.
Update:
tonumber("0xf")
This returns nil, while
tonumber("15")
work fine. There's definitely something wrong with hex in my libs...
If hex literals aren't working for you (though they should), you can always use hex from lua by doing tonumber("fe",16)
Why do functions have to be different in different compilers, ...why?
Alright, the problem was that Lua tries to convert numbers into double by default. For this it uses the function "strtod", which takes 2 arguments, the string, and a char pointer. The char pointer is supposed to point to the last position after the parsed number. Which for a hex number would mean the 'x', after the '0'. If this isn't the case, Lua assumes an error, and gives us this nice little error message.
I've compiled Lua using DMC, because I need the lib to be in OMF, and I assume others used DMC as well. But apparently DMC's strtod works differenty, since the pointers always point to the start of the string if it's a hex... or rather any invalid number.
I've now added a little hack, which checks for the x, if conversion to double failed. Not pretty, but it works fine for now.
int luaO_str2d (const char *s, lua_Number *result) {
char *endptr;
*result = lua_str2number(s, &endptr);
/* Hack for DMC */
if (endptr == s)
if(*(s+1) == 'x' || *(s+1) == 'X')
endptr++;
else
return 0; /* conversion failed */
I faced this bug with lua5.2. Lua 5.1 works fine.

Problem with yecc grammar

I'm currently writing a small parser in erlang, using yecc, and have encountered some problems. The problems occur when I'm parsing rules with 'lbrack' in it. The following rule
is an illustration of my problem:
program -> 'char' 'ident' 'lbrack' 'int_constant' 'rbrack' 'semi'
It compiles ok, but when I'm trying to parse the following tokens:
[{char,1},
{ident,1,1,t},
{lbrack,1},
{int_constant,1,10},
{rbrack,1},
{semi,1}]
the parser crashes with
{error,
{1,parser,["syntax error before: ","lbrack"]}}}
I tried with the following yecc file, yt.yrl:
Nonterminals
program.
Terminals
char ident lbrack int_constant rbrack semi.
Rootsymbol
program.
program -> 'char' 'ident' 'lbrack' 'int_constant' 'rbrack' 'semi'.
with your input and it worked fine. It didn't return anything, well '$undefined', but that is as it should be as my example doesn't return anything. Note that none of your terminal symbols need to be quoted as they are just normal atoms with "ordinary" names.

Resources