Anomalous Antlr4 parsing - parsing

I am working on parsing a grammar using Antlr4 and running into a problem that I cannot understand. In a nutshell, the problem is that Antlr4 parser fails to fully parse a test string in my original grammar but when I add a superfluous rule , the parse is completed. I am providing a simplified version of my grammar to illustrate the issue.
grammar my;
st: 'H' hd | EOF ;
hd: 'D' d | 'C' c | st ;
d: hd ;
c: 'D' c | hd ;
s1: 'D' s1 | c ;
// p: hd ;
SKP: [ \t\r\n]+ -> skip
When provided with the input string:
H C D C C D
Antlr4 parser reports the error:
line 2:0 no viable alternative at input 'DCCD'
and the command grun st -gui shows the partial parse tree:
However, if the commented out rule (p: hd) is included in the grammar, Antlr4 parses the string completely and produces the following parse tree:
Note that the nonterminal p is not in the original grammar and cannot be reached from the start symbol st. As such, the added production is superfluous and should not affect the parsing of the grammar.

Related

antlr4.7.2, right recursion and ambiguity

The Antlr 4.7.2 parser made with the following ambiguous, right-recursive context-free grammar fails to parse fed:
grammar ambrd7;
s : c s | b s | 'd' | 'e' 'd'; // fails on 'fed'
c : 'f' 'e' ;
b : 'f' ;
WS: [ \t\r\n]+ -> skip ;
The TestRig tool for testing antlr parsers emits an error message: no viable alternative at input 'fed', whereas mathematically this grammar generates fed from s through several derivations:
s --> c s --> 'f' 'e' s --> 'f' 'e' 'd'
s --> b s --> 'f' s --> 'f' 'e' 'd'
Does anyone know why the parser fails to parse fed?
Thanks,
Eric
That's really odd. In vscode this grammar works as expected:
The only idea I have for the failure you see is that you use an outdated test rig.

xtext not accepting string constant - expecting RULE_ID

I have tried to cut down my problem to the simplest problem I can in xtext - I would like to use the following grammar:
M: lines += T*;
T:
DT
| BDT
| N
;
BDT:
name = ('a' | 'b' | 'c')
;
DT:
'd' name=ID
('(' (ts += BDT (','ts += BDT)*) ')')?
;
N:
'n' name=ID ':' type=[T]
;
I am intending to parse expressions of the form d f(a,b,b) for example which works fine. I would also like to be able to parse n g:f which also works, but not n g:a - where a here is part of the BDT rule. The error given is "Missing RULE_ID at 'a'".
I'd like to allow the grammar to parse n g:a for example, and I'd be very grateful if anyone could point out where I'm going wrong here on this very simple grammar.
Lexing is done context free. A keyword can never be an ID. You can address this trough parser rules.
You can introduce a datatype rule
MyID: ID | "a" | ... | "c";
And use it where you use ID

ANTLR4 can't parse Integer if a parser rules has an own numeric literal

I am struggling a bit with trying to define integers in my grammar.
Let's say I have this small grammar:
grammar Hello;
r : 'hello' INTEGER;
INTEGER : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
If I then type in
hello 5
it parses correctly.
However, if I have an additional parser rule (even if it's unused) which defines a token '5',
then I can't parse the previous example anymore.
So this grammar:
grammar Hello;
r : 'hello' INTEGER;
unusedRule: 'hi' '5';
INTEGER : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
with
hello 5
won't parse anymore. It gives me the following error:
Hello::r:1:6: mismatched input '5' expecting INTEGER
How is that possible and how can I work around this?
When you define a parser rule like
unusedRule: 'hi' '5';
Antlr creates implicit lexer tokens for the subterms. Since they are automatically created in the lexer, you have no control over where the sit in the precedence evaluation of Lexer rules.
Consequently, the best policy is to never use literals in parser rules; always explicitly define your tokens.

Operator precedence with LR(0) parser

A typical BNF defining arithmetic operations:
E :- E + T
| T
T :- T * F
| F
F :- ( E )
| number
Is there any way to re-write this grammar so it could be implemented with an LR(0) parser, while still retaining the precedence and left-associativity of the operators?
I'm thinking it should be possible by introducing some sort of disambiguation non-terminals, but I can't figure out how to do it.
Thanks!
A language can only have an LR(0) grammar if it's prefix-free, meaning that no string in the language is a prefix of another. In this case, the language you're describing isn't prefix-free. For example, the string number + number is a prefix of number + number + number.
A common workaround to address this would be to "endmark" your language by requiring all strings generated to end in a special "done" character. For example, you could require that all strings generated end in a semicolon. If you do that, you can build an LR(0) parser for the language with this grammar:
S → E;
E → E + T | T
T → T * F | F
F → number | (E)

Problems with left-recursion

I have a little grammar containing a few commands which have to be used with Numbers and some of these commands return Numbers as well.
My grammar snippet looks like this:
Command:
name Numbers
| Numbers "test"
;
name:
"abs"
| "acos"
;
Numbers:
NUMBER
| numberReturn
;
numberReturn:
name Numbers
;
terminal NUMBER:
('0'..'9')+("."("0".."9")+)?
;
After having inserted the "Numbers 'test'" part in rule command the compiler complains about non-LL() decicions and tells me I have to work around these (left-factoring, syntactic predicates, backtracking) but my problem is that I have no idea what kind of input wouldn't be non-LL() in this case nor do I have an idea how to left-factor my grammar (I don't want toturn on backtracking).
EDIT:
A few examples of what this grammar should match:
abs 3;
acos abs 4; //interpreted as "acos (abs 4)"
acos 3 test; //(acos 3) test
Best regards
Raven
The grammar you are trying to achieve is left-recursive; that means the parser does not know how to tell between (acos 10) test and acos (10 test) (without the parentheses). However, you can give the parser some hints for it to know the correct order, such as parenthesized expressions.
This would be a valid Xtext grammar, with testparenthesized expressions:
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model
: operations += UnaryOperation*
;
UnaryOperation returns Expression
: 'abs' exp = Primary
| 'acos' exp = Primary
| '(' exp = Primary 'test' ')'
;
Primary returns Expression
: NumberLiteral
| UnaryOperation
;
NumberLiteral
: value = INT
;
The parser will correctly recognize expressions such as:
(acos abs (20 test) test)
acos abs 20
acos 20
(20 test)
These articles may be helpful for you:
https://dslmeinte.wordpress.com/tag/unary-operator/
http://blog.efftinge.de/2010/08/parsing-expressions-with-xtext.html

Resources