Antlr Grammar Token not being recognized - parsing

Hello I need help with antlr4 grammar. I have been trying to create a parser for Datalog grammar. This is just a small snippet of the whole code. Whatever I try to parse its being recognized as Uppercase or Lowercase. The predicate token is not being recognized
For example the following Code should parse
abc as abc-> predicate
But its being parsed as
a-> Lrr
b-> Lrr
c-> Lrr
Its being parsed similarly for the rest of my code. How do I fix it?
grammar D;
predicate : Lrr | predicate varChars ;
varChars : Lrr | Urr;
Lrr : LOWERCASE;
Urr: UPPERCASE;
fragment LOWERCASE : [a-z] ;
fragment UPPERCASE : [A-Z] ;
Where am I going wrong. Please help

Related

ANTLR4 can't parse Integer if a parser rules has an own numeric literal

I am struggling a bit with trying to define integers in my grammar.
Let's say I have this small grammar:
grammar Hello;
r : 'hello' INTEGER;
INTEGER : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
If I then type in
hello 5
it parses correctly.
However, if I have an additional parser rule (even if it's unused) which defines a token '5',
then I can't parse the previous example anymore.
So this grammar:
grammar Hello;
r : 'hello' INTEGER;
unusedRule: 'hi' '5';
INTEGER : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
with
hello 5
won't parse anymore. It gives me the following error:
Hello::r:1:6: mismatched input '5' expecting INTEGER
How is that possible and how can I work around this?
When you define a parser rule like
unusedRule: 'hi' '5';
Antlr creates implicit lexer tokens for the subterms. Since they are automatically created in the lexer, you have no control over where the sit in the precedence evaluation of Lexer rules.
Consequently, the best policy is to never use literals in parser rules; always explicitly define your tokens.

ANTLR4: Two channels, one for CSV-formatted data, one for key/value-formatted data -- does not work

The lexer grammar below contains two sets of rules: (1) rules for tokenizing CSV-formatted input, and (2) rules for tokenizing key/value-formatted input. For (1) I put the tokens on channel(0). For (2) I put the tokens on channel(1). Do you see any problems with my lexer grammar?
Also below is a parser grammar and it also contains two sets of rules: (1) rules for structuring CSV tokens into a parse tree, and (2) rules for for structuring key/value tokens into a parse tree. Do you see any problems with my parser grammar?
When I apply ANTLR to the grammar files, compile, and then run the test rig (with the -gui flag) using this CSV input:
FirstName, LastName, Street, City, State, ZipCode
Mark,, 4460 Stuart Street, Marion Center, PA, 15759
the parse tree is completely wrong - the tree contains no data. I have no idea why the parse tree is wrong. Any suggestions? I have tested each part separately (removed the key/value rules from the lexer and parser grammars and ran it with CSV input, removed the CSV rules from the lexer and parser grammars and ran it with key/value input) and it works fine.
Lexer Grammar
lexer grammar MyLexer;
COMMA : ',' -> channel(0) ;
NL : ('\r')?'\n' -> channel(0) ;
WS : [ \t\r\n]+ -> skip, channel(0) ;
STRING : (~[,\r\n])+ -> channel(0) ;
KEY : ('FirstName' | 'LastName') -> channel(1) ;
EQ : '=' -> channel(1) ;
NL2 : ('\r')?'\n' -> channel(1) ;
WS2 : [ \t\r\n]+ -> skip, channel(1) ;
VALUE : (~[=\r\n])+ -> channel(1) ;
Parser Grammar
parser grammar MyParser;
options { tokenVocab=MyLexer; }
csv : (header rows)+ EOF ;
header : field (COMMA field)* NL ;
rows : (row)* ;
row : field (COMMA field)* NL ;
field : STRING | ;
keyValue : pairs EOF ;
pairs : (pair)+ ;
pair : key EQ value NL2;
key : KEY ;
value : VALUE ;
The longest token match wins and if two matches are equal-sized the first one matches. That means:
STRING subsumes KEY, EQ and VALUE, you will never get Tokens of the latter types.
The ANTLR parser needs random Access on the token stream, thus not allowing context sensitive lexing.
I suggest to put both lexer grammars into separate grammars. Maybe it gets tricky to use them with a common parser grammar. If so - split the parser grammar as well.

ANTLR4: Unrecognized constant value in a lexer command

I am learning how to use the "more" lexer command. I typed in the lexer grammar shown in the ANTLR book, page 281:
lexer grammar Lexer_To_Test_More_Command ;
LQUOTE : '"' -> more, mode(STR) ;
WS : [ \t\r\n]+ -> skip ;
mode STR ;
STRING : '"' -> mode(DEFAULT_MODE) ;
TEXT : . -> more ;
Then I created this simple parser to use the lexer:
grammar Parser_To_Test_More_Command ;
import Lexer_To_Test_More_Command ;
test: STRING EOF ;
Then I opened a DOS window and entered this command:
antlr4 Parser_To_Test_More_Command.g4
That generated this warning message:
warning(155): Parser_To_Test_More_Command.g4:3:29: rule LQUOTE
contains a lexer command with an unrecognized constant value; lexer
interpreters may produce incorrect output
Am I doing something wrong in the lexer or parser?
Combined grammars (which are grammars that start with just grammar, instead of parser grammar or lexer grammar) cannot use lexer modes. Instead of using the import feature¹, you should use the tokenVocab feature like this:
Lexer_To_Test_More_Command.g4:
lexer grammar Lexer_To_Test_More_Command;
// lexer rules and modes here
Parser_To_Test_More_Command.g4:
parser grammar Parser_To_Test_More_Command;
options {
tokenVocab = Lexer_To_Test_More_Command;
}
// parser rules here
¹ I actually recommend avoiding the import statement altogether in ANTLR. The method I described above is almost always preferable.

Antlr grammar, implicit token definition in parser rule

A weird thing is going on. I defined the grammar and this is an excerpt.
name
: Letter
| Digit name
| Letter name
;
numeral
: Digit
| Digit numeral
;
fragment
Digit
: [0-9]
;
fragment
Letter
: [a-zA-Z]
;
So why does it show warnings for just two lines (Letter and Digit name) where i referenced a fragment and others below are completely fine...
Lexer rules you mark as fragments can only be used by other lexer rules, not by parser rules. Fragment rules never become a token of their own.
Be sure you understand the difference: What does "fragment" mean in ANTLR?
EDIT
Also, I now see that you're doing too much in the parser. The rules name and numeral should really be a lexer rule:
Name
: ( Digit | Letter)* Letter
;
Numeral
: Digit+
;
in which case you don't need to account for a Space rule in any of your parser rules (this is about your last question which was just removed).
Just in case you are using an older version of antlr:
[0-9]
and
[a-zA-Z]
are not valid regular expressions in old Antlr.
replace them with
'0'..'9'
and
('a'..'z' | 'A'..'Z')
and your issues should go away.

Incorrect Parsing of simple arithmetic grammar in ANTLR

I recently started studying ANTLR. Below is the grammar for the arithmetic expression.
The problem is that when I am putting (calling) expression rule in the term rule then it is parsing incorrectly even for (9+8). It is somehow ignoring the right parenthesis.
While when I put add rule instead of calling expression rule from the rule term, it is working fine.
As in:
term:
INTEGER
| '(' add ')'
;
Can anyone tell me why it is happening because more or les they both are the same.
Grammer for which it is giving incorrect results
term
:
INTEGER
| '(' expression ')'
;
mult
:
term ('*' term)*
;
add
:
mult ('+' mult)*
;
expression
:
add
;
When I parse "(8+9)" with a parser generated from your grammar, starting with the expression rule, I get the following parse tree:
In other words: it works just fine.
Perhaps you're using ANTLRWorks' (or ANTLR IDE's) interpreter to test your grammar? In thta case: don't use the interpreter, it's buggy. Use ANTLRWorks' debugger instead (the image is exported from ANTLRWorks' debugger).

Resources