What's wrong with the following Lexer import?

What's wrong with the following Lexer import? - parsing

I keep getting an error that I can't seem to understand in my separated parser + grammar files as follows:
The errors occurs here:
# TestParser.g4
parser grammar TestParser;
options {tokenVocab=TestLexer;}
root: NUM EOF;
~~~~ <-- error
# TestLexer.g4
lexer grammar TestLexer;
NUM : [0-9]+;
What issue am I failing to see?

Related

antlr4 not parsing according to grammar

I'm trying to parse 'for loop' according to this (partial) grammar:
grammar GaleugParserNew;
/*
* PARSER RULES
*/
relational
: '>'
| '<'
;
varChange
: '++'
| '--'
;
values
: ID
| DIGIT
;
for_stat
: FOR '(' ID '=' values ';' values relational values ';' ID varChange ')' '{' '}'
;
/*
* LEXER RULES
*/
FOR : 'for' ;
ID : [a-zA-Z_] [a-zA-Z_0-9]* ;
DIGIT : [0-9]+ ;
SPACE : [ \t\r\n] -> skip ;
When I try to generate the gui of how it's parsed, it's not following the grammar I provided above. This is what it produces:
I've encountered this problem before, what I did then was simply exit cmd, open it again and compile everything and somehow that worked then. It's not working now though.
I'm not really very knowledgeable about antlr4 so I'm not sure where to look to solve this problem.

Must be a problem of the IDE you are using. The grammar is fine and produces this parse tree in Visual Studio Code:
I guess the IDE is using the wrong parser or lexer (maybe from a different work file?). Print the lexer tokens to see if they are what you expect. Hint: avoid defining implicit lexer tokens (like '(', '}' etc.), which will allow to give the tokens good names.

ANTLR4 can't parse Integer if a parser rules has an own numeric literal

I am struggling a bit with trying to define integers in my grammar.
Let's say I have this small grammar:
grammar Hello;
r : 'hello' INTEGER;
INTEGER : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
If I then type in
hello 5
it parses correctly.
However, if I have an additional parser rule (even if it's unused) which defines a token '5',
then I can't parse the previous example anymore.
So this grammar:
grammar Hello;
r : 'hello' INTEGER;
unusedRule: 'hi' '5';
INTEGER : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
with
hello 5
won't parse anymore. It gives me the following error:
Hello::r:1:6: mismatched input '5' expecting INTEGER
How is that possible and how can I work around this?

When you define a parser rule like
unusedRule: 'hi' '5';
Antlr creates implicit lexer tokens for the subterms. Since they are automatically created in the lexer, you have no control over where the sit in the precedence evaluation of Lexer rules.
Consequently, the best policy is to never use literals in parser rules; always explicitly define your tokens.

Antlr Error when adding a Mode for Lexers

I'm trying Lexing Modes for the first time.
I have a lexer grammar with a mode that I'm importing into my "main" grammar.
I get this error when generating the java classes for the Grammar's lexer
'rule DESCRIPTION_FIELD contains a lexer command with an unrecognized constant value; lexer interpreters may produce incorrect output'
I followed this article
My Lexer grammar is the following :
lexer grammar TestLexerGrammar;
DESCRIPTION_FIELD
:
'DESCRIPTION:'-> pushMode(FREETEXTMODE)
;
mode FREETEXTMODE;
FREE_TEXT_FIELD_FORMAT
:
STR+
;
fragment
STR
:
(
LETTER
| DIGIT
)
;
my main grammar:
grammar Grammar;
import TestLexerGrammar;
descriptionElement
:
DESCRIPTION_FIELD freeTextFields
;
freeTextFields
:
FREE_TEXT_FIELD_FORMAT+
;
so in the generated GrammarLexer.java I get an error : " FREETEXTMODE cannot be resolved to a variable "
Is this a wrong approach? and is there a possible way to trigger changing mode through a parsing rule?

You can not use mode in grammars with import statement. There are related issues on github: Problems with lexical modes inside an imported grammar and No error/incorrect code generation when importing lexer grammar with modes into a combined grammar.
So, you should repair your main grammar and remove import statement by the following way:
parser grammar Grammar;
options { tokenVocab=TestLexerGrammar; }

Why does this grammar fail to parse this input?

I'm defining a grammar for a small language and Antlr4. The idea is in that language, there's a keyword "function" which can be used to either define a function or as a type specifier when defining parameters. I would like to be able to do something like this:
function aFunctionHere(int a, function callback) ....
However, it seems Antlr doesn't like that I use "function" in two different places. As far as I can tell, the grammar isn't even ambiguous.
In the following grammar, if I remove LINE 1, the generated parser parses the sample input without a problem. Also, if I change the token string in either LINE 2 or LINE 3, so that they are not equal, the parser works.
The error I get with the grammar as-is:
line 1:0 mismatched input 'function' expecting <INVALID>
What does "expecting <INVALID>" mean?
The (stripped down) grammar:
grammar test;
begin : function ;
function: FUNCTION IDENTIFIER '(' parameterlist? ')' ;
parameterlist: parameter (',' parameter)+ ;
parameter: BaseParamType IDENTIFIER ;
// Lexer stuff
BaseParamType:
INT_TYPE
| FUNCTION_TYPE // <---- LINE 1
;
FUNCTION : 'function'; // <---- LINE 2
INT_TYPE : 'int';
FUNCTION_TYPE : 'function'; // <---- LINE 3
IDENTIFIER : [a-zA-Z_$]+[a-zA-Z_$0-9]*;
WS : [ \t\r\n]+ -> skip ;
The input I'm using:
function abc(int c, int d, int a)
The program to test the generated parser:
from antlr4 import *
from testLexer import testLexer as Lexer
from testParser import testParser as Parser
from antlr4.tree.Trees import Trees
def main(argv):
input = FileStream(argv[1] if len(argv)>1 else "test.in")
lexer = Lexer(input)
tokens = CommonTokenStream(lexer)
parser = Parser(tokens)
tree = parser.begin()
print Trees.toStringTree(tree, None, parser)
if __name__ == '__main__':
import sys
main(sys.argv)

Just use one name for the token function.
A token is just a token. Looking at function in isolation, it is not possible to decide whether it is a FUNCTION or a FUNCTION_TYPE. Since FUNCTION, comes first in the file, that's what the lexer used. That makes it impossible to match FUNCTION_TYPE, so that becomes an invalid token type.
The parser will figure out the syntactic role of the token function. So there would be no point using two different lexical descriptors for the same token, even if it would be possible.
In the grammar in the OP, BaseParamType is also a lexical type, which will absorb all uses of the token function, preventing FUNCTION from being recognized in the production for function. Changing its name to baseParamType, which effectively changes it to a parser non-terminal, will allow the parser to work, although I suppose it may alter the parse tree in undesirable ways.
I understand the objection that the parser "should know" which lexical tokens are possible in context, given the nature of Antlr's predictive parsing strategy. I'm far from an Antlr expert so I won't pretend to explain why it doesn't seem to work, but with the majority of parser generators -- and all the ones I commonly use -- lexical analysis is effectively performed as a prior pass to parsing, so the conversion of textual input into a stream of tokens is done prior to the parser establishing context. (Most lexical generators, including Antlr, have mechanisms with which the user can build lexical context, but IMHO these mechanisms reduce grammar readability and should only be used if strictly necessary.)
Here's the grammar file which I tested:
grammar test;
begin : function ;
function: FUNCTION IDENTIFIER '(' parameterlist? ')' ;
parameterlist: parameter (',' parameter)+ ;
parameter: baseParamType IDENTIFIER ;
// Lexer stuff
baseParamType:
INT_TYPE
| FUNCTION //
;
FUNCTION : 'function';
INT_TYPE : 'int';
IDENTIFIER : [a-zA-Z_$]+[a-zA-Z_$0-9]*;
WS : [ \t\r\n]+ -> skip ;

ANTLR4: Unrecognized constant value in a lexer command

I am learning how to use the "more" lexer command. I typed in the lexer grammar shown in the ANTLR book, page 281:
lexer grammar Lexer_To_Test_More_Command ;
LQUOTE : '"' -> more, mode(STR) ;
WS : [ \t\r\n]+ -> skip ;
mode STR ;
STRING : '"' -> mode(DEFAULT_MODE) ;
TEXT : . -> more ;
Then I created this simple parser to use the lexer:
grammar Parser_To_Test_More_Command ;
import Lexer_To_Test_More_Command ;
test: STRING EOF ;
Then I opened a DOS window and entered this command:
antlr4 Parser_To_Test_More_Command.g4
That generated this warning message:
warning(155): Parser_To_Test_More_Command.g4:3:29: rule LQUOTE
contains a lexer command with an unrecognized constant value; lexer
interpreters may produce incorrect output
Am I doing something wrong in the lexer or parser?

Combined grammars (which are grammars that start with just grammar, instead of parser grammar or lexer grammar) cannot use lexer modes. Instead of using the import feature¹, you should use the tokenVocab feature like this:
Lexer_To_Test_More_Command.g4:
lexer grammar Lexer_To_Test_More_Command;
// lexer rules and modes here
Parser_To_Test_More_Command.g4:
parser grammar Parser_To_Test_More_Command;
options {
tokenVocab = Lexer_To_Test_More_Command;
}
// parser rules here
¹ I actually recommend avoiding the import statement altogether in ANTLR. The method I described above is almost always preferable.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

What's wrong with the following Lexer import? - parsing

Related

antlr4 not parsing according to grammar

ANTLR4 can't parse Integer if a parser rules has an own numeric literal

Antlr Error when adding a Mode for Lexers

Why does this grammar fail to parse this input?

ANTLR4: Unrecognized constant value in a lexer command

Categories

Resources