implicit declaration in parsing rule warning antlrworks 2 - antlrworks

I always get implicit declaration in parsing rule warning, when coding all examples from antlr v4 inside antlrworks 2. for my simple rule like :
type
: 'Integer'
| 'Character'
| 'Real'
| 'String'
| 'Short'
| 'Long'
| 'Double'
| 'Signed'
| 'Unsigned'
| 'Boolean'
| structTag
| enumTag
| declarator
;
Can anybody give me solution that warning, at last solution for example above.
thank

The warning is to inform you that you will have no way in code to know if your type is an identifier, character, real, etc., because you have not assigned named token types to the corresponding tokens. You can resolve this warning by creating named lexer rules for each of your tokens:
INTEGER : 'Integer';
CHARACTER : 'Character';
You do not have to change the type rule after adding these new definitions, but after adding the definitions you will be able to check if a token type is INTEGER or CHARACTER as part of your parser result handling code.

Related

Obscure Antlr Error when Parsing Data Type

I am trying to parse a variable type for a toy language meant to teach Antlr fundamentals. I wish to parse at the rule var using the below code.
// Parser
var : TYPE ID;
// Lexer
TYPE: SIGNED PTR? DIMENSIONS?
| UNSIGNED PTR? DIMENSIONS?
| UNSIGNABLE PTR? DIMENSIONS?;
fragment DIMENSIONS : '[' ((NAT | ':') ',')* (NAT | ':')? ']';
fragment SIGNED : 'I16' | 'I32' | 'I64' | 'F32' | 'CHAR';
fragment UNSIGNED : 'U_I16' | 'U_I32' | 'U_I64' | 'U_F32' | 'U_CHAR';
fragment UNSIGNABLE : 'VOID' | 'STR' | 'BOOL' | 'CPLX';
PTR : 'PTR';
NAT : [0-9]+;
ID : [A-Z][A-Z0-9_]*;
However, when I test my program with the example declaration I32 HELLO_9, I receive the following error.
line 1:0 missing TYPE at 'I32'
PTR and DIMENSIONS should be marked as optional, so I am unsure as to why my lexer will not identify the I32 token for the SIGNED fragment. As a secondary question, I wonder how it is ever possible for professional programmers to create sophisticated projects with Antlr. I have experimented with Haskell parsing libraries in the past and it appears (from my subjective view) that Antlr is more prone to producing obscure errors. My perception is probably just a consequence of my inexperience, and I would be thankful to hear the opinions of a more suave programmer.
Given your grammar, I can't reproduce this. If I add SPACE : [ \t\r\n] -> skip; to it, the following code:
TLexer lexer = new TLexer(CharStreams.fromString("I32 HELLO_9"));
TParser parser = new TParser(new CommonTokenStream(lexer));
ParseTree root = parser.var();
System.out.println(root.toStringTree(parser));
produces no warnings/errors and prints:
(var I32 HELLO_9)
representing the parse tree:
The real problem is something #rici mentioned, or it is hidden by the fact that you've minimized your real grammar and this minimized form does not produce the error your real grammar does.

Small Shift/Reduce conflict in CUP

I'm having a minor problem in trying to figure out how to resolve a conflict in my CUP parser project. I understand why the error is occurring, VariableDeclStar's first terminal can be ID, as well as Type, which brings up the conflict, however I cannot figure out how to resolve the conflict in a way that would preserve Type and Variable as separate states. Any help or tips would be appreciated.
VariableDecl ::= Variable SEMICOLON {::};
Variable ::= Type ID {::};
Type ::= INT {::}
| DOUBLE {::}
| BOOLEAN {::}
| STRING {::}
| Type LEFTBRACKET RIGHTBRACKET {::}
| ID {::};
VariableDeclStar::= VariableDecl VariableDeclStar {::}
| {::};
https://i.gyazo.com/0ac3fbf4ebc2d3968f1c2a78c292bc0d.png

Can I do something to avoid the need to backtrack in this grammar?

I am trying to implement an interpreter for a programming language, and ended up stumbling upon a case where I would need to backtrack, but my parser generator (ply, a lex&yacc clone written in Python) does not allow that
Here's the rules involved:
'var_access_start : super'
'var_access_start : NAME'
'var_access_name : DOT NAME'
'var_access_idx : OPSQR expression CLSQR'
'''callargs : callargs COMMA expression
| expression
| '''
'var_access_metcall : DOT NAME LPAREN callargs RPAREN'
'''var_access_token : var_access_name
| var_access_idx
| var_access_metcall'''
'''var_access_tokens : var_access_tokens var_access_token
| var_access_token'''
'''fornew_var_access_tokens : var_access_tokens var_access_name
| var_access_tokens var_access_idx
| var_access_name
| var_access_idx'''
'type_varref : var_access_start fornew_var_access_tokens'
'hard_varref : var_access_start var_access_tokens'
'easy_varref : var_access_start'
'varref : easy_varref'
'varref : hard_varref'
'typereference : NAME'
'typereference : type_varref'
'''expression : new typereference LPAREN callargs RPAREN'''
'var_decl_empty : NAME'
'var_decl_value : NAME EQUALS expression'
'''var_decl : var_decl_empty
| var_decl_value'''
'''var_decls : var_decls COMMA var_decl
| var_decl'''
'statement : var var_decls SEMIC'
The error occurs with statements of the form
var x = new SomeGuy.SomeOtherGuy();
where SomeGuy.SomeOtherGuy would be a valid variable that stores a type (types are first class objects) - and that type has a constructor with no arguments
What happens when parsing that expression is that the parser constructs a
var_access_start = SomeGuy
var_access_metcall = . SomeOtherGuy ( )
and then finds a semicolon and ends in an error state - I would clearly like the parser to backtrack, and try constructing an expression = new typereference(SomeGuy .SomeOtherGuy) LPAREN empty_list RPAREN and then things would work because the ; would match the var statement syntax all right
However, given that PLY does not support backtracking and I definitely do not have enough experience in parser generators to actually implement it myself - is there any change I can make to my grammar to work around the issue?
I have considered using -> instead of . as the "method call" operator, but I would rather not change the language just to appease the parser.
Also, I have methods as a form of "variable reference" so you can do
myObject.someMethod().aChildOfTheResult[0].doSomeOtherThing(1,2,3).helloWorld()
but if the grammar can be reworked to achieve the same effect, that would also work for me
Thanks!
I assume that your language includes expressions other than the ones you've included in the excerpt. I'm also going to assume that new, super and var are actually terminals.
The following is only a rough outline. For readability, I'm using bison syntax with quoted literals, but I don't think you'll have any trouble converting.
You say that "types are first-class values" but your syntax explicitly precludes using a method call to return a type. In fact, it also seems to preclude a method call returning a function, but that seems odd since it would imply that methods are not first-class values, even though types are. So I've simplified the grammar by allowing expressions like:
new foo.returns_method_which_returns_type()()()
It's easy enough to add the restrictions back in, but it makes the exposition harder to follow.
The basic idea is that to avoid forcing the parser to make a premature decision; once new is encountered, it is only possible to distinguish between a method call and a constructor call from the lookahead token. So we need to make sure that the same reductions are used up to that point, which means that when the open parenthesis is encountered, we must still retain both possibilities.
primary: NAME
| "super"
;
postfixed: primary
| postfixed '.' NAME
| postfixed '[' expression ']'
| postfixed '(' call_args ')' /* PRODUCTION 1 */
;
expression: postfixed
| "new" postfixed '(' call_args ')' /* PRODUCTION 2 */
/* | other stuff not relevant here */
;
/* Your callargs allows (,,,3). This one doesn't */
call_args : /* EMPTY */
| expression_list
;
expression_list: expression
| expression_list ',' expression
;
/* Another slightly simplified production */
var_decl: NAME
| NAME '=' expression
;
var_decl_list: var_decl
| var_decl_list ',' var_decl
;
statement: "var" var_decl_list ';'
/* | other stuff not relevant here */
;
Now, take a look at PRODUCTION 1 and PRODUCTION 2, which are very similar. (Marked with comments.) These are basically the ambiguity for which you sought backtracking. However, in this grammar, there is no issue, since once a new has been encountered, the reduction of PRODUCTION 2 can only be performed when the lookahead token is , or ;, while PRODUCTION 1 can only be performed with lookahead tokens ., ( and [.
(Grammar tested with bison, just to make sure there are no conflicts.)

ANTLR doesn't find the defined start rule

I'm facing a strange ANTLR issue with a that should just output an AST.
grammar ltxt.g;
options
{
language=CSharp3;
}
prog : start
;
start : '{Start 'loopname'}'statement'{Ende 'loopname'}'
| statement
;
loopname : (('a'..'z')|('A'..'Z')|('1'..'9'))*;
statement : '<%' table_ref '>'
| start;
table_ref : '{'format'}'ID;
format : FSTRING
| FSTRING OFSTRING{0,5}
;
FSTRING : '#F'
| '#D'
| '#U'
| '#K'
;
OFSTRING: 'F'
| 'D'
| 'U'
| 'K'
//| 1..65536
;
ID : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
WS : ( ' '
| '\t'
| '\r'
| '\n'
) {$channel=HIDDEN;}
;
When I try to code-gen this I get
error(100):LTXT.g:1:13:syntax error: antlr: MismatchedTokenException(74!=52). I didn't declare any 74 or 52.
also I do not get a Synatx diagram, since "rule "start"" cannot be found as a start state...
I know that this isn't pretty, but I thought it would work at least :)
Best,
wishi
There are four errors that I see.
A grammar name can't contain a period. That's the syntax error you're getting. The 74!=52 error message is a hint telling you that ANTLR found token id 74 when it was expecting token id 52, which in this case just translates to "it found one thing when it expected something else."
The grammar name ("ltxt") and the file name before the extension ("LTXT") need to match exactly.
The grammar won't produce an AST unless you specify output=AST; in the options section.
format's second alternative (FSTRING OFSTRING{0,5}) won't do what I think you think it's going to do. ANTLR doesn't support an arbitrary number of matches such as "match zero to five OFSTRINGs". You'll need to redefine the rule using semantic predicates that count occurrences for you. They aren't hard to use, but they're one of the trickier parts of ANTLR.
I hope that helps get you started.

Antlr non-LL(*) decision

I'm having some trouble creating part of my ANTLR grammar for my programming language.
I'm getting the error when the second part of the type declaration occurs:
public type
: ID ('.' ID)* ('?')? -> ^(R__Type ID ID* ('?')?)
| '(' type (',' type)* ')' ('?')? -> ^(R__Type type* ('?')?)
;
I'm trying to either match:
A line like System.String (works fine)
A tuple such as (System.String, System.Int32)
The error occurs slightly higher up the tree, and states:
[fatal] rule statement has non-LL(*) decision due to recursive rule invocations reachable from alts 1,2. Resolve by left-factoring or using syntactic predicates or using backtrack=true option.
What am I doing wrong?
Right, I managed to fix this a little earlier up the tree, by editing the rule that deals with variable declarations:
'my' ID (':' type '=' constant | ':' type | '=' constant) -> ^(R__VarDecl ID type? constant?)
So that it works like:
'my' ID
(
':' type ('=' constant)?
| '=' constant
) -> ^(R__VarDecl ID type? constant?)
I got the idea from the example of syntactic predicates here:
https://wincent.com/wiki/ANTLR_predicates
Luckily I didn't need a predicate in the end!

Resources