Antlr PLSQL grammar not parsing package name correctly?

Antlr PLSQL grammar not parsing package name correctly? - parsing

Here is the Antlr grammar file for PLSQL.
However, it is not completely correct. For example, I have a package that starts with:
CREATE OR REPLACE PACKAGE BODY SCHEMA_NAME.PACKAGE_NAME AS
The parser generated from the preceding grammar file does not parse SCHEMA_NAME.PACKAGE_NAME as the package_name (package_name is an Antlr parser rule).
To fix this, I have changed this
package_name
: id
;
To this
package_name
: id ('.' id)?
;
I have also tried
package_name
: id ('.' id_expression)?
;
but none of them worked. Antlr still won't include the last .PACKAGE_NAME part in the package name.
Why is this not working? How can I fix this?

Such package definition:
package_name
: id ('.' id_expression)?
;
is working well with the next PL/SQL query:
CREATE OR REPLACE PACKAGE BODY SCHEMA_NAME.PACKAGE_NAME AS END;
I suppose you forgot END and semicolon at the end.
Also, if you find a bug, please add an issue or pull request to an official ANTLR grammars repository, because of this PL/SQL grammar is not completed as it has been already mentioned in comment.

Related

Parsing Dart | ANTLR | Handle a comma at the end of parameter list

My apologies for the bad title, but couldn't express it in better words.
I'm writing a parser using ANTLR to calculate complexities in dart code.
Things seem to work fine until I tried to parse a file with the following Method Signature
Stream<SomeState> mapEventToState(SomeEvent event,) async* {
//someCode to map the State to Event
}
Here the mapEventToState(SomeEvent event,) creates an issue because of the COMMA , at the end.
It presents 2 params to me because of the trailing COMMA (whereas in reality it's just one) and includes some part of the code in the params list thus making the rest of the code unreadable for ANTLR.
This is normal in flutter to end a list of parameters with a COMMA.
The grammar corresponding to it is:
initializedVariableDeclaration
: declaredIdentifier ('=' expression)? (','initializedIdentifier)*
;
initializedIdentifier
: identifier ('=' expression)?
;
initializedIdentifierList
: initializedIdentifier (',' initializedIdentifier)*
;
The full grammar can be checked at https://github.com/antlr/grammars-v4/blob/master/dart2/Dart2.g4
What should I change on the grammar so that I don't face this issue and the parser can understand that functionName(Param param1, Param param2,) is same as functionName(Param param1, Param param2)

The Dart project maintains a reference ANTLR grammar for the Dart language (mostly as a tool for ourselves, to ensure new language features can be parsed).
It might be useful as a reference.
The "dart2" grammar you are linking to in the ANTLR repository is probably severely outdated. It was not created by a Dart team member, and if it doesn't handle trailing commas in argument lists, it was probably never complete for Dart 2.0. Use with caution.

I do not believe that the rule you mentioned (initializedVariableDeclaration) is the grammar corresponding to the problem. That's for an ordinary variable declaration (with an initializer).
I believe you actually want to change formalParameterList. The Dart grammar is provided by the language specification, and we can compare the grammar listed there to the grammar from the ANTLR repository.
The ANTLR file has:
formalParameterList
: '(' ')'
| '(' normalFormalParameters ')'
...
whereas the Dart 2.10 specification has, from section 9.2 (Formal Parameters):
<formalParameterList> ::= ‘(’ ‘)’
| ‘(’ <normalFormalParameters> ‘,’? ‘)’
...
You should file an issue against ANTLR or create a pull request to fix it.
That file also does not appear to have been substantially updated since May 2019 and seems to be missing some notable changes to the Dart language since that time (e.g. spread collections (spreadElement), collection-if (ifElement), and collection-for (forElement) from Dart 2.3, and the changes for null safety).

How to prefix numbers with optional letter?

I am building on an initial Xtext project build using gradle.
ext.xtextVersion = '2.20.0'
I have following xtext grammar:
grammar com.exampe.Rule with org.eclipse.xtext.common.Terminals hidden(WS, ML_COMMENT, SL_COMMENT)
import "http://www.eclipse.org/emf/2002/Ecore" as ecore
generate rule "http://www.example.com/Rule"
Rule:
{Number} (other?='o')? number=INT
;
This does NOT parse o19.
Then, the Rule is changed to following:
Rule:
{Number} (other?='*')? number=INT
;
This DOES parse *19.
I did not find any special treatment in letters versus symbols.
What is going wrong here? How can I make o19 getting parsed.

o19 is parsed by the rule ID which you imported by inheriting from org.eclipse.xtext.common.Terminals. In Xtext, the Lexer runs independent from the parser (context insensitive) and tokenizes the text into keywords and terminal rule calls.
You have to add a terminal rule for such cases.
terminal PREFIXED_INT:
'o' INT;
But I don't know whether it's a good idea in terms of readability if you keep the ID rule as well. Readers of your code might be mislead.

ANTLR: Different token with trailing bracket

I am working on an ANTLRv4 grammar for BUGS - my repo is here, the link points to a particular commit so shouldn't go out of date.
Minimum code example below.
I would like the input rule to go along t route if input is T(, but to go along the id route if the input is T for the grammar below.
grammar temp;
input: t | id;
t: T '(';
id: ID;
T: 'T' {_input.LA(1)==(}?;
ID: [a-zA-Z][a-zA-Z0-9._]*;
My ANLTRv4 specification of BUGS grammar was obtained heavily inspired with the FLEX+BISON lexing and parsing grammar incorporated in JAGS 4.3.0 source code, in files src/lib/compiler/parser.yy and src/lib/compiler/scanner.ll.
The way they accomplish it is by using the trailing context in the lexer, e.g. r/s. The way to do it in ANTLR is given here, but I cannot get it to work.
I need it to work this way because another part of the grammar depends on this mechanism - relevant code fragment here.
You can recreate my particular issue by cloning my repo and running make - this will give list of tokens lexed and error in parsing stage. In the tokens list the letter T is lexed as token 'T' rather than ID as I'd like it to be.
I feel there is much more natural/correct way to do it in ANTLR, but I'm new to this and cannot figure out a way.
PS If you have an idea how to better name this question please edit it.

If I understand the problem correctly the following code will work fine:
grammar temp;
input: t | id;
t: T '(';
id: ID | T;
T: 'T';
LPAREN: '(';
ID: [a-zA-Z][a-zA-Z0-9._]*;

Ambiguous ANTLR parser rule

I have a very simple example text which I want to parse with ANTLR, and yet I'm getting wrong results due to ambiguous definition of the rule.
Here is the grammar:
grammar SimpleExampleGrammar;
prog : event EOF;
event : DEFINE EVT_HEADER eventName=eventNameRule;
eventNameRule : DIGIT+;
DEFINE : '#define';
EVT_HEADER : 'EVT_';
DIGIT : [0-9a-zA-Z_];
WS : ('' | ' ' | '\r' | '\n' | '\t') -> channel(HIDDEN);
First text example:
#define EVT_EX1
Second text example:
#define EVT_EX1
#define EVT_EX2
So, the first example is parsed correctly.
However, the second example doesn't work, as the eventNameRule matches the next "#define ..." and the parse tree is incorrect
Appreciate any help to change the grammar to parse this correctly.
Thanks,
Busi

Beside the missing loop specifier you also have a problem in your WS rule. The first alt matches anything. Remove that. And, btw, give your DIGIT rule a different name. It matches more than just digits.

As Adrian pointed out, my main mistake here is that in the initial rule (prog) I used "event" and not "event+" this will solve the issue.
Thanks Adrian.

Skipping tokens in yacc

I want to have a grammar rule like below in my yacc file:
insert_statement: INSERT INTO NAME (any_token)* ';'
We can skip all the tokens until a given token at an error, in yacc as follows:
stat: error ';'
Is there any mechanism to skip any number of characters in yacc, when there is no error?
Thanks

After sometime I could solve my problem the following way and would like to mention it as it would be helpful to someone:
Add a token definition to lex including the characters that should be in a skipping token:
<*>[A-Za-z0-9_:.-]* { return SKIPPINGTOKS; }
(this would identify any token like a, 1, hello, hello123 etc.)
Then add the following such rules to yacc as required:
insert_statement: INSERT INTO NAME skipping_portion ';'
skipping_portion: SKIPPINGTOKS | skipping_portion SKIPPINGTOKS
Hope this may help someone...

I think you would want to do something like this. It skips any and all tokens that are not the semicolon.
insert_statement: INSERT INTO NAME discardable_tokens_or_epsilon ';' ;
discardable_tokens_or_epsilon: discardable_tokens
| epsilon
;
discardable_tokens: discardable_tokens discardable_token
| discardable_token
;
discardable_token: FOO
| BAR
| BLETCH
...et cetera... anything other than a semicolon
;
epsilon: ;

Simply don't specify a production rule containing those tokens, you'd like to skip.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Antlr PLSQL grammar not parsing package name correctly? - parsing

Related

Parsing Dart | ANTLR | Handle a comma at the end of parameter list

How to prefix numbers with optional letter?

ANTLR: Different token with trailing bracket

Ambiguous ANTLR parser rule

Skipping tokens in yacc

Categories

Resources