How to make this production more 'modular' - parsing

I have the following expression group where everything is thrown into the same expr rule:
grammar MyGrammar;
expr
: '(' expr ')'
// BoolExressions -- cannot move these out or else get Left-Recursion
| expr ('=' | '!=') expr
| expr 'AND' expr
| expr 'OR' expr
| ATOM
;
ATOM: [a-z]+ | [0-9]+;
WHITESPACE: [ \t\r\n] -> skip;
It works, but I would like to extract the boolExpression stuff so that I can use that separately, as some other rules I have must use a boolean expression rather than any expression. However, as soon as I do that I get a left-recursion error.
What would be a good way to break this up, so that I can separate the BooleanExpression stuff? Ideally, I would like it to "look like this":
grammar MyGrammar;
expr
: '(' expr ')'
| boolExpr
| ATOM
;
boolExpr
: expr ('=' | '!=') expr
| expr 'AND' expr
| expr 'OR' expr
;
ATOM: [a-z]+ | [0-9]+;
WHITESPACE: [ \t\r\n] -> skip;
// error(119): The following sets of rules are
// mutually left-recursive [expr, boolExpr]

it doesn’t quite get you a single boolExpr, but you should consider labeled alternatives:
grammar MyGrammar;
expr
: '(' expr ')'
// BoolExressions -- cannot move these out or else get Left-Recursion
| expr ('=' | '!=') expr # compareExpr
| expr 'AND' expr # andExpr
| expr 'OR' expr # orExpr
| ATOM
;
ATOM: [a-z]+ | [0-9]+;
WHITESPACE: [ \t\r\n] -> skip;
This creates separate *Context classes for each alternative which significantly reduces the complexity of contexts your listeners and visitors will deal with (there will be more of them, though, obviously). Symbols are also scoped to each alternative so you can do something like:
grammar MyGrammar;
expr
: '(' expr ')'
// BoolExressions -- cannot move these out or else get Left-Recursion
| lhs=expr ('=' | '!=') rhs=expr # compareExpr
| lhs=expr 'AND' rhs=expr # andExpr
| lhs=expr 'OR' rhs=expr # orExpr
| ATOM
;
ATOM: [a-z]+ | [0-9]+;
WHITESPACE: [ \t\r\n] -> skip;

Related

How to solve the following grammar ambiguity

I am trying to parse a SQL statement that allows for both a BETWEEN expr1 AND expr2 and also expr1 AND expr2. An example would be:
SELECT * FROM tbl WHERE
col1 BETWEEN 1 AND 5
AND col3 = 10;
What would be a good way to disambiguate this, as my grammar is currently like the following:
grammar DBParser;
statement:expr EOF;
expr
: '(' expr ')'
| expr '=' expr
| expr 'BETWEEN' expr 'AND' expr
| expr 'AND' expr
| ATOM
;
ATOM: [a-zA-Z0-9]+;
WHITESPACE: [ \t\r\n] -> skip;
And with the input (col1 BETWEEN 1 AND 5) AND (col3 = 10);:

Resolving Shift/reduce conflicts in GNU Bison

I have the following grammar rules:
%precedence KW2
%left "or"
%left "and"
%left "==" "!=" ">=" ">" "<=" "<"
%left "-" "+"
%left "/" "*"
%start statement1
%%
param
: id
| id ":" expr // Conflict is caused by this line
| id "=" expr
;
param_list
: param_list "," param
| param
;
defparam
: param_list "," "/"
| param_list "," "/" ","
;
param_arg_list
: defparam param_list
| param_list
;
statement1
: KEYWORD1 "(" param_arg_list ")" ":" expr {}
expression1
: KEYWORD2 param_arg_list ":" expr %prec KW2 {} // This causes shift/reduce conflicts
expr
: id
| expr "+" expr
| expr "-" expr
| expr "*" expr
| expr "/" expr
| expr "==" expr
| expr "!=" expr
| expr "<" expr
| expr "<=" expr
| expr ">" expr
| expr ">=" expr
| expr "and" expr
| expr "or" expr
| expression1
id
: TK_NAME {}
.output
State 33
12 param: id . [":", ",", ")"]
13 | id . ":" expr
14 | id . "=" expr
":" shift, and go to state 55
"=" shift, and go to state 56
":" [reduce using rule 12 (param)]
$default reduce using rule 12 (param)
The problem here is that, For the expression1, id ":" expr rule in param is not required, so If I remove id ":" expr, the conflicts are resolved. But, I can not remove id ":" expr rule in param, because statement1 requires it.
I wanted to use para_arg_list for statement1 and expression1 is that, it simplifies the grammar rules by not allowing to use the grammar rules again and again.
My question is that is there any other way to resolve the conflict?

How to make certain rules mandatory in Antlr

I wrote the following grammar which should check for a conditional expression.
Examples below is what I want to achieve using this grammar:
test invalid
test = 1 valid
test = 1 and another_test>=0.2 valid
test = 1 kasd y = 1 invalid (two conditions MUST be separated by AND/OR)
a = 1 or (b=1 and c) invalid (there cannot be a lonely character like 'c'. It should always be a triplet. i.e, literal operator literal)
grammar expression;
expr
: literal_value
| expr ( '='|'<>'| '<' | '<=' | '>' | '>=' ) expr
| expr K_AND expr
| expr K_OR expr
| function_name '(' ( expr ( ',' expr )* | '*' )? ')'
| '(' expr ')'
;
literal_value
: NUMERIC_LITERAL
| STRING_LITERAL
| IDENTIFIER
;
keyword
: K_AND
| K_OR
;
name
: any_name
;
function_name
: any_name
;
database_name
: any_name
;
table_name
: any_name
;
column_name
: any_name
;
any_name
: IDENTIFIER
| keyword
| STRING_LITERAL
| '(' any_name ')'
;
K_AND : A N D;
K_OR : O R;
IDENTIFIER
: '"' (~'"' | '""')* '"'
| '`' (~'`' | '``')* '`'
| '[' ~']'* ']'
| [a-zA-Z_] [a-zA-Z_0-9]*
;
NUMERIC_LITERAL
: DIGIT+ ( '.' DIGIT* )? ( E [-+]? DIGIT+ )?
| '.' DIGIT+ ( E [-+]? DIGIT+ )?
;
STRING_LITERAL
: '\'' ( ~'\'' | '\'\'' )* '\''
;
fragment DIGIT : [0-9];
fragment A : [aA];
fragment B : [bB];
fragment C : [cC];
fragment D : [dD];
fragment E : [eE];
fragment F : [fF];
fragment G : [gG];
fragment H : [hH];
fragment I : [iI];
fragment J : [jJ];
fragment K : [kK];
fragment L : [lL];
fragment M : [mM];
fragment N : [nN];
fragment O : [oO];
fragment P : [pP];
fragment Q : [qQ];
fragment R : [rR];
fragment S : [sS];
fragment T : [tT];
fragment U : [uU];
fragment V : [vV];
fragment W : [wW];
fragment X : [xX];
fragment Y : [yY];
fragment Z : [zZ];
WS: [ \n\t\r]+ -> skip;
So my question is, how can I get the grammar to work for the examples mentioned above? Can we make certain words as mandatory between two triplets (literal operator literal)? In a sense I'm just trying to get a parser to validate the where clause condition but only simple condition and functions are permitted. I also want have a visitor that retrieves the values like function, parenthesis, any literal etc in Java, how to achieve that?
Yes and no.
You can change your grammar to only allow expressions that are comparisons and logical operations on the same:
expr
: term ( '='|'<>'| '<' | '<=' | '>' | '>=' ) term
| expr K_AND expr
| expr K_OR expr
| '(' expr ')'
;
term
: literal_value
| function_name '(' ( expr ( ',' expr )* | '*' )? ')'
;
The issue comes if you want to allow boolean variables or functions -- you need to classify the functions/vars in your lexer and have a different terminal for each, which is tricky and error prone.
Instead, it is generally better to NOT do this kind of checking in the parser -- have your parser be permissive and accept anything expression-like, and generate an expression tree for it. Then have a separate pass over the tree (called a type checker) that checks the types of the operands of operations and the arguments to functions.
This latter approach (with a separate type checker) generally ends up being much simpler, clearer, more flexible, and gives better error messages (rather than just 'syntax error').

Context Free Grammar in ANTLR throwing error for if-statement

I wrote a grammar in ANTLR for a Java-like if statement as follows:
if_statement
: 'if' expression
(statement | '{' statement+ '}')
('elif' expression (statement | '{' statement+ '}'))*
('else' (statement | '{' statement+ '}'))?
;
I've implemented the "statement" and "expression" correctly, but the if_statement is giving me the following error:
Decision can match input such as "'elif'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
|---> ('elif' expression (statement | '{' statement+ '}'))*
warning(200): /OptDB/src/OptDB/XL.g:38:9:
Decision can match input such as "'else'" using multiple alternatives: 1, 2
As a result, alternative(s) 2 were disabled for that input
|---> ('else' (statement | '{' statement+ '}'))?
It seems like there are problems with the "elif" and "else" block.
Basically, we can have 0 or more "elif" blocks, so I wrapped them with *
Also we can have 0 or 1 "else" block, so I wrapped it it with ?.
What seems to cause the error?
========================================================================
I'll also put the implementations of "expression" and "statements":
statement
: assignment_statement
| if_statement
| while_statement
| for_statement
| function_call_statement
;
term
: IDENTIFIER
| '(' expression ')'
| INTEGER
| STRING_LITERAL
| CHAR_LITERAL
| IDENTIFIER '(' actualParameters ')'
;
negation
: 'not'* term
;
unary
: ('+' | '-')* negation
;
mult
: unary (('*' | '/' | 'mod') unary)*
;
add
: mult (('+' | '-') mult)*
;
relation
: add (('=' | '/=' | '<' | '<=' | '>=' | '>') add)*
;
expression
: relation (('and' | 'or') relation)*
;
actualParameters
: expression (',' expression)*
;
Because your grammar allows for statement block without being grouped by {...}, you've got yourself a classic dangling else ambiguity.
Short explanation. The input:
if expr1 if expr2 ... else ...
could be parsed as:
Parse 1
if expr1
if expr2
...
else
...
but also as this:
Parse 2
if expr1
if expr2
...
else
...
To eliminate the ambiguity, either change:
(statement | '{' statement+ '}')
into:
'{' statement+ '}'
// or
'{' statement* '}'
so that it's clear by looking at the braces to which if the else belongs to, or add a predicate to force the parser to choose Parse 1:
if_statement
: 'if' expression statement_block
(('elif')=> 'elif' expression statement_block)*
(('else')=> 'else' statement_block)?
;
statement_block
: '{' statement* '}'
| statement
;

Optimizing Bison Grammar

I have this grammar of a C# like language, and I want to make a parser for it, but when I put the grammar it tells me about Shift/Reduce conflicts. I tried to fix some but I can't seem to find another way to improve this grammar. Any help would be greatly appreciated :D Here's the grammar:
Program: Decl
| Program Decl
;
Decl: VariableDecl
| FunctionDecl
| ClassDecl
| InterfaceDecl
;
VariableDecl: Variable SEMICOLON
;
Variable: Type IDENTIFIER
;
Type: TOKINT
| TOKDOUBLE
| TOKBOOL
| TOKSTRING
| IDENTIFIER
| Type BRACKETS
;
FunctionDecl: Type IDENTIFIER OPARENS Formals CPARENS StmtBlock
| TOKVOID IDENTIFIER OPARENS Formals CPARENS StmtBlock
;
Formals: VariablePlus
| /* epsilon */
;
VariablePlus: Variable
| VariablePlus COMMA Variable
;
ClassDecl: TOKCLASS IDENTIFIER OptExtends OptImplements OBRACE ListaField CBRACE
;
OptExtends: TOKEXTENDS IDENTIFIER
| /* epsilon */
;
OptImplements: TOKIMPLEMENTS ListaIdent
| /* epsilon */
;
ListaIdent: ListaIdent COMMA IDENTIFIER
| IDENTIFIER
;
ListaField: ListaField Field
| /* epsilon */
;
Field: VariableDecl
| FunctionDecl
;
InterfaceDecl: TOKINTERFACE IDENTIFIER OBRACE ListaProto CBRACE
;
ListaProto: ListaProto Prototype
| /* epsilon */
;
Prototype: Type IDENTIFIER OPARENS Formals CPARENS SEMICOLON
| TOKVOID IDENTIFIER OPARENS Formals CPARENS SEMICOLON
;
StmtBlock: OBRACE ListaOptG CBRACE
;
ListaOptG: /* epsilon */
| VariableDecl ListaOptG
| Stmt ListaOptG
;
Stmt: OptExpr SEMICOLON
| IfStmt
| WhileStmt
| ForStmt
| BreakStmt
| ReturnStmt
| PrintStmt
| StmtBlock
;
OptExpr: Expr
| /* epsilon */
;
IfStmt: TOKIF OPARENS Expr CPARENS Stmt OptElse
;
OptElse: TOKELSE Stmt
| /* epsilon */
;
WhileStmt: TOKWHILE OPARENS Expr CPARENS Stmt
;
ForStmt: TOKFOR OPARENS OptExpr SEMICOLON Expr SEMICOLON OptExpr CPARENS Stmt
;
ReturnStmt: TOKRETURN OptExpr SEMICOLON
;
BreakStmt: TOKBREAK SEMICOLON
;
PrintStmt: TOKPRINT OPARENS ListaExprPlus CPARENS SEMICOLON
;
ListaExprPlus: Expr
| ListaExprPlus COMMA Expr
;
Expr: LValue LOCATION Expr
| Constant
| LValue
| TOKTHIS
| Call
| OPARENS Expr CPARENS
| Expr PLUS Expr
| Expr MINUS Expr
| Expr TIMES Expr
| Expr DIVIDED Expr
| Expr MODULO Expr
| MINUS Expr
| Expr LESSTHAN Expr
| Expr LESSEQUALTHAN Expr
| Expr GREATERTHAN Expr
| Expr GREATEREQUALTHAN Expr
| Expr EQUALS Expr
| Expr NOTEQUALS Expr
| Expr AND Expr
| Expr OR Expr
| NOT Expr
| TOKNEW OPARENS IDENTIFIER CPARENS
| TOKNEWARRAY OPARENS Expr COMMA Type CPARENS
| TOKREADINTEGER OPARENS CPARENS
| TOKREADLINE OPARENS CPARENS
| TOKMALLOC OPARENS Expr CPARENS
;
LValue: IDENTIFIER
| Expr PERIOD IDENTIFIER
| Expr OBRACKET Expr CBRACKET
;
Call: IDENTIFIER OPARENS Actuals CPARENS
| Expr PERIOD IDENTIFIER OPARENS Actuals CPARENS
| Expr PERIOD LibCall OPARENS Actuals CPARENS
;
LibCall: TOKGETBYTE OPARENS Expr CPARENS
| TOKSETBYTE OPARENS Expr COMMA Expr CPARENS
;
Actuals: ListaExprPlus
| /* epsilon */
;
Constant: INTCONSTANT
| DOUBLECONSTANT
| BOOLCONSTANT
| STRINGCONSTANT
| TOKNULL
;
The old Bison version on my school's server says you have 241 shift/reduce conflicts. One is the dangling if/else statement. Putting "OptElse" does NOT solve it. You should just write out the IfStmt and an IfElseStmt and then use %nonassoc and %prec options in bison to fix it.
Your expressions are the issue of almost all of the other 240 conflicts. What you need to do is either force precedence rules (messy and a terrible idea) or break your arithmetic expressions into stuff like:
AddSubtractExpr: AddSubtractExpr PLUS MultDivExpr | ....
;
MultDivExpr: MultiDivExpr TIMES Factor | ....
;
Factor: Variable | LPAREN Expr RPAREN | call | ...
;
Since Bison produces a bottom up parser, something like this will give you correct order of operations. If you have a copy of the first edition of the Dragon Book, you should look at the grammar in Appendix A. I believe the 2nd edition also has similar rules for simple expressions.
conflicts (shift/reduce or reduce/reduce) mean that your grammar is not LALR(1) so can't be handled by bison directly without help. There are a number of immediately obvious problems:
expression ambiguity -- there's no precedence in the grammar, so things like a + b * c are ambiguous. You can fix this by adding precedence rules, or by splitting the Expr rule into separate AdditiveExpr, MultiplicativeExpr, ConditionalExpr etc rules.
dangling else ambiguity -- if (a) if (b) x; else y; -- the else could be matched with either if. You can either ignore this if the default shift is correct (it usually is for this specific case, but ignoring errors is always dangerous) or split the Stmt rule
There are many books on grammars and parsing that will help with this.

Resources