Parse error using yecc in Erlang: "syntax error before" - parsing

I am currently writing a parser with yecc in Erlang.
Nonterminals expression.
Terminals '{' '}' '+' '*' 'atom' 'app' 'integer' 'if0' 'fun' 'rec'.
Rootsymbol expression.
expression -> '{' '+' expression expression '}' : {'AddExpression', '$3','$4'}.
expression -> '{' 'if0' expression expression expression '}' : {'if0', '$3', '$4', '$5'}.
expression -> '{' '*' expression expression '}' : {'MultExpression', '$3','$4'}.
expression -> '{' 'app' expression expression '}' : {'AppExpression', '$3','$4'}.
expression -> '{' 'fun' '{' expression '}' expression '}': {'FunExpression', '$4', '$6'}.
expression -> '{' 'rec' '{' expression expression '}' expression '}' : {'RecExpression', '$4', '$5', '$7'}.
expression -> atom : '$1'.
expression -> integer : '$1'.
I also have an erlang project that tokenizes the the input before parsing:
tok(X) ->
element(2, erl_scan:string(X)).
get_Value(X)->
element(2, parse(tok(X))).
These cases are accepted:
interp:get_Value("{+ {+ 4 6} 6}").
interp:get_Value("{+ 4 2}").
These return:
{'AddExpression' {'AddExpression' {integer, 1,6} {integer,1,6}}{integer,1,6}}
and
{'AddExpression' {integer,1,4} {integer,1,2}}
But this test case:
interp:get_Value("{if0 3 4 5}").
Returns:
{1,string_parser,["syntax error before: ","if0"]}

In the grammar rules what you are showing are the category of the terminal tokens and not their values. So you can match against an atom but not against a specific atom. If you are using the Erlang tokenizer then the token generated for "if0" will be {atom,Line,if0} while in you grammar you want a {if0,Line} token. This is what the "Pre-processing" section of the yecc documentation is trying to explain.
You will need a special tokenizer for this. A simple way of handling this if you want to use the Erlang tokenizer is have a pre-processing pass which scans the token list and converts {atom,Line,if0} tokens to {if0,Line} tokens.

Related

Ambiguous call expression in ANTLR4 grammar

I have a simple grammar (for demonstration)
grammar Test;
program
: expression* EOF
;
expression
: Identifier
| expression '(' expression? ')'
| '(' expression ')'
;
Identifier
: [a-zA-Z_] [a-zA-Z_0-9?]*
;
WS
: [ \r\t\n]+ -> channel(HIDDEN)
;
Obviously the second and third alternatives in the expression rule are ambiguous. I want to resolve this ambiguity by permitting the second alternative only if an expression is immediately followed by a '('.
So the following
bar(foo)
should match the second alternative while
bar
(foo)
should match the 1st and 3rd alternatives (even if the token between them is in the HIDDEN channel).
How can I do that? I have seen these ambiguities, between call expressions and parenthesized expressions, present in languages that have no (or have optional) expression terminator tokens (or rules) - example
The solution to this is to temporary "unhide" whitespace in your second alternative. Have a look at this question for how this can be done.
With that solution your code could look somthing like this
expression
: Identifier
| {enableWS();} expression '(' {disableWS();} expression? ')'
| '(' expression ')'
;
That way the second alternative matches the input WS-sensitive and will therefore only be matched if the identifier is directly followed by the bracket.
See here for the implementation of the MultiChannelTokenStream that is mentioned in the linked question.

Solving shift/reduce conflict in expression grammar

I am new to bison and I am trying to make a grammar parsing expressions.
I am facing a shift/reduce conflight right now I am not able to solve.
The grammar is the following:
%left "[" "("
%left "+"
%%
expression_list : expression_list "," expression
| expression
| /*empty*/
;
expression : "(" expression ")"
| STRING_LITERAL
| INTEGER_LITERAL
| DOUBLE_LITERAL
| expression "(" expression_list ")" /*function call*/
| expression "[" expression "]" /*index access*/
| expression "+" expression
;
This is my grammar, but I am facing a shift/reduce conflict with those two rules "(" expression ")" and expression "(" expression_list ")".
How can I resolve this conflict?
EDIT: I know I could solve this using precedence climbing, but I would like to not do so, because this is only a small part of the expression grammar, and the size of the expression grammar would explode using precedence climbing.
There is no shift-reduce conflict in the grammar as presented, so I suppose that it is just an excerpt of the full grammar. In particular, there will be precisely the shift/reduce conflict mentioned if the real grammar includes:
%start program
%%
program: %empty
| program expression
In that case, you will run into an ambiguity because given, for example, a(b), the parser cannot tell whether it is a single call-expression or two consecutive expressions, first a single variable, and second a parenthesized expression. To avoid this problem you need to have some token which separates expression (statements).
There are some other issues:
expression_list : expression_list "," expression
| expression
| /*empty*/
;
That allows an expression list to be ,foo (as in f(,foo)), which is likely not desirable. Better would be
arguments: %empty
| expr_list
expr_list: expr
| expr_list ',' expr
And the precedences are probably backwards. Usually one wants postfix operators like call and index to bind more tightly than arithmetic operators, so they should come at the end. Otherwise a+b(7) is (a+b)(7), which is unconventional.

How to define logical operator with parenthesis in ANTLR grammar

I am defining a grammar in ANTLR that will express an expression which includes logical operator and parenthesis together.
Here is the grammar
grammar simpleGrammar;
/* This will be the entry point of the parser. */
parse
:
expression EOF
;
expression
:
expression binOp expression | ID | unOp (expression) | '(' expression ')'
;
binOp
:
('AND' | 'OR')
;
unOp
:
'NOT'
;
ID :
('a'..'z' | 'A'..'Z')+
;
The defined grammar can able to express parse tree without parenthesis but when I input an example with parenthesis for example, (Apple OR Bananana)AND Orange
It is showing MismatchedTokenException
So, It will be really appreciated if someone explains how to define the grammar in order to express the parenthesis.
You forgot to tell ANTLR what to do with whitespace. For example:
WS : [ \t\r\n] -> skip;
Add this and you grammar will work.
As a side note, your grammar has the same precedence for the AND and OR operators. And these operators have higher precedence than NOT. As this goes against conventional rules, I'd advise you to write your expression rule like this instead:
expression
: '(' expression ')' # parenExp
| 'NOT' expression # notExpr
| expression 'AND' expression # andExpr
| expression 'OR' expression # orExpr
| ID # atomExpr
;

Why if I add lambda left recursion occurs?

I am trying to write if syntax by using flex bison and in parser I have a problem
here is a grammar for if syntax in cpp
program : //start rule
|statements;
block:
TOKEN_BEGIN statements ';'TOKEN_END;
reexpression:
| TOKEN_OPERATOR expression;
expression: //x+2*a*3
TOKEN_ID reexpression
| TOKEN_NUMBER reexpression;
assignment:
TOKEN_ID'='expression
statement:
assignment;
statements:
statement';'
| block
| if_statement;
else_statement:
TOKEN_ELSE statements ;
else_if_statement:
TOKEN_ELSE_IF '(' expression ')' statements;
if_statement:
TOKEN_IF '(' expression ')' statements else_if_statement else_statement;
I can't understand why if I replace these three rules , left recursion happen I just add lambda to These rules
else_statement:
|TOKEN_ELSE statements ;
else_if_statement:
|TOKEN_ELSE_IF '(' expression ')' statements;
if_statement:
TOKEN_IF '(' expression ')' statements else_if_statement else_statement;
please help me understand.
There's no lambda or left-recursion involved.
When you add epsilon to the if rules (making the else optional), you get conflicts, because the resulting grammar is ambiguous. This is the classic dangling else ambiguity where when you have TWO ifs with a single else, the else can bind to either if.
IF ( expr1 ) IF ( expr2 ) block1 ELSE block2

yacc shift/reduce conflict when parse C ++ template arguments

I have a YACC grammar for parsing expressions in C++. Here is lite version:
// yacc.y
%token IDENT
%%
expr:
call_expr
| expr '<' call_expr
| expr '>' call_expr
;
call_expr:
IDENT
| '(' expr ')'
| IDENT '<' args '>' '(' args ')'
;
args:
IDENT
| args ',' IDENT
;
%%
When I want to support function call with template arguments, I got a shift/reduce conflict.
When we got input IDENT '<' IDENT, yacc doesn't know whether we should shift or reduce.
I want IDENT '<' args '>' '(' args ')' got higher precedence level than expr '<' call_expr, so I can parse the following exprs.
x < y
f<x>(a,b)
f<x,y>(a,b) < g<x,y>(c,d)
I see C++/C# both support this syntax. Is there any way to solve this problem with yacc?
How do I modify the .y file?
Thank you!
You want the -v option to yacc/bison. It will give you a .output file with all the info about the generated shift/reduce parser. With your grammar, bison gives you:
State 1 conflicts: 1 shift/reduce
:
state 1
4 call_expr: IDENT .
6 | IDENT . '<' args '>' '(' args ')'
'<' shift, and go to state 5
'<' [reduce using rule 4 (call_expr)]
$default reduce using rule 4 (call_expr)
which shows you where the problem is. After seeing an IDENT, when the next token is <, it doesn't know if it should reduce that call_expr (to ultimately match the rule expr: expr '<' call_expr) or if it should shift to match rule 6.
Parsing this with only 1 token lookahead is hard, as you have two distinct meanings of the token < (less-than or open angle bracket), and which is meant depends on the later tokens.
This case is actually even worse, as it is ambiguous, as an input like
a < b > ( c )
is probably a template call with both args lists being singletons, but it could also be
( a < b ) > ( c )
so just unfactoring the grammar won't help. Your best bet is to use a more powerful parsing method, like bison's %glr-parser option or btyacc

Resources