My objective is to create a parser for a small language. It is currently giving me one shift/reduce error.
My CFG is ambiguous somewhere, but I can't figure out where
prog: PROGRAM beg {$$ = "program" $2;}
| PROGRAM stmt beg {$$ = "program" $2 $3;}
beg: BEG stmt END {$$ = "begin" $2 "end";}
| BEG END {$$ = "begin" "end";}
stmt: beg {$$ = $1;}
| if_stmt {$$ = $1;}/*
| IF expr THEN stmt {$$ = $1 $2 $3 $4;}*/
| WHILE expr beg {$$ = "while" $2 $3;}
| VAR COLEQUALS arithexpr SEMI {$$ = $1 ":=" $3 ";";}
| VAR COLON INTEGER SEMI {$$ = $1 ":" "integer" ";";} /*Declaring an integer */
| VAR COLON REAL SEMI {$$ $1 ":" "real" ";";} /*declaring a real */
if_stmt: IF expr THEN stmt {$$ = "if" $2 "then" $4;}
| IF expr THEN stmt ELSE stmt {$$ = "if" $2 "then" $4 "else" $6;}
expr: NOT VAR {$$ = "!" $2;}
| VAR GREATERTHAN arithexpr {$$ = $1 ">" $3;}
| VAR LESSTHAN arithexpr {$$ = $1 "<" $3;}
| VAR GREATERTHANEQUALTO arithexpr {$$ = $1 ">=" $3;}
| VAR LESSTHANEQUALTO arithexpr {$$ = $1 "<=" $3;}
| VAR EQUALS arithexpr {$$ = $1 "==" $3;}
| VAR NOTEQUALS arithexpr {$$ = $1 "!=" $3;}
| arithexpr AND arithexpr {$$ = $1 "&&" $3;}
| arithexpr OR arithexpr {$$ = $1 "||" $3;}
arithexpr: arithexpr PLUS term {$$ = $1 + $3;}
| arithexpr MINUS term {$$ = $1 - $3;}
| term {$$ = $1;}
term: term TIMES factor {$$ = $1 * $3;}
| term DIVIDE factor {$$ = $1 / $3;}
| factor {$$ = $1;}
factor: VAL {$$ = $1;}
The "error" comes from the ambiguity in the if_stmt's else part: stmt can be an if_stmt, and it's not clear to which if a else-part belongs, e.g. if you write:
if y1 then if y2 then x=1 else x=2
Then the else-part could either belong to the first if or the second one.
This question has been asked in variations many times, just search for if then else shift reduce
For diagnosis (to find out that you are also a victim of that if then else shift reduce problem) you can tell bison to produce an output-file with
bison -r all myparser.y
which will produce a file myparser.output, in which you can find for your case:
State 50 conflicts: 1 shift/reduce
....
state 50
11 if_stmt: IF expr THEN stmt . [ELSE, BEG, END]
12 | IF expr THEN stmt . ELSE stmt
ELSE shift, and go to state 60
ELSE [reduce using rule 11 (if_stmt)]
$default reduce using rule 11 (if_stmt)
state 51
...
One solution for this would be to introduce a block-statement and only alow these as statements in the if and else part:
stmt: ...
| blk_stmt
blk_stmt: BEGIN stmt END
if_stmt: IF expr THEN blk_stmt
| IF expr THEN blk_stmt ELSE blk_stmt
Which would for a modified c-language mean that only
if x1 then {if x2 then {y=1}} else {y=2}
be possible (with { representing the BEGIN-token and }representing the END-token) thus resolving the ambiguity.
Related
Currently, my parser file looks like this:
%{
#include <stdio.h>
#include <math.h>
int yylex();
void yyerror (const char *s);
%}
%union {
long num;
char* str;
}
%start line
%token print
%token exit_cmd
%token <str> identifier
%token <str> string
%token <num> number
%%
line: assignment {;}
| exit_stmt {;}
| print_stmt {;}
| line assignment {;}
| line exit_stmt {;}
| line print_stmt {;}
;
assignment: identifier '=' number {printf("Assigning var %s to value %d\n", $1, $3);}
| identifier '=' string {printf("Assigning var %s to value %s\n", $1, $3);}
;
exit_stmt: exit_cmd {exit(0);}
;
print_stmt: print print_expr {;}
;
print_expr: string {printf("%s\n", $1);}
| number {printf("%d\n", $1);}
;
%%
int main(void)
{
return yyparse();
}
void yyerror (const char *s) {fprintf(stderr, "%s\n", s);}
Giving the input: myvar = 3 gives the output Assigning var myvar = 3 to value 3, as expected. However, modifying the code to include an equation grammar rule breaks such assignments.
Equation grammar:
equation: number '+' number {$$ = $1 + $3;}
| number '-' number {$$ = $1 - $3;}
| number '*' number {$$ = $1 * $3;}
| number '/' number {$$ = $1 / $3;}
| number '^' number {$$ = pow($1, $3);}
| equation '+' number {$$ = $1 + $3;}
| equation '-' number {$$ = $1 - $3;}
| equation '*' number {$$ = $1 * $3;}
| equation '/' number {$$ = $1 / $3;}
| equation '^' number {$$ = pow($1, $3);}
;
Modifying the assignment grammar accordingly as well:
assignment: identifier '=' number {printf("Assigning var %s to value %d\n", $1, $3);}
| identifier '=' equation {printf("Assigning var %s to value %d\n", $1, $3);}
| identifier '=' string {printf("Assigning var %s to value %s\n", $1, $3);}
;
And giving the equation rule the type of num in the parser's first section:
%type <num> equation
Giving the same input: var = 3 freezes the program.
I know this is a long question but can anyone please explain what is going on here?
Also, here's the lexer in case you wanna take a look.
It doesn't "freeze the program". The program is just waiting for more input.
In your first grammar, var = 3 is a complete statement which cannot be extended. But in your second grammar, it could be the beginning of var = 3 + 4, for example. So the parser needs to read another token after the 3. If you want input lines to be terminated by a newline, you will need to modify your scanner to send a newline character as a token, and then modify your grammar to expect a newline token at the end of every statement. If you intend to allow statements to be spread out over several lines, you"ll need to be aware of that fact while typing input.
There are several problems with your grammar, and also with your parser. (Flex doesn't implement non-greedy repetition, for example.) Please look at the examples in the bison and flex manuals
I'm trying to write a little interpreter with GNU bison.
I wanted to ask if anyone could explain the difference between the directive% right and% left and where my mistake is in the code below.
%token <flo> FLO
%token <name> NAME
%right '='
%left '+' '-'
%left '*' '/' '%'
%left '&' '|' 'x'
%left NEG NOT LOGIC_NOT
%left '^'
%left ARG
%type <flo> exp
%%
language: /* nothing */
| language statment
statment: '\n'
| exp
| error { yyerrok; }
;
exp: FLO { $$ = $1; }
| NAME '(' ')' { $$ = ycall($1); }
| NAME '(' exp ')' { $$ = ycall($1, $3); }
| NAME '(' exp ',' exp ')' { $$ = ycall($1, $3, $5); }
| NAME '=' exp { $$ = 1; ysetvar($1, $3); }
| NAME %prec VAR { $$ = ygetvar($1); }
| '_' exp %prec ARG { $$ = ygetarg($2, args); }
| '(' exp ')' { $$ = $2; }
/* 1 Operand */
| '-' exp %prec NEG { $$ = - $2; }
| '~' exp %prec NOT { $$ = ~ static_cast<int>($2); }
| '!' exp %prec LOGIC_NOT { $$ = ! static_cast<int>($2); }
/* 2 Operands */
| exp '+' exp { $$ = $1 + $3; }
| exp '-' exp { $$ = $1 - $3; }
| exp '*' exp { $$ = $1 * $3; }
| exp '/' exp { $$ = $1 / $3; }
| exp '%' exp { $$ = static_cast<int>($1) % static_cast<int>($3); }
| exp '^' exp { $$ = pow($1, $3); }
| exp '&' exp { $$ = static_cast<int>($1) & static_cast<int>($3); }
| exp '|' exp { $$ = static_cast<int>($1) | static_cast<int>($3); }
| exp 'x' exp { $$ = static_cast<int>($1) ^ static_cast<int>($3); }
;
Look at the y.output file produced by yacc or bison with the -v argument. The first conflict is in state 5:
State 5
7 exp: NAME . '(' ')'
8 | NAME . '(' exp ')'
9 | NAME . '(' exp ',' exp ')'
10 | NAME . '=' exp
11 | NAME .
'=' shift, and go to state 14
'(' shift, and go to state 15
'(' [reduce using rule 11 (exp)]
$default reduce using rule 11 (exp)
In this case the conflcit is when there's a '(' after a NAME -- this is an ambiguity in your grammar in which it might be a call expression, or it might be a simple NAME expression followed by a parenthesized expression, due to the fact that you have no separator between statements in your language.
The second conflict is:
State 13
4 statment: exp .
17 exp: exp . '+' exp
18 | exp . '-' exp
19 | exp . '*' exp
20 | exp . '/' exp
21 | exp . '%' exp
22 | exp . '^' exp
23 | exp . '&' exp
24 | exp . '|' exp
25 | exp . 'x' exp
'+' shift, and go to state 21
'-' shift, and go to state 22
'*' shift, and go to state 23
'/' shift, and go to state 24
'%' shift, and go to state 25
'&' shift, and go to state 26
'|' shift, and go to state 27
'x' shift, and go to state 28
'^' shift, and go to state 29
'-' [reduce using rule 4 (statment)]
$default reduce using rule 4 (statment)
which is essentially the same problem, this time with a '-' -- the input NAME - NAME might be a single binary subtract statements, or it might be two statements -- a NAME followed by a unary negate.
If you add a separator between statements (such as ;), both of these conflicts would go away.
Let's assume I have grammar like this:
expr : expr '-' expr { $$ = $1 - $3; }
| "Function" '(' expr ',' expr ')' { $$ = ($3 - $5) * 2; }
| NUMBER { $$ = $1; };
How can use rule
expr : expr '-' expr { $$ = $1 - $3; }
inside
expr : "Function" '(' expr ',' expr ')' { $$ = ($3 - $5) * 2; }
Because implementation of $1 - $3 is repeated? It would be much better if I can use already implemented subtraction from rule one and only add multiplication with 2. This is just the basic example, but I have very big grammar with lot of repeating calculations.
I was without conflicts in my parser.y file. But introducing the actions to construct my syntactic tree resulted in 12 new shift/reduce conflicts. Do you guys have any idea on this?
Down bellow you have my parser.y and the compilation log.
Parser.y:
%{
#include <stdio.h>
#include "main.h"
#include "iks_ast.h"
%}
%union {
struct item_t *symbol;
struct node *tree;
}
%error-verbose
/* Declaração dos tokens da linguagem */
%token TK_PR_INT
%token TK_PR_FLOAT
%token TK_PR_BOOL
%token TK_PR_CHAR
%token TK_PR_STRING
%token TK_PR_IF
%token TK_PR_THEN
%token TK_PR_ELSE
%token TK_PR_WHILE
%token TK_PR_DO
%token TK_PR_INPUT
%token TK_PR_OUTPUT
%token TK_PR_RETURN
%token TK_OC_LE
%token TK_OC_GE
%token TK_OC_EQ
%token TK_OC_NE
%token TK_OC_AND
%token TK_OC_OR
%token<symbol> TK_LIT_INT
%token<symbol> TK_LIT_FLOAT
%token<symbol> TK_LIT_FALSE
%token<symbol> TK_LIT_TRUE
%token<symbol> TK_LIT_CHAR
%token<symbol> TK_LIT_STRING
%token<symbol> TK_IDENTIFICADOR
%token TOKEN_ERRO
%left TK_OC_OR TK_OC_AND
%left '<' '>' TK_OC_LE TK_OC_GE TK_OC_EQ TK_OC_NE
%left '+' '-'
%left '*' '/'
%nonassoc LOWER_THAN_ELSE
%nonassoc TK_PR_ELSE
%start programa
%type<symbol> decl_var
%type<symbol> cabecalho
%type<tree> programa
%type<tree> def_funcao
%type<tree> expressao
%type<tree> controle_fluxo
%type<tree> comando
%type<tree> chamada_funcao
%type<tree> entrada
%type<tree> saida
%type<tree> lista_expressoes
%type<tree> lista_expressoes_nao_vazia
%type<tree> retorna
%type<tree> bloco_comando
%type<tree> seq_comando
%type<tree> atribuicao
%type<tree> vetor_indexado
%%
programa: decl_global programa {$$ = $2;}
| def_funcao programa {$$ = create_node(IKS_AST_PROGRAMA); $$ = insert_child($$,$1); $1 = AST_link($1,$2);}
| {$$=NULL;}
;
decl_global: decl_var ';'
| decl_vetor ';'
| decl_var {error("Faltando o ';' no final do comando.", $1->line); return IKS_SYNTAX_ERRO;}
;
decl_local: decl_var ';' decl_local
|
;
/* Declaracao de variaveis e tipos*/
decl_var
: tipo_var TK_IDENTIFICADOR {$$ = $2;}
;
decl_vetor
: tipo_var TK_IDENTIFICADOR '[' TK_LIT_INT ']'
;
tipo_var: TK_PR_INT
| TK_PR_FLOAT
| TK_PR_BOOL
| TK_PR_CHAR
| TK_PR_STRING
;
/* Declaracao de funcao */
def_funcao: cabecalho decl_local bloco_comando {$$ = create_node(IKS_AST_FUNCAO); $$ = insert_child($$,$3);}
| cabecalho decl_local bloco_comando ';' {error("Declaração de função com ';' no final do comando.\n",$1->line); return IKS_SYNTAX_ERRO;}
;
chamada_funcao
: TK_IDENTIFICADOR '(' lista_expressoes ')' {$$ = AST_ident_exp(IKS_AST_CHAMADA_DE_FUNCAO,$1,$3);}
;
cabecalho: decl_var '(' lista_parametros ')' {$$ = $1;}
;
lista_parametros: lista_parametros_nao_vazia
|
;
lista_parametros_nao_vazia: parametro ',' lista_parametros_nao_vazia
| parametro
;
parametro: decl_var
;
comando: bloco_comando {$$ = $1;}
| controle_fluxo {$$ = $1;}
| atribuicao {$$ = $1;}
| entrada {$$ = $1;}
| saida {$$ = $1;}
| retorna {$$ = $1;}
| decl_var ';' {$$ = NULL;}
| chamada_funcao {$$ = $1;}
| ';' {$$ = NULL;}
;
bloco_comando: '{' seq_comando '}' {$$ = create_node(IKS_AST_BLOCO); $$ = insert_child($$,$2);}
;
seq_comando: seq_comando comando {$$ = AST_link($1,$2); }
| /* empty */ {//não sei se precisa
$$ = NULL;}
;
/* Atribuicoes de variaveis */
atribuicao: TK_IDENTIFICADOR '=' expressao {$$ = AST_ident_exp(IKS_AST_ATRIBUICAO,$1,$3);}
| vetor_indexado '=' expressao {$$ = create_node(IKS_AST_ATRIBUICAO); $$ = insert_child($$,$1); $$ = insert_child($$,$3); }
;
vetor_indexado
: TK_IDENTIFICADOR '[' expressao ']' { $$ = AST_ident_exp(IKS_AST_VETOR_INDEXADO,$1,$3);}
;
/* Entrada e Saida (Input e Output) */
entrada
: TK_PR_INPUT TK_IDENTIFICADOR {$$ = create_node(IKS_AST_INPUT); $$ = AST_input($$,$2);}
;
saida
: TK_PR_OUTPUT lista_expressoes_nao_vazia {$$ = create_node(IKS_AST_OUTPUT); $$ = insert_child($$,$2);}
;
lista_expressoes_nao_vazia: expressao ',' lista_expressoes_nao_vazia {$$ = AST_link($1,$3);}
| expressao {$$ = $1;}
;
retorna: TK_PR_RETURN expressao ';' {$$ = create_node(IKS_AST_RETURN); $$ = insert_child($$,$2);}
;
/* Fluxo de Controle */
controle_fluxo
: TK_PR_IF '(' expressao ')' TK_PR_THEN comando %prec LOWER_THAN_ELSE {$$ = AST_if($3,$6,NULL);}
| TK_PR_IF '(' expressao ')' TK_PR_THEN comando TK_PR_ELSE comando {$$ = AST_if($3,$6,$8);}
| TK_PR_WHILE '(' expressao ')' TK_PR_DO comando {$$ = AST_while(IKS_AST_WHILE_DO,$3,$6);}
| TK_PR_DO comando TK_PR_WHILE '(' expressao ')' {$$ = AST_while(IKS_AST_DO_WHILE,$2,$5);}
;
expressao: TK_IDENTIFICADOR {$$ = AST_ident_literal(IKS_AST_IDENTIFICADOR,$1);}
| TK_IDENTIFICADOR '[' expressao ']' {$$ = AST_ident_exp(IKS_AST_VETOR_INDEXADO,$1,$3);}
| TK_LIT_INT {$$ = AST_ident_literal(IKS_AST_LITERAL,$1);}
| TK_LIT_FLOAT {$$ = AST_ident_literal(IKS_AST_LITERAL,$1);}
| TK_LIT_FALSE {$$ = AST_ident_literal(IKS_AST_LITERAL,$1);}
| TK_LIT_TRUE {$$ = AST_ident_literal(IKS_AST_LITERAL,$1);}
| TK_LIT_CHAR {$$ = AST_ident_literal(IKS_AST_LITERAL,$1);}
| TK_LIT_STRING {$$ = AST_ident_literal(IKS_AST_LITERAL,$1);}
| expressao '+' expressao {$$ = create_node(IKS_AST_ARIM_SOMA); $$ = AST_expression($$,$1,$3); }
| expressao '-' expressao {$$ = create_node(IKS_AST_ARIM_SUBTRACAO); $$ = AST_expression($$,$1,$3); }
| expressao '*' expressao {$$ = create_node(IKS_AST_ARIM_MULTIPLICACAO); $$ = AST_expression($$,$1,$3); }
| expressao '/' expressao {$$ = create_node(IKS_AST_ARIM_DIVISAO); $$ = AST_expression($$,$1,$3); }
| expressao '<' expressao {$$ = create_node(IKS_AST_LOGICO_COMP_L); $$ = AST_expression($$,$1,$3); }
| expressao '>' expressao {$$ = create_node(IKS_AST_LOGICO_COMP_G); $$ = AST_expression($$,$1,$3); }
| '+' expressao {$$ = $2;}
| '-' expressao {$$ = create_node(IKS_AST_ARIM_INVERSAO); $$ = AST_expression($$,$2,NULL);}
| '(' expressao ')' {$$ = $2;}
| expressao TK_OC_LE expressao {$$ = create_node(IKS_AST_LOGICO_COMP_LE); $$ = AST_expression($$,$1,$3);}
| expressao TK_OC_GE expressao {$$ = create_node(IKS_AST_LOGICO_COMP_GE); $$ = AST_expression($$,$1,$3);}
| expressao TK_OC_EQ expressao {$$ = create_node(IKS_AST_LOGICO_COMP_IGUAL); $$ = AST_expression($$,$1,$3);}
| expressao TK_OC_NE expressao {$$ = create_node(IKS_AST_LOGICO_COMP_DIF); $$ = AST_expression($$,$1,$3);}
| expressao TK_OC_AND expressao {$$ = create_node(IKS_AST_LOGICO_E); $$ = AST_expression($$,$1,$3);}
| expressao TK_OC_OR expressao {$$ = create_node(IKS_AST_LOGICO_OU); $$ = AST_expression($$,$1,$3);}
| '!' expressao {$$ = create_node(IKS_AST_LOGICO_COMP_NEGACAO); $$ = AST_expression($$,$2,NULL);}
| chamada_funcao {$$ = $1;}
;
lista_expressoes: lista_expressoes_nao_vazia {$$ = $1;}
| {$$ = NULL;}
;
%%
error(char *s, int line){
printf("Erro na linha %d: %s", line,s);
}
Log:
[ 10%] [BISON][parser] Building parser with bison 3.0.2
parser.y: warning: 12 shift/reduce conflicts [-Wconflicts-sr]
[ 20%] [FLEX][scanner] Building scanner with flex 2.5.35
Scanning dependencies of target main
[ 30%] Building C object CMakeFiles/main.dir/scanner.c.o
scanner.l:11:1: warning: data definition has no type or storage class [enabled by default]
scanner.l: In function ‘yylex’:
scanner.l:84:16: warning: assignment makes pointer from integer without a cast [enabled by default]
scanner.l:85:16: warning: assignment makes pointer from integer without a cast [enabled by default]
scanner.l:87:16: warning: assignment makes pointer from integer without a cast [enabled by default]
scanner.l:89:16: warning: assignment makes pointer from integer without a cast [enabled by default]
scanner.l:91:16: warning: assignment makes pointer from integer without a cast [enabled by default]
scanner.l:93:16: warning: assignment makes pointer from integer without a cast [enabled by default]
scanner.l:95:17: warning: assignment makes pointer from integer without a cast [enabled by default]
scanner.l: In function ‘install_id’:
scanner.l:145:4: warning: return makes integer from pointer without a cast [enabled by default]
[ 40%] Building C object CMakeFiles/main.dir/parser.c.o
[ 50%] Building C object CMakeFiles/main.dir/src/main.c.o
[ 60%] Building C object CMakeFiles/main.dir/src/comp_tree.c.o
Linking C executable main
[100%] Built target main
The shift-reduce conflicts are all the result of a single production and have nothing to do with the semantic actions.
The production is:
expressao : '!' expressao ;
And the problem is that ! does not appear in the precedence list.
Also, your grammar probably doesn't work the way you expect it to, because you don't have a specific precedence declaration to distinguish the unary + and - operators from their binary versions. As a result -a*b will parse as -(a*b) rather than (-a)*b. Of course, for integer arithmetic, these are the same, but it would be cleaner to get the syntax tree correct. You could fix both of those problems at once by adding
%right '!'
after all the %left declarations, and then adding %prec '!' to the end of the unary + and - productions.
I don't know why the problem manifested when you added semantic actions. Perhaps you also added the production for !.
Exploring parsing libraries in Haskell I came across this project: haskell-parser-examples. Running some examples I found a problem with the operator precedence. It works fine when using Parsec:
$ echo "3*2+1" | dist/build/lambda-parsec/lambda-parsec
Op Add (Op Mul (Num 3) (Num 2)) (Num 1)
Num 7
But not with Happy/Alex:
$ echo "3*2+1" | dist/build/lambda-happy-alex/lambda-happy-alex
Op Mul (Num 3) (Op Add (Num 2) (Num 1))
Num 9
Even though the operator precedence seems well-defined. Excerpt from the parser:
%left '+' '-'
%left '*' '/'
%%
Exprs : Expr { $1 }
| Exprs Expr { App $1 $2 }
Expr : Exprs { $1 }
| let var '=' Expr in Expr end { App (Abs $2 $6) $4 }
| '\\' var '->' Expr { Abs $2 $4 }
| Expr op Expr { Op (opEnc $2) $1 $3 }
| '(' Expr ')' { $2 }
| int { Num $1 }
Any hint? (I opened a bug report some time ago, but no response).
[Using gch 7.6.3, alex 3.1.3, happy 1.19.4]
This appears to be a bug in haskell-parser-examples' usage of token precedence. Happy's operator precedence only affects the rules that use the tokens directly. In the parser we want to apply precedence to the Expr rule, but the only applicable rule,
| Expr op Expr { Op (opEnc $2) $1 $3 }
doesn't use tokens itself, instead relying on opEnc to expand them. If opEnc is inlined into Expr,
| Expr '*' Expr { Op Mul $1 $3 }
| Expr '+' Expr { Op Add $1 $3 }
| Expr '-' Expr { Op Sub $1 $3 }
it should work properly.