Making YACC output an AST (token tree) - parsing

Is it possible to make YACC (or I'm my case MPPG) output an Abstract Syntax Tree (AST).
All the stuff I'm reading suggests its simple to make YACC do this, but I'm struggling to see how you know when to move up a node in the tree as your building it.

Expanding on Hao's point and from the manual, you want to do something like the following:
Assuming you have your abstract syntax tree with function node which creates an object in the tree:
expr : expr '+' expr
{
$$ = node( '+', $1, $3 );
}
This code translates to "When parsing expression with a plus, take the left and right descendants $1/$3 and use them as arguments to node. Save the output to $$ (the return value) of the expression.
$$ (from the manual):
To return a value, the action normally
sets the pseudovariable ``$$'' to some
value.

Have you looked at the manual (search for "parse tree" to find the spot)? It suggests putting node creation in an action with your left and right descendants being $1 and $3, or whatever they may be. In this case, yacc would be moving up the tree on your behalf rather than your doing it manually.

The other answers propose to modify the grammar, this isn't doable when playing with a C++ grammer (several hundred of rules..)
Fortunately, we can do it automatically, by redefining the debug macros.
In this code, we are redefining YY_SYMBOL_PRINT actived with YYDEBUG :
%{
typedef struct tree_t {
struct tree_t **links;
int nb_links;
char* type; // the grammar rule
};
#define YYDEBUG 1
//int yydebug = 1;
tree_t *_C_treeRoot;
%}
%union tree_t
%start program
%token IDENTIFIER
%token CONSTANT
%left '+' '-'
%left '*' '/'
%right '^'
%%
progam: exprs { _C_treeRoot = &$1.t; }
|
| hack
;
exprs:
expr ';'
| exprs expr ';'
;
number:
IDENTIFIER
| '-' IDENTIFIER
| CONSTANT
| '-' CONSTANT
;
expr:
number
| '(' expr ')'
| expr '+' expr
| expr '-' expr
| expr '*' expr
| expr '/' expr
| expr '^' expr
;
hack:
{
// called at each reduction in YYDEBUG mode
#undef YY_SYMBOL_PRINT
#define YY_SYMBOL_PRINT(A,B,C,D) \
do { \
int n = yyr2[yyn]; \
int i; \
yyval.t.nb_links = n; \
yyval.t.links = malloc(sizeof *yyval.t.links * yyval.t.nb_links);\
yyval.t.str = NULL; \
yyval.t.type = yytname[yyr1[yyn]]; \
for (i = 0; i < n; i++) { \
yyval.t.links[i] = malloc(sizeof (YYSTYPE)); \
memcpy(yyval.t.links[i], &yyvsp[(i + 1) - n], sizeof(YYSTYPE)); \
} \
} while (0)
}
;
%%
#include "lexer.c"
int yyerror(char *s) {
printf("ERROR : %s [ligne %d]\n",s, num_ligne);
return 0;
}
int doParse(char *buffer)
{
mon_yybuffer = buffer;
tmp_buffer_ptr = buffer;
tree_t *_C_treeRoot = NULL;
num_ligne = 1;
mon_yyptr = 0;
int ret = !yyparse();
/////////****
here access and print the tree from _C_treeRoot
***///////////
}
char *tokenStrings[300] = {NULL};
char *charTokenStrings[512];
void initYaccTokenStrings()
{
int k;
for (k = 0; k < 256; k++)
{
charTokenStrings[2*k] = (char)k;
charTokenStrings[2*k+1] = 0;
tokenStrings[k] = &charTokenStrings[2*k];
}
tokenStrings[CONSTANT] = "CONSTANT";
tokenStrings[IDENTIFIER] = "IDENTIFIER";
extern char space_string[256];
for (k = 0; k < 256; k++)
{
space_string[k] = ' ';
}
}
the leafs are created just before the RETURN in the FLEX lexer

Related

FLEX/YACC program not behaving as expected : can't grab int value from sequence of ints

I am trying to build a parser that takes a list of strings in the following format and performs either an addition or multiplication of all of its elements :
prod 5-6_
sum _
sum 5_
sum 5-6-7_
$
Should print the following to the screen :
prod = 30
sum = 0
sum = 5
sum = 18
What I am actually getting as output is this :
prod = 0
sum = 0
sum = 5
sum = 5
My lex file looks like this :
%{
#include <iostream>
#include "y.tab.h"
using namespace std;
extern "C" int yylex();
%}
%option yylineno
digit [0-9]
integer {digit}+
operator "sum"|"prod"
%%
{integer} { return number; }
{operator} { return oper; }
"-" { return '-'; }
"_" { return '_'; }
"$" { return '$'; }
\n { ; }
[\t ]+ { ; }
. { cout << "unknown char" << endl; }
%%
and my yacc file looks like this :
%token oper
%token number
%token '-'
%token '_'
%token '$'
%start valid
%{
#include <iostream>
#include <string>
#include <cstdio>
#include <cstdlib>
using namespace std;
#define YYSTYPE int
extern FILE *yyin;
extern char yytext[];
extern "C" int yylex();
int yyparse();
extern int yyerror(char *);
char op;
%}
%%
valid : expr_seq endfile {}
| {}
;
expr_seq : expr {}
| expr_seq expr {}
;
expr : op sequence nl {if (op == '+') cout << "sum = " ; else cout << "prod = ";}
| op nl {if (op == '+') cout << "sum = 0"; else cout <<"prod = 1";}
;
op : oper { if (yytext[0] == 's') op = '+'; else op = '*';}
;
sequence : number { $$ = atoi(yytext);}
| sequence '-' number { if (op == '+') $$ = $1 + $3; else $$ = $1 * $3;}
;
nl : '_' { cout << endl;}
;
endfile : '$' {}
;
%%
int main(int argc, char *argv[])
{
++argv, --argc;
if(argc > 0) yyin = fopen(argv[0], "r");
else yyin = stdin;
yyparse();
return 0;
}
int yyerror(char * msg)
{
extern int yylineno;
cerr << msg << "on line # " << yylineno << endl;
return 0;
}
My reasoning for the yacc logic is as follows :
a file is valid only if it contains a sequence of expressions followed by the endfile symbol.
a sequence of expressions is a single expression or several expressions.
an expression is either an operator followed by a new line, OR an operator, followed by a list of numbers, followed by a new line symbol.
an operator is either 'sum' or 'prod'
a list of numbers is either a number or several numbers separated by the '-' symbol.
From my perspective this should work, but for some reason it doesn't interpret the sequence of numbers properly after the first element. Any tips would be helpful.
Thanks
You must not use yytext in your yacc actions. yytext is only valid during a scanner action, and the parser often reads ahead to the next token. (In fact, yacc always reads the next token. Bison sometimes doesn't, but it's not always easily predictable.)
You can associate a semantic value with every token (and non-terminal), and you can reference these semantic values using $1, $2, etc. in your yacc actions. You can even associate semantic values of different types to different grammar symbols. And if you use bison -- and you probably are using bison -- you can give grammar symbols names to make it easier to refer to their semantic values.
This is all explained in depth, with examples, in the bison manual.
The solution that worked was simply to change the following lines :
sequence : number { $$ = atoi(yytext);}
| sequence '-' number { if (op == '+') $$ = $1 + $3; else $$ = $1 * $3;}
;
to this :
sequence : number { $$ = atoi(yytext);}
| sequence '-' number { if (op == '+') $$ = $1 + atoi(yytext); else $$ = $1 * atoi(yytext);}
;

Parser to verify declarations of type int and float in C language

I'm trying to write a parser to verify the following declarations of type int and float in C language.
variables declarations, pointer variable declarations, array of any dimensions
float a , b , r = 5, area = r * r , * b;
int a , b , c , ** p ;
int x , mat [2][3];
This is my lex file
%{
#include "y.tab.h"
extern int yylval;
%}
%%
"int" return INT;
"float" return FLOAT;
[0-9]+ return NUM;
[_|a-z|A-Z]([_|a-z|A-Z|0-9])*{1,255} return NAME;
[+\-*/] return op;
[ \t\n];
. return yytext[0];
%%
This is my yacc file
%{
#include<stdio.h>
int yylex() ;
int yyerror();
%}
%token NUM NAME op INT FLOAT
%%
stmt_list: stmt | stmt_list stmt;
stmt: type id_list ';' { printf("Valid Declaration\n"); };
type: INT | FLOAT;
id_list: id ',' id_list | id ;
id: NAME'='expr | expr;
expr: expr op expr | POINT expr | expr MATRIX | '(' expr')' | NAME;
MATRIX: '[' NUM ']' | '[' NUM ']' MATRIX ;
POINT: '*' | '*'POINT;
%%
int main(){
yyparse();
return 0;
}
int yyerror(){
printf("Invalid Declaration\n");
return -1;
}
Even if I enter "int a;" as input, I get "Invalid Declaration". I'm not able to figure out what I'm doing wrong.

Conflicts in Parser for Propositional logic with IF-THEN-ELSE ternary operator

I want to implement the Parser for proposition logic which has the following operators in decreasing order of precedence:
NOT p
p AND q
p OR q
IF p THEN q
p IFF q
IF p THEN q ELSE r
The main issue is with the IF-THEN-ELSE operator. Without it, I am able to write the grammar properly. Presently my yacc file looks like
%term
PARSEPROG | AND | NOT | OR | IF | THEN | ELSE | IFF | LPAREN | RPAREN | ATOM of string | SEMICOLON | EOF
%nonterm
start of Absyn.program | EXP of Absyn.declaration
%start start
%eop EOF SEMICOLON
%pos int
%verbose
%right ELSE
%right IFF
%right THEN
%left AND OR
%left NOT
%name Fol
%noshift EOF
%%
start : PARSEPROG EXP (Absyn.PROGRAM(EXP))
EXP: ATOM ( Absyn.LITERAL(ATOM) )
| LPAREN EXP RPAREN (EXP)
| EXP AND EXP ( Absyn.CONJ(EXP1, EXP2) )
| EXP OR EXP ( Absyn.DISJ(EXP1, EXP2) )
| IF EXP THEN EXP ELSE EXP ( Absyn.IFTHENELSE(EXP1, EXP2, EXP3) )
| IF EXP THEN EXP ( Absyn.IMPLI(EXP1, EXP2) )
| EXP IFF EXP ( Absyn.BIIMPLI(EXP1, EXP2) )
| NOT EXP ( Absyn.NEGATION(EXP) )
But I don't seem to get the correct idea how to eliminate reduce-shift conflicts. Some examples of correct parsing are:
IF a THEN IF b THEN c________a->(b->c)
IF a THEN IF b THEN c ELSE d IFF e OR f_______IFTHENELSE(a,b->c,d<=>e/\f)
Any help/pointers will be really helpful. Thanks.
Making my Yacc sit up and beg
I'm more convinced than ever that the correct approach here is a GLR grammar, if at all possible. However, inspired by #Kaz, I produced the following yacc/bison grammar with an LALR(1) grammar (not even using precedence declarations).
Of course, it cheats, since the problem cannot be solved with an LALR(1) grammar. At appropriate intervals, it walks the constructed tree of IF THEN and IF THEN ELSE expressions, and moves the ELSE clauses as required.
Nodes which need to be re-examined for possible motion are given the AST nodetype IFSEQ and the ELSE clauses are attached with the traditional tightest match grammar, using a classic matched-if/unmatched-if grammar. A fully-matched IF THEN ELSE clause does not need to be rearranged; the tree rewrite will apply to the expression associated with the first ELSE whose right-hand operand is unmatched (if there is one). Keeping the fully-matched prefix of an IF expression separate from the tail which needs to be rearranged required almost-duplicating some rules; the almost-duplicated rules differ in that their actions directly produce TERNARY nodes instead if IFSEQ nodes.
In order to correctly answer the question, it would also be necessary to rearrange some IFF nodes, since the IFF binds more weakly than the THEN clause and more tightly than the ELSE clause. I think this means:
IF p THEN q IFF IF r THEN s ==> ((p → q) ↔ (r → s))
IF p THEN q IFF r ELSE s IFF t ==> (p ? (q ↔ r) : (s ↔ t))
IF p THEN q IFF IF r THEN s ELSE t IFF u ==> (p ? (q ↔ (r → s)) : (t ↔ u))
although I'm not sure that is what is being asked for (particularly the last one) and I really don't think it's a good idea. In the grammar below, if you want IFF to apply to an IF p THEN q subexpression, you will have to use parentheses; IF p THEN q IFF r produces p → (q ↔ r) and p IFF IF q THEN r is a syntax error.
Frankly, I think this whole thing would be easier using arrows for conditionals and biconditionals (as in the glosses above), and using IF THEN ELSE only for ternary selector expressions (written above with C-style ? : syntax, which is another possibility). That will generate far fewer surprises. But it's not my language.
One solution for the biconditional operator with floating precedence would be to parse in two passes. The first pass would only identify the IF p THEN q operators without an attached ELSE, using a mechanism similar to the one proposed here, and change them to p -> q by deleting the IF and changing the spelling of THEN. Other operators would not be parsed and parentheses would be retained. It would then feed to resulting token stream into a second LALR parser with a more traditional grammar style. I might get around to coding that only because I think that two-pass bison parsers are occasionally useful and there are few examples floating around.
Here's the tree-rewriting parser. I apologise for the length:
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void yyerror(const char* msg);
int yylex(void);
typedef struct Node Node;
enum AstType { ATOM, NEG, CONJ, DISJ, IMPL, BICOND, TERNARY,
IFSEQ
};
struct Node {
enum AstType type;
union {
const char* atom;
Node* child[3];
};
};
Node* node(enum AstType type, Node* op1, Node* op2, Node* op3);
Node* atom(const char* name);
void node_free(Node*);
void node_print(Node*, FILE*);
typedef struct ElseStack ElseStack;
struct ElseStack {
Node* action;
ElseStack* next;
};
ElseStack* build_else_stack(Node*, ElseStack*);
ElseStack* shift_elses(Node*, ElseStack*);
%}
%union {
const char* name;
struct Node* node;
}
%token <name> T_ID
%token T_AND "and"
T_ELSE "else"
T_IF "if"
T_IFF "iff"
T_NOT "not"
T_OR "or"
T_THEN "then"
%type <node> term conj disj bicond cond mat unmat tail expr
%%
prog : %empty | prog stmt;
stmt : expr '\n' { node_print($1, stdout); putchar('\n'); node_free($1); }
| '\n'
| error '\n'
term : T_ID { $$ = atom($1); }
| "not" term { $$ = node(NEG, $2, NULL, NULL); }
| '(' expr ')' { $$ = $2; }
conj : term
| conj "and" term { $$ = node(CONJ, $1, $3, NULL); }
disj : conj
| disj "or" conj { $$ = node(DISJ, $1, $3, NULL); }
bicond: disj
| disj "iff" bicond { $$ = node(BICOND, $1, $3, NULL); }
mat : bicond
| "if" expr "then" mat "else" mat
{ $$ = node(IFSEQ, $2, $4, $6); }
unmat: "if" expr "then" mat
{ $$ = node(IFSEQ, $2, $4, NULL); }
| "if" expr "then" unmat
{ $$ = node(IFSEQ, $2, $4, NULL); }
| "if" expr "then" mat "else" unmat
{ $$ = node(IFSEQ, $2, $4, $6); }
tail : "if" expr "then" mat
{ $$ = node(IFSEQ, $2, $4, NULL); }
| "if" expr "then" unmat
{ $$ = node(IFSEQ, $2, $4, NULL); }
cond : bicond
| tail { shift_elses($$, build_else_stack($$, NULL)); }
| "if" expr "then" mat "else" cond
{ $$ = node(TERNARY, $2, $4, $6); }
expr : cond
%%
/* Walk the IFSEQ nodes in the tree, pushing any
* else clause found onto the else stack, which it
* returns.
*/
ElseStack* build_else_stack(Node* ifs, ElseStack* stack) {
if (ifs && ifs->type != IFSEQ) {
stack = build_else_stack(ifs->child[1], stack);
if (ifs->child[2]) {
ElseStack* top = malloc(sizeof *top);
*top = (ElseStack) { ifs->child[2], stack };
stack = build_else_stack(ifs->child[2], top);
}
}
return stack;
}
/* Walk the IFSEQ nodes in the tree, attaching elses from
* the else stack.
* Pops the else stack as it goes, freeing popped
* objects, and returns the new top of the stack.
*/
ElseStack* shift_elses(Node* n, ElseStack* stack) {
if (n && n->type == IFSEQ) {
if (stack) {
ElseStack* top = stack;
stack = shift_elses(n->child[2],
shift_elses(n->child[1], stack->next));
n->type = TERNARY;
n->child[2] = top;
free(top);
}
else {
shift_elses(n->child[2],
shift_elses(n->child[1], NULL));
n->type = IMPL;
n->child[2] = NULL;
}
}
return stack;
}
Node* node(enum AstType type, Node* op1, Node* op2, Node* op3) {
Node* rv = malloc(sizeof *rv);
*rv = (Node){type, .child = {op1, op2, op3}};
return rv;
}
Node* atom(const char* name) {
Node* rv = malloc(sizeof *rv);
*rv = (Node){ATOM, .atom = name};
return rv;
}
void node_free(Node* n) {
if (n) {
if (n->type == ATOM) free((char*)n->atom);
else for (int i = 0; i < 3; ++i) node_free(n->child[i]);
free(n);
}
}
const char* typename(enum AstType type) {
switch (type) {
case ATOM: return "ATOM";
case NEG: return "NOT" ;
case CONJ: return "CONJ";
case DISJ: return "DISJ";
case IMPL: return "IMPL";
case BICOND: return "BICOND";
case TERNARY: return "TERNARY" ;
case IFSEQ: return "IF_SEQ";
}
return "**BAD NODE TYPE**";
}
void node_print(Node* n, FILE* out) {
if (n) {
if (n->type == ATOM)
fputs(n->atom, out);
else {
fprintf(out, "(%s", typename(n->type));
for (int i = 0; i < 3 && n->child[i]; ++i) {
fputc(' ', out); node_print(n->child[i], out);
}
fputc(')', out);
}
}
}
void yyerror(const char* msg) {
fprintf(stderr, "%s\n", msg);
}
int main(int argc, char** argv) {
return yyparse();
}
The lexer is almost trivial. (This one uses lower-case keywords because my fingers prefer that, but it's trivial to change.)
%{
#include "ifelse.tab.h"
%}
%option noinput nounput noyywrap nodefault
%%
and { return T_AND; }
else { return T_ELSE; }
if { return T_IF; }
iff { return T_IFF; }
not { return T_NOT; }
or { return T_OR; }
then { return T_THEN; }
[[:alpha:]]+ { yylval.name = strdup(yytext);
return T_ID; }
([[:space:]]{-}[\n])+ ;
\n { return '\n'; }
. { return *yytext;}
As written, the parser/lexer reads a line at a time, and prints the AST for each line (so multiline expressions aren't allowed). I hope it's clear how to change it.
A relatively easy way to deal with this requirement is to create a grammar which over-generates, and then reject the syntax we don't want using semantics.
Concretely, we use a grammar like this:
expr : expr AND expr
| expr OR expr
| expr IFF expr
| IF expr THEN expr
| expr ELSE expr /* generates some sentences we don't want! */
| '(' expr ')'
| ATOM
;
Note that ELSE is just an ordinary low precedence operator: any expression can be followed by ELSE and another expression. But in the semantic rule, we implement a check that the left side of ELSE is an IF expression. If not, then we raise an error.
This approach is not only easy to implement, but easy to document for the end-users and consequently easy to understand and use. The end user can accept the simple theory that ELSE is just another binary operator with a very low precedence, along with a rule which rejects it when it's not combined with IF/THEN.
Here is a test run from a complete program I wrote (using classic Yacc, in C):
$ echo 'a AND b OR c' | ./ifelse
OR(AND(a, b), c)
$ echo 'a OR b AND c' | ./ifelse
OR(a, AND(b, c))
$ echo 'IF a THEN b' | ./ifelse
IF(a, b)
Ordinary single IF/ELSE does what we want:
$ echo 'IF a THEN b ELSE c' | ./ifelse
IFELSE(a, b, c)
The key thing that you're after:
$ echo 'IF a THEN IF x THEN y ELSE c' | ./ifelse
IFELSE(a, IF(x, y), c)
correctly, the ELSE goes with the outer IF. Here is the error case with bad ELSE:
$ echo 'a OR b ELSE c' | ./ifelse
error: ELSE must pair with IF
<invalid>
Here is parentheses to force the usual "else with closest if" behavior:
$ echo 'IF a THEN (IF x THEN y ELSE c)' | ./ifelse
IF(a, IFELSE(x, y, c))
The program shows what parse it is using by building an AST and then walking it to print it in prefix F(X, Y) syntax. (For which as a Lisp programmer, I had to hold back the gagging reflex a little bit).
The AST structure is also what allows the ELSE rule to detect whether its left argument is an expression of the correct kind.
Note: You might want the following to be handled, but it isn't:
$ echo 'IF a THEN IF x THEN y ELSE z ELSE w' | ./ifelse
error: ELSE must pair with IF
<invalid>
The issue here is that the ELSE w is being paired with an IFELSE expression.
A more sophisticated approach is possible that might be interesting to explore. The parser can treat ELSE as an ordinary binary operator and generate the AST that way. Then a whole separate walk can check the tree for valid ELSE usage and transform it as necessary. Or perhaps we can play here with the associativity of ELSE and treat cascading ELSE in the parser action in some suitable way.
The complete source code, which I saved in a file called ifelse.y and built using:
$ yacc ifelse.y
$ gcc -o ifelse y.tab.c
is here:
%{
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <ctype.h>
typedef struct astnode {
int op;
struct astnode *left, *right;
char *lexeme;
} astnode;
void yyerror(const char *s)
{
fprintf(stderr, "error: %s\n", s);
}
void *xmalloc(size_t size)
{
void *p = malloc(size);
if (p)
return p;
yyerror("out of memory");
abort();
}
char *xstrdup(char *in)
{
size_t sz = strlen(in) + 1;
char *out = xmalloc(sz);
return strcpy(out, in);
}
astnode *astnode_cons(int op, astnode *left, astnode *right, char *lexeme)
{
astnode *a = xmalloc(sizeof *a);
a->op = op;
a->left = left;
a->right = right;
a->lexeme = lexeme;
return a;
}
int yylex(void);
astnode *ast;
%}
%union {
astnode *node;
char *lexeme;
int none;
}
%token<none> '(' ')'
%token<lexeme> ATOM
%left<none> ELSE
%left<none> IF THEN
%right<none> IFF
%left<none> OR
%left<none> AND
%type<node> top expr
%%
top : expr { ast = $1; }
expr : expr AND expr
{ $$ = astnode_cons(AND, $1, $3, 0); }
| expr OR expr
{ $$ = astnode_cons(OR, $1, $3, 0); }
| expr IFF expr
{ $$ = astnode_cons(IFF, $1, $3, 0); }
| IF expr THEN expr
{ $$ = astnode_cons(IF, $2, $4, 0); }
| expr ELSE expr
{ if ($1->op != IF)
{ yyerror("ELSE must pair with IF");
$$ = 0; }
else
{ $$ = astnode_cons(ELSE, $1, $3, 0); } }
| '(' expr ')'
{ $$ = $2; }
| ATOM
{ $$ = astnode_cons(ATOM, 0, 0, $1); }
;
%%
int yylex(void)
{
int ch;
char tok[64], *te = tok + sizeof(tok), *tp = tok;
while ((ch = getchar()) != EOF) {
if (isalnum((unsigned char) ch)) {
if (tp >= te - 1)
yyerror("token overflow");
*tp++ = ch;
} else if (isspace(ch)) {
if (tp > tok)
break;
} else if (ch == '(' || ch == ')') {
if (tp == tok)
return ch;
ungetc(ch, stdin);
break;
} else {
yyerror("invalid character");
}
}
if (tp > tok) {
yylval.none = 0;
*tp++ = 0;
if (strcmp(tok, "AND") == 0)
return AND;
if (strcmp(tok, "OR") == 0)
return OR;
if (strcmp(tok, "IFF") == 0)
return IFF;
if (strcmp(tok, "IF") == 0)
return IF;
if (strcmp(tok, "THEN") == 0)
return THEN;
if (strcmp(tok, "ELSE") == 0)
return ELSE;
yylval.lexeme = xstrdup(tok);
return ATOM;
}
return 0;
}
void ast_print(astnode *a)
{
if (a == 0) {
fputs("<invalid>", stdout);
return;
}
switch (a->op) {
case ATOM:
fputs(a->lexeme, stdout);
break;
case AND:
case OR:
case IF:
case IFF:
switch (a->op) {
case AND:
fputs("AND(", stdout);
break;
case OR:
fputs("OR(", stdout);
break;
case IF:
fputs("IF(", stdout);
break;
case IFF:
fputs("IFF(", stdout);
break;
}
ast_print(a->left);
fputs(", ", stdout);
ast_print(a->right);
putc(')', stdout);
break;
case ELSE:
fputs("IFELSE(", stdout);
ast_print(a->left->left);
fputs(", ", stdout);
ast_print(a->left->right);
fputs(", ", stdout);
ast_print(a->right);
putc(')', stdout);
break;
}
}
int main(void)
{
yyparse();
ast_print(ast);
puts("");
return 0;
}

Bison syntax error

I recently started learning bison and I already hit a wall. The manual sections are a little bit ambiguous, so I guess an error was to be expected. The code below is the first tutorial from the official manual - The Reverse Polish Notation Calculator, saved in a single file - rpcalc.y.
/* Reverse polish notation calculator */
%{
#include <stdio.h>
#include <math.h>
#include <ctype.h>
int yylex (void);
void yyerror (char const *);
%}
%define api.value.type {double}
%token NUM
%% /* Grammar rules and actions follow. */
input:
%empty
| input line
;
line:
'\n'
| exp '\n' {printf ("%.10g\n", $1);}
;
exp:
NUM {$$ = $1; }
| exp exp '+' {$$ = $1 + $2; }
| exp exp '-' {$$ = $1 - $2; }
| exp exp '*' {$$ = $1 * $2; }
| exp exp '/' {$$ = $1 / $2; }
| exp exp '^' {$$ = pow ($1, $2); }
| exp 'n' {$$ = -$1; }
;
%%
/* The lexical analyzer */
int yylex (void)
{
int c;
/* Skip white space */
while((c = getchar()) == ' ' || c == '\t')
continue;
/* Process numbers */
if(c == '.' || isdigit (c))
{
ungetc (c, stdin);
scanf ("%lf", $yylval);
return NUM;
}
/* Return end-of-imput */
if (c == EOF)
return 0;
/* Return a single char */
return c;
}
int main (void)
{
return yyparse ();
}
void yyerror (char const *s)
{
fprintf (stderr, "%s\n", s);
}
Executing bison rpcalc.y in cmd returns the following error:
rpcalc.y:11.24-31: syntax error, unexpected {...}
What seems to be the problem?
The fault is caused by you using features that are new to the 3.0 version of bison, whereas you have an older version of bison installed. If you are unable to upgrade to version 3.0, it is an easy change to convert the grammar to using the features of earlier versions of bison.
The %define api.value.type {double} can be changed to a %type command, and the %empty command removed. The resulting bison program would be:
/* Reverse polish notation calculator */
%{
#include <stdio.h>
#include <math.h>
#include <ctype.h>
int yylex (void);
void yyerror (char const *);
%}
%type <double> exp
%token <double> NUM
%% /* Grammar rules and actions follow. */
input:
| input line
;
line:
'\n'
| exp '\n' {printf ("%.10g\n", $1);}
;
exp:
NUM {$$ = $1; }
| exp exp '+' {$$ = $1 + $2; }
| exp exp '-' {$$ = $1 - $2; }
| exp exp '*' {$$ = $1 * $2; }
| exp exp '/' {$$ = $1 / $2; }
| exp exp '^' {$$ = pow ($1, $2); }
| exp 'n' {$$ = -$1; }
;
%%
/* The lexical analyzer */
int yylex (void)
{
int c;
/* Skip white space */
while((c = getchar()) == ' ' || c == '\t')
continue;
/* Process numbers */
if(c == '.' || isdigit (c))
{
ungetc (c, stdin);
scanf ("%lf", $yylval);
return NUM;
}
/* Return end-of-imput */
if (c == EOF)
return 0;
/* Return a single char */
return c;
}
int main (void)
{
return yyparse ();
}
void yyerror (char const *s)
{
fprintf (stderr, "%s\n", s);
}
This runs in a wider range of bison versions.

Error while retrieving value from symbol table

I'm trying to implement a simple calculator using Flex and Bison. I'm running into problems in the Bison stage, wherein I can't figure out the way in which the value of the variable can be retrieved from the symbol table and assigned to $$.
The lex file:
%{
#include <iostream>
#include <string.h>
#include "calc.tab.h"
using namespace std;
void Print();
int count = 0;
%}
%%
[ \t\n]+ ;
"print" {Print();}
"exit" {
exit(EXIT_SUCCESS);
}
[0-9]+ {
yylval.FLOAT = atof(yytext);
return (NUMBER);
count++;
}
[a-z][_a-zA-Z0-9]* {
yylval.NAME = yytext;
return (ID);
}
. {
return (int)yytext[0];
}
%%
void Print()
{
cout << "Printing ST..\n";
}
int yywrap()
{
return 0;
}
The Bison file:
%{
#include <iostream>
#include <string.h>
#include "table.h"
extern int count;
int yylex();
int yyerror(const char *);
int UpdateSymTable(float, char *, float);
using namespace std;
%}
%union
{
float FLOAT;
char *NAME;
}
%token NUMBER
%token ID
%type <FLOAT> NUMBER
%type <NAME> ID
%type <FLOAT> expr
%type <FLOAT> E
%left '*'
%left '/'
%left '+'
%left '-'
%right '='
%%
E: expr {cout << $$ << "\n";}
expr: NUMBER {$$ = $1;}
| expr '+' expr {$$ = $1 + $3;}
| expr '-' expr {$$ = $1 - $3;}
| expr '*' expr {$$ = $1 * $3;}
| expr '/' expr {$$ = $1 / $3;}
| ID '=' expr {
int index = UpdateSymTable($$, $1, $3);
$$ = st[index].number = $3; //The problem is here
}
%%
int yyerror(const char *msg)
{
cout << "Error: "<<msg<<"\n";
}
int UpdateSymTable(float doll_doll, char *doll_one, float doll_three)
{
int number1 = -1;
for(int i=0;i<count;i++)
{
if(!strcmp(doll_one, st[i].name) == 0)
{
strcpy(st[i].name, doll_one);
st[i].number = doll_three;
number1 = i;
}
else if(strcmp(doll_one, st[i].name) == 0)
{
number1 = i;
}
}
return number1;
}
int main()
{
yyparse();
}
The symbol table:
struct st
{
float number;
char name[25];
}st[25];
The output I'm getting is:
a = 20
c = a+3
20
Error: syntax error
I would really appreciate it if someone told me what is going wrong. I'm trying since a long time, and I haven't been able to resolve the error.
The syntax error is the result of your grammar only accepting a single expr rather than a sequence of exprs. See, for example, this question.
One of the problems with your symbol table lookup is that you incorrectly return the value yytext as your semantic value, instead of making a copy. See, for example, this question.
However, your UpdateSymTable functions has quite a few problems, starting with the fact that the names you chose for parameters are meaningless, and furthermore the first parameter ("doll_doll") is never used. I don't know what you intended to test with !strcmp(doll_one, st[i].name) == 0 but whatever it was, there must be a simpler way of expressing it. In any case, the logic is incorrect. I'd suggest writing some simple test programs (without bison and flex) to let you debug the symbol table handling. And/or talk to your lab advisor, assuming you have one.
Finally, (of what I noticed) your precedence relations are not correct. First, they are reversed: the operator which binds least tightly (assignment) should come first. Second, it is not the case that + has precedence over - , or vice versa; the two operators have the same precedence. Similarly with * and /. You could try reading the precedence chapter of the bison manual if you don't have lecture notes or other information.

Resources