flex and bison issue for real numbers - flex-lexer

I have been using flex and bison for making a small calculator. My files are the following:
bisonFile.y
%{
#include <stdio.h>
%}
/* declare tokens */
%token NUMBER
%token ADD SUB MUL DIV ABS
%token EOL
%%
calclist: /* nothing */
| calclist exp EOL { printf("= %d\n", $2); }
;
exp: factor
| exp ADD factor { $$ = $1 + $3; }
| exp SUB factor { $$ = $1 - $3; }
;
factor: term
| factor MUL term { $$ = $1 * $3; }
| factor DIV term { $$ = $1 / $3; }
;
term: NUMBER
| ABS term { $$ = $2 >= 0? $2 : - $2; }
;
%%
main(int argc, char **argv)
{
yyparse();
}
yyerror(char *s)
{
fprintf(stderr, "error: %s\n", s);
}
flexFile.l
%{
# include "f5.tab.h"
int yylval;
%}
/* reconocimiento de tokens e impresion */
%{
int yylval;
%}
%option noyywrap
%%
"+" { return ADD; }
"-" { return SUB; }
"*" { return MUL; }
"/" { return DIV; }
"|" { return ABS; }
[0-9]+("."[0-9]+)? { yylval = atoi(yytext); return NUMBER; } //part added
\n { return EOL; }
[ \t] { /* ignore whitespace */ }
. { printf("Mystery character %c\n", *yytext); }
%%
My program works fine with integer numbers, and it also recognizes real numbers, but the problem is that when I print the results of an operation it always return the answer as an integer number. Why is that?
Thanks

Your use of atoi in the production converts the string to an integer.
Using atof will convert it to a floating point number.
If you want to separate the two, you'll need to change the matching rule for integers, and add one for floating point.

Change "%d" → "%f" in the file “bisonFile.y”. This uses a floating point format for printing the result. The fixed line should read:
| calclist exp EOL { printf("= %f\n", $2); }
In the file “flexFile.l” remove both definitions int yylval. bison outputs
YYSTYPE yylval;
automatically. YYSTYPE is the type of the semantic values. Because you want a floating point calculator, this shall be double. Note that YYSTYPE defaults to int. To change that, YYSTYPE must be defined when compiling the C-codes (from bison and flex) (see below).
Finally, as already stated by MIS, replace atoi() → atof(). The edited line in flexFile.l should read:
[0-9]+("."[0-9]+)? { yylval = atof(yytext); return NUMBER; }
For a novice the dependencies between flex and bison sources might be confusing. A minimal Makefile documents how the example can be compiled. Line 2 sets the semantic type for the scanner and the parser consistently:
calc: calc.o l.o
calc.o l.o: CFLAGS+=-DYYSTYPE=double
l.o: l.c f5.tab.h
calc.c f5.tab.h: bisonFile.y
bison -o $# --defines=f5.tab.h $^
l.c: flexFile.l f5.tab.h
flex -o $# $^
clean::
$(RM) calc calc.o calc.c f5.tab.h l.o l.c
That’ll do the trick.

Related

How do I use %empty correctly?

I'm new to bison/fex and I am trying to recognize patterns of 1-3 words of input.
My .l recognizes WORD as any series of lowercase characters, treats % and $ characters as tokens of their own ascii value, ignores whitespace, and recognizes everything else as an ERR token.
MISC [\%\$]
%%
[a-z]+ { yylval.WORD = yytext; return WORD; }
[ \t\r\n]+ {} //ignore whitespace
{MISC} return (int) yytext[0];
. return ERR; //unrecognized input
My .y tries to recognize sequences of 1-3 WORDS separated by characters % and $. I want it so that even if I input just 1 WORD, I still get a complete statement. I don't include any rules for the ERR token to invoke a syntax error in the parser when an unrecognize character is received as input.
%{
#include <stdio.h>
int yylex();
void yyerror(char* s){
fprintf(stderr, "%s\n", s);
};
%}
%define api.value.type union
%token <char*> WORD
%nterm <char*> word1 word2 word3
%token ERR
%%
statement: word1 word2 word3 { printf("%s, %s, %s\n", $1, $2, $3); return 0; }
;
word1: WORD { $$ = $1; }
;
word2: %empty { $$ = "nothing"; }
| '%' WORD { $$ = $2; }
;
word3: %empty { $$ = "nothing";}
| '$' WORD { $$ = $2; }
;
%%
My main.c loops on yyparse(). Ideally I'm trying to parse only 1 line of input per iteration.
#include <unistd.h>
#include <stdio.h>
int yyparse();
int yylex();
extern FILE* yyin;
extern void yyrestart();
int main() {
while(1) {
printf("input: ");
if (yyparse() == 0) {
printf("success\n");
};
yyrestart(yyin);
}
return 0;
}
However, I am getting adverse output, and I can't explain what's causing it:
In: word word -> Out: word word, nothing, nothing
In: word word word -> Out: word word, nothing, nothing
In: word % word $ word -> Out: word % word $ word, word $ word, word
Additionally, if I input only 1 WORD, my command line hangs until it receives additional input. This additional input could be anything. Even characters that would normally be recognized as an ERR token and invoke a syntax error somehow get by.
In: word. -> Out: word., nothing, nothing
I want my parser to be able to run even if I give it only 1 word of input. I thought that with the inclusion of %empty for subrules word2 and word3 I would get this behavior, but I am not sure what I am doing wrong.

FLEX/YACC program not behaving as expected : can't grab int value from sequence of ints

I am trying to build a parser that takes a list of strings in the following format and performs either an addition or multiplication of all of its elements :
prod 5-6_
sum _
sum 5_
sum 5-6-7_
$
Should print the following to the screen :
prod = 30
sum = 0
sum = 5
sum = 18
What I am actually getting as output is this :
prod = 0
sum = 0
sum = 5
sum = 5
My lex file looks like this :
%{
#include <iostream>
#include "y.tab.h"
using namespace std;
extern "C" int yylex();
%}
%option yylineno
digit [0-9]
integer {digit}+
operator "sum"|"prod"
%%
{integer} { return number; }
{operator} { return oper; }
"-" { return '-'; }
"_" { return '_'; }
"$" { return '$'; }
\n { ; }
[\t ]+ { ; }
. { cout << "unknown char" << endl; }
%%
and my yacc file looks like this :
%token oper
%token number
%token '-'
%token '_'
%token '$'
%start valid
%{
#include <iostream>
#include <string>
#include <cstdio>
#include <cstdlib>
using namespace std;
#define YYSTYPE int
extern FILE *yyin;
extern char yytext[];
extern "C" int yylex();
int yyparse();
extern int yyerror(char *);
char op;
%}
%%
valid : expr_seq endfile {}
| {}
;
expr_seq : expr {}
| expr_seq expr {}
;
expr : op sequence nl {if (op == '+') cout << "sum = " ; else cout << "prod = ";}
| op nl {if (op == '+') cout << "sum = 0"; else cout <<"prod = 1";}
;
op : oper { if (yytext[0] == 's') op = '+'; else op = '*';}
;
sequence : number { $$ = atoi(yytext);}
| sequence '-' number { if (op == '+') $$ = $1 + $3; else $$ = $1 * $3;}
;
nl : '_' { cout << endl;}
;
endfile : '$' {}
;
%%
int main(int argc, char *argv[])
{
++argv, --argc;
if(argc > 0) yyin = fopen(argv[0], "r");
else yyin = stdin;
yyparse();
return 0;
}
int yyerror(char * msg)
{
extern int yylineno;
cerr << msg << "on line # " << yylineno << endl;
return 0;
}
My reasoning for the yacc logic is as follows :
a file is valid only if it contains a sequence of expressions followed by the endfile symbol.
a sequence of expressions is a single expression or several expressions.
an expression is either an operator followed by a new line, OR an operator, followed by a list of numbers, followed by a new line symbol.
an operator is either 'sum' or 'prod'
a list of numbers is either a number or several numbers separated by the '-' symbol.
From my perspective this should work, but for some reason it doesn't interpret the sequence of numbers properly after the first element. Any tips would be helpful.
Thanks
You must not use yytext in your yacc actions. yytext is only valid during a scanner action, and the parser often reads ahead to the next token. (In fact, yacc always reads the next token. Bison sometimes doesn't, but it's not always easily predictable.)
You can associate a semantic value with every token (and non-terminal), and you can reference these semantic values using $1, $2, etc. in your yacc actions. You can even associate semantic values of different types to different grammar symbols. And if you use bison -- and you probably are using bison -- you can give grammar symbols names to make it easier to refer to their semantic values.
This is all explained in depth, with examples, in the bison manual.
The solution that worked was simply to change the following lines :
sequence : number { $$ = atoi(yytext);}
| sequence '-' number { if (op == '+') $$ = $1 + $3; else $$ = $1 * $3;}
;
to this :
sequence : number { $$ = atoi(yytext);}
| sequence '-' number { if (op == '+') $$ = $1 + atoi(yytext); else $$ = $1 * atoi(yytext);}
;

Why do I need to rewrite a grammar?

I'm trying to study compiler construction on my own. I'm reading a book and this is one of the exercises (I want to stress that this is not homework, I'm doing this on my own).
The following grammar represents a simple arithmetic expressions in
LISP-like prefix notation
lexp -> number | ( op lexp-seq )
op -> + | * | +
lexp-seq -> lexp-seq lexp | lexp
For example, the expression (* (-2) 3 4) has a value of -24. Write
Yacc/Bison specification for a program that will compute and print
the value of expressions in this syntax. (Hint: this will require
rewriting the grammar, as well as the use of a mechanism for passing
the operator to an lexp-seq
I have solved it. The solution is provided below. However I have questions about my solution as well as the problem itself. Here they are:
I don't modify a grammar in my solution and it seems to be working perfectly. There are no conflicts when Yacc/Bison spec is converted to a .c file. So why is the author saying that I need to rewrite a grammar?
My solution is using a stack as a mechanism for passing the operator to an lexp-seq. Can someone suggest a different method, the one that will not use a stack?
Here is my solution to the problem (I'm not posting code for stack manipulation as the assumption is that the reader is familiar with how stacks work)
%{
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include "linkedstack.h"
int yylex();
int yyerror();
node *operatorStack;
%}
%token NUMBER
%%
command : lexp { printf("%d\n", $1); };
lexp : NUMBER { $$ = $1; }
| '(' op lexp_seq ')'
{
int operator;
operatorStack = pop(operatorStack, &operator);
switch(operator) {
default:
yyerror("Unknown operator");
exit(1);
break;
case '+':
case '*':
$$ = $3;
break;
case '-':
$$ = -$3;
break;
}
}
;
op : '+' { operatorStack = push(operatorStack, '+'); }
| '-' { operatorStack = push(operatorStack, '-'); }
| '*' { operatorStack = push(operatorStack, '*'); }
;
lexp_seq : lexp_seq lexp
{
switch(operatorStack->data) {
default:
yyerror("Unrecognized operator");
exit(1);
break;
case '+':
$$ = $1 + $2;
break;
case '-':
$$ = $1 - $2;
break;
case '*':
$$ = $1 * $2;
break;
}
}
| lexp { $$ = $1; }
;
%%
int main(int argc, char** argv) {
int retVal;
init(operatorStack);
if (2 == argc && (0 == strcmp("-g", argv[1])))
yydebug = 1;
retVal = yyparse();
destroy(operatorStack);
return retVal;
}
int yylex() {
int c;
/* eliminate blanks*/
while((c = getchar()) == ' ');
if (isdigit(c)) {
ungetc(c, stdin);
scanf("%d", &yylval);
return (NUMBER);
}
/* makes the parse stop */
if (c == '\n') return 0;
return (c);
}
int yyerror(char * s) {
fprintf(stderr, "%s\n", s);
return 0;
} /* allows for printing of an error message */
Using a stack here is unnecessary if you rewrite the grammar.
One way is to use a different non-terminal for each operator:
command : lexp '\n' { printf("%d\n", $1); }
lexp : NUMBER
| '(' op_exp ')' { $$ = $2; }
op_exp : plus_exp | times_exp | minus_exp
plus_exp: '+' lexp { $$ = $2; }
| plus_exp lexp { $$ = $1 + $2; }
times_exp: '*' lexp { $$ = $2; }
| times_exp lexp { $$ = $1 * $2; }
minus_exp: '-' lexp { $$ = -$2; }
| minus_exp lexp { $$ = $1 - $2; }
I don't know if that is what your book's author had in mind. There are certainly other possible implementations.
In a real lisp-like language, you would need to do this quite differently, because the first object in an lexp could be a higher-order value (i.e. a function), which might even be the result of a function call, so you can't encode the operations into the syntax (and you can't necessarily partially evaluate the expression as you parse new arguments, either).

Bison syntax error

I recently started learning bison and I already hit a wall. The manual sections are a little bit ambiguous, so I guess an error was to be expected. The code below is the first tutorial from the official manual - The Reverse Polish Notation Calculator, saved in a single file - rpcalc.y.
/* Reverse polish notation calculator */
%{
#include <stdio.h>
#include <math.h>
#include <ctype.h>
int yylex (void);
void yyerror (char const *);
%}
%define api.value.type {double}
%token NUM
%% /* Grammar rules and actions follow. */
input:
%empty
| input line
;
line:
'\n'
| exp '\n' {printf ("%.10g\n", $1);}
;
exp:
NUM {$$ = $1; }
| exp exp '+' {$$ = $1 + $2; }
| exp exp '-' {$$ = $1 - $2; }
| exp exp '*' {$$ = $1 * $2; }
| exp exp '/' {$$ = $1 / $2; }
| exp exp '^' {$$ = pow ($1, $2); }
| exp 'n' {$$ = -$1; }
;
%%
/* The lexical analyzer */
int yylex (void)
{
int c;
/* Skip white space */
while((c = getchar()) == ' ' || c == '\t')
continue;
/* Process numbers */
if(c == '.' || isdigit (c))
{
ungetc (c, stdin);
scanf ("%lf", $yylval);
return NUM;
}
/* Return end-of-imput */
if (c == EOF)
return 0;
/* Return a single char */
return c;
}
int main (void)
{
return yyparse ();
}
void yyerror (char const *s)
{
fprintf (stderr, "%s\n", s);
}
Executing bison rpcalc.y in cmd returns the following error:
rpcalc.y:11.24-31: syntax error, unexpected {...}
What seems to be the problem?
The fault is caused by you using features that are new to the 3.0 version of bison, whereas you have an older version of bison installed. If you are unable to upgrade to version 3.0, it is an easy change to convert the grammar to using the features of earlier versions of bison.
The %define api.value.type {double} can be changed to a %type command, and the %empty command removed. The resulting bison program would be:
/* Reverse polish notation calculator */
%{
#include <stdio.h>
#include <math.h>
#include <ctype.h>
int yylex (void);
void yyerror (char const *);
%}
%type <double> exp
%token <double> NUM
%% /* Grammar rules and actions follow. */
input:
| input line
;
line:
'\n'
| exp '\n' {printf ("%.10g\n", $1);}
;
exp:
NUM {$$ = $1; }
| exp exp '+' {$$ = $1 + $2; }
| exp exp '-' {$$ = $1 - $2; }
| exp exp '*' {$$ = $1 * $2; }
| exp exp '/' {$$ = $1 / $2; }
| exp exp '^' {$$ = pow ($1, $2); }
| exp 'n' {$$ = -$1; }
;
%%
/* The lexical analyzer */
int yylex (void)
{
int c;
/* Skip white space */
while((c = getchar()) == ' ' || c == '\t')
continue;
/* Process numbers */
if(c == '.' || isdigit (c))
{
ungetc (c, stdin);
scanf ("%lf", $yylval);
return NUM;
}
/* Return end-of-imput */
if (c == EOF)
return 0;
/* Return a single char */
return c;
}
int main (void)
{
return yyparse ();
}
void yyerror (char const *s)
{
fprintf (stderr, "%s\n", s);
}
This runs in a wider range of bison versions.

Error while retrieving value from symbol table

I'm trying to implement a simple calculator using Flex and Bison. I'm running into problems in the Bison stage, wherein I can't figure out the way in which the value of the variable can be retrieved from the symbol table and assigned to $$.
The lex file:
%{
#include <iostream>
#include <string.h>
#include "calc.tab.h"
using namespace std;
void Print();
int count = 0;
%}
%%
[ \t\n]+ ;
"print" {Print();}
"exit" {
exit(EXIT_SUCCESS);
}
[0-9]+ {
yylval.FLOAT = atof(yytext);
return (NUMBER);
count++;
}
[a-z][_a-zA-Z0-9]* {
yylval.NAME = yytext;
return (ID);
}
. {
return (int)yytext[0];
}
%%
void Print()
{
cout << "Printing ST..\n";
}
int yywrap()
{
return 0;
}
The Bison file:
%{
#include <iostream>
#include <string.h>
#include "table.h"
extern int count;
int yylex();
int yyerror(const char *);
int UpdateSymTable(float, char *, float);
using namespace std;
%}
%union
{
float FLOAT;
char *NAME;
}
%token NUMBER
%token ID
%type <FLOAT> NUMBER
%type <NAME> ID
%type <FLOAT> expr
%type <FLOAT> E
%left '*'
%left '/'
%left '+'
%left '-'
%right '='
%%
E: expr {cout << $$ << "\n";}
expr: NUMBER {$$ = $1;}
| expr '+' expr {$$ = $1 + $3;}
| expr '-' expr {$$ = $1 - $3;}
| expr '*' expr {$$ = $1 * $3;}
| expr '/' expr {$$ = $1 / $3;}
| ID '=' expr {
int index = UpdateSymTable($$, $1, $3);
$$ = st[index].number = $3; //The problem is here
}
%%
int yyerror(const char *msg)
{
cout << "Error: "<<msg<<"\n";
}
int UpdateSymTable(float doll_doll, char *doll_one, float doll_three)
{
int number1 = -1;
for(int i=0;i<count;i++)
{
if(!strcmp(doll_one, st[i].name) == 0)
{
strcpy(st[i].name, doll_one);
st[i].number = doll_three;
number1 = i;
}
else if(strcmp(doll_one, st[i].name) == 0)
{
number1 = i;
}
}
return number1;
}
int main()
{
yyparse();
}
The symbol table:
struct st
{
float number;
char name[25];
}st[25];
The output I'm getting is:
a = 20
c = a+3
20
Error: syntax error
I would really appreciate it if someone told me what is going wrong. I'm trying since a long time, and I haven't been able to resolve the error.
The syntax error is the result of your grammar only accepting a single expr rather than a sequence of exprs. See, for example, this question.
One of the problems with your symbol table lookup is that you incorrectly return the value yytext as your semantic value, instead of making a copy. See, for example, this question.
However, your UpdateSymTable functions has quite a few problems, starting with the fact that the names you chose for parameters are meaningless, and furthermore the first parameter ("doll_doll") is never used. I don't know what you intended to test with !strcmp(doll_one, st[i].name) == 0 but whatever it was, there must be a simpler way of expressing it. In any case, the logic is incorrect. I'd suggest writing some simple test programs (without bison and flex) to let you debug the symbol table handling. And/or talk to your lab advisor, assuming you have one.
Finally, (of what I noticed) your precedence relations are not correct. First, they are reversed: the operator which binds least tightly (assignment) should come first. Second, it is not the case that + has precedence over - , or vice versa; the two operators have the same precedence. Similarly with * and /. You could try reading the precedence chapter of the bison manual if you don't have lecture notes or other information.

Resources