I'm new to bison/fex and I am trying to recognize patterns of 1-3 words of input.
My .l recognizes WORD as any series of lowercase characters, treats % and $ characters as tokens of their own ascii value, ignores whitespace, and recognizes everything else as an ERR token.
MISC [\%\$]
%%
[a-z]+ { yylval.WORD = yytext; return WORD; }
[ \t\r\n]+ {} //ignore whitespace
{MISC} return (int) yytext[0];
. return ERR; //unrecognized input
My .y tries to recognize sequences of 1-3 WORDS separated by characters % and $. I want it so that even if I input just 1 WORD, I still get a complete statement. I don't include any rules for the ERR token to invoke a syntax error in the parser when an unrecognize character is received as input.
%{
#include <stdio.h>
int yylex();
void yyerror(char* s){
fprintf(stderr, "%s\n", s);
};
%}
%define api.value.type union
%token <char*> WORD
%nterm <char*> word1 word2 word3
%token ERR
%%
statement: word1 word2 word3 { printf("%s, %s, %s\n", $1, $2, $3); return 0; }
;
word1: WORD { $$ = $1; }
;
word2: %empty { $$ = "nothing"; }
| '%' WORD { $$ = $2; }
;
word3: %empty { $$ = "nothing";}
| '$' WORD { $$ = $2; }
;
%%
My main.c loops on yyparse(). Ideally I'm trying to parse only 1 line of input per iteration.
#include <unistd.h>
#include <stdio.h>
int yyparse();
int yylex();
extern FILE* yyin;
extern void yyrestart();
int main() {
while(1) {
printf("input: ");
if (yyparse() == 0) {
printf("success\n");
};
yyrestart(yyin);
}
return 0;
}
However, I am getting adverse output, and I can't explain what's causing it:
In: word word -> Out: word word, nothing, nothing
In: word word word -> Out: word word, nothing, nothing
In: word % word $ word -> Out: word % word $ word, word $ word, word
Additionally, if I input only 1 WORD, my command line hangs until it receives additional input. This additional input could be anything. Even characters that would normally be recognized as an ERR token and invoke a syntax error somehow get by.
In: word. -> Out: word., nothing, nothing
I want my parser to be able to run even if I give it only 1 word of input. I thought that with the inclusion of %empty for subrules word2 and word3 I would get this behavior, but I am not sure what I am doing wrong.
Related
I am trying to build a parser that takes a list of strings in the following format and performs either an addition or multiplication of all of its elements :
prod 5-6_
sum _
sum 5_
sum 5-6-7_
$
Should print the following to the screen :
prod = 30
sum = 0
sum = 5
sum = 18
What I am actually getting as output is this :
prod = 0
sum = 0
sum = 5
sum = 5
My lex file looks like this :
%{
#include <iostream>
#include "y.tab.h"
using namespace std;
extern "C" int yylex();
%}
%option yylineno
digit [0-9]
integer {digit}+
operator "sum"|"prod"
%%
{integer} { return number; }
{operator} { return oper; }
"-" { return '-'; }
"_" { return '_'; }
"$" { return '$'; }
\n { ; }
[\t ]+ { ; }
. { cout << "unknown char" << endl; }
%%
and my yacc file looks like this :
%token oper
%token number
%token '-'
%token '_'
%token '$'
%start valid
%{
#include <iostream>
#include <string>
#include <cstdio>
#include <cstdlib>
using namespace std;
#define YYSTYPE int
extern FILE *yyin;
extern char yytext[];
extern "C" int yylex();
int yyparse();
extern int yyerror(char *);
char op;
%}
%%
valid : expr_seq endfile {}
| {}
;
expr_seq : expr {}
| expr_seq expr {}
;
expr : op sequence nl {if (op == '+') cout << "sum = " ; else cout << "prod = ";}
| op nl {if (op == '+') cout << "sum = 0"; else cout <<"prod = 1";}
;
op : oper { if (yytext[0] == 's') op = '+'; else op = '*';}
;
sequence : number { $$ = atoi(yytext);}
| sequence '-' number { if (op == '+') $$ = $1 + $3; else $$ = $1 * $3;}
;
nl : '_' { cout << endl;}
;
endfile : '$' {}
;
%%
int main(int argc, char *argv[])
{
++argv, --argc;
if(argc > 0) yyin = fopen(argv[0], "r");
else yyin = stdin;
yyparse();
return 0;
}
int yyerror(char * msg)
{
extern int yylineno;
cerr << msg << "on line # " << yylineno << endl;
return 0;
}
My reasoning for the yacc logic is as follows :
a file is valid only if it contains a sequence of expressions followed by the endfile symbol.
a sequence of expressions is a single expression or several expressions.
an expression is either an operator followed by a new line, OR an operator, followed by a list of numbers, followed by a new line symbol.
an operator is either 'sum' or 'prod'
a list of numbers is either a number or several numbers separated by the '-' symbol.
From my perspective this should work, but for some reason it doesn't interpret the sequence of numbers properly after the first element. Any tips would be helpful.
Thanks
You must not use yytext in your yacc actions. yytext is only valid during a scanner action, and the parser often reads ahead to the next token. (In fact, yacc always reads the next token. Bison sometimes doesn't, but it's not always easily predictable.)
You can associate a semantic value with every token (and non-terminal), and you can reference these semantic values using $1, $2, etc. in your yacc actions. You can even associate semantic values of different types to different grammar symbols. And if you use bison -- and you probably are using bison -- you can give grammar symbols names to make it easier to refer to their semantic values.
This is all explained in depth, with examples, in the bison manual.
The solution that worked was simply to change the following lines :
sequence : number { $$ = atoi(yytext);}
| sequence '-' number { if (op == '+') $$ = $1 + $3; else $$ = $1 * $3;}
;
to this :
sequence : number { $$ = atoi(yytext);}
| sequence '-' number { if (op == '+') $$ = $1 + atoi(yytext); else $$ = $1 * atoi(yytext);}
;
I have been using flex and bison for making a small calculator. My files are the following:
bisonFile.y
%{
#include <stdio.h>
%}
/* declare tokens */
%token NUMBER
%token ADD SUB MUL DIV ABS
%token EOL
%%
calclist: /* nothing */
| calclist exp EOL { printf("= %d\n", $2); }
;
exp: factor
| exp ADD factor { $$ = $1 + $3; }
| exp SUB factor { $$ = $1 - $3; }
;
factor: term
| factor MUL term { $$ = $1 * $3; }
| factor DIV term { $$ = $1 / $3; }
;
term: NUMBER
| ABS term { $$ = $2 >= 0? $2 : - $2; }
;
%%
main(int argc, char **argv)
{
yyparse();
}
yyerror(char *s)
{
fprintf(stderr, "error: %s\n", s);
}
flexFile.l
%{
# include "f5.tab.h"
int yylval;
%}
/* reconocimiento de tokens e impresion */
%{
int yylval;
%}
%option noyywrap
%%
"+" { return ADD; }
"-" { return SUB; }
"*" { return MUL; }
"/" { return DIV; }
"|" { return ABS; }
[0-9]+("."[0-9]+)? { yylval = atoi(yytext); return NUMBER; } //part added
\n { return EOL; }
[ \t] { /* ignore whitespace */ }
. { printf("Mystery character %c\n", *yytext); }
%%
My program works fine with integer numbers, and it also recognizes real numbers, but the problem is that when I print the results of an operation it always return the answer as an integer number. Why is that?
Thanks
Your use of atoi in the production converts the string to an integer.
Using atof will convert it to a floating point number.
If you want to separate the two, you'll need to change the matching rule for integers, and add one for floating point.
Change "%d" → "%f" in the file “bisonFile.y”. This uses a floating point format for printing the result. The fixed line should read:
| calclist exp EOL { printf("= %f\n", $2); }
In the file “flexFile.l” remove both definitions int yylval. bison outputs
YYSTYPE yylval;
automatically. YYSTYPE is the type of the semantic values. Because you want a floating point calculator, this shall be double. Note that YYSTYPE defaults to int. To change that, YYSTYPE must be defined when compiling the C-codes (from bison and flex) (see below).
Finally, as already stated by MIS, replace atoi() → atof(). The edited line in flexFile.l should read:
[0-9]+("."[0-9]+)? { yylval = atof(yytext); return NUMBER; }
For a novice the dependencies between flex and bison sources might be confusing. A minimal Makefile documents how the example can be compiled. Line 2 sets the semantic type for the scanner and the parser consistently:
calc: calc.o l.o
calc.o l.o: CFLAGS+=-DYYSTYPE=double
l.o: l.c f5.tab.h
calc.c f5.tab.h: bisonFile.y
bison -o $# --defines=f5.tab.h $^
l.c: flexFile.l f5.tab.h
flex -o $# $^
clean::
$(RM) calc calc.o calc.c f5.tab.h l.o l.c
That’ll do the trick.
I'm trying to implement a simple calculator using Flex and Bison. I'm running into problems in the Bison stage, wherein I can't figure out the way in which the value of the variable can be retrieved from the symbol table and assigned to $$.
The lex file:
%{
#include <iostream>
#include <string.h>
#include "calc.tab.h"
using namespace std;
void Print();
int count = 0;
%}
%%
[ \t\n]+ ;
"print" {Print();}
"exit" {
exit(EXIT_SUCCESS);
}
[0-9]+ {
yylval.FLOAT = atof(yytext);
return (NUMBER);
count++;
}
[a-z][_a-zA-Z0-9]* {
yylval.NAME = yytext;
return (ID);
}
. {
return (int)yytext[0];
}
%%
void Print()
{
cout << "Printing ST..\n";
}
int yywrap()
{
return 0;
}
The Bison file:
%{
#include <iostream>
#include <string.h>
#include "table.h"
extern int count;
int yylex();
int yyerror(const char *);
int UpdateSymTable(float, char *, float);
using namespace std;
%}
%union
{
float FLOAT;
char *NAME;
}
%token NUMBER
%token ID
%type <FLOAT> NUMBER
%type <NAME> ID
%type <FLOAT> expr
%type <FLOAT> E
%left '*'
%left '/'
%left '+'
%left '-'
%right '='
%%
E: expr {cout << $$ << "\n";}
expr: NUMBER {$$ = $1;}
| expr '+' expr {$$ = $1 + $3;}
| expr '-' expr {$$ = $1 - $3;}
| expr '*' expr {$$ = $1 * $3;}
| expr '/' expr {$$ = $1 / $3;}
| ID '=' expr {
int index = UpdateSymTable($$, $1, $3);
$$ = st[index].number = $3; //The problem is here
}
%%
int yyerror(const char *msg)
{
cout << "Error: "<<msg<<"\n";
}
int UpdateSymTable(float doll_doll, char *doll_one, float doll_three)
{
int number1 = -1;
for(int i=0;i<count;i++)
{
if(!strcmp(doll_one, st[i].name) == 0)
{
strcpy(st[i].name, doll_one);
st[i].number = doll_three;
number1 = i;
}
else if(strcmp(doll_one, st[i].name) == 0)
{
number1 = i;
}
}
return number1;
}
int main()
{
yyparse();
}
The symbol table:
struct st
{
float number;
char name[25];
}st[25];
The output I'm getting is:
a = 20
c = a+3
20
Error: syntax error
I would really appreciate it if someone told me what is going wrong. I'm trying since a long time, and I haven't been able to resolve the error.
The syntax error is the result of your grammar only accepting a single expr rather than a sequence of exprs. See, for example, this question.
One of the problems with your symbol table lookup is that you incorrectly return the value yytext as your semantic value, instead of making a copy. See, for example, this question.
However, your UpdateSymTable functions has quite a few problems, starting with the fact that the names you chose for parameters are meaningless, and furthermore the first parameter ("doll_doll") is never used. I don't know what you intended to test with !strcmp(doll_one, st[i].name) == 0 but whatever it was, there must be a simpler way of expressing it. In any case, the logic is incorrect. I'd suggest writing some simple test programs (without bison and flex) to let you debug the symbol table handling. And/or talk to your lab advisor, assuming you have one.
Finally, (of what I noticed) your precedence relations are not correct. First, they are reversed: the operator which binds least tightly (assignment) should come first. Second, it is not the case that + has precedence over - , or vice versa; the two operators have the same precedence. Similarly with * and /. You could try reading the precedence chapter of the bison manual if you don't have lecture notes or other information.
I am trying to parse a file like this: (too simple for my actual purpose, but for the beginning, this is ok)
#Book{key2,
Author="Some2VALUE" ,
Title="VALUE2"
}
The lexer is:
[A-Za-z"][^\\\" \n\(\),=\{\}#~_]* { yylval.sval = strdup(yytext); return KEY; }
#[A-Za-z][A-Za-z]+ {yylval.sval = strdup(yytext + 1); return ENTRYTYPE;}
[ \t\n] ; /* ignore whitespace */
[{}=,] { return *yytext; }
. { fprintf(stderr, "Unrecognized character %c in input\n", *yytext); }
And then parsing this with:
%union
{
char *sval;
};
%token <sval> ENTRYTYPE
%type <sval> VALUE
%token <sval> KEY
%start Input
%%
Input: Entry
| Input Entry ; /* input is zero or more entires */
Entry:
ENTRYTYPE '{' KEY ','{
b_entry.type = $1;
b_entry.id = $3;
b_entry.table = g_hash_table_new_full(g_str_hash, g_str_equal, free, free);}
KeyVals '}' {
parse_entry(&b_entry);
g_hash_table_destroy(b_entry.table);
free(b_entry.type); free(b_entry.id);
b_entry.table = NULL;
b_entry.type = b_entry.id = NULL;}
;
KeyVals:
/* empty */
| KeyVals KeyVal ; /* zero or more keyvals */
VALUE:
/*empty*/
| KEY
| VALUE KEY
;
KeyVal:
/*empty*/
KEY '=' VALUE ',' { g_hash_table_replace(b_entry.table, $1, $3); }
| KEY '=' VALUE { g_hash_table_replace(b_entry.table, $1, $3); }
| error '\n' {yyerrok;}
;
There are few problem, so that I need to generalize both the lexer and parser:
1) It can not read a sentence, i.e. if the RHS of Author="Some Value", it only shows "Some. i.e. space is not handled. Dont know how to do it.
2) If I enclose the RHS with {} rather then "", it gives syntax error.
Looking for help for this 2 situation.
The main issue is that your tokens are not appropriate. You should try to recognize the tokens of your example as follows:
#Book ENTRYTYPE
{ '{'
key2 KEY
, ','
Author KEY
= '='
"Some2VALUE" VALUE
, ','
Title KEY
= '='
"VALUE2" VALUE
} '}'
The VALUE token could for example be defined as follows:
%x value
%%
"\"" {BEGIN(value);}
<value>"\"" {BEGIN{INITIAL); return VALUE;}
<value>"\\\"" { /* escaped " */ }
<value>[^"] { /* Non-escaped char */ }
Or in a single expression as
"\""([^"]|("\\\""))*"\""
This is assuming that only " needs to be escaped with a \. I'm not sure how BibTeX defines how to escape a ", if possible at all.
During parsing, if I encounter a include token I want to instruct YACC to open the file specified as input and to begin parsing this. Once this parsing is finished, I want to instruct YACC to return to the file and continue parsing directly after the include expression. I will restrict the include depth level to one.
The flex manual covers how to do this using yypush_buffer_state() and yypop_buffer_state().
Here is the section of the official manual on using multiple input buffers. There is some sample code.
It's normal to communicate between the lexical and syntactic phases of your processor.
So, recognize the syntax for an include directive in your parser (or, to make things easier, just recognize it in the lexer) and do the switching at the lexical level.
For example, here is a simple language that recognizes standard input lines containing ab or cd or .file. When it sees .someString it opens someString as an include file and then goes back to reading standard input.
%{
#include <stdio.h>
#include <stdlib.h>
void start_include(char *); int yylex(void); void yyerror(void *);
#define YYSTYPE char *
%}
%%
all: all others | others;
others: include_rule | rule_1 | rule_2 | 'Z' { YYACCEPT; };
include_rule: '.' '\n' { start_include($1); };
rule_1: 'a' 'b' '\n' { printf("random rule 1\n"); };
rule_2: 'c' 'd' '\n' { printf("random rule 2\n"); };
%%
FILE * f = NULL;
void start_include(char *s) {
if ((f = fopen(s, "r")) == NULL)
abort();
}
int yylex(void) {
int c;
static char s[100];
if (f == NULL)
f = stdin;
c = getc(f);
if (c == EOF) {
f = stdin;
c = getc(f);
}
if (c == '.') {
scanf(" %s", s);
yylval = s;
} else if (c == EOF)
return 'Z';
return c;
}
And when we run it...
$ cat > toplevel
ab
.myinclude
ab
$ cat > myinclude
cd
cd
$ yacc ip.y && cc -Wall y.tab.c -ly && ./a.out < toplevel
random rule 1
random rule 2
random rule 2
random rule 1
$