I am currently working through sample code from the O'Reilly press book entitled "Flex and Bison". I am using the GNU C compiler for Windows with Flex and Bison binary install for Windows which is launched using gcc rather than the Linux cc command.
The problem is that the code if copied directly from the book does not compile and I have had to hack it a bit to get it to work.
Example from book
Example 1-1. Word count fb1-1.l
/* just like Unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
. { chars++; }
%%
main(int argc, char **argv)
{
yylex();
printf("%8d%8d%8d\n", lines, words, chars);
}
I compiled the code in the flowing way using Windows command line:
flex file.l
gcc lex.yy.c -o a.exe
It crashed stating yywrap() was not found. I added that and then it worked but did not complete the printf in the main function as it just hung waiting for more input!
Here is my solution that works but feels like a hack and that I am not in full understanding of the process.
/* just like Unix wc */
%{
*#include <string.h>*
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
*"." { return ;}*
. { chars++; }
%%
*int yywrap(void)
{
return 1;
}*
int main(void)
{
yylex();
printf("num lines is %8d, num words is %8d, num chars is %8d\n", lines, words, chars);
return 0;
}
I had to add a new rule to return out of yylex() which was not in the book, add yywrap()- not really knowing why and add string.h which was not present!. My main question is are there significant differences between flex for Windows and Unix and is it possible to run the original code with my gcc compiler and gnu flex without the said hacks?
I do not understand what have you achieved with this:
#include <string.h>
"." { return; }
But what I know for sure is that if you are running FLEX without specified input file you have to mark the end of input. Otherwise FLEX will wait for input. What I would suggest:
%{
#include <stdio.h>
int chars = 0;
int words = 0;
int lines = 0;
%}
WORD [a-zA-Z]+
%%
{WORD} {
words++;
chars += strlen(yytext);
}
\n {
lines++;
/* chars++; why this? there was no columns here - it's a new line */
}
\s {
/* count the spaces */
chars++;
}
\t {
/* count the tabs */
chars += 4 /* or 8 */;
}
. {
printf("Error (unknown symbol):\t%c\n", yytext[0]);
chars++;
}
%%
int main()
{
/* iterate until end of input and even if errors - continue */
while(yylex()){ }
printf("lines:\t%8d\nwords:\t%8d\nchars:\t%8d\n", lines, words, chars);
return 0;
}
Build with:
flex input.l
output will be lex.yy.c
Then build:
gcc -o scanner.exe lex.yy.c -lfl
Create a txt file with input. Run following:
scanner.exe <in.txt>out.txt
Less sign means redirect input from file in.txt while greater sign means redirect output to out.txt Cause file has EOF at the end of file FLEX will properly stop.
Related
How to match a pattern R only if it is preceded by another pattern S without reading S (to give S-matched input back to lex) ?
file.l :
%%
\\foo {
yytext++; // To remove the starting backslash
printf("%s\n", yytext);
}
\\ printf("backslash!\n");
.
%%
int main() {
yylex();
}
In the above example i want to accept foo only when it is preceded by a backslash \.
But in my current implementation i am eating the \ which is matched below.
To check run as :
lex file.l
gcc -lfl lex.yy.c
./a.out
Edit 1:
I tried using unput as suggested by #rici. But as i implemented there are strings having both patterns which do not detect both patterns.
Like bar\foo
%%
\\foo {
yytext++; // To remove the starting backslash
printf("foo\n");
unput('\\');
}
bar\\ printf("bar\n");
.
%%
int main() {
yylex();
}
Edit 2:
Got the answer here. It uses flex start condition.
Edit 3:
%x backslash
%%
<backslash>foo {
printf("foo\n");
BEGIN(INITIAL);
}
bar\\ {
BEGIN(backslash);
printf("bar\n");
}
.
%%
int main() {
yylex();
}
I am just learning flex and I have written a flex program to detect a given word is verb or not. I will take input from a text file.I want to improve the code. I want to detect if there is any ill formed or unfinished string in the code.Unfinished means it starts using the start symbol (" " or /* ) but doesn't have any ending one and ill formed means,for example ( "I am" a boy") or (/* this is a */ comment */) like these ones. I want to detect them in my code. How will I do that? My sample code is as follows:
%%
[\t]+
is |
am |
are |
was |
were {printf("%s: is a verb",yytext);}
[a-zA-Z]+ {printf("%s: is a verb",yytext);}
["][^"]*["] {printf("'%s': is a string\n", yytext); }
. |\n
%%
int main(int argc, char *argv[]){
yyin = fopen(argv[1], "r");
yylex();
fclose(yyin);
}
This is similar in solution to the multi-line comment problem answered previously.. I quote from that:
The flex manual section on using <<EOF>> is quite
helpful as it has exactly your case as an example, and their code can
also be copied verbatim into your flex program.
As it explains, when using <<EOF>> you cannot place it in a normal
regular expression pattern. It can only be proceeded by a the name of a state. In your code you are using a state to indicate you are
inside a string. This state is called STRING_MULTI. All you have
to do is put that in front of the <<EOF>> marker and give it an
action to do.
The special action function yyterminate() tells flex that you have
recognised the <<EOF>> and that it marks the end-of-input for your
program.
Combining the stings and comments into one flex program gives you:
%option noyywrap
%x COMMENT_MULTI STRING_MULTI
%%
[\n\t\r ]+ {
/* ignore whitespace */ }
<INITIAL>"/*" {
/* begin of multi-line comment */
yymore();
BEGIN(COMMENT_MULTI);
}
<INITIAL>["] { yymore(); BEGIN(STRING_MULTI);}
<STRING_MULTI>[^"]+ {yymore(); }
<STRING_MULTI>["] {printf("String was : %s\n",yytext); BEGIN(INITIAL); }
<STRING_MULTI><<EOF>> {printf("Unterminated String: %s\n",yytext); yyterminate();}
<COMMENT_MULTI>"*/" {
/* end of multi-line comment */
printf("'%s': was a multi-line comment\n", yytext);
BEGIN(INITIAL);
}
<COMMENT_MULTI>. {
yymore();
}
<COMMENT_MULTI>\n {
yymore();
}
<COMMENT_MULTI><<EOF>> {printf("Unterminated Comment: %s\n", yytext); yyterminate();}
%%
int main(int argc, char *argv[]){
yylex();
}
I have a problem with flex. It doesn't recognize the or operator in this rule:
[0-9A-Za-z]+{CORRECT} | {CORRECT}[0-9A-Za-z]+ [0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
If I split it into three rules then it is recognized:
[0-9A-Za-z]+{CORRECT} {...}
{CORRECT}[0-9A-Za-z]+ { ...}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
To explain myself better the pattern I am trying to recognize is:
CORRECT [1-9]*_[1-9]*0
And in order for flex to recognize the CORRECT pattern only when it is not surrounded by other characters I have to add these three rules.
Full flex code:
%option noyywrap
%{
#include <stdio.h>
int num_lines=1;
%}
CORRECT [1-9]*_[1-9]*0
%%
{CORRECT} { printf("CORRECT TOKEN:%s\n",yytext); }
[0-9A-Za-z]+{CORRECT} { printf("ERROR %d:Unidentified symbol: %s\n",num_lines,yytext);}
{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext); }
"\n" { num_lines++; }
" "
"\t"
"\r"
. { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
%%
int main(int argc,char **argv)
{
++argv,--argc;
if(argc>0)
yyin=fopen(argv[0],"r");
else
yyin=stdin;
yylex();
}
Whitespace is significant in a lex pattern. a | b is not the same as a|b. In the troublesome pattern, you have whitespace that I don't think you intended.
That said, in my opinion, your 3-pattern solution is easier to read and maintain.
I'm trying to present to the screen an error output when the user enters an incomplete
define , e.g :
#define A // this is wrong , error message should appear
#define A 5 // this is the correct form , no error message would be presented
but it doesn't work , here's the code :
%{
#include <stdio.h>
#include <string.h>
%}
%s FULLCOMMENT
%s LINECOMMENT
%s DEFINE
%s INCLUDE
%s PRINT
%s PSEUDO_C_CODE
STRING [^ \n]*
%%
<FULLCOMMENT>"*/" { BEGIN INITIAL; }
<FULLCOMMENT>. { /* Do Nothing */ }
<INCLUDE>"<stdio.h>"|"<stdlib.h>"|"<string.h>" { BEGIN INITIAL; }
<INCLUDE>. { printf("error\n"); return 0 ; }
<DEFINE>[ \t] { printf("eat a space within define\n"); }
<DEFINE>{STRING} { printf("eat string %s\n" , yytext);}
<DEFINE>\n { printf("eat a break line within define\n"); BEGIN INITIAL; }
"#include" { BEGIN INCLUDE; }
"#define" { printf("you gonna to define\n"); BEGIN DEFINE; }
"#define"+. { printf("error\n"); }
%%
int yywrap(void) { return 1; } // Callback at end of file
int main(void) {
yylex();
return 0 ;
}
Where did I go wrong ?
Here is the output :
a#ubuntu:~/Desktop$ ./a.out
#define A 4
error
A 4
#define A
error
A
The rule "#define"+. is longer and gets precedence over the earlier #define even for correct input. You could say just this :-
"#define"[:space:]*$ {printf("error\n"); }
Consider using the -dvT options with flex to get detailed debug output. Also I am not sure if you need such extensive use of states for anything except maybe comments. But you would know better.
So for a project that I'm working on, I am using Lex and Yacc to parse a FTP configuration file. The configuration files look something like this:
global {
num_daemons = 10
etc = /etc/ftpd
};
host "ftp-1.foobar.com" {
ftproot = /var/ftp/server1
max_out_bandwidth = 20.7
};
host "ftp-2.foobar.com" {
ftproot = /var/ftp/server2
exclude = /var/ftp/server2/private
};
host "ftp-3.foobar.com" {
ftproot = /var/ftp/server3
};
Now, my question is, how do I obtain this information in a usable way? Let's say I wanted to put things like the address after the host token into a struct. How would I do that? Also, how would I simply print out the values that I've parsed to the command line? Also, to run it, do I just cat the config file and pipe in the compiled c program? Thanks in advance for any help!
Here is my code:
%{
// tokens.l
#include <stdio.h>
#include <stdlib.h>
#include "y.tab.h"
int yyparse();
%}
%option noyywrap
%x OPTION
%x OPTID
%%
<INITIAL>global { return GLOBAL; }
<INITIAL>host { return HOST; }
<INITIAL>"[a-zA-z1-9./-]+" { return NAME; }
<INITIAL>\{ { return CURLY_OPEN; BEGIN OPTION; }
<INITIAL>\n { return EOLN; }
<INITIAL><<EOF>> { return EOFTOK; }
<OPTION>[a-zA-z1-9./-_]+ { return ID_NAME; BEGIN OPTID; }
<OPTION>[\t] {}
<OPTION>[\};] { return OPTION_CLOSE; BEGIN INITIAL;}
<OPTID>[a-zA-z1-9./-]+ { return ID_STRING; BEGIN OPTION; }
<OPTID>[0-9.]+ { return ID_NUM; BEGIN OPTION; }
<OPTID>[\n] { return EOLN; }
%%
int main(int argc, char **argv) {
// Where I am confused..
}
and my yacc file:
%{
// parse.y
#include <stdio.h>
#include <stdlib.h>
int yyerror(char *);
int yylex(void);
%}
%token ERROR EOLN EOFTOK
%token OPTION_CLOSE GLOBAL HOST NAME ID_NAME ID_STRING ID_NUM CURLY_OPEN
%%
input
: lines EOFTOK { YYACCEPT; }
;
lines
:
| lines line
;
line
: option
| opident
| OPTION_CLOSE
;
option
: GLOBAL CURLY_OPEN
| HOST NAME CURLY_OPEN
;
opident
: ID_NAME '=' ID_STRING
| ID_NAME '=' ID_NUM
;
%%
int yyerror(char *msg) {}
You would generally have variables which were accessible and set up before calling the parser, like a linked list of key/value pairs:
typedef struct sNode {
char *key;
char *val;
struct sNode *next;
} tNode;
tNode *lookupHead = NULL;
Then, in your Yacc code, something like:
opident
: ID_NAME '=' ID_STRING { addLookupStr (lookupHead, $1, $3); }
| ID_NAME '=' ID_NUM { other function call here }
;
This would basically execute that code as the rules are found (replacing the $ variables with the item in the rule, $1 is the value for the ID_NAME token, $2 is the =, and so on).
The function would be something like:
void addLookupStr (char *newkey, char *newval) {
// Check for duplicate keys, then attempt to add. All premature returns
// should also be logging errors and setting error flags as needed.
tNode *curr = lookupHead;
while (curr != NULL) {
if (strcmp (curr->key, newkey) == 0)
return;
curr = curr->next;
}
if ((curr = malloc (sizeof (tNode))) == NULL)
return;
if ((curr->key = strdup (newkey)) == NULL) {
free (curr);
return;
}
if ((curr->val = strdup (newval)) == NULL) {
free (curr->newkey);
free (curr);
return;
}
// All possibly-failing ops complete, insert at head of list.
curr->next = lookupHead;
lookupHead = curr;
}