How to present error when #define is incomplete in Flex? - parsing

I'm trying to present to the screen an error output when the user enters an incomplete
define , e.g :
#define A // this is wrong , error message should appear
#define A 5 // this is the correct form , no error message would be presented
but it doesn't work , here's the code :
#include <stdio.h>
#include <string.h>
STRING [^ \n]*
<FULLCOMMENT>. { /* Do Nothing */ }
<INCLUDE>"<stdio.h>"|"<stdlib.h>"|"<string.h>" { BEGIN INITIAL; }
<INCLUDE>. { printf("error\n"); return 0 ; }
<DEFINE>[ \t] { printf("eat a space within define\n"); }
<DEFINE>{STRING} { printf("eat string %s\n" , yytext);}
<DEFINE>\n { printf("eat a break line within define\n"); BEGIN INITIAL; }
"#include" { BEGIN INCLUDE; }
"#define" { printf("you gonna to define\n"); BEGIN DEFINE; }
"#define"+. { printf("error\n"); }
int yywrap(void) { return 1; } // Callback at end of file
int main(void) {
return 0 ;
Where did I go wrong ?
Here is the output :
a#ubuntu:~/Desktop$ ./a.out
#define A 4
A 4
#define A

The rule "#define"+. is longer and gets precedence over the earlier #define even for correct input. You could say just this :-
"#define"[:space:]*$ {printf("error\n"); }
Consider using the -dvT options with flex to get detailed debug output. Also I am not sure if you need such extensive use of states for anything except maybe comments. But you would know better.


Output produced for the given input using the bottom up parsing

I tried solving this question and the answer comes out to be option c. But in few textbooks answer given is option b. I am confused what would be the correct answer? Plz help me out!
GAAAAT is the correct answer; it is the output produced by a parser which honours the order of the actions in the translation rules (some of which occur in mid-rule).
Yacc/bison is one such parser, which makes it very easy to verify:
#include <ctype.h>
#include <stdio.h>
void yyerror(const char* msg) {
fprintf(stderr, "%s\n", msg);
int yylex(void) {
int ch;
while ((ch = getchar()) != EOF) {
if (isalnum(ch)) return ch;
return 0;
S: 'p' { putchar('G'); } P
P: 'q' { putchar('A'); } Q
P: 'r' { putchar('T'); }
P: %empty { putchar('E'); }
Q: 's' { putchar('A'); } P
Q: %empty { putchar('O'); }
int main(void) {
$ bison -o gate.c gate.y
$ gcc -std=c99 -Wall -o gate gate.c
$ ./gate<<<pqsqsr
If we modify the grammar to put all of the actions at the end of their respective rule, we obtain answer (b). (Other than the grammar, everything is the same as the previous example, so I'm only showing the new translation rules.)
S: 'p' P { putchar('G'); }
P: 'q' Q { putchar('A'); }
P: 'r' { putchar('T'); }
P: %empty { putchar('E'); }
Q: 's' P { putchar('A'); }
Q: %empty { putchar('O'); }
$ bison -o gate_no_mra.c gate_no_mra.y
$ gcc -std=c99 -Wall -o gate_no_mra gate_no_mra.c
$ ./gate_no_mra<<<pqsqsr

Difference "Flex and Bison" code in windows and linux

I am currently working through sample code from the O'Reilly press book entitled "Flex and Bison". I am using the GNU C compiler for Windows with Flex and Bison binary install for Windows which is launched using gcc rather than the Linux cc command.
The problem is that the code if copied directly from the book does not compile and I have had to hack it a bit to get it to work.
Example from book
Example 1-1. Word count fb1-1.l
/* just like Unix wc */
int chars = 0;
int words = 0;
int lines = 0;
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
. { chars++; }
main(int argc, char **argv)
printf("%8d%8d%8d\n", lines, words, chars);
I compiled the code in the flowing way using Windows command line:
flex file.l
gcc lex.yy.c -o a.exe
It crashed stating yywrap() was not found. I added that and then it worked but did not complete the printf in the main function as it just hung waiting for more input!
Here is my solution that works but feels like a hack and that I am not in full understanding of the process.
/* just like Unix wc */
*#include <string.h>*
int chars = 0;
int words = 0;
int lines = 0;
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
*"." { return ;}*
. { chars++; }
*int yywrap(void)
return 1;
int main(void)
printf("num lines is %8d, num words is %8d, num chars is %8d\n", lines, words, chars);
return 0;
I had to add a new rule to return out of yylex() which was not in the book, add yywrap()- not really knowing why and add string.h which was not present!. My main question is are there significant differences between flex for Windows and Unix and is it possible to run the original code with my gcc compiler and gnu flex without the said hacks?
I do not understand what have you achieved with this:
#include <string.h>
"." { return; }
But what I know for sure is that if you are running FLEX without specified input file you have to mark the end of input. Otherwise FLEX will wait for input. What I would suggest:
#include <stdio.h>
int chars = 0;
int words = 0;
int lines = 0;
WORD [a-zA-Z]+
{WORD} {
chars += strlen(yytext);
\n {
/* chars++; why this? there was no columns here - it's a new line */
\s {
/* count the spaces */
\t {
/* count the tabs */
chars += 4 /* or 8 */;
. {
printf("Error (unknown symbol):\t%c\n", yytext[0]);
int main()
/* iterate until end of input and even if errors - continue */
while(yylex()){ }
printf("lines:\t%8d\nwords:\t%8d\nchars:\t%8d\n", lines, words, chars);
return 0;
Build with:
flex input.l
output will be lex.yy.c
Then build:
gcc -o scanner.exe lex.yy.c -lfl
Create a txt file with input. Run following:
scanner.exe <in.txt>out.txt
Less sign means redirect input from file in.txt while greater sign means redirect output to out.txt Cause file has EOF at the end of file FLEX will properly stop.

Detecting ill formed strings and comments in flex

I am just learning flex and I have written a flex program to detect a given word is verb or not. I will take input from a text file.I want to improve the code. I want to detect if there is any ill formed or unfinished string in the code.Unfinished means it starts using the start symbol (" " or /* ) but doesn't have any ending one and ill formed means,for example ( "I am" a boy") or (/* this is a */ comment */) like these ones. I want to detect them in my code. How will I do that? My sample code is as follows:
is |
am |
are |
was |
were {printf("%s: is a verb",yytext);}
[a-zA-Z]+ {printf("%s: is a verb",yytext);}
["][^"]*["] {printf("'%s': is a string\n", yytext); }
. |\n
int main(int argc, char *argv[]){
yyin = fopen(argv[1], "r");
This is similar in solution to the multi-line comment problem answered previously.. I quote from that:
The flex manual section on using <<EOF>> is quite
helpful as it has exactly your case as an example, and their code can
also be copied verbatim into your flex program.
As it explains, when using <<EOF>> you cannot place it in a normal
regular expression pattern. It can only be proceeded by a the name of a state. In your code you are using a state to indicate you are
inside a string. This state is called STRING_MULTI. All you have
to do is put that in front of the <<EOF>> marker and give it an
action to do.
The special action function yyterminate() tells flex that you have
recognised the <<EOF>> and that it marks the end-of-input for your
Combining the stings and comments into one flex program gives you:
%option noyywrap
[\n\t\r ]+ {
/* ignore whitespace */ }
<INITIAL>"/*" {
/* begin of multi-line comment */
<INITIAL>["] { yymore(); BEGIN(STRING_MULTI);}
<STRING_MULTI>[^"]+ {yymore(); }
<STRING_MULTI>["] {printf("String was : %s\n",yytext); BEGIN(INITIAL); }
<STRING_MULTI><<EOF>> {printf("Unterminated String: %s\n",yytext); yyterminate();}
/* end of multi-line comment */
printf("'%s': was a multi-line comment\n", yytext);
<COMMENT_MULTI><<EOF>> {printf("Unterminated Comment: %s\n", yytext); yyterminate();}
int main(int argc, char *argv[]){

Flex (lexical analyzer) not recognizing or operator

I have a problem with flex. It doesn't recognize the or operator in this rule:
[0-9A-Za-z]+{CORRECT} | {CORRECT}[0-9A-Za-z]+ [0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
If I split it into three rules then it is recognized:
[0-9A-Za-z]+{CORRECT} {...}
{CORRECT}[0-9A-Za-z]+ { ...}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
To explain myself better the pattern I am trying to recognize is:
CORRECT [1-9]*_[1-9]*0
And in order for flex to recognize the CORRECT pattern only when it is not surrounded by other characters I have to add these three rules.
Full flex code:
%option noyywrap
#include <stdio.h>
int num_lines=1;
CORRECT [1-9]*_[1-9]*0
{CORRECT} { printf("CORRECT TOKEN:%s\n",yytext); }
[0-9A-Za-z]+{CORRECT} { printf("ERROR %d:Unidentified symbol: %s\n",num_lines,yytext);}
{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext); }
"\n" { num_lines++; }
" "
. { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
int main(int argc,char **argv)
Whitespace is significant in a lex pattern. a | b is not the same as a|b. In the troublesome pattern, you have whitespace that I don't think you intended.
That said, in my opinion, your 3-pattern solution is easier to read and maintain.

How does a Lex & Yacc parser output values?

So for a project that I'm working on, I am using Lex and Yacc to parse a FTP configuration file. The configuration files look something like this:
global {
num_daemons = 10
etc = /etc/ftpd
host "" {
ftproot = /var/ftp/server1
max_out_bandwidth = 20.7
host "" {
ftproot = /var/ftp/server2
exclude = /var/ftp/server2/private
host "" {
ftproot = /var/ftp/server3
Now, my question is, how do I obtain this information in a usable way? Let's say I wanted to put things like the address after the host token into a struct. How would I do that? Also, how would I simply print out the values that I've parsed to the command line? Also, to run it, do I just cat the config file and pipe in the compiled c program? Thanks in advance for any help!
Here is my code:
// tokens.l
#include <stdio.h>
#include <stdlib.h>
#include ""
int yyparse();
%option noyywrap
<INITIAL>global { return GLOBAL; }
<INITIAL>host { return HOST; }
<INITIAL>"[a-zA-z1-9./-]+" { return NAME; }
<INITIAL>\n { return EOLN; }
<INITIAL><<EOF>> { return EOFTOK; }
<OPTION>[a-zA-z1-9./-_]+ { return ID_NAME; BEGIN OPTID; }
<OPTION>[\t] {}
<OPTID>[a-zA-z1-9./-]+ { return ID_STRING; BEGIN OPTION; }
<OPTID>[0-9.]+ { return ID_NUM; BEGIN OPTION; }
<OPTID>[\n] { return EOLN; }
int main(int argc, char **argv) {
// Where I am confused..
and my yacc file:
// parse.y
#include <stdio.h>
#include <stdlib.h>
int yyerror(char *);
int yylex(void);
: lines EOFTOK { YYACCEPT; }
| lines line
: option
| opident
int yyerror(char *msg) {}
You would generally have variables which were accessible and set up before calling the parser, like a linked list of key/value pairs:
typedef struct sNode {
char *key;
char *val;
struct sNode *next;
} tNode;
tNode *lookupHead = NULL;
Then, in your Yacc code, something like:
: ID_NAME '=' ID_STRING { addLookupStr (lookupHead, $1, $3); }
| ID_NAME '=' ID_NUM { other function call here }
This would basically execute that code as the rules are found (replacing the $ variables with the item in the rule, $1 is the value for the ID_NAME token, $2 is the =, and so on).
The function would be something like:
void addLookupStr (char *newkey, char *newval) {
// Check for duplicate keys, then attempt to add. All premature returns
// should also be logging errors and setting error flags as needed.
tNode *curr = lookupHead;
while (curr != NULL) {
if (strcmp (curr->key, newkey) == 0)
curr = curr->next;
if ((curr = malloc (sizeof (tNode))) == NULL)
if ((curr->key = strdup (newkey)) == NULL) {
free (curr);
if ((curr->val = strdup (newval)) == NULL) {
free (curr->newkey);
free (curr);
// All possibly-failing ops complete, insert at head of list.
curr->next = lookupHead;
lookupHead = curr;
