Flex: match if preceded by a character/pattern - flex-lexer

How to match a pattern R only if it is preceded by another pattern S without reading S (to give S-matched input back to lex) ?
file.l :
%%
\\foo {
yytext++; // To remove the starting backslash
printf("%s\n", yytext);
}
\\ printf("backslash!\n");
.
%%
int main() {
yylex();
}
In the above example i want to accept foo only when it is preceded by a backslash \.
But in my current implementation i am eating the \ which is matched below.
To check run as :
lex file.l
gcc -lfl lex.yy.c
./a.out
Edit 1:
I tried using unput as suggested by #rici. But as i implemented there are strings having both patterns which do not detect both patterns.
Like bar\foo
%%
\\foo {
yytext++; // To remove the starting backslash
printf("foo\n");
unput('\\');
}
bar\\ printf("bar\n");
.
%%
int main() {
yylex();
}
Edit 2:
Got the answer here. It uses flex start condition.
Edit 3:
%x backslash
%%
<backslash>foo {
printf("foo\n");
BEGIN(INITIAL);
}
bar\\ {
BEGIN(backslash);
printf("bar\n");
}
.
%%
int main() {
yylex();
}

Related

Output produced for the given input using the bottom up parsing

I tried solving this question and the answer comes out to be option c. But in few textbooks answer given is option b. I am confused what would be the correct answer? Plz help me out!
GAAAAT is the correct answer; it is the output produced by a parser which honours the order of the actions in the translation rules (some of which occur in mid-rule).
Yacc/bison is one such parser, which makes it very easy to verify:
%{
#include <ctype.h>
#include <stdio.h>
void yyerror(const char* msg) {
fprintf(stderr, "%s\n", msg);
}
int yylex(void) {
int ch;
while ((ch = getchar()) != EOF) {
if (isalnum(ch)) return ch;
}
return 0;
}
%}
%%
S: 'p' { putchar('G'); } P
P: 'q' { putchar('A'); } Q
P: 'r' { putchar('T'); }
P: %empty { putchar('E'); }
Q: 's' { putchar('A'); } P
Q: %empty { putchar('O'); }
%%
int main(void) {
yyparse();
putchar('\n');
}
$ bison -o gate.c gate.y
$ gcc -std=c99 -Wall -o gate gate.c
$ ./gate<<<pqsqsr
GAAAAT
If we modify the grammar to put all of the actions at the end of their respective rule, we obtain answer (b). (Other than the grammar, everything is the same as the previous example, so I'm only showing the new translation rules.)
S: 'p' P { putchar('G'); }
P: 'q' Q { putchar('A'); }
P: 'r' { putchar('T'); }
P: %empty { putchar('E'); }
Q: 's' P { putchar('A'); }
Q: %empty { putchar('O'); }
$ bison -o gate_no_mra.c gate_no_mra.y
$ gcc -std=c99 -Wall -o gate_no_mra gate_no_mra.c
$ ./gate_no_mra<<<pqsqsr
TAAAAG

Difference "Flex and Bison" code in windows and linux

I am currently working through sample code from the O'Reilly press book entitled "Flex and Bison". I am using the GNU C compiler for Windows with Flex and Bison binary install for Windows which is launched using gcc rather than the Linux cc command.
The problem is that the code if copied directly from the book does not compile and I have had to hack it a bit to get it to work.
Example from book
Example 1-1. Word count fb1-1.l
/* just like Unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
. { chars++; }
%%
main(int argc, char **argv)
{
yylex();
printf("%8d%8d%8d\n", lines, words, chars);
}
I compiled the code in the flowing way using Windows command line:
flex file.l
gcc lex.yy.c -o a.exe
It crashed stating yywrap() was not found. I added that and then it worked but did not complete the printf in the main function as it just hung waiting for more input!
Here is my solution that works but feels like a hack and that I am not in full understanding of the process.
/* just like Unix wc */
%{
*#include <string.h>*
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
*"." { return ;}*
. { chars++; }
%%
*int yywrap(void)
{
return 1;
}*
int main(void)
{
yylex();
printf("num lines is %8d, num words is %8d, num chars is %8d\n", lines, words, chars);
return 0;
}
I had to add a new rule to return out of yylex() which was not in the book, add yywrap()- not really knowing why and add string.h which was not present!. My main question is are there significant differences between flex for Windows and Unix and is it possible to run the original code with my gcc compiler and gnu flex without the said hacks?
I do not understand what have you achieved with this:
#include <string.h>
"." { return; }
But what I know for sure is that if you are running FLEX without specified input file you have to mark the end of input. Otherwise FLEX will wait for input. What I would suggest:
%{
#include <stdio.h>
int chars = 0;
int words = 0;
int lines = 0;
%}
WORD [a-zA-Z]+
%%
{WORD} {
words++;
chars += strlen(yytext);
}
\n {
lines++;
/* chars++; why this? there was no columns here - it's a new line */
}
\s {
/* count the spaces */
chars++;
}
\t {
/* count the tabs */
chars += 4 /* or 8 */;
}
. {
printf("Error (unknown symbol):\t%c\n", yytext[0]);
chars++;
}
%%
int main()
{
/* iterate until end of input and even if errors - continue */
while(yylex()){ }
printf("lines:\t%8d\nwords:\t%8d\nchars:\t%8d\n", lines, words, chars);
return 0;
}
Build with:
flex input.l
output will be lex.yy.c
Then build:
gcc -o scanner.exe lex.yy.c -lfl
Create a txt file with input. Run following:
scanner.exe <in.txt>out.txt
Less sign means redirect input from file in.txt while greater sign means redirect output to out.txt Cause file has EOF at the end of file FLEX will properly stop.

Detecting ill formed strings and comments in flex

I am just learning flex and I have written a flex program to detect a given word is verb or not. I will take input from a text file.I want to improve the code. I want to detect if there is any ill formed or unfinished string in the code.Unfinished means it starts using the start symbol (" " or /* ) but doesn't have any ending one and ill formed means,for example ( "I am" a boy") or (/* this is a */ comment */) like these ones. I want to detect them in my code. How will I do that? My sample code is as follows:
%%
[\t]+
is |
am |
are |
was |
were {printf("%s: is a verb",yytext);}
[a-zA-Z]+ {printf("%s: is a verb",yytext);}
["][^"]*["] {printf("'%s': is a string\n", yytext); }
. |\n
%%
int main(int argc, char *argv[]){
yyin = fopen(argv[1], "r");
yylex();
fclose(yyin);
}
This is similar in solution to the multi-line comment problem answered previously.. I quote from that:
The flex manual section on using <<EOF>> is quite
helpful as it has exactly your case as an example, and their code can
also be copied verbatim into your flex program.
As it explains, when using <<EOF>> you cannot place it in a normal
regular expression pattern. It can only be proceeded by a the name of a state. In your code you are using a state to indicate you are
inside a string. This state is called STRING_MULTI. All you have
to do is put that in front of the <<EOF>> marker and give it an
action to do.
The special action function yyterminate() tells flex that you have
recognised the <<EOF>> and that it marks the end-of-input for your
program.
Combining the stings and comments into one flex program gives you:
%option noyywrap
%x COMMENT_MULTI STRING_MULTI
%%
[\n\t\r ]+ {
/* ignore whitespace */ }
<INITIAL>"/*" {
/* begin of multi-line comment */
yymore();
BEGIN(COMMENT_MULTI);
}
<INITIAL>["] { yymore(); BEGIN(STRING_MULTI);}
<STRING_MULTI>[^"]+ {yymore(); }
<STRING_MULTI>["] {printf("String was : %s\n",yytext); BEGIN(INITIAL); }
<STRING_MULTI><<EOF>> {printf("Unterminated String: %s\n",yytext); yyterminate();}
<COMMENT_MULTI>"*/" {
/* end of multi-line comment */
printf("'%s': was a multi-line comment\n", yytext);
BEGIN(INITIAL);
}
<COMMENT_MULTI>. {
yymore();
}
<COMMENT_MULTI>\n {
yymore();
}
<COMMENT_MULTI><<EOF>> {printf("Unterminated Comment: %s\n", yytext); yyterminate();}
%%
int main(int argc, char *argv[]){
yylex();
}

Flex (lexical analyzer) not recognizing or operator

I have a problem with flex. It doesn't recognize the or operator in this rule:
[0-9A-Za-z]+{CORRECT} | {CORRECT}[0-9A-Za-z]+ [0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
If I split it into three rules then it is recognized:
[0-9A-Za-z]+{CORRECT} {...}
{CORRECT}[0-9A-Za-z]+ { ...}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
To explain myself better the pattern I am trying to recognize is:
CORRECT [1-9]*_[1-9]*0
And in order for flex to recognize the CORRECT pattern only when it is not surrounded by other characters I have to add these three rules.
Full flex code:
%option noyywrap
%{
#include <stdio.h>
int num_lines=1;
%}
CORRECT [1-9]*_[1-9]*0
%%
{CORRECT} { printf("CORRECT TOKEN:%s\n",yytext); }
[0-9A-Za-z]+{CORRECT} { printf("ERROR %d:Unidentified symbol: %s\n",num_lines,yytext);}
{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext); }
"\n" { num_lines++; }
" "
"\t"
"\r"
. { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
%%
int main(int argc,char **argv)
{
++argv,--argc;
if(argc>0)
yyin=fopen(argv[0],"r");
else
yyin=stdin;
yylex();
}
Whitespace is significant in a lex pattern. a | b is not the same as a|b. In the troublesome pattern, you have whitespace that I don't think you intended.
That said, in my opinion, your 3-pattern solution is easier to read and maintain.

How to present error when #define is incomplete in Flex?

I'm trying to present to the screen an error output when the user enters an incomplete
define , e.g :
#define A // this is wrong , error message should appear
#define A 5 // this is the correct form , no error message would be presented
but it doesn't work , here's the code :
%{
#include <stdio.h>
#include <string.h>
%}
%s FULLCOMMENT
%s LINECOMMENT
%s DEFINE
%s INCLUDE
%s PRINT
%s PSEUDO_C_CODE
STRING [^ \n]*
%%
<FULLCOMMENT>"*/" { BEGIN INITIAL; }
<FULLCOMMENT>. { /* Do Nothing */ }
<INCLUDE>"<stdio.h>"|"<stdlib.h>"|"<string.h>" { BEGIN INITIAL; }
<INCLUDE>. { printf("error\n"); return 0 ; }
<DEFINE>[ \t] { printf("eat a space within define\n"); }
<DEFINE>{STRING} { printf("eat string %s\n" , yytext);}
<DEFINE>\n { printf("eat a break line within define\n"); BEGIN INITIAL; }
"#include" { BEGIN INCLUDE; }
"#define" { printf("you gonna to define\n"); BEGIN DEFINE; }
"#define"+. { printf("error\n"); }
%%
int yywrap(void) { return 1; } // Callback at end of file
int main(void) {
yylex();
return 0 ;
}
Where did I go wrong ?
Here is the output :
a#ubuntu:~/Desktop$ ./a.out
#define A 4
error
A 4
#define A
error
A
The rule "#define"+. is longer and gets precedence over the earlier #define even for correct input. You could say just this :-
"#define"[:space:]*$ {printf("error\n"); }
Consider using the -dvT options with flex to get detailed debug output. Also I am not sure if you need such extensive use of states for anything except maybe comments. But you would know better.

Resources