Detecting ill formed strings and comments in flex - flex-lexer

I am just learning flex and I have written a flex program to detect a given word is verb or not. I will take input from a text file.I want to improve the code. I want to detect if there is any ill formed or unfinished string in the code.Unfinished means it starts using the start symbol (" " or /* ) but doesn't have any ending one and ill formed means,for example ( "I am" a boy") or (/* this is a */ comment */) like these ones. I want to detect them in my code. How will I do that? My sample code is as follows:
%%
[\t]+
is |
am |
are |
was |
were {printf("%s: is a verb",yytext);}
[a-zA-Z]+ {printf("%s: is a verb",yytext);}
["][^"]*["] {printf("'%s': is a string\n", yytext); }
. |\n
%%
int main(int argc, char *argv[]){
yyin = fopen(argv[1], "r");
yylex();
fclose(yyin);
}

This is similar in solution to the multi-line comment problem answered previously.. I quote from that:
The flex manual section on using <<EOF>> is quite
helpful as it has exactly your case as an example, and their code can
also be copied verbatim into your flex program.
As it explains, when using <<EOF>> you cannot place it in a normal
regular expression pattern. It can only be proceeded by a the name of a state. In your code you are using a state to indicate you are
inside a string. This state is called STRING_MULTI. All you have
to do is put that in front of the <<EOF>> marker and give it an
action to do.
The special action function yyterminate() tells flex that you have
recognised the <<EOF>> and that it marks the end-of-input for your
program.
Combining the stings and comments into one flex program gives you:
%option noyywrap
%x COMMENT_MULTI STRING_MULTI
%%
[\n\t\r ]+ {
/* ignore whitespace */ }
<INITIAL>"/*" {
/* begin of multi-line comment */
yymore();
BEGIN(COMMENT_MULTI);
}
<INITIAL>["] { yymore(); BEGIN(STRING_MULTI);}
<STRING_MULTI>[^"]+ {yymore(); }
<STRING_MULTI>["] {printf("String was : %s\n",yytext); BEGIN(INITIAL); }
<STRING_MULTI><<EOF>> {printf("Unterminated String: %s\n",yytext); yyterminate();}
<COMMENT_MULTI>"*/" {
/* end of multi-line comment */
printf("'%s': was a multi-line comment\n", yytext);
BEGIN(INITIAL);
}
<COMMENT_MULTI>. {
yymore();
}
<COMMENT_MULTI>\n {
yymore();
}
<COMMENT_MULTI><<EOF>> {printf("Unterminated Comment: %s\n", yytext); yyterminate();}
%%
int main(int argc, char *argv[]){
yylex();
}

Related

Flex: match if preceded by a character/pattern

How to match a pattern R only if it is preceded by another pattern S without reading S (to give S-matched input back to lex) ?
file.l :
%%
\\foo {
yytext++; // To remove the starting backslash
printf("%s\n", yytext);
}
\\ printf("backslash!\n");
.
%%
int main() {
yylex();
}
In the above example i want to accept foo only when it is preceded by a backslash \.
But in my current implementation i am eating the \ which is matched below.
To check run as :
lex file.l
gcc -lfl lex.yy.c
./a.out
Edit 1:
I tried using unput as suggested by #rici. But as i implemented there are strings having both patterns which do not detect both patterns.
Like bar\foo
%%
\\foo {
yytext++; // To remove the starting backslash
printf("foo\n");
unput('\\');
}
bar\\ printf("bar\n");
.
%%
int main() {
yylex();
}
Edit 2:
Got the answer here. It uses flex start condition.
Edit 3:
%x backslash
%%
<backslash>foo {
printf("foo\n");
BEGIN(INITIAL);
}
bar\\ {
BEGIN(backslash);
printf("bar\n");
}
.
%%
int main() {
yylex();
}

Difference "Flex and Bison" code in windows and linux

I am currently working through sample code from the O'Reilly press book entitled "Flex and Bison". I am using the GNU C compiler for Windows with Flex and Bison binary install for Windows which is launched using gcc rather than the Linux cc command.
The problem is that the code if copied directly from the book does not compile and I have had to hack it a bit to get it to work.
Example from book
Example 1-1. Word count fb1-1.l
/* just like Unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
. { chars++; }
%%
main(int argc, char **argv)
{
yylex();
printf("%8d%8d%8d\n", lines, words, chars);
}
I compiled the code in the flowing way using Windows command line:
flex file.l
gcc lex.yy.c -o a.exe
It crashed stating yywrap() was not found. I added that and then it worked but did not complete the printf in the main function as it just hung waiting for more input!
Here is my solution that works but feels like a hack and that I am not in full understanding of the process.
/* just like Unix wc */
%{
*#include <string.h>*
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
*"." { return ;}*
. { chars++; }
%%
*int yywrap(void)
{
return 1;
}*
int main(void)
{
yylex();
printf("num lines is %8d, num words is %8d, num chars is %8d\n", lines, words, chars);
return 0;
}
I had to add a new rule to return out of yylex() which was not in the book, add yywrap()- not really knowing why and add string.h which was not present!. My main question is are there significant differences between flex for Windows and Unix and is it possible to run the original code with my gcc compiler and gnu flex without the said hacks?
I do not understand what have you achieved with this:
#include <string.h>
"." { return; }
But what I know for sure is that if you are running FLEX without specified input file you have to mark the end of input. Otherwise FLEX will wait for input. What I would suggest:
%{
#include <stdio.h>
int chars = 0;
int words = 0;
int lines = 0;
%}
WORD [a-zA-Z]+
%%
{WORD} {
words++;
chars += strlen(yytext);
}
\n {
lines++;
/* chars++; why this? there was no columns here - it's a new line */
}
\s {
/* count the spaces */
chars++;
}
\t {
/* count the tabs */
chars += 4 /* or 8 */;
}
. {
printf("Error (unknown symbol):\t%c\n", yytext[0]);
chars++;
}
%%
int main()
{
/* iterate until end of input and even if errors - continue */
while(yylex()){ }
printf("lines:\t%8d\nwords:\t%8d\nchars:\t%8d\n", lines, words, chars);
return 0;
}
Build with:
flex input.l
output will be lex.yy.c
Then build:
gcc -o scanner.exe lex.yy.c -lfl
Create a txt file with input. Run following:
scanner.exe <in.txt>out.txt
Less sign means redirect input from file in.txt while greater sign means redirect output to out.txt Cause file has EOF at the end of file FLEX will properly stop.

Lua 'require' but files are only in memory

Setting: I'm using Lua from a C/C++ environment.
I have several lua files on disk. Those are read into memory and some more memory-only lua files become available during runtime. Think e.g. of an editor, with additional unsaved lua files.
So, I have a list<identifier, lua_file_content> in memory. Some of these files have require statements in them. When I try to load all these files to a lua instance (currently via lua_dostring) I get attempt to call global require (a nil value).
Is there a possibility to provide a require function, which replaces the old one and just uses the provided in memory files (those files are on the C side)?
Is there another way of allowing require in these files without having the required files on disk?
An example would be to load the lua stdlib from memory only without altering it. (This is actually my test case.)
Instead of replacing require, why not add a function to package.loaders? The code is nearly the same.
int my_loader(lua_State* state) {
// get the module name
const char* name = lua_tostring(state);
// find if you have such module loaded
if (mymodules.find(name) != mymodules.end())
{
luaL_loadbuffer(state, buffer, size, name);
// the chunk is now at the top of the stack
return 1;
}
// didn't find anything
return 0;
}
// When you load the lua state, insert this into package.loaders
http://www.lua.org/manual/5.1/manual.html#pdf-package.loaders
A pretty straightforward C++ function that would mimic require could be: (pseudocode)
int my_require(lua_State* state) {
// get the module name
const char* name = lua_tostring(state);
// find if you have such module loaded
if (mymodules.find(name) != mymodules.end())
luaL_loadbuffer(state, buffer, size, name);
// the chunk is now at the top of the stack
lua_call(state)
return 1;
}
Expose this function to Lua as require and you're good to go.
I'd also like to add that to completely mimic require's behaviour, you'd probably need to take care of package.loaded, to avoid the code to be loaded twice.
There is no package.loaders in lua 5.2
It called package.searchers now.
#include <stdio.h>
#include <string>
#include <lua.hpp>
std::string module_script;
int MyLoader(lua_State *L)
{
const char *name = luaL_checkstring(L, 1); // Module name
// std::string result = SearchScript(name); // Search your database.
std::string result = module_script; // Just for demo.
if( luaL_loadbuffer(L, result.c_str(), result.size(), name) )
{
printf("%s", lua_tostring(L, -1));
lua_pop(L, 1);
}
return 1;
}
void SetLoader(lua_State* L)
{
lua_register(L, "my_loader", MyLoader);
std::string str;
// str += "table.insert(package.loaders, 2, my_loader) \n"; // Older than lua v5.2
str += "table.insert(package.searchers, 2, my_loader) \n";
luaL_dostring(L, str.c_str());
}
void SetModule()
{
std::string str;
str += "print([[It is add.lua]]) \n";
str += "return { func = function() print([[message from add.lua]]) end } \n";
module_script=str;
}
void LoadMainScript(lua_State* L)
{
std::string str;
str += "dev = require [[add]] \n";
str += "print([[It is main.lua]]) \n";
str += "dev.func() \n";
if ( luaL_loadbuffer(L, str.c_str(), str.size(), "main") )
{
printf("%s", lua_tostring(L, -1));
lua_pop(L, 1);
return;
}
}
int main()
{
lua_State* L = luaL_newstate();
luaL_openlibs(L);
SetModule(L); // Write down module in memory. Lua not load it yet.
SetLoader(L);
LoadMainScript(L);
lua_pcall(L,0,0,0);
lua_close(L);
return 0;
}

Flex (lexical analyzer) not recognizing or operator

I have a problem with flex. It doesn't recognize the or operator in this rule:
[0-9A-Za-z]+{CORRECT} | {CORRECT}[0-9A-Za-z]+ [0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
If I split it into three rules then it is recognized:
[0-9A-Za-z]+{CORRECT} {...}
{CORRECT}[0-9A-Za-z]+ { ...}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ {...}
To explain myself better the pattern I am trying to recognize is:
CORRECT [1-9]*_[1-9]*0
And in order for flex to recognize the CORRECT pattern only when it is not surrounded by other characters I have to add these three rules.
Full flex code:
%option noyywrap
%{
#include <stdio.h>
int num_lines=1;
%}
CORRECT [1-9]*_[1-9]*0
%%
{CORRECT} { printf("CORRECT TOKEN:%s\n",yytext); }
[0-9A-Za-z]+{CORRECT} { printf("ERROR %d:Unidentified symbol: %s\n",num_lines,yytext);}
{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
[0-9A-Za-z]+{CORRECT}[0-9A-Za-z]+ { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext); }
"\n" { num_lines++; }
" "
"\t"
"\r"
. { printf("ERROR %d:Unidentified symbol: %s \n",num_lines,yytext);}
%%
int main(int argc,char **argv)
{
++argv,--argc;
if(argc>0)
yyin=fopen(argv[0],"r");
else
yyin=stdin;
yylex();
}
Whitespace is significant in a lex pattern. a | b is not the same as a|b. In the troublesome pattern, you have whitespace that I don't think you intended.
That said, in my opinion, your 3-pattern solution is easier to read and maintain.

How to present error when #define is incomplete in Flex?

I'm trying to present to the screen an error output when the user enters an incomplete
define , e.g :
#define A // this is wrong , error message should appear
#define A 5 // this is the correct form , no error message would be presented
but it doesn't work , here's the code :
%{
#include <stdio.h>
#include <string.h>
%}
%s FULLCOMMENT
%s LINECOMMENT
%s DEFINE
%s INCLUDE
%s PRINT
%s PSEUDO_C_CODE
STRING [^ \n]*
%%
<FULLCOMMENT>"*/" { BEGIN INITIAL; }
<FULLCOMMENT>. { /* Do Nothing */ }
<INCLUDE>"<stdio.h>"|"<stdlib.h>"|"<string.h>" { BEGIN INITIAL; }
<INCLUDE>. { printf("error\n"); return 0 ; }
<DEFINE>[ \t] { printf("eat a space within define\n"); }
<DEFINE>{STRING} { printf("eat string %s\n" , yytext);}
<DEFINE>\n { printf("eat a break line within define\n"); BEGIN INITIAL; }
"#include" { BEGIN INCLUDE; }
"#define" { printf("you gonna to define\n"); BEGIN DEFINE; }
"#define"+. { printf("error\n"); }
%%
int yywrap(void) { return 1; } // Callback at end of file
int main(void) {
yylex();
return 0 ;
}
Where did I go wrong ?
Here is the output :
a#ubuntu:~/Desktop$ ./a.out
#define A 4
error
A 4
#define A
error
A
The rule "#define"+. is longer and gets precedence over the earlier #define even for correct input. You could say just this :-
"#define"[:space:]*$ {printf("error\n"); }
Consider using the -dvT options with flex to get detailed debug output. Also I am not sure if you need such extensive use of states for anything except maybe comments. But you would know better.

Resources