Flex reentrant start with user-specific state - flex-lexer

Flex sets the YY_STATE to INITIAL by default when yyscan_t is called.
I'm trying to make a reentrant scanner that can start with user-specific state instead of INITIAL.
Here is the case
/* comment start //not passed into flex
in comment //first line passed into flex
end of comment*/ //second line passed into flex
For some reasons these 2 lines are separately fed into the reentrant scanner and the YY_STATE the line belongs to are known. What I need is to pass the comment state into reentrant flex and switch YY_STATE to COMMENT before start lexing in comment\n.
My workaround are adding a dummy token in head of a line and passing the state as yyextra into flex. Once the dummy token is recognized, switch to the specific state. Hence flex begins lexing the line with specific YY_STATE. However, adding a dummy token at the beginning of each line is time-consuming.
Here is the way I used to call reentrant flex:
yyscan_t scanner;
YY_BUFFER_STATE buffer;
yylex_init(&scanner);
buffer = yy_scan_string(inputStr, scanner);
yyset_extra(someStructure, scanner);
yylex(scanner);
yy_delete_buffer(buffer, scanner);
yylex_destroy(scanner);
Is it possible to set YY_STATE before yylex(scanner) is called ?

If you are only calling yylex once for each input line, then you could just add an extra argument to yylex which provides the start condition to switch to, and set the start condition at the top of yylex.
But there's no simple way to refer to start conditions from outside of the flex file, nor is there a convenient way to extract the current start condition from the yystate_t object. The fact that you claim to have this information available suggests that you are storing it somewhere when you change start states, so you could restore the start state from that same place when you start up yylex. The simplest place to store the information would be the yyextra object, so that's the basis of this sample code:
File begin.int.h
/* This is the internal header file, which defines the extra data structure
* and, in this case, the tokens.
*/
#ifndef BEGIN_INT_H
#define BEGIN_INT_H
struct Extra {
int start;
};
enum Tokens { WORD = 256 };
#endif
File begin.h
/* This is the external header, which includes the header produced by
* flex. That header cannot itself be included in the flex-generated code,
* and it depends on the internal header. So the order of includes here is
* (sadly) important.
*/
#ifndef BEGIN_H_
#define BEGIN_H_
#include "begin.int.h"
#include "begin.lex.h"
#endif
File: begin.l
/* Very simple lexer, whose only purpose is to drop comments. */
%option noinput nounput noyywrap nodefault 8bit
%option reentrant
%option extra-type="struct Extra*"
%{
#include "begin.int.h"
/* This macro ensures that start condition changes are saved */
#define MY_BEGIN(s) BEGIN(yyextra->start = s)
%}
%x IN_COMMENT
%%
/* See note below */
BEGIN (yyextra->start);
"/*" MY_BEGIN(IN_COMMENT);
[[:alnum:]]+ return WORD;
[[:space:]]+ ;
. return yytext[0];
<IN_COMMENT>{
"*/" MY_BEGIN(INITIAL);
.|[^*]+ ;
}
Note:
Any indented code after the first %% and before the first pattern is inserted at the beginning of yylex; the only thing that executes before it is the one-time initialization of the yystate_t object, if necessary.
File begin.main.c
/* Simple driver which creates and destroys a scanner object for every line
* of input. Note, however, that it reuses the extra data object, which holds
* persistent information (in this case, the current start condition).
*/
#include <stdio.h>
#include "begin.h"
int main ( int argc, char * argv[] ) {
char* buffer = NULL;
size_t buflen = 0;
struct Extra my_extra = {0};
for (;;) {
ssize_t nr = getline(&buffer, &buflen, stdin);
if (nr < 0) break;
if (nr == 0) continue;
yyscan_t scanner;
yylex_init_extra(&my_extra, &scanner);
/* Ensure there are two NUL bytes for yy_scan_buffer */
if (buflen < nr + 2) {
buffer = realloc(buffer, nr + 2);
buflen = nr + 2;
}
buffer[nr + 1] = 0;
YY_BUFFER_STATE b = yy_scan_buffer(buffer, nr + 2, scanner);
for (;;) {
int token = yylex(scanner);
if (token == 0) break;
printf("%d: '%s'\n", token, yyget_text(scanner));
}
yy_delete_buffer(b, scanner);
yylex_destroy(scanner);
}
return 0;
}
Build:
flex -o begin.lex.c --header-file begin.lex.h begin.l
gcc -Wall -ggdb -o begin begin.lex.c begin.main.c

Related

How to add local variables to yylex function in flex lexer?

I was writing a lexer file that matches simple custom delimited strings of the form xyz$this is stringxyz. This is nearly how I did it:
%{
char delim[16];
uint8_t dlen;
%}
%%
.*$ {
dlen = yyleng-1;
strncpy(delim, yytext, dlen);
BEGIN(STRING);
}
<STRING>. {
if(yyleng >= dlen) {
if(strncmp(delim, yytext[yyleng-dlen], dlen) == 0) {
BEGIN(INITIAL);
return STR;
}
}
yymore();
}
%%
Now I wanted to convert this to reentrant lexer. But I don't know how to make delim and dlen as local variables inside yylex apart from modifying generated lexer. Someone please help me how should I do this.
I don't recommend to store these in yyextra because, these variables need not persist across multiple calls to yylex. Hence I would prefer an answer that guides me towards declaring these as local variables.
In the (f)lex file, any indented lines between the %% and the first rule are copied verbatim into yylex() prior to the first statement, precisely to allow you to declare and initialize local variables.
This behaviour is guaranteed by the Posix specification; it is not a flex extension: (emphasis added)
Any such input (beginning with a <blank>or within "%{" and "%}" delimiter lines) appearing at the beginning of the Rules section before any rules are specified shall be written to lex.yy.c after the declarations of variables for the yylex() function and before the first line of code in yylex(). Thus, user variables local to yylex() can be declared here, as well as application code to execute upon entry to yylex().
A similar statement is in the Flex manual section 5.2, Format of the Rules Section
The strategy you propose will work, certainly, but it's not very efficient. You might want to consider using input() to read characters one at a time, although that's not terribly efficient either. In any event, delim is unnecessary:
%%
int dlen;
[^$\n]{1,16}\$ {
dlen = yyleng-1;
yymore();
BEGIN(STRING);
}
<STRING>. {
if(yyleng > dlen * 2) {
if(memcmp(yytext, yytext + yyleng - dlen, dlen) == 0) {
/* Remove the delimiter from the reported value of yytext. */
yytext += dlen + 1;
yyleng -= 2 * dlen + 1;
yytext[yyleng] = 0;
return STR;
}
}
yymore();
}
%%

Does each call to `yylex()` generate a token or all the tokens for the input?

I am trying to understand how flex works under the hood.
In the following first example, it seems that main() calls yylex() only once, and yylex() generates all the tokens for the entire input.
In the second example, it seems that main() calls yylex() once per token generated, and yylex() generates a token per call.
Does each call to yylex() generate a token or all the tokens for the input?
Why is yylex() called different number of times in the two examples?
I heard that yylex() is like a coroutine, and each call to it will resume with the rest of the input left from last call and generate a token. In that sense, how does the first example calls yylex() just once and generate all the tokens in the input?
/* just like Unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
. { chars++; }
%%
main(int argc, char **argv)
{
yylex();
printf("%8d%8d%8d\n", lines, words, chars);
}
$ ./a.out
The boy stood on the burning deck
shelling peanuts by the peck
^D
2 12 63
$
and
/* recognize tokens for the calculator and print them out */
%{
enum yytokentype {
NUMBER = 258,
ADD = 259,
SUB = 260,
MUL = 261,
DIV = 262,
ABS = 263,
EOL = 264
};
int yylval;
%}
%%
"+" { return ADD; }
"-" { return SUB; }
"*" { return MUL; }
"/" { return DIV; }
"|" { return ABS; }
[0-9]+ { yylval = atoi(yytext); return NUMBER; }
\n { return EOL; }
[ \t] { /* ignore whitespace */ }
. { printf("Mystery character %c\n", *yytext); }
%%
main(int argc, char **argv)
{
int tok;
while(tok = yylex()) {
printf("%d", tok);
if(tok == NUMBER) printf(" = %d\n", yylval);
else printf("\n");
}
}
$ ./a.out
a / 34 + |45
Mystery character a
262
258 = 34
259
263
258 = 45
264
^D
$
Flex doesn't decide when the scanner will return (except for the default EOF rule). The scanner which it builds performs lexical actions in a loop until some action returns. So it is entirely up to you how you want to structure your scanner.
However, the classic yyparse/yylex processing model consists of the parser calling yylex() every time it needs a new token. So it expects yylex() to return immediately once it finds a token.
In your first code example, there is no parser and the scanner action is limited to printing out the token. While the example is perfectly correct, relying on the scanner loop to repeatedly execute actions, I'd prefer the second model even if you don't (yet) intend to add a parser, because it will make it easier to decouple token handling from token generation.
That doesn't mean that every lexical action will contain a return statement, though. Some lexical patterns correspond to non-tokens (comments and whitespace, for example), and the corresponding action will most likely do nothing (other than possibly recording input position) so that the scanner will continue to search fir a token to return.
(F)lex scanners are not easy to make into coroutines, so if a coroutine is really required (for example, to incrementally parse a asynchronous input), then another tool might be preferred.
Bison does offer the possiblity to generate a "push parser" in which the scanner calls the parser every time it finds a token, rather than returning to the parser. But neither the "push" nor the traditional "pull" model have anything to do with coroutines, IMHO, and the use of the word to describe parser/scanner interaction strikes me as imprecise and unuseful (although I have a lot of respect for the author you might be quoting.)

Any suggestions about how to implement a BASIC language parser/interpreter?

I've been trying to implement a BASIC language interpreter (in C/C++) but I haven't found any book or (thorough) article which explains the process of parsing the language constructs. Some commands are rather complex and hard to parse, especially conditionals and loops, such as IF-THEN-ELSE and FOR-STEP-NEXT, because they can mix variables with constants and entire expressions and code and everything else, for example:
10 IF X = Y + Z THEN GOTO 20 ELSE GOSUB P
20 FOR A = 10 TO B STEP -C : PRINT C$ : PRINT WHATEVER
30 NEXT A
It seems like a nightmare to be able to parse something like that and make it work. And to make things worse, programs written in BASIC can easily be a tangled mess. That's why I need some advice, read some book or whatever to make my mind clear about this subject. What can you suggest?
You've picked a great project - writing interpreters can be lots of fun!
But first, what do we even mean by an interpreter? There are different types of interpreters.
There is the pure interpreter, where you simply interpret each language element as you find it. These are the easiest to write, and the slowest.
A step up, would be to convert each language element into some sort of internal form, and then interpret that. Still pretty easy to write.
The next step, would be to actually parse the language, and generate a syntax tree, and then interpret that. This is somewhat harder to write, but once you've done it a few times, it becomes pretty easy.
Once you have a syntax tree, you can fairly easily generate code for a custom stack virtual machine. A much harder project is to generate code for an existing virtual machine, such as the JVM or CLR.
In programming, like most engineering endeavors, careful planning greatly helps, especially with complicated projects.
So the first step is to decide which type of interpreter you wish to write. If you have not read any of a number of compiler books (e.g., I always recommend Niklaus Wirth's "Compiler Construction" as one of the best introductions to the subject, and is now freely available on the web in PDF form), I would recommend that you go with the pure interpreter.
But you still need to do some additional planning. You need to rigorously define what it is you are going to be interpreting. EBNF is great for this. For a gentile introduction EBNF, read the first three parts of a Simple Compiler at http://www.semware.com/html/compiler.html It is written at the high school level, and should be easy to digest. Yes, I tried it on my kids first :-)
Once you have defined what it is you want to be interpreting, you are ready to write your interpreter.
Abstractly, you're simple interpreter will be divided into a scanner (technically, a lexical analyzer), a parser, and an evaluator. In the simple pure interpolator case, the parser and evaluator will be combined.
Scanners are easy to write, and easy to test, so we won't spend any time on them. See the aforementioned link for info on crafting a simple scanner.
Lets (for example) define your goto statement:
gotostmt -> 'goto' integer
integer -> [0-9]+
This tells us that when we see the token 'goto' (as delivered by the scanner), the only thing that can follow is an integer. And an integer is simply a string a digits.
In pseudo code, we might handle this as so:
(token - is the current token, which is the current element just returned via the scanner)
loop
if token == "goto"
goto_stmt()
elseif token == "gosub"
gosub_stmt()
elseif token == .....
endloop
proc goto_stmt()
expect("goto") -- redundant, but used to skip over goto
if is_numeric(token)
--now, somehow set the instruction pointer at the requested line
else
error("expecting a line number, found '%s'\n", token)
end
end
proc expect(s)
if s == token
getsym()
return true
end
error("Expecting '%s', found: '%s'\n", curr_token, s)
end
See how simple it is? Really, the only hard thing to figure out in a simple interpreter is the handling of expressions. A good recipe for handling those is at: http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm Combined with the aforementioned references, you should have enough to handle the sort of expressions you would encounter in BASIC.
Ok, time for a concrete example. This is from a larger 'pure interpreter', that handles a enhanced version of Tiny BASIC (but big enough to run Tiny Star Trek :-) )
/*------------------------------------------------------------------------
Simple example, pure interpreter, only supports 'goto'
------------------------------------------------------------------------*/
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <setjmp.h>
#include <ctype.h>
enum {False=0, True=1, Max_Lines=300, Max_Len=130};
char *text[Max_Lines+1]; /* array of program lines */
int textp; /* used by scanner - ptr in current line */
char tok[Max_Len+1]; /* the current token */
int cur_line; /* the current line number */
int ch; /* current character */
int num; /* populated if token is an integer */
jmp_buf restart;
int error(const char *fmt, ...) {
va_list ap;
char buf[200];
va_start(ap, fmt);
vsprintf(buf, fmt, ap);
va_end(ap);
printf("%s\n", buf);
longjmp(restart, 1);
return 0;
}
int is_eol(void) {
return ch == '\0' || ch == '\n';
}
void get_ch(void) {
ch = text[cur_line][textp];
if (!is_eol())
textp++;
}
void getsym(void) {
char *cp = tok;
while (ch <= ' ') {
if (is_eol()) {
*cp = '\0';
return;
}
get_ch();
}
if (isalpha(ch)) {
for (; !is_eol() && isalpha(ch); get_ch()) {
*cp++ = (char)ch;
}
*cp = '\0';
} else if (isdigit(ch)) {
for (; !is_eol() && isdigit(ch); get_ch()) {
*cp++ = (char)ch;
}
*cp = '\0';
num = atoi(tok);
} else
error("What? '%c'", ch);
}
void init_getsym(const int n) {
cur_line = n;
textp = 0;
ch = ' ';
getsym();
}
void skip_to_eol(void) {
tok[0] = '\0';
while (!is_eol())
get_ch();
}
int accept(const char s[]) {
if (strcmp(tok, s) == 0) {
getsym();
return True;
}
return False;
}
int expect(const char s[]) {
return accept(s) ? True : error("Expecting '%s', found: %s", s, tok);
}
int valid_line_num(void) {
if (num > 0 && num <= Max_Lines)
return True;
return error("Line number must be between 1 and %d", Max_Lines);
}
void goto_line(void) {
if (valid_line_num())
init_getsym(num);
}
void goto_stmt(void) {
if (isdigit(tok[0]))
goto_line();
else
error("Expecting line number, found: '%s'", tok);
}
void do_cmd(void) {
for (;;) {
while (tok[0] == '\0') {
if (cur_line == 0 || cur_line >= Max_Lines)
return;
init_getsym(cur_line + 1);
}
if (accept("bye")) {
printf("That's all folks!\n");
exit(0);
} else if (accept("run")) {
init_getsym(1);
} else if (accept("goto")) {
goto_stmt();
} else {
error("Unknown token '%s' at line %d", tok, cur_line); return;
}
}
}
int main() {
int i;
for (i = 0; i <= Max_Lines; i++) {
text[i] = calloc(sizeof(char), (Max_Len + 1));
}
setjmp(restart);
for (;;) {
printf("> ");
while (fgets(text[0], Max_Len, stdin) == NULL)
;
if (text[0][0] != '\0') {
init_getsym(0);
if (isdigit(tok[0])) {
if (valid_line_num())
strcpy(text[num], &text[0][textp]);
} else
do_cmd();
}
}
}
Hopefully, that will be enough to get you started. Have fun!
I will certainly get beaten by telling this ...but...:
First, I am actually working on a standalone library ( as a hobby ) that is made of:
a tokenizer, building linear (flat list) of tokens from the source text and following the same sequence as the text ( lexems created from the text flow ).
A parser by hands (syntax analyse; pseudo-compiler )
There is no "pseudo-code" nor "virtual CPU/machine".
Instructions(such as 'return', 'if' 'for' 'while'... then arithemtic expressions ) are represented by a base c++-struct/class and is the object itself. The base object, I name it atom, have a virtual method called "eval", among other common members, that is the "execution/branch" also by itself. So no matter I have an 'if' statement with its possible branchings ( single statement or bloc of statements/instructions ) as true or false condition, it will be called from the base virtual atom::eval() ... and so on for everything that is an atom.
Even 'objects' such as variables are 'atom'. 'eval()' will simply return its value from a variant container held by the atom itself ( pointer, refering to the 'local' variant instance (the instance variant iself) held the 'atom' or to another variant held by an atom that is created in a given 'bloc/stack'. So 'atom' are 'inplace' instructions/objects.
As of now, as an example, chunk of not really meaningful 'code' as below just works:
r = 5!; // 5! : (factorial of 5 )
Response = 1 + 4 - 6 * --r * ((3+5)*(3-4) * 78);
if (Response != 1){ /* '<>' also is not equal op. */
return r^3;
}
else{
return 0;
}
Expressions ( arithemtics ) are built into binary tree expression:
A = b+c; =>
=
/ \
A +
/ \
b c
So the 'instruction'/statement for expression like above is the tree-entry atom that in the above case, is the '=' (binary) operator.
The tree is built with atom::r0,r1,r2 :
atom 'A' :
r0
|
A
/ \
r1 r2
Regarding 'full-duplex' mecanism between c++ runtime and the 'script' library, I've made class_adaptor and adaptor<> :
ex.:
template<typename R, typename ...Args> adaptor_t<T,R, Args...>& import_method(const lstring& mname, R (T::*prop)(Args...)) { ... }
template<typename R, typename ...Args> adaptor_t<T,R, Args...>& import_property(const lstring& mname, R (T::*prop)(Args...)) { ... }
Second: I know there are plenty of tools and libs out there such as lua, boost::bind<*>, QML, JSON, etc... But in my situation, I need to create my very own [edit] 'independant' [/edit] lib for "live scripting". I was scared that my 'interpreter' could take a huge amount of RAM, but I am surprised that it is not as big as using QML,jscript or even lua :-)
Thank you :-)
Don't bother with hacking a parser together by hand. Use a parser generator. lex + yacc is the classic lexer/parser generator combination, but a Google search will reveal plenty of others.

What this cast and assignment is all about?

I am reading Richard Stevens' Advance Programming in unix environment.
There is a code in thread synchronization category (chapter - 11).
This is code showing is showing how to avoid race conditions for many shared structure of same type.
This code is showing two mutex for synch.- one for a list fh (a list which keep track of all the foo structures) & f_next field and another for the structure foo
The code is:
#include <stdlib.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#define NHASH 29
#define HASH(fp) (((unsigned long)fp)%NHASH)
struct foo *fh[NHASH];
pthread_mutex_t hashlock = PTHREAD_MUTEX_INITIALIZER;
struct foo {
int f_count;
pthread_mutex_t f_lock;
struct foo *f_next; /* protected by hashlock */
int f_id;
/* ... more stuff here ... */
};
struct foo * foo_alloc(void) /* allocate the object */
{
struct foo *fp;
int idx;
if ((fp = malloc(sizeof(struct foo))) != NULL) {
fp->f_count = 1;
if (pthread_mutex_init(&fp->f_lock, NULL) != 0) {
free(fp);
return(NULL);
}
idx = HASH(fp);
pthread_mutex_lock(&hashlock);
///////////////////// HERE -----------------
fp->f_next = fh[idx];
fh[idx] = fp->f_next;
//////////////////// UPTO HERE -------------
pthread_mutex_lock(&fp->f_lock);
pthread_mutex_unlock(&hashlock);
/* ... continue initialization ... */
pthread_mutex_unlock(&fp->f_lock);
}
return(fp);
}
void foo_hold(struct foo *fp) /* add a reference to the object */
.......
The doubt is
1) What is HASH(fp) pre-processor doing?
I know that it is typecasting what is fp store and then taking its modulo. But, in the function foo_alloc we are just passing the address of newly allocated foo structure.
Why we are doing this I know that this will give me a integer between 0 and 28 - to store in array fh. But why are we taking modulo of an address. Why there is so much randomization?
2) Suppose i accept that, now after this what these two lines are doing (also highlighted in the code) :
fp->f_next = fh[idx];
fh[idx] = fp->f_next;
I hope initially fh[idx] has any garbage value which i assigned to the f_next field of foo and in the next line what is happening , again the same assignment but in opposite order.
struct foo *fh[NHASH] is a hash table, and use the HASH macro as the hash function.
1) HASH(fp) calculates the index to decide where the in the fh to store fp, and it uses the address of the fp and uses the address as key to calculate the index. We can easily typecast the address to the long type.
2) Use the linked list to avoid the hash collisions called separate chaining, and I think the following cod is right, and you can check it in the book :
fp->f_next = fh[idx];
fh[idx] = fp;
insert the fp element to the header of the linked list fh[idx], and the initial value of the fh[idx] is null.

Evaluating Mathematical Expressions using Lua

In my previous question I was looking for a way of evaulating complex mathematical expressions in C, most of the suggestions required implementing some type of parser.
However one answer, suggested using Lua for evaluating the expression. I am interested in this approach but I don't know anything about Lua.
Can some one with experience in Lua shed some light?
Specifically what I'd like to know is
Which API if any does Lua provide that can evaluate mathematical expressions passed in as a string? If there is no API to do such a thing, may be some one can shed some light on the linked answer as it seemed like a good approach :)
Thanks
The type of expression I'd like to evaluate is given some user input such as
y = x^2 + 1/x - cos(x)
evaluate y for a range of values of x
It is straightforward to set up a Lua interpreter instance, and pass it expressions to be evaluated, getting back a function to call that evaluates the expression. You can even let the user have variables...
Here's the sample code I cooked up and edited into my other answer. It is probably better placed on a question tagged Lua in any case, so I'm adding it here as well. I compiled this and tried it for a few cases, but it certainly should not be trusted in production code without some attention to error handling and so forth. All the usual caveats apply here.
I compiled and tested this on Windows using Lua 5.1.4 from Lua for Windows. On other platforms, you'll have to find Lua from your usual source, or from www.lua.org.
Update: This sample uses simple and direct techniques to hide the full power and complexity of the Lua API behind as simple as possible an interface. It is probably useful as-is, but could be improved in a number of ways.
I would encourage readers to look into the much more production-ready ae library by lhf for code that takes advantage of the API to avoid some of the quick and dirty string manipulation I've used. His library also promotes the math library into the global name space so that the user can say sin(x) or 2 * pi without having to say math.sin and so forth.
Public interface to LE
Here is the file le.h:
/* Public API for the LE library.
*/
int le_init();
int le_loadexpr(char *expr, char **pmsg);
double le_eval(int cookie, char **pmsg);
void le_unref(int cookie);
void le_setvar(char *name, double value);
double le_getvar(char *name);
Sample code using LE
Here is the file t-le.c, demonstrating a simple use of this library. It takes its single command-line argument, loads it as an expression, and evaluates it with the global variable x changing from 0.0 to 1.0 in 11 steps:
#include <stdio.h>
#include "le.h"
int main(int argc, char **argv)
{
int cookie;
int i;
char *msg = NULL;
if (!le_init()) {
printf("can't init LE\n");
return 1;
}
if (argc<2) {
printf("Usage: t-le \"expression\"\n");
return 1;
}
cookie = le_loadexpr(argv[1], &msg);
if (msg) {
printf("can't load: %s\n", msg);
free(msg);
return 1;
}
printf(" x %s\n"
"------ --------\n", argv[1]);
for (i=0; i<11; ++i) {
double x = i/10.;
double y;
le_setvar("x",x);
y = le_eval(cookie, &msg);
if (msg) {
printf("can't eval: %s\n", msg);
free(msg);
return 1;
}
printf("%6.2f %.3f\n", x,y);
}
}
Here is some output from t-le:
E:...>t-le "math.sin(math.pi * x)"
x math.sin(math.pi * x)
------ --------
0.00 0.000
0.10 0.309
0.20 0.588
0.30 0.809
0.40 0.951
0.50 1.000
0.60 0.951
0.70 0.809
0.80 0.588
0.90 0.309
1.00 0.000
E:...>
Implementation of LE
Here is le.c, implementing the Lua Expression evaluator:
#include <lua.h>
#include <lauxlib.h>
#include <stdlib.h>
#include <string.h>
static lua_State *L = NULL;
/* Initialize the LE library by creating a Lua state.
*
* The new Lua interpreter state has the "usual" standard libraries
* open.
*/
int le_init()
{
L = luaL_newstate();
if (L)
luaL_openlibs(L);
return !!L;
}
/* Load an expression, returning a cookie that can be used later to
* select this expression for evaluation by le_eval(). Note that
* le_unref() must eventually be called to free the expression.
*
* The cookie is a lua_ref() reference to a function that evaluates the
* expression when called. Any variables in the expression are assumed
* to refer to the global environment, which is _G in the interpreter.
* A refinement might be to isolate the function envioronment from the
* globals.
*
* The implementation rewrites the expr as "return "..expr so that the
* anonymous function actually produced by lua_load() looks like:
*
* function() return expr end
*
*
* If there is an error and the pmsg parameter is non-NULL, the char *
* it points to is filled with an error message. The message is
* allocated by strdup() so the caller is responsible for freeing the
* storage.
*
* Returns a valid cookie or the constant LUA_NOREF (-2).
*/
int le_loadexpr(char *expr, char **pmsg)
{
int err;
char *buf;
if (!L) {
if (pmsg)
*pmsg = strdup("LE library not initialized");
return LUA_NOREF;
}
buf = malloc(strlen(expr)+8);
if (!buf) {
if (pmsg)
*pmsg = strdup("Insufficient memory");
return LUA_NOREF;
}
strcpy(buf, "return ");
strcat(buf, expr);
err = luaL_loadstring(L,buf);
free(buf);
if (err) {
if (pmsg)
*pmsg = strdup(lua_tostring(L,-1));
lua_pop(L,1);
return LUA_NOREF;
}
if (pmsg)
*pmsg = NULL;
return luaL_ref(L, LUA_REGISTRYINDEX);
}
/* Evaluate the loaded expression.
*
* If there is an error and the pmsg parameter is non-NULL, the char *
* it points to is filled with an error message. The message is
* allocated by strdup() so the caller is responsible for freeing the
* storage.
*
* Returns the result or 0 on error.
*/
double le_eval(int cookie, char **pmsg)
{
int err;
double ret;
if (!L) {
if (pmsg)
*pmsg = strdup("LE library not initialized");
return 0;
}
lua_rawgeti(L, LUA_REGISTRYINDEX, cookie);
err = lua_pcall(L,0,1,0);
if (err) {
if (pmsg)
*pmsg = strdup(lua_tostring(L,-1));
lua_pop(L,1);
return 0;
}
if (pmsg)
*pmsg = NULL;
ret = (double)lua_tonumber(L,-1);
lua_pop(L,1);
return ret;
}
/* Free the loaded expression.
*/
void le_unref(int cookie)
{
if (!L)
return;
luaL_unref(L, LUA_REGISTRYINDEX, cookie);
}
/* Set a variable for use in an expression.
*/
void le_setvar(char *name, double value)
{
if (!L)
return;
lua_pushnumber(L,value);
lua_setglobal(L,name);
}
/* Retrieve the current value of a variable.
*/
double le_getvar(char *name)
{
double ret;
if (!L)
return 0;
lua_getglobal(L,name);
ret = (double)lua_tonumber(L,-1);
lua_pop(L,1);
return ret;
}
Remarks
The above sample consists of 189 lines of code total, including a spattering of comments, blank lines, and the demonstration. Not bad for a quick function evaluator that knows how to evaluate reasonably arbitrary expressions of one variable, and has rich library of standard math functions at its beck and call.
You have a Turing-complete language underneath it all, and it would be an easy extension to allow the user to define complete functions as well as to evaluate simple expressions.
Since you're lazy, like most programmers, here's a link to a simple example that you can use to parse some arbitrary code using Lua. From there, it should be simple to create your expression parser.
This is for Lua users that are looking for a Lua equivalent of "eval".
The magic word used to be loadstring but it is now, since Lua 5.2, an upgraded version of load.
i=0
f = load("i = i + 1") -- f is a function
f() ; print(i) -- will produce 1
f() ; print(i) -- will produce 2
Another example, that delivers a value :
f=load('return 2+3')
print(f()) -- print 5
As a quick-and-dirty way to do, you can consider the following equivalent of eval(s), where s is a string to evaluate :
load(s)()
As always, eval mechanisms should be avoided when possible since they are expensive and produce a code difficult to read.
I personally use this mechanism with LuaTex/LuaLatex to make math operations in Latex.
The Lua documentation contains a section titled The Application Programming Interface which describes how to call Lua from your C program. The documentation for Lua is very good and you may even be able to find an example of what you want to do in there.
It's a big world in there, so whether you choose your own parsing solution or an embeddable interpreter like Lua, you're going to have some work to do!
function calc(operation)
return load("return " .. operation)()
end

Resources