Match rule only at the begining of a file

Match rule only at the begining of a file - flex-lexer

Problem
I'm writing a sort of a script language interpreter.
I would like it to be able to handle (ignore) things like shebang, utf-bom, or other such thing that can appear on the beginning of a file.
The problem is that I cannot be sure that my growing grammar won't at some point have a rule that could match one of those things. (It's unlikely but you don't get reliable programs by just ignoring unlikely problems.)
Therefore, I would like to do it properly and ignore those things only if they are at the beginning of a file.
Let's focus on a shebang in the example.
I've written some simple grammar that illustrates the problems I'm facing.
Lexer:
%%
#!.+ { printf("shebang: \"%s\"\n", yytext + 2); }
[[:alnum:]_]+ { printf("id: %s\n", yytext); return 1; }
#[^#]*# { printf("thingy: \"%s\"\n", yytext); return 2; }
[[:space:]] ;
. { printf("error: '%c'\n", yytext[0]); }
%%
int main() { while (yylex()); return 0; }
int yywrap() { return 1; }
Input file:
#!my-program
# some multiline
thingy #
aaa bbb
ccc#!not a shebang#ddd
eee
Expected output:
shebang: "my-program"
thingy: "# some multiline
thingy #"
id: aaa
id: bbb
id: ccc
thingy: "#!not a shebang#"
id: ddd
id: eee
Actual output:
thingy: "#!my-program
#"
id: some
id: multiline
id: thingy
thingy: "#
aaa bbb
ccc#"
error: '!'
id: not
id: a
id: shebang
error: '#'
id: ddd
id: eee
My (bad?) solution
I figured that this is a good case to use start conditions.
I managed to use them to write a lexer that does work, however, it's rather ugly:
%s MAIN
%%
<INITIAL>#!.+ { printf("shebang: \"%s\"\n", yytext + 2); BEGIN(MAIN); }
<INITIAL>""/(?s:.) { BEGIN(MAIN); }
[[:alnum:]_]+ { printf("id: %s\n", yytext); return 1; }
<MAIN>#[^#]*# { printf("thingy: \"%s\"\n", yytext); return 2; }
[[:space:]] ;
. { printf("error: '%c'\n", yytext[0]); }
%%
int main() { while (yylex()); return 0; }
int yywrap() { return 1; }
Notice that I had to specify the start condition MAIN before the rule #[^#]*#.
It's because it would otherwise collide with the shebang rule #!.+.
Unfortunately, the INITIAL start condition is inclusive, which means I had to specifically exclude from it any rule that would cause problems. I have to remember about it every time I write a new rule (AKA I'll forget about it).
Is there some way to make the INITIAL exclusive or choose a different start condition to be the default?

Here's a simpler solution, assuming you're using Flex (as per your flex-lexer tag):
%option noinput nounput noyywrap nodefault yylineno
%{
#define YY_USER_INIT BEGIN(STARTUP);
%}
%x STARTUP
%%
<STARTUP>#!.* { BEGIN(INITIAL); printf("Shebang: \"%s\"\n", yytext+2); }
<STARTUP>.|\n { BEGIN(INITIAL); yyless(0); }
/* Rest is INITIAL */
[[:alnum:]_]+ { printf("id: %s\n", yytext); return 1; }
#[^#]*# { printf("thingy: \"%s\"\n", yytext); return 2; }
[[:space:]] ;
. { printf("error: '%c'\n", yytext[0]); }
Test:
rici$ flex -o shebang.c shebang.l
rici$ gcc -Wall -o shebang shebang.c -lfl
rici$ ./shebang <<"EOF"
> #!my-program
> # some multiline
> thingy #
> aaa bbb
> ccc#!not a shebang#ddd
> eee
> EOF
Shebang: "my-program"
thingy: "# some multiline
thingy #"
id: aaa
id: bbb
id: ccc
thingy: "#!not a shebang#"
id: ddd
id: eee
Notes:
The %option line:
prevents "Unused function" warnings;
removes the need for yywrap;
shows an error if there's some possible input which doesn't match any pattern;
counts input lines in the global yylineno
The macro YY_USER_INIT is executed precisely once, when the scanner starts up. It executes before any of Flex's initialization code; fortunately, Flex's initialization code does not change the start condition if it's already been set.
yyless(0) causes the current token to be rescanned. (The argument doesn't have to be 0; it truncates the current token to that length and efficiently puts the rest back into the input stream.)
The library -lfl includes yywrap() (although in this case, it's not used), and a simple main() definition rather similar to the one in your example.
(1) and (2) are Flex extensions. (3) and (4) should be available in any lex which conforms to Posix, with the exception that the Posix lex libary is linked with -ll.

There is an indirect way to select a different start condition.
The start condition is an integer variable (e.g. INITIAL is 0).
You can get its current value using a macro YY_START and if it equals INITIAL, change it to another value, effectively replacing it.
%x BEGINNING
%s MAIN
%%
%{
if (YY_START == INITIAL)
BEGIN(BEGINNING);
%}
<BEGINNING>#!.+ { printf("shebang: \"%s\"\n", yytext + 2); BEGIN(MAIN); }
<BEGINNING>""/(?s:.) { BEGIN(MAIN); }
[[:alnum:]_]+ { printf("id: %s\n", yytext); return 1; }
#[^#]*# { printf("thingy: \"%s\"\n", yytext); return 2; }
[[:space:]] ;
. { printf("error: '%c'\n", yytext[0]); }
%%
int main() { while (yylex()); return 0; }
int yywrap() { return 1; }
The disadvantages of this solution are:
The code block will execute every time you call yylex (not only the first time, when it's actually needed). The overhead is small enough to be ignored, though.
Flex will still generate the whole state machine as if INITIAL was used. I have no idea if this creates a lot of code or if it would matter in a big parser.

Related

How to add local variables to yylex function in flex lexer?

I was writing a lexer file that matches simple custom delimited strings of the form xyz$this is stringxyz. This is nearly how I did it:
%{
char delim[16];
uint8_t dlen;
%}
%%
.*$ {
dlen = yyleng-1;
strncpy(delim, yytext, dlen);
BEGIN(STRING);
}
<STRING>. {
if(yyleng >= dlen) {
if(strncmp(delim, yytext[yyleng-dlen], dlen) == 0) {
BEGIN(INITIAL);
return STR;
}
}
yymore();
}
%%
Now I wanted to convert this to reentrant lexer. But I don't know how to make delim and dlen as local variables inside yylex apart from modifying generated lexer. Someone please help me how should I do this.
I don't recommend to store these in yyextra because, these variables need not persist across multiple calls to yylex. Hence I would prefer an answer that guides me towards declaring these as local variables.

In the (f)lex file, any indented lines between the %% and the first rule are copied verbatim into yylex() prior to the first statement, precisely to allow you to declare and initialize local variables.
This behaviour is guaranteed by the Posix specification; it is not a flex extension: (emphasis added)
Any such input (beginning with a <blank>or within "%{" and "%}" delimiter lines) appearing at the beginning of the Rules section before any rules are specified shall be written to lex.yy.c after the declarations of variables for the yylex() function and before the first line of code in yylex(). Thus, user variables local to yylex() can be declared here, as well as application code to execute upon entry to yylex().
A similar statement is in the Flex manual section 5.2, Format of the Rules Section
The strategy you propose will work, certainly, but it's not very efficient. You might want to consider using input() to read characters one at a time, although that's not terribly efficient either. In any event, delim is unnecessary:
%%
int dlen;
[^$\n]{1,16}\$ {
dlen = yyleng-1;
yymore();
BEGIN(STRING);
}
<STRING>. {
if(yyleng > dlen * 2) {
if(memcmp(yytext, yytext + yyleng - dlen, dlen) == 0) {
/* Remove the delimiter from the reported value of yytext. */
yytext += dlen + 1;
yyleng -= 2 * dlen + 1;
yytext[yyleng] = 0;
return STR;
}
}
yymore();
}
%%

Processing character returned by yyless within a start condition in yacc

For the code snippet below, the "ASSN: =" block for {EQ} is not triggered for an input of "CC=gcc\n" - I don't understand why this is, the equals character is being passed, as it is being processed by the next rule for {CHAR}.
How can I ensure that the {EQ} rule for is processed when the equals character is 'pushed' back by yyless?
The byacc code is pretty much empty with a single dummy rule, but with the relevant %token lines.
#define _XOPEN_SOURCE 700
#include <stdio.h>
#include "y.tab.h"
extern YYSTYPE yylval;
%}
%x ASSIGNMENT
%option noyywrap
DIGIT [0-9]
ALPHA [A-Za-z]
SPACE [ ]
TAB [\t]
WS [ \t]+
NEWLINE (\n|\r|\r\n)
IDENT [A-Za-z_][A-Za-z_0-9]+
EQ =
CHAR [^\r\n]+
%%
<*>"#"{CHAR}{NEWLINE}
({IDENT}{EQ})|({IDENT}{WS}{EQ}) {
yylval.strval = strndup(yytext,
strlen(yytext)-1);
printf("NORM: %s\n", yylval.strval);
yyless(strlen(yytext)-1);
BEGIN(ASSIGNMENT);
return TOK_IDENT;
}
<ASSIGNMENT>{
{EQ} {
printf("ASSN: =\n");
return TOK_ASSIGN;
}
{CHAR} {
printf("ASSN: %s\n", yytext);
return TOK_STRING;
}
{NEWLINE} {
BEGIN(INITIAL);
}
}
{WS}
{NEWLINE}
. {
printf("DOT : %s\n", yytext);
}
<*><<EOF>> {
printf("EOF\n");
return 0;
}
%%
int main()
{
printf("Start\n\n");
int ret;
while( (ret = yylex()) ) {
printf("LEX : %u\n", ret);
}
printf("\nEnd\n");
}
Example output:
Start
NORM: CC
LEX : 257
ASSN: =gcc
LEX : 259
EOF
End

My issue was that flex matches the longest rule first, so {CHAR} was always winning over {EQ}. I solved this by introducing another Start Condition to consume the {EQ}{WS}? before passing to

Does each call to `yylex()` generate a token or all the tokens for the input?

I am trying to understand how flex works under the hood.
In the following first example, it seems that main() calls yylex() only once, and yylex() generates all the tokens for the entire input.
In the second example, it seems that main() calls yylex() once per token generated, and yylex() generates a token per call.
Does each call to yylex() generate a token or all the tokens for the input?
Why is yylex() called different number of times in the two examples?
I heard that yylex() is like a coroutine, and each call to it will resume with the rest of the input left from last call and generate a token. In that sense, how does the first example calls yylex() just once and generate all the tokens in the input?
/* just like Unix wc */
%{
int chars = 0;
int words = 0;
int lines = 0;
%}
%%
[a-zA-Z]+ { words++; chars += strlen(yytext); }
\n { chars++; lines++; }
. { chars++; }
%%
main(int argc, char **argv)
{
yylex();
printf("%8d%8d%8d\n", lines, words, chars);
}
$ ./a.out
The boy stood on the burning deck
shelling peanuts by the peck
^D
2 12 63
$
and
/* recognize tokens for the calculator and print them out */
%{
enum yytokentype {
NUMBER = 258,
ADD = 259,
SUB = 260,
MUL = 261,
DIV = 262,
ABS = 263,
EOL = 264
};
int yylval;
%}
%%
"+" { return ADD; }
"-" { return SUB; }
"*" { return MUL; }
"/" { return DIV; }
"|" { return ABS; }
[0-9]+ { yylval = atoi(yytext); return NUMBER; }
\n { return EOL; }
[ \t] { /* ignore whitespace */ }
. { printf("Mystery character %c\n", *yytext); }
%%
main(int argc, char **argv)
{
int tok;
while(tok = yylex()) {
printf("%d", tok);
if(tok == NUMBER) printf(" = %d\n", yylval);
else printf("\n");
}
}
$ ./a.out
a / 34 + |45
Mystery character a
262
258 = 34
259
263
258 = 45
264
^D
$

Flex doesn't decide when the scanner will return (except for the default EOF rule). The scanner which it builds performs lexical actions in a loop until some action returns. So it is entirely up to you how you want to structure your scanner.
However, the classic yyparse/yylex processing model consists of the parser calling yylex() every time it needs a new token. So it expects yylex() to return immediately once it finds a token.
In your first code example, there is no parser and the scanner action is limited to printing out the token. While the example is perfectly correct, relying on the scanner loop to repeatedly execute actions, I'd prefer the second model even if you don't (yet) intend to add a parser, because it will make it easier to decouple token handling from token generation.
That doesn't mean that every lexical action will contain a return statement, though. Some lexical patterns correspond to non-tokens (comments and whitespace, for example), and the corresponding action will most likely do nothing (other than possibly recording input position) so that the scanner will continue to search fir a token to return.
(F)lex scanners are not easy to make into coroutines, so if a coroutine is really required (for example, to incrementally parse a asynchronous input), then another tool might be preferred.
Bison does offer the possiblity to generate a "push parser" in which the scanner calls the parser every time it finds a token, rather than returning to the parser. But neither the "push" nor the traditional "pull" model have anything to do with coroutines, IMHO, and the use of the word to describe parser/scanner interaction strikes me as imprecise and unuseful (although I have a lot of respect for the author you might be quoting.)

Non-greedy multi-line and single-line matching

I'm trying to modify a flex+bison generator to allow the inclusion of code snippets denoted by surrounding '{{' and '}}'. Unlike the multi-line comment case, I must capture all of the content.
My attempts either fail in the case where the '{{' and the '}}' are on the same line or they are painfully slow.
My first attempt was something like this:
%{
#include <stdio.h>
// sscce implementation of a growing string buffer
char codeBlock[4096];
int codeOffset;
const char* curFilename = "file.l";
extern int yylineno;
void add_code_line(const char* yytext)
{
codeOffset += sprintf(codeBlock + codeOffset, "#line %u \"%s\"\n\t%s\n", yylineno, curFilename, yytext);
}
%}
%option stack
%option yylineno
%x CODE_FRAG
%%
"{{"[ \n]* { codeOffset = 0; yy_push_state(CODE_FRAG); }
<CODE_FRAG>"}}" { codeBlock[codeOffset] = 0; printf("// code\n%s\n", codeBlock); yy_pop_state(); }
<CODE_FRAG>[^\n]* { add_code_line(yytext); }
<CODE_FRAG>\n
\n
.
Note: the "codeBlock" implementation is a contrivance for the purpose of an SSCCE only. It's not what I'm actually using.
This works for a simple test case:
{{ from line 1
from line 2
}}
{{
from line 7
}}
Outputs
// code
#line 1 "file.l"
from line 1
#line 2 "file.l"
from line 2
// code
#line 7 "file.l"
from line 7
But it can't handle
{{ hello }}
The two solutions I can think of are:
/* capture character-by-character */
<CODE_FRAG>. { add_code_character(yytext[0]); }
And
<INITIAL>"{{".*?"}}" { int n = strlen(yytext); yytext + (n - 2) = 0; add_code(yytext + 2); }
The former seems likely to be slow, and the latter just feels wrong.
Any ideas?
--- EDIT ---
The following appears to achieve the result desired, but I'm not sure if it's a "good" Flex way to do this:
"{{"[ \n]* { codeOffset = 0; yy_push_state(CODE_FRAG); }
<CODE_FRAG>"}}" { codeBlock[codeOffset] = 0; printf("// code\n%s\n", codeBlock); yy_pop_state(); }
<CODE_FRAG>.*?/"}}" { add_code_line(yytext); }
<CODE_FRAG>.*? { add_code_line(yytext); }
<CODE_FRAG>\n

Flex doesn't implement non-greedy matches. So .*? won't work the way you expect it to in flex. (It will be an optional .*, which is indistinguishable from .*)
Here's a regular expression which will match from {{ as far as possible without a }}:
"{{"([}]?[^}])*
That might not be what you want, since it won't allow nested {{...}} within your code blocks. However, you didn't mention that as a requirement and none of your examples functions that way.
The above regular expression does not match the closing }}, which appears to be what you want since it lets you call add_code(yytext+2) without modifying the temporary buffer. However, you do need to deal with the }} in your action. See below.
The regular expression above will match to the end of the file if there is no matching }}. You probably want to deal with that as an error; the simplest way is to check if EOF is encountered while you are trying to ignore the }}
"{{"([}]?[^}])* { add_code(yytext+2);
if (input() == EOF || input() == EOF) {
/* Produce an error, unclosed {{ */
}
}

Any suggestions about how to implement a BASIC language parser/interpreter?

I've been trying to implement a BASIC language interpreter (in C/C++) but I haven't found any book or (thorough) article which explains the process of parsing the language constructs. Some commands are rather complex and hard to parse, especially conditionals and loops, such as IF-THEN-ELSE and FOR-STEP-NEXT, because they can mix variables with constants and entire expressions and code and everything else, for example:
10 IF X = Y + Z THEN GOTO 20 ELSE GOSUB P
20 FOR A = 10 TO B STEP -C : PRINT C$ : PRINT WHATEVER
30 NEXT A
It seems like a nightmare to be able to parse something like that and make it work. And to make things worse, programs written in BASIC can easily be a tangled mess. That's why I need some advice, read some book or whatever to make my mind clear about this subject. What can you suggest?

You've picked a great project - writing interpreters can be lots of fun!
But first, what do we even mean by an interpreter? There are different types of interpreters.
There is the pure interpreter, where you simply interpret each language element as you find it. These are the easiest to write, and the slowest.
A step up, would be to convert each language element into some sort of internal form, and then interpret that. Still pretty easy to write.
The next step, would be to actually parse the language, and generate a syntax tree, and then interpret that. This is somewhat harder to write, but once you've done it a few times, it becomes pretty easy.
Once you have a syntax tree, you can fairly easily generate code for a custom stack virtual machine. A much harder project is to generate code for an existing virtual machine, such as the JVM or CLR.
In programming, like most engineering endeavors, careful planning greatly helps, especially with complicated projects.
So the first step is to decide which type of interpreter you wish to write. If you have not read any of a number of compiler books (e.g., I always recommend Niklaus Wirth's "Compiler Construction" as one of the best introductions to the subject, and is now freely available on the web in PDF form), I would recommend that you go with the pure interpreter.
But you still need to do some additional planning. You need to rigorously define what it is you are going to be interpreting. EBNF is great for this. For a gentile introduction EBNF, read the first three parts of a Simple Compiler at http://www.semware.com/html/compiler.html It is written at the high school level, and should be easy to digest. Yes, I tried it on my kids first :-)
Once you have defined what it is you want to be interpreting, you are ready to write your interpreter.
Abstractly, you're simple interpreter will be divided into a scanner (technically, a lexical analyzer), a parser, and an evaluator. In the simple pure interpolator case, the parser and evaluator will be combined.
Scanners are easy to write, and easy to test, so we won't spend any time on them. See the aforementioned link for info on crafting a simple scanner.
Lets (for example) define your goto statement:
gotostmt -> 'goto' integer
integer -> [0-9]+
This tells us that when we see the token 'goto' (as delivered by the scanner), the only thing that can follow is an integer. And an integer is simply a string a digits.
In pseudo code, we might handle this as so:
(token - is the current token, which is the current element just returned via the scanner)
loop
if token == "goto"
goto_stmt()
elseif token == "gosub"
gosub_stmt()
elseif token == .....
endloop
proc goto_stmt()
expect("goto") -- redundant, but used to skip over goto
if is_numeric(token)
--now, somehow set the instruction pointer at the requested line
else
error("expecting a line number, found '%s'\n", token)
end
end
proc expect(s)
if s == token
getsym()
return true
end
error("Expecting '%s', found: '%s'\n", curr_token, s)
end
See how simple it is? Really, the only hard thing to figure out in a simple interpreter is the handling of expressions. A good recipe for handling those is at: http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm Combined with the aforementioned references, you should have enough to handle the sort of expressions you would encounter in BASIC.
Ok, time for a concrete example. This is from a larger 'pure interpreter', that handles a enhanced version of Tiny BASIC (but big enough to run Tiny Star Trek :-) )
/*------------------------------------------------------------------------
Simple example, pure interpreter, only supports 'goto'
------------------------------------------------------------------------*/
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <setjmp.h>
#include <ctype.h>
enum {False=0, True=1, Max_Lines=300, Max_Len=130};
char *text[Max_Lines+1]; /* array of program lines */
int textp; /* used by scanner - ptr in current line */
char tok[Max_Len+1]; /* the current token */
int cur_line; /* the current line number */
int ch; /* current character */
int num; /* populated if token is an integer */
jmp_buf restart;
int error(const char *fmt, ...) {
va_list ap;
char buf[200];
va_start(ap, fmt);
vsprintf(buf, fmt, ap);
va_end(ap);
printf("%s\n", buf);
longjmp(restart, 1);
return 0;
}
int is_eol(void) {
return ch == '\0' || ch == '\n';
}
void get_ch(void) {
ch = text[cur_line][textp];
if (!is_eol())
textp++;
}
void getsym(void) {
char *cp = tok;
while (ch <= ' ') {
if (is_eol()) {
*cp = '\0';
return;
}
get_ch();
}
if (isalpha(ch)) {
for (; !is_eol() && isalpha(ch); get_ch()) {
*cp++ = (char)ch;
}
*cp = '\0';
} else if (isdigit(ch)) {
for (; !is_eol() && isdigit(ch); get_ch()) {
*cp++ = (char)ch;
}
*cp = '\0';
num = atoi(tok);
} else
error("What? '%c'", ch);
}
void init_getsym(const int n) {
cur_line = n;
textp = 0;
ch = ' ';
getsym();
}
void skip_to_eol(void) {
tok[0] = '\0';
while (!is_eol())
get_ch();
}
int accept(const char s[]) {
if (strcmp(tok, s) == 0) {
getsym();
return True;
}
return False;
}
int expect(const char s[]) {
return accept(s) ? True : error("Expecting '%s', found: %s", s, tok);
}
int valid_line_num(void) {
if (num > 0 && num <= Max_Lines)
return True;
return error("Line number must be between 1 and %d", Max_Lines);
}
void goto_line(void) {
if (valid_line_num())
init_getsym(num);
}
void goto_stmt(void) {
if (isdigit(tok[0]))
goto_line();
else
error("Expecting line number, found: '%s'", tok);
}
void do_cmd(void) {
for (;;) {
while (tok[0] == '\0') {
if (cur_line == 0 || cur_line >= Max_Lines)
return;
init_getsym(cur_line + 1);
}
if (accept("bye")) {
printf("That's all folks!\n");
exit(0);
} else if (accept("run")) {
init_getsym(1);
} else if (accept("goto")) {
goto_stmt();
} else {
error("Unknown token '%s' at line %d", tok, cur_line); return;
}
}
}
int main() {
int i;
for (i = 0; i <= Max_Lines; i++) {
text[i] = calloc(sizeof(char), (Max_Len + 1));
}
setjmp(restart);
for (;;) {
printf("> ");
while (fgets(text[0], Max_Len, stdin) == NULL)
;
if (text[0][0] != '\0') {
init_getsym(0);
if (isdigit(tok[0])) {
if (valid_line_num())
strcpy(text[num], &text[0][textp]);
} else
do_cmd();
}
}
}
Hopefully, that will be enough to get you started. Have fun!

I will certainly get beaten by telling this ...but...:
First, I am actually working on a standalone library ( as a hobby ) that is made of:
a tokenizer, building linear (flat list) of tokens from the source text and following the same sequence as the text ( lexems created from the text flow ).
A parser by hands (syntax analyse; pseudo-compiler )
There is no "pseudo-code" nor "virtual CPU/machine".
Instructions(such as 'return', 'if' 'for' 'while'... then arithemtic expressions ) are represented by a base c++-struct/class and is the object itself. The base object, I name it atom, have a virtual method called "eval", among other common members, that is the "execution/branch" also by itself. So no matter I have an 'if' statement with its possible branchings ( single statement or bloc of statements/instructions ) as true or false condition, it will be called from the base virtual atom::eval() ... and so on for everything that is an atom.
Even 'objects' such as variables are 'atom'. 'eval()' will simply return its value from a variant container held by the atom itself ( pointer, refering to the 'local' variant instance (the instance variant iself) held the 'atom' or to another variant held by an atom that is created in a given 'bloc/stack'. So 'atom' are 'inplace' instructions/objects.
As of now, as an example, chunk of not really meaningful 'code' as below just works:
r = 5!; // 5! : (factorial of 5 )
Response = 1 + 4 - 6 * --r * ((3+5)*(3-4) * 78);
if (Response != 1){ /* '<>' also is not equal op. */
return r^3;
}
else{
return 0;
}
Expressions ( arithemtics ) are built into binary tree expression:
A = b+c; =>
=
/ \
A +
/ \
b c
So the 'instruction'/statement for expression like above is the tree-entry atom that in the above case, is the '=' (binary) operator.
The tree is built with atom::r0,r1,r2 :
atom 'A' :
r0
|
A
/ \
r1 r2
Regarding 'full-duplex' mecanism between c++ runtime and the 'script' library, I've made class_adaptor and adaptor<> :
ex.:
template<typename R, typename ...Args> adaptor_t<T,R, Args...>& import_method(const lstring& mname, R (T::*prop)(Args...)) { ... }
template<typename R, typename ...Args> adaptor_t<T,R, Args...>& import_property(const lstring& mname, R (T::*prop)(Args...)) { ... }
Second: I know there are plenty of tools and libs out there such as lua, boost::bind<*>, QML, JSON, etc... But in my situation, I need to create my very own [edit] 'independant' [/edit] lib for "live scripting". I was scared that my 'interpreter' could take a huge amount of RAM, but I am surprised that it is not as big as using QML,jscript or even lua :-)
Thank you :-)

Don't bother with hacking a parser together by hand. Use a parser generator. lex + yacc is the classic lexer/parser generator combination, but a Google search will reveal plenty of others.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Match rule only at the begining of a file - flex-lexer

Related

How to add local variables to yylex function in flex lexer?

Processing character returned by yyless within a start condition in yacc

Does each call to `yylex()` generate a token or all the tokens for the input?

Non-greedy multi-line and single-line matching

Any suggestions about how to implement a BASIC language parser/interpreter?

Categories

Resources