What does code block in flex-lexer rule section do? - flex-lexer

I’m learning flex and met a issue about code block in rule section.
In flex’s manual http://westes.github.io/flex/manual/Comments-in-the-Input.html#Comments-in-the-Input, there’s a code block in rule section:
%{
/* code block */
%}
/* Definitions Section */
%x STATE_X
%%
/* Rules Section */
ruleA /* after regex */ { /* code block */ } /* after code block */
/* Rules Section (indented) */
<STATE_X>{
ruleC ECHO;
ruleD ECHO;
%{
/* code block */
%}
}
%%
/* User Code Section */
You could see there’s a second code block between two %%, I have two questions:
when will this code execute?
what’s the difference between this and YY_USER_ACTION?

flex manual
A code block in the rules section has unpredictable results unless:
It occurs before the first pattern, or
It contains nothing other than white space or comments.
This particular code block consists only of white space and a comment. So the question of when it executes is pretty zen. (In the "sound of one hand clapping" sense.) It does nothing. When? Well, whenever. Nothing is hard to observe.
YY_USER_ACTION happens just after the pattern is recognised, before the rule action (even if that action is empty). If you don't define YY_USER_ACTION, it also does nothing so I suppose there is no difference from a comment. But normally it's defined to do something, and it is inserted in every rule, not just one place. So that's completely different.

Related

How to use myname2.lex in the flex examples?

I see the following example in flex examples. I am able to compile it. But I am not sure what input I should give to it. Could anybody let me know? Thanks.
/*
* myname2.lex : A sample Flex program
* that does token replacement.
*/
%{
#include <stdio.h>
%}
%x STRING
%%
\" ECHO; BEGIN(STRING);
<STRING>[^\"\n]* ECHO;
<STRING>\" ECHO; BEGIN(INITIAL);
%NAME { printf("%s",getenv("LOGNAME")); }
%HOST { printf("%s",getenv("HOST")); }
%HOSTTYPE { printf("%s",getenv("HOSTTYPE"));}
%HOME { printf("%s",getenv("HOME")); }
The rules %NAME, %HOST, %HOSTTYPE and %HOME match those exact strings respectively. So you could enter those and see their corresponding actions execute.
You could also enter one of them surrounded by quotes (e.g. "%HOST") and observe that its action will not be executed because the whole thing was seen as a string literal.

Flex-lexer: Write state defines to a different file

I want to use the start states of flex inside functions (and external files). Therefore I need the state definitions to be inside an external header file.
Is there any way of letting the definitions be written to an external file?
The code below shows an example of using the states inside functions defined inside the l-file
lexer.l
%{
void changeState(){
YY_START = MY_STATE;
}
%}
%x MY_STATE
%%
[ rules ]
%%
The following should work:
lexer.l
%x MY_STATE
%%
[ rules ]
%%
void changeState(){
BEGIN(MY_STATE);
}
Don't forget, that the upper section is actually only for declarations. Definitions should go in the last section. That way, they are places after the #define section

Use yylex() to get the list of token types from an input string

I have a CLI that was made using Bison and Flex which has grown large and complicated, and I'm trying to get the complete sequence of tokens (yytokentype or the corresponding yytranslate Bison symbol numbers) for a given input string to the parser.
Ideally, every time yyerror() is called I want to store the sequence of tokens that were identified during parse. I don't need to know the yylval's, states, actions, etc, just the token list resulting from the string input to the buffer.
If a straightforward way of doing this doesn't exist, then just a stand-alone way of going from string --> yytokentypes will work.
The below code just has debugging printouts, which I'll change to storing it in the place I want as soon as I figure out how to get the tokens.
// When an error condition is reached, yylex() to get the yytokentypes
void yyerror(const char *s)
{
std::cerr<<"LEX\n";
int tok; // yytokentype
do
{
tok = yylex();
std::cerr<<tok<<",";
}while(tok);
std::cerr<<"LEX\n";
}
A simpler solution is to just change the name of the lexer using the YY_DECL macro and then add a definition of yylex at the end:
%{
// ...
#include "parser.tab.h"
#define YY_DECL static int wrapped_lexer(void)
%}
%%
/* rules */
%%
int yylex(void) {
int token = wrapped_lexer();
/* do something with the token */
return token;
}
Having said that, unless the source code is read-once for some reason, it's probably faster on the whole to rescan the input only if an error is encountered rather than saving the token list in case an error is an encountered. Lexing is really pretty fast, and in many use cases, syntactically correct inputs are more common than erroneous ones.
OK I figured a way to do this without having to re-tokenize the input string. Flex allows you to define YY_DECL, which by default is found in the generated lexer file to produce the yylex() declaration:
#ifndef YY_DECL
//some other stuff
#define YY_DECL int yylex (void)
#endif /* !YY_DECL */
And this goes in place
/** The main scanner function which does all the work.
*/
YY_DECL
{
// Body of yylex() which returns the yytokentype
}
A tricky thing that I'm able to do is re-define yylex() via YY_DECL to capture every token before it gets returned to the caller. This allows me to store the yytokentype for every call without changing the parser's behavior one bit. Below I'm just printing it out here for testing:
#define YY_DECL \
int yylex2(void); \
int yylex (void) \
{ \
int ret; \
ret = yylex2(); \
std::cerr<<"yylex2 returns: "<<ret<<"\n"; \
return ret; \
} \

context of function portTASK_FUNCTION in sourecode of freeRTOS (void)pvParameters

In tracing source code of task.c for freeRTOS, i see a function named portTASK_FUNCTION. its code is as below
static portTASK_FUNCTION( prvIdleTask, pvParameters )
{
/* Stop warnings. */
( void ) pvParameters; //<--what for??
for( ;; )
{
do something
}
}
i don't understand what ( void ) pvParameters means, hope someone could help me, thx
btw, this function's type of args are not declared, why does it can work?
This code consists of comment:
/* Stop warnings. */
The optimizer will remove the code you mentioned. But there is unused parameter in function - pvParameters. And this code is written to shut up compiler. It does nothing.
portTASK_FUNCTION is NOT a function, its a macro. If I google it the first link I get is here: http://www.freertos.org/implementing-a-FreeRTOS-task.html - in this case prvIdleTask is the function. In all but the one obscure case mentioned on the link the portTASK_FUNCTION macro is obsolete (not required) but is used in the main kernel code for portability.

Aliasing frequently used patterns in Lex

I have one regexp, which is used in a several rules. Can I define alias for it, to keep this regexp definition in one place and just use it across the code?
Example:
[A-Za-z0-9].[A-Za-z0-9_-]* (expression) NAME (alias)
...
%%
NAME[=]NAME {
//Do something.
}
%%
It goes in the definitions section of your lex input file (before the %%) and you use it in a regular expression by putting the name inside curly braces ({…}). For example:
name [A-Za-z0-9][A-Za-z0-9_-]*
%%
{name}[=]{name} { /* Do something */ }

Resources