How to use myname2.lex in the flex examples? - flex-lexer

I see the following example in flex examples. I am able to compile it. But I am not sure what input I should give to it. Could anybody let me know? Thanks.
/*
* myname2.lex : A sample Flex program
* that does token replacement.
*/
%{
#include <stdio.h>
%}
%x STRING
%%
\" ECHO; BEGIN(STRING);
<STRING>[^\"\n]* ECHO;
<STRING>\" ECHO; BEGIN(INITIAL);
%NAME { printf("%s",getenv("LOGNAME")); }
%HOST { printf("%s",getenv("HOST")); }
%HOSTTYPE { printf("%s",getenv("HOSTTYPE"));}
%HOME { printf("%s",getenv("HOME")); }

The rules %NAME, %HOST, %HOSTTYPE and %HOME match those exact strings respectively. So you could enter those and see their corresponding actions execute.
You could also enter one of them surrounded by quotes (e.g. "%HOST") and observe that its action will not be executed because the whole thing was seen as a string literal.

Related

What does code block in flex-lexer rule section do?

I’m learning flex and met a issue about code block in rule section.
In flex’s manual http://westes.github.io/flex/manual/Comments-in-the-Input.html#Comments-in-the-Input, there’s a code block in rule section:
%{
/* code block */
%}
/* Definitions Section */
%x STATE_X
%%
/* Rules Section */
ruleA /* after regex */ { /* code block */ } /* after code block */
/* Rules Section (indented) */
<STATE_X>{
ruleC ECHO;
ruleD ECHO;
%{
/* code block */
%}
}
%%
/* User Code Section */
You could see there’s a second code block between two %%, I have two questions:
when will this code execute?
what’s the difference between this and YY_USER_ACTION?
flex manual
A code block in the rules section has unpredictable results unless:
It occurs before the first pattern, or
It contains nothing other than white space or comments.
This particular code block consists only of white space and a comment. So the question of when it executes is pretty zen. (In the "sound of one hand clapping" sense.) It does nothing. When? Well, whenever. Nothing is hard to observe.
YY_USER_ACTION happens just after the pattern is recognised, before the rule action (even if that action is empty). If you don't define YY_USER_ACTION, it also does nothing so I suppose there is no difference from a comment. But normally it's defined to do something, and it is inserted in every rule, not just one place. So that's completely different.

Use yylex() to get the list of token types from an input string

I have a CLI that was made using Bison and Flex which has grown large and complicated, and I'm trying to get the complete sequence of tokens (yytokentype or the corresponding yytranslate Bison symbol numbers) for a given input string to the parser.
Ideally, every time yyerror() is called I want to store the sequence of tokens that were identified during parse. I don't need to know the yylval's, states, actions, etc, just the token list resulting from the string input to the buffer.
If a straightforward way of doing this doesn't exist, then just a stand-alone way of going from string --> yytokentypes will work.
The below code just has debugging printouts, which I'll change to storing it in the place I want as soon as I figure out how to get the tokens.
// When an error condition is reached, yylex() to get the yytokentypes
void yyerror(const char *s)
{
std::cerr<<"LEX\n";
int tok; // yytokentype
do
{
tok = yylex();
std::cerr<<tok<<",";
}while(tok);
std::cerr<<"LEX\n";
}
A simpler solution is to just change the name of the lexer using the YY_DECL macro and then add a definition of yylex at the end:
%{
// ...
#include "parser.tab.h"
#define YY_DECL static int wrapped_lexer(void)
%}
%%
/* rules */
%%
int yylex(void) {
int token = wrapped_lexer();
/* do something with the token */
return token;
}
Having said that, unless the source code is read-once for some reason, it's probably faster on the whole to rescan the input only if an error is encountered rather than saving the token list in case an error is an encountered. Lexing is really pretty fast, and in many use cases, syntactically correct inputs are more common than erroneous ones.
OK I figured a way to do this without having to re-tokenize the input string. Flex allows you to define YY_DECL, which by default is found in the generated lexer file to produce the yylex() declaration:
#ifndef YY_DECL
//some other stuff
#define YY_DECL int yylex (void)
#endif /* !YY_DECL */
And this goes in place
/** The main scanner function which does all the work.
*/
YY_DECL
{
// Body of yylex() which returns the yytokentype
}
A tricky thing that I'm able to do is re-define yylex() via YY_DECL to capture every token before it gets returned to the caller. This allows me to store the yytokentype for every call without changing the parser's behavior one bit. Below I'm just printing it out here for testing:
#define YY_DECL \
int yylex2(void); \
int yylex (void) \
{ \
int ret; \
ret = yylex2(); \
std::cerr<<"yylex2 returns: "<<ret<<"\n"; \
return ret; \
} \

flex 2.5.35 gives error when ctrl-M used in lex file

I have a simple lex file.
%{
#include <stdio.h>
%}
space_char [ \t\^M]
space {space_char}+
%%
%%
int yywrap(void) {
return 1;
}
int main(void) {
yylex();
return 0;
}
When I compile this file with flex-2.5.35, it gives following errors:
lex.l:5: bad character:
lex.l:5: name defined twice
But, with flex-2.5.4, it runs fine.
I understand this error is due to special character ctrl-m (carriage-return). I want to know if flex-2.5.35 doesn't support special characters like ctrl-l, ctrl-m? And if so, then what's the alternate way? Please note, I am restricted with the use of 2.5.35 only.
Thanks.
As in C, you can use \r for the carriage return character.

Aliasing frequently used patterns in Lex

I have one regexp, which is used in a several rules. Can I define alias for it, to keep this regexp definition in one place and just use it across the code?
Example:
[A-Za-z0-9].[A-Za-z0-9_-]* (expression) NAME (alias)
...
%%
NAME[=]NAME {
//Do something.
}
%%
It goes in the definitions section of your lex input file (before the %%) and you use it in a regular expression by putting the name inside curly braces ({…}). For example:
name [A-Za-z0-9][A-Za-z0-9_-]*
%%
{name}[=]{name} { /* Do something */ }

Complete URL encoding

Anyone know of a tool to completely encode a string to URL encoding? Best known example is something to convert space character to %20. I want to do this for every single character. What's a good tool for this (linux)?
thanks everyone for down voting, if i cared what language i would have specified. couldnt find anything useful in the other post linked below so i wrote this. this is good enough for me, might be good enough for you.
#include <stdio.h>
// Treats all args as one big string. Inserts implicit spaces between args.
int main(int argc, char *argv[])
{
if(argc == 1)
{
printf("Need something to encode.");
return 1;
}
int count = 0;
while(++count < argc)
{
char *input = argv[count];
while(*input != '\0')
{
printf("%%%x", *input);
input++;
}
printf("%%20");
}
printf("\n");
return 0;
}
Take a look at this SO question:
How to urlencode data for curl command?
Which programming language? You can even do something client-side...
i modified this of the other link
perl -p -e 's/(.)/sprintf("%%%02X", ord($1))/seg'
it works nice enough..
run this.. type in what you want to convert..(or pipe it through) and it'll output everything %encoded

Resources