Flex dropping predefined macro - flex-lexer

I have the following flex source:
%{
#if !defined(__linux__) && !defined(__unix__)
/* Maybe on windows */
#endif
int num_chars = 0;
%}
%%
. ++num_chars;
%%
int main()
{
yylex();
printf("%d chars\n", num_chars);
return 0;
}
int yywrap()
{
return 1;
}
I generate a C file by the command flex flextest.l and compile the result with gcc -o fltest lex.yy.c
To my surprise, I get the following output:
flextest.l:2:37: error: operator "defined" requires an identifier
#if !defined(__linux__) && !defined(__unix__)
After further checking, the issue seems to be that flex has actually replaced __unix__ with the empty string, as shown by:
$ grep __linux_ lex.yy.c
#if !defined(__linux__) && !defined()
Why does this happen, and is it possible to avoid it?

It's actually m4 (the macro processor which is used by current versions of flex) which is expanding __unix__ to the empty string. The Gnu implementation of m4 defines certain symbols to empty macros so that they can be tested with ifdef.
Of course, it's (better said, it was) a bug in flex. Flex shouldn't allow m4 to expand macros within user content copied from the scanner definition file, and the current version of flex correctly arranges for the text included from the scanner description file to be quoted so that it will pass through m4 unmodified even if it happens to include a string which could be interpreted by m4 as a macro expansion.
The bug is certainly present in v2.5.39 and v2.6.1 of flex. I didn't test all previous versions, but I suppose it was introduced when flex was modified to use m4, which was v2.5.30 according to the NEWS file.
This particular quoting issue was fixed in v2.6.2 but the current version of flex (2.6.4) contains various other bug fixes, so I'd recommend you upgrade to the latest version.
If you really need a version which could work with both the buggy and the more recent versions of flex, you could use one of the two following hacks:
Find some other way to write __unix__. One possibility is the following
#define C(x,y) x##y
#define UNIX_ C(__un,ix__)
#if !defined(__linux__) && !UNIX_
That hack won't work with defined, since defined(UNIX_) tests whether UNIX_ is defined, not whether what it expands to is defined. But normally built-in symbols like __unix__ are actually defined to be 1, if they are defined, and the #if directive treats any identifier which is not #define'd as though it were 0, which means that you can usually leave use x instead of defined(x). (However, it will produce different results if there were a #define x 0 in effect, so it's not quite a perfect substitute.)
Flex, like many m4 applications, redefines m4's quote marks to be [[ and ]]. Both the buggy flex and the corrected versions substitute these quote marks with a rather elaborate sequence which effectively quotes the quote marks. However, the buggy version does not otherwise quote user-defined text, so macro substitutions will be performed in user text. (As mentioned, this is why __unix__ becomes the empty string.
In flex versions in which user-defined text is not quoted, it is possible to invoke the m4 macro which redefines quote marks. These new quote marks can then be used to quote the #if line, preventing macro substitution of __unix__. However, the quote definition must be restored, or it will completely wreck macro processing of the rest of the file. That's a bit tricky because it is impossible to write [[. (Flex will substitute it with a different string.)
The following seems to do the trick. Note that the macro invocations are placed inside C comments. The changequote macros will expand to an empty string, if they are expanded. But in flex versions since v2.6.2, user-supplied text is quoted, so the changequote macros will not be expanded. Putting them inside comments hides them from the C compiler.
%{
/*m4_changequote(<<,>>)<<*/
#if !defined(__linux__) && !defined(__unix__)
/*>>m4_changequote(<<[>><<[>>,<<]>><<]>>)*/
/* Maybe on windows */
#endif
(The m4 macro which changes quote marks is changequote but flex invokes m4 with the -P flag which changes builtins like changequote to m4_changequote. In the second call to changequote, the two [ which make up the [[ sign are individually quoted with the temporary << quote marks, which hides them from the code in flex which modifies use of [[.)
I don't know how reliable this hack is but it worked on the versions of flex which I had kicking around on my machine, including 2.5.4 (pre-M4) 2.5.39 (buggy), 2.6.1 (buggy), 2.6.2 (somewhat debugged) and 2.6.4 (more debugged).

Related

How to get automake to recognize the non-default filename produced by flex when %option prefix= is set

In general, automake's capability to properly invoke flex and bison when building parsers and scanners is incredibly useful. However, I'm running up against an issue that I can't seem to resolve.
I have a lex file, trigraphs.l, which performs the text replacement of C trigraph sequences. There will be multiple lexers included in the final executable, which will be a pre-preprocessor that prepares the source file for the C preprocessor proper, so a subsequent lexer will take care of joining logical lines that span multiple physical lines, another one will strip out comments, etc. Per the C standard, there is a specific order these transformations are supposed to take place in, so to avoid issues I'm using three separate lexers that will run in the proper sequence.
Anyway, to avoid symbol name conflicts I'm using %option prefix="blah," so my trigraph scanner is as follows:
%option noyywrap
%option prefix="trigraphs_"
%%
"??<" { printf("{"); }
"??>" { printf("}"); }
"??(" { printf("["); }
"??)" { printf("]"); }
"??=" { printf("#"); }
"??/" { printf("\\"); }
"??'" { printf("^"); }
"??!" { printf("|"); }
"??-" { printf("~"); }
. { printf("%c", *yytext); }
%%
The issue seems to be that automake->ylwrap is expecting an output from flex with the traditional lex.yy.c filename, to rename to trigraphs.c. However, because the %option prefix also changes the output filename (to lex.trigraphs_.c), it doesn't get renamed to the trigraphs.c that the Makefile is expecting. This causes the obvious compilation error when the Makefile wants to compile trigraphs.c and it doesn't exist.
One solution that occurred to me is to use %option outfile="lex.yy.c" to sidestep around that. It seems to work so far; however, it feels a bit "hackish" so I wonder if there's a more idiomatic or canonical way to handle this. When searching for solutions others had found, I did come across this question on StackExchange, but it's eight years old now so it's possible that there have been developments in the interim.
There is no more idiomatic way to handle this because Automake rules know nothing about flex. Automake expects to work with project that are compatible with POSIX lex, and expects any lex implementation to follow the POSIX convention of producing a lex.yy.c file.

Flex-lexer: Is using `%option debug` the same as defining `FLEX_DEBUG`

When running flex on my *.l file with %option debug i see #define FLEX_DEBUG in the generated scanner file.
Is there any difference between using %option debug and just defining FLEX_DEBUG by passing -DFLEX_DEBUG to gcc?
There is no difference with the current version of flex. The only thing that %option debug does is insert #define FLEX_DEBUG into the output file. This is also what is done if -d or --debug is specified on the flex command-line.
However, there is one important difference: The FLEX_DEBUG macro is not documented, so there is no guarantee that the behaviour will not change with future versions. (This is different from the yacc/bison YYDEBUG macro, which is documented.)

Validate URL in Informix 4GL program

In my Informix 4GL program, I have an input field where the user can insert a URL and the feed is later being sent over to the web via a script.
How can I validate the URL at the time of input, to ensure that it's a live link? Can I make a call and see if I get back any errors?
I4GL checking the URL
There is no built-in function to do that (URLs didn't exist when I4GL was invented, amongst other things).
If you can devise a C method to do that, you can arrange to call that method through the C interface. You'll write the method in native C, and then write an I4GL-callable C interface function using the normal rules. When you build the program with I4GL c-code, you'll link the extra C functions too. If you build the program with I4GL-RDS (p-code), you'll need to build a custom runner with the extra function(s) exposed. All of this is standard technique for I4GL.
In general terms, the C interface code you'll need will look vaguely like this:
#include <fglsys.h>
// Standard interface for I4GL-callable C functions
extern int i4gl_validate_url(int nargs);
// Using obsolescent interface functions
int i4gl_validate_url(int nargs)
{
if (nargs != 1)
fgl_fatal(__FILE__, __LINE__, -1318);
char url[4096];
popstring(url, sizeof(url));
int r = validate_url(url); // Your C function
retint(r);
return 1;
}
You can and should check the manuals but that code, using the 'old style' function names, should compile correctly. The code can be called in I4GL like this:
DEFINE url CHAR(256)
DEFINE rc INTEGER
LET url = "http://www.google.com/"
LET rc = i4gl_validate_url(url)
IF rc != 0 THEN
ERROR "Invalid URL"
ELSE
MESSAGE "URL is OK"
END IF
Or along those general lines. Exactly what values you return depends on your decisions about how to return a status from validate_url(). If need so be, you can return multiple values from the interface function (e.g. error number and text of error message). Etc. This is about the simplest possible design for calling some C code to validate a URL from within an I4GL program.
Modern C interface functions
The function names in the interface library were all changed in the mid-00's, though the old names still exist as macros. The old names were:
popstring(char *buffer, int buflen)
retint(int retval)
fgl_fatal(const char *file, int line, int errnum)
You can find the revised documentation at IBM Informix 4GL v7.50.xC3: Publication library in PDF in the 4GL Reference Manual, and you need Appendix C "Using C with IBM Informix 4GL".
The new names start ibm_lib4gl_:
ibm_libi4gl_popMInt()
ibm_libi4gl_popString()
As to the error reporting function, there is one — it exists — but I don't have access to documentation for it any more. It'll be in the fglsys.h header. It takes an error number as one argument; there's the file name and a line number as the other arguments. And it will, presumably, be ibm_lib4gl_… and there'll be probably be Fatal or perhaps fatal (or maybe Err or err) in the rest of the name.
I4GL running a script that checks the URL
Wouldn't it be easier to write a shell script to get the status code? That might work if I can return the status code or any existing results back to the program into a variable? Can I do that?
Quite possibly. If you want the contents of the URL as a string, though, you'll might end up wanting to call C. It is certainly worth thinking about whether calling a shell script from within I4GL is doable. If so, it will be a lot simpler (RUN "script", IIRC, where the literal string would probably be replaced by a built-up string containing the command and the URL). I believe there are file I/O functions in I4GL now, too, so if you can get the script to write a file (trivial), you can read the data from the file without needing custom C. For a long time, you needed custom C to do that.
I just need to validate the URL before storing it into the database. I was thinking about:
#!/bin/bash
read -p "URL to check: " url
if curl --output /dev/null --silent --head --fail "$url"; then
printf '%s\n' "$url exist"
else
printf '%s\n' "$url does not exist"
fi
but I just need the output instead of /dev/null to be into a variable. I believe the only option is to dump the output into a temp file and read from there.
Instead of having I4GL run the code to validate the URL, have I4GL run a script to validate the URL. Use the exit status of the script and dump the output of curl into /dev/null.
FUNCTION check_url(url)
DEFINE url VARCHAR(255)
DEFINE command_line VARCHAR(255)
DEFINE exit_status INTEGER
LET command_line = "check_url ", url
RUN command_line RETURNING exit_status
RETURN exit_status
END FUNCTION {check_url}
Your calling code can analyze exit_status to see whether it worked. A value of 0 indicates success; non-zero indicates a problem of some sort, which can be deemed 'URL does not work'.
Make sure the check_url script (a) exits with status zero on success and non-zero on any sort of failure, and (b) doesn't write anything to standard output (or standard error) by default. The writing to standard error or output will screw up screen layouts, etc, and you do not want that. (You can obviously have options to the script that enable standard output, or you can invoke the script with options to suppress standard output and standard error, or redirect the outputs to /dev/null; however, when used by the I4GL program, it should be silent.)
Your 'script' (check_url) could be as simple as:
#!/bin/bash
exec curl --output /dev/null --silent --head --fail "${1:-http://www.example.com/"
This passes the first argument to curl, or the non-existent example.com URL if no argument is given, and replaces itself with curl, which generates a zero/non-zero exit status as required. You might add 2>/dev/null to the end of the command line to ensure that error messages are not seen. (Note that it will be hell debugging this if anything goes wrong; make sure you've got provision for debugging.)
The exec is a minor optimization; you could omit it with almost no difference in result. (I could devise a scheme that would probably spot the difference; it involves signalling the curl process, though — kill -9 9999 or similar, where the 9999 is the PID of the curl process — and isn't of practical significance.)
Given that the script is just one line of code that invokes another program, it would be possible to embed all that in the I4GL program. However, having an external shell script (or Perl script, or …) has merits of flexibility; you can edit it to log attempts, for example, without changing the I4GL code at all. One more file to distribute, but better flexibility — keep a separate script, even though it could all be embedded in the I4GL.
As Jonathan said "URLs didn't exist when I4GL was invented, amongst other things". What you will find is that the products that have grown to superceed Informix-4gl such as FourJs Genero will cater for new technologies and other things invented after I4GL.
Using FourJs Genero, the code below will do what you are after using the Informix 4gl syntax you are familiar with
IMPORT com
MAIN
-- Should succeed and display 1
DISPLAY validate_url("http://www.google.com")
DISPLAY validate_url("http://www.4js.com/online_documentation/fjs-fgl-manual-html/index.html#c_fgl_nf.html") -- link to some of the features added to I4GL by Genero
-- Should fail and display 0
DISPLAY validate_url("http://www.google.com/testing")
DISPLAY validate_url("http://www.google2.com")
END MAIN
FUNCTION validate_url(url)
DEFINE url STRING
DEFINE req com.HttpRequest
DEFINE resp com.HttpResponse
-- Returns TRUE if http request to a URL returns 200
TRY
LET req = com.HttpRequest.create(url)
CALL req.doRequest()
LET resp = req.getResponse()
IF resp.getStatusCode() = 200 THEN
RETURN TRUE
END IF
-- May want to handle other HTTP status codes
CATCH
-- May want to capture case if not connected to internet etc
END TRY
RETURN FALSE
END FUNCTION

What's the macro to distinguish ifort from other fortran compilers?

I'm working with Fortran code that has to work with various Fortran compilers (and is interacting with both C++ and Java code). Currently, we have it working with gfortran and g95, but I'm researching what it would take to get it working with ifort, and the first problem I'm having is figuring out how to determine in the source code whether it's using ifort or not.
For example, I currently have this piece of code:
#if defined(__GFORTRAN__)
// Macro to add name-mangling bits to fortran symbols. Currently for gfortran only
#define MODFUNCNAME(mod,fname) __ ## mod ## _MOD_ ## fname
#else
// Macro to add name-mangling bits to fortran symbols. Currently for g95 only
#define MODFUNCNAME(mod,fname) mod ## _MP_ ## fname
#endif // if __GFORTRAN__
What's the macro for ifort? I tried IFORT, but that wasn't right, and further guessing doesn't seem productive. I also tried reading the man page, using ifort -help, and Google.
You're after __INTEL_COMPILER, as defined in http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/fortran/win/compiler_f/bldaps_for/common/bldaps_use_presym.htm
According to their docs, they define __INTEL_COMPILER=910 . The 910 may be a version number, but you can probably just #ifdef on it.
I should note that ifort doesn't allow macros unless you explicity turn it on with the /fpp flag.

#line and jump to line

Do any editors honer C #line directives with regards to goto line features?
Context:
I'm working on a code generator and need to jump to a line of the output but the line is specified relative to the the #line directives I'm adding.
I can drop them but then finding the input line is even a worse pain
If the editor is scriptable it should be possible to write a script to do the navigation. There might even be a Vim or Emacs script that already does something similar.
FWIW when I writing a lot of Bison/Flexx I wrote a Zeus Lua macro script that attempted to do something similar (i.e. move from input file to the corresponding line of the output file by search for the #line marker).
For any one that might be interested here is that particular macro script.
#line directives are normally inserted by the precompiler, not into source code, so editors won't usually honor that if the file extension is .c.
However, the normal file extension for post-compiled files is .i or .gch, so you might try using that and see what happens.
I've used the following in a header file occasionally to produce clickable items in
the VC6 and recent VS(2003+) compiler ouptut window.
Basically, this exploits the fact that items output in the compiler output
are essentially being parsed for "PATH(LINENUM): message".
This presumes on the Microsoft compiler's treatment of "pragma remind".
This isn't quite exactly what you asked... but it might be generally helpful
in arriving at something you can get the compiler to emit that some editors might honor.
// The following definitions will allow you to insert
// clickable items in the output stream of the Microsoft compiler.
// The error and warning variants will be reported by the
// IDE as actual warnings and errors... which means you can make
// them occur in the task list.
// In theory, the coding standards could be checked to some extent
// in this way and reminders that show up as warnings or even
// errors inserted...
#define strify0(X) #X
#define strify(X) strify0(X)
#define remind(S) message(__FILE__ "(" strify( __LINE__ ) ") : " S)
// example usage
#pragma remind("warning: fake warning")
#pragma remind("error: fake error")
I haven't tried it in a while but it should still work.
Use sed or a similar tool to translate the #lines to something else not interpreted by the compiler, so you get C error messages on the real line, but have a reference to the original input file nearby.

Resources