What are the prototypes for these flex functions in a re-entrant parser? - flex-lexer

I'm converting a working flex/bison parser to run re-entrantly. The parser has the ability to accept include command-file.txt directives, which was implemented on the flex side of things like this:
^include { BEGIN INCL; }
<INCL>{ws}+ { /* Ignore */ }
<INCL>[^ \t\n\r\f]+ { /* Swallow everything up to whitespace or an EOL character.
* When state returns to initial, the whitepsace
* and/or EOL will be taken care of. */
yyin = fopen ( yytext, "r" );
if (! yyin) {
char filename[1024];
sprintf(filename,"/home/scripts/%s",yytext);
yyin = fopen( filename, "r");
if ( ! yyin) {
char buf[256];
sprintf(buf,"Couldn't open ""%s"".",yytext);
yyerror(buf);
}
}
yypush_buffer_state(yy_create_buffer(yyin, YY_BUF_SIZE));
BEGIN 0;
}
<<EOF>> {
yypop_buffer_state();
if (!YY_CURRENT_BUFFER) {
yyterminate();
}
}
This works nicely. Now that I've added %option reentrant and %option bison-bridge, I get these errors:
lexer.l:119: error: too few arguments to function `yy_create_buffer'
lexer.l:119: error: too few arguments to function `yypush_buffer_state'
lexer.l:123: error: too few arguments to function `yypop_buffer_state'
What are the proper ways to invoke these functions/macros in a re-entrant parser?

The reentrant interfaces are documented (briefly) in the flex manual.
All interfaces have one extra argument of type yyscan_t which comes at the end of the argument list. Examples (pulled from a flex-generated file):
YY_BUFFER_STATE yy_create_buffer (FILE *file,int size ,yyscan_t yyscanner );
void yy_delete_buffer (YY_BUFFER_STATE b ,yyscan_t yyscanner );
void yy_flush_buffer (YY_BUFFER_STATE b ,yyscan_t yyscanner );
void yypush_buffer_state (YY_BUFFER_STATE new_buffer ,yyscan_t yyscanner );
void yypop_buffer_state (yyscan_t yyscanner );
yylex follows the same pattern, so you can use yyscanner inside an action to refer to the argument provided to yylex

Related

Variadic Dispatch Function

I have an interface wherein the types of the parameters mostly encode their own meanings. I have a function that takes one of these parameters. I'm trying to make a function that takes a set of these parameters and performs the function on each one in order.
#include <iostream>
#include <vector>
enum param_type{typeA,typeB};
template <param_type PT> struct Container{
int value;
Container(int v):value(v){}
};
int f(Container<typeA> param){
std::cout<<"Got typeA with value "<<param.value<<std::endl;
return param.value;
}
int f(Container<typeB> param){
std::cout<<"Got typeB with value "<<param.value<<std::endl;
return param.value;
}
My current solution uses a recursive variadic template to delegate the work.
void g(){}
template <typename T,typename...R>
void g(T param,R...rest){
f(param);
g(rest...);
}
I would like to use a packed parameter expansion, but I can't seem to get that to work without also using the return values. (In my particular case the functions are void.)
template <typename...T> // TODO: Use concepts once they exist.
void h(T... params){
// f(params);...
// f(params)...; // Fail to compile.
// {f(params)...};
std::vector<int> v={f(params)...}; // Works
}
Example usage
int main(){
auto a=Container<typeA>(5);
auto b=Container<typeB>(10);
g(a,b);
h(a,b);
return 0;
}
Is there an elegant syntax for this expansion in C++?
In C++17: use a fold expression with the comma operator.
template <typename... Args>
void g(Args... args)
{
((void)f(args), ...);
}
Before C++17: comma with 0 and then expand into the braced initializer list of an int array. The extra 0 is there to ensure that a zero-sized array is not created.
template <typename... Args>
void g(Args... args)
{
int arr[] {0, ((void)f(args), 0)...};
(void)arr; // suppress unused variable warning
}
In both cases, the function call expression is cast to void to avoid accidentally invoking a user-defined operator,.

Vala string processing corrupts memory. Why and how to avoid?

I'm not sure whether I'm misusing Vala or GLib.Regex, because I'm new to both. I've created a minimal example, which reproduces the error. From the following code, I'd expect that it prints a INPUTX b six times, prefixed with source and result alternatingly:
public class Test
{
public static void run( string src )
{
var regex = new Regex( "INPUT[0-9]" );
for( int i = 0; i < 3; ++i )
{
stdout.printf( #"-- source: $src\n" );
src = regex.replace( src, -1, 0, "value" );
stdout.printf( #"-- result: $src\n\n" );
}
}
public static void main()
{
Test.run( "a INPUTX b" );
}
}
I wrote this code based on the example in the docs. However, after compiling with valac Test.vala --pkg glib-2.0 and running, I get:
-- source: a INPUTX b
-- result: a INPUTX b
-- source: -- source:
-- result: N�
-- source: -- source:
-- result: PN�
What am I doing wrong?
After looking into the generated C code, I concluded that this rather is a Vala-related issue: Vala puts a g_free to the end of the loop's body, which frees the memory returned by g_regex_replace, and that is referenced by src. But why does Vala do that?
The reason is that (see)
arguments are, by default, unowned.
Hence, when we assign the string object returned by regex.replace to the unowned string src, that reference is (see)
not recorded in the object
and the Vala compile considers it to be safe to dispose - although it's not quiet clear, why this happens particularly at the end of the loop's body.
So the straiht-forward solution is to declare the src argument as owned.
Consider this (nonsense) code:
string foo (string s)
{
return s;
}
void run (string src)
{
var regex = new Regex( "INPUT[0-9]" );
for( int i = 0; i < 3; ++i )
{
stdout.printf( #"-- source: $src\n" );
//src = regex.replace( src, -1, 0, "value" );
src = foo (src);
stdout.printf( #"-- result: $src\n\n" );
}
}
void main ()
{
run( "a INPUTX b" );
}
The Vala compiler (rightfully) complains:
test.vala:13.2-13.16: error: Invalid assignment from owned expression to unowned variable
src = foo (src);
^^^^^^^^^^^^^^^
So there must be something different for methods from vapi files, since it allows the call to Regex.replace ().
I smell a bug somewhere (either in the compiler or the vapi), but I'm not sure.

want to skip all line comments except two in antlr4 grammar

I want to extend the IDL.g4 grammar a bit so that I can distinguish the following two comments //#top-level false and //#top-level true, all other comments I just want to skip like before.
I have tried to add top_level, TOP_LEVEL_TRUEand TOP_LEVEL_FALSElike this, because I thought antr4 gave precedence to lexical rules comming first.
top_level
: TOP_LEVEL_TRUE
| TOP_LEVEL_FALSE
;
TOP_LEVEL_TRUE
: '//#top-level true'
;
TOP_LEVEL_FALSE
: '//#top-level false'
;
LINE_COMMENT
: '//' ~('\n'|'\r')* '\r'? '\' -> channel(HIDDEN)
;
But the listener enterTop_level(...) is never called,
all comments seems to be eaten by LINE_COMMENT. How shall I organize the lexer and parser rules?
And one more question, I also want to be notified when end of input-file is reached. How do I do that? I have tried a finalize() function i the listener class, but never get called.
Updated with a complete example:
I use this grammar file : IDL.g4 as I said above. Then I update it by putting the parser rule top_level just below the event_header rule. The Lexer rules is put just above the ID rule.
Here is my Listener.java file
class Listener extends IDLBaseListener {
#Override
public void enterTop_level(IDLParser.Top_levelContext ctx) {
System.out.println("Found top-level");
}
}
and here is a main program: IDLCheck.java
import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
import java.io.FileInputStream;
import java.io.InputStream;
public class IDLCheck {
public void process(String[] args) throws Exception {
InputStream is = new FileInputStream("sample.idl");
ANTLRInputStream input = new ANTLRInputStream(is);
IDLLexer lexer = new IDLLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
IDLParser parser = new IDLParser(tokens);
parser.setBuildParseTree(true);
RuleContext tree = parser.specification();
Listener listener = new Listener();
ParseTreeWalker walker = new ParseTreeWalker();
walker.walk(listener, tree);
}
public static void main(String[] args) throws Exception {
new IDLCheck().process(args);
}
}
and a input file: sample.idl
module CommonTypes {
struct WChannel {
int w;
float d;
}; //#top-level false
struct EPlanID {
int kind;
short index;
}; //#top-level TRUE
};
I expect to see the output "Found top-level" twice, but I see nothing
Finally I found a solution. I just added newline characters to the TOP_LEVEL_FALSE and TOP_LEVEL_TRUElexer rules an I also added the top_level parser rule to the definition rule because I only expected top_level to appear after a struct or union. this is a rti.com specific extension to the IDL-format, this modification seems to be good enough for me.
definition
: type_decl SEMICOLON top_level?
| const_decl SEMICOLON
...
TOP_LEVEL_TRUE
: '//#top-level true' '\r'? '\n'
;
TOP_LEVEL_FALSE
: '//#top-level false' '\r'? '\n'
;

Is "Oops: ; return error;" a valid method declaration in C?

#include <stdio.h>
#include <AssertMacros.h>
int main( int argc, char* argv[] )
{
int error = 1;
verify_noerr( error );
require_noerr( error, Oops ); //<---- Is Oops a callback method?
printf("You shouldn't be here!\n");
Oops: ; // <--v____ Is this a method declaration?
return error; // <--^ Why the ':' followed by the ';'?
}
This code is from iOS documentation from 2006. I realize that in C the default return type for a method with no declared return type is int. But is this really a method that is leaning on that principle? And why the colon semicolon? My last thought was that its a C block, but Wikipedia says otherwise.
I'm stumped.
This:
Oops: ;
is a label, which can be the target of a goto.
I'm guessing that require_noerr is a macro that expands to a goto to the given label if error is an error code.
You'd use this system to exit from a function when an error occurred. It allows for cleanup code between the label and the end of the function (which a simple if (error) return; doesn't).
this is called a label in C programming.
in c code you can use goto to jump to this label
goto Oops;

Values in $1, $2 .. variables always NULL

I am trying to create a parser with Bison (GNU bison 2.4.1) and flex (2.5.35) on my Ubuntu OS. I have something like this:
sql.h:
typedef struct word
{
char *val;
int length;
} WORD;
struct yword
{
struct word v;
int o;
...
};
sql1.y
%{
..
#include "sql.h"
..
%}
%union yystype
{
struct tree *t;
struct yword b;
...
}
%token <b> NAME
%%
...
table:
NAME { add_table(root, $1.v); }
;
...
Trouble is that whatever string I give to it, when it comes to resolve this, v always has values (NULL, 0) even if the input string should have some table name. (I chose to skip unnecessary other details/snippets, but can provide more if it helps resolve this.)
I wrote the grammar which is complete and correct, but I can't get it to build the parse tree due to this problem.
Any inputs would be quite appreciated.
Your trouble seems related to some missing or buggous code in the lexical analyzer.
Check your lexical analyzer first.
If it does not return the token proprely the parser part can not handle correctly the values.
Write a basic test that print the token value.
Do not mind the "c" style, above all is the principle :
main() {
int token;
while( token = yylex() ) {
switch( token) {
case NAME:
printf("name '%s'\n", yylval.b.v.val );
break;
...
}
}
}
If you run some input and that does not work.
if the lexical analyzer does not set yylval when it returns NAME, it is normal that val is empty.
If in your flex you have a pattern such as :
[a-z]+ { return NAME; }
It is incorrect you have to set the value like this
[a-z]+ {
yylval.val = strdup(yytext);
yylval.length = yylen;
return NAME; }

Resources