Parse c-clang index.h file with with clang itself - clang

I am trying to parse c-clang index.h file with ClangSharp (just for testing purposes of ClangSharp parser on C#) and I found that it misses parsing of functions because of CINDEX_LINKAGE macro in the function declaration.
If I remove it, parser will correctly find FunctionDecl and parse it without errors.
I cannot understand how this macro preventing functions from being parsed. Does someone know how to workaround this?

Issue was in the #include line itself. By default, clang header includes setup to search in the directory on one level up, but clang itself by some reason does not understand such
include format.

Related

Clang libtooling: how to print compiler macro definitions

I have a LibTooling based utility and I would like to output a list of macro definitions for debug purposes. One can print compiler macro definitions with clang/gcc -dM -E -, but it does not seem to work if I pass -dM -E or -dD to ClangTool. Is it possible to do so with LibTooling API or CLI options in any way? It does not matter if it will include macros defined in a parsed source code or not.
I've looked at other similar questions, and as far as I can tell they all are about macros expanded in a parsed source code. That's not exactly what I need.
The answer is blindingly obvious—in retrospect. clang::Preprocessor::getPredefines() returns just that—a list of compiler predefines. The preprocessor object can be obtained e.g. from clang::CompilerInstance, as an argument in clang::DiagnosticConsumer::BeginSourceFile(), etc.
Just for the sake of completeness, this functionality is not available via libclang API, but I could do that using the fact that all predefines are present in the beginning of a translation unit. So after parsing I just listed all top-level cursors of CursorKind.MACRO_DEFINITION that are not in any real location (location.file is None using python bindings notation)

AST of a project by Clang

I use the Clang python binding to extract the AST of c/c++ files. It works perfectly for a simple program I wrote. The problem is when I want to employ it for a big project like openssl. I can run clang for any single file of the project, but clang seems to miss some headers of the project, and just gives me the AST of a few functions of the file, not all of the functions. I set the include folder by -I, but still getting part of the functions.
This is my code:
import clang.cindex as cl
cl.Config.set_library_path(clang_lib_dir)
index = cl.Index.create()
lib = 'Path to include folder'
args = ['-I{}'.format(lib)]
translation_unit = index.parse(source_file, args=args)
my_get_info(translation_unit.cursor)
I receive too many header files not found errors.
UPDATE
I used Make to compile openssl by clang? I can pass -emit-ast option to clang to dump the ast of each file, but I cannot read it now by the clang python binding.
Any clues how I can save the the serialized representation of the translation units so that I will be able to read it by index.read()?
Thank you!
You would "simply" need to provide the right args. But be aware of two possible issues.
Different files may require different arguments for parsing. The easiest solution is to obtain compilation database and then extract compile commands from it. If you go this way be aware that you would need to filter out the arguments a bit and remove things like -c FooBar.cpp (potentially some others), otherwise you may get something like ASTReadError.
Another issue is that the include paths (-I ...) may be relative to the source directory. I.e., if a file main.cpp compiled from a directory /opt/project/ with -I include/path argument, then before calling index.parse(source_file, args=args) you need to step in (chdir) into the /opt/project, and when you are done you will probably need to go back to the original working directory. So the code may look like this (pseudocode):
cwd = getcwd()
chdir('/opt/project')
translation_unit = index.parse(source_file, args=args)
chdir(cwd)
I hope it helps.

Add #include's to the headers of a program using llvm clang

I need to add headers to an already existing program by transforming it with LLVM and Clang.
I have used clang's rewriter to accomplish a similar thing in the changing function names and arguments, etc.
But the header files aren't present in clang's AST. I already know we need to use PPCallbacks (https://clang.llvm.org/doxygen/classclang_1_1PPCallbacks.html) but I am in dire need of some examples on how to make it work with the rewriter if at all possible.
Alternatively, adding a #include statement just before the first
using namespace <namespace>;
Also works. I would like to know an example of this as well.
Any help would be appreciated.
There is a bit of confusion in your question. You need to understand in details how the preprocessor works. Be aware that most of C++ compilation happens after the preprocessing phase (so most C++ static analyzers work after that phase).
In other words, the C++ specification (and also the C specification) defines first what is preprocessing, and then what is the syntax and the semantics of the preprocessed form.
In other words, when compiling foo.cc your compiler see the preprocessed form foo.ii that you could obtain with clang++ -C -E foo.cc > foo.ii
In the 1980s the preprocessor /lib/cpp was a separate program forked by the compiler (and some temporary foo.ii was sitting on the disk and removed at end of compilation). Today, it is -for performance reasons- some initial processing done inside the compiler. But you could reason as if it was still separate.
Either you want to alter the Clang compiler, and it deals (like every other C++ compiler or C++ static analyzer) mostly with the preprocessed form. Then you don't want to add new #include-s, but you want to alter the flow of AST given to the compiler (after preprocessing), and that is a different question: you then want to add some AST between existing AST elements (independently of any preprocessor directives).
Or you want to automatically change the C++ source code. The hard part is determining what you want to change and at what place. I suppose that you have used complex stuff to determine that a #include <vector> has to be inserted after line 34 of file foo.cc. Once you've got that information (and getting it is the hard thing), doing the insertion is pretty trivial. For example, you could read every C++ source line, and insert your line when you have read enough lines.

flex yy_fatal_error exist just like that. I want handler back to application

flex yy_fatal_error exist just like that. But I want handler back to my application. How to avoid exist call? from yy_fatal_error. whether this problem addressed in any version? your suggestion is highly appreciated. help me on this issues.
You can override the function, by #defineing your own. Note that in the generated code there is
/* Report a fatal error. */
#ifndef YY_FATAL_ERROR
#define YY_FATAL_ERROR(msg) yy_fatal_error( msg )
#endif
If you #define the macro YY_FATAL_ERROR(msg) to call your own function, the lexer will call that function rather than the one from the template.
However, the lexer template is written to assume that this function does not return. You can make it do that by using setjmp and longjmp to prepare a predictable place to return in your application and jumping back (from your own yy_fatal_error function) to that when a "fatal" error is used.
vi like emacs does this for instance, because it uses lexers for syntax highlighting. If a fatal error is generated by the lexer, you would not want the editor to stop.
Here are a few links discussing setjmp and longjmp:
Practical usage of setjmp and longjmp in C
setjmp and longjmp - understanding with examples

Flex input buffer reset after error

I'm using flex & bison to parse a custom language and I'm in the situation described here: http://www.gnu.org/software/bison/manual/html_node/How-Can-I-Reset-the-Parser.html.
To be more precise
I invoke yyparse several times, and on correct input it works
properly; but when a parse error is found, all the other calls fail
too. How can I reset the error flag of yyparse?
My parser and scanner run inside a separate thread, but there is only one thread working with the input file. In my understanding I don't need to write a reentrant scanner since there is only one thread working with the input file. In that page the problem is clearly explained but the solution is not clear to me.
It says:
Therefore, whenever you change yyin, you must tell the Lex-generated
scanner to discard its current buffer and switch to the new one. This
depends upon your implementation of Lex; see its documentation for
more. For Flex, it suffices to call ‘YY_FLUSH_BUFFER’ after each
change to yyin. If your Flex-generated scanner needs to read from
several input streams to handle features like include files, you might
consider using Flex functions like ‘yy_switch_to_buffer’ that
manipulate multiple input buffers
My parser thread calls yyparse in order to build my AST. What is not clear to me is when and where I have to call yy_flush_buffer to fix the problem. In my understanding the scanner code (generated by Flex) is called by the parser code (generated by Bison). The Bison generated code is generated by the grammar. As a result the parser code is not under my direct control. This means I cannot include the call to yy_flush_buffer into the parser code since it would be overwritten every time I generate the parser code by the grammar. It means that I should put the yy_flush_buffer in the grammr file somewhere. But where?
I fixed the problem by doing:
...
FILE *f = fopen(_filename, "r");
yyrestart(f);
yyparse();
...
I leave the question since it could be useful for other people.

Resources