Is the clang serialized AST portable? - clang

I'm using the python bindings to walk the clang AST...
When I encounter an error, I would like to dump the AST to a file, so that I can later load it from that file and debug the walker.
This works fine, if I dump and then load using the TranslationUnit.save() and Index.read() bindings, however this does not work if I move the AST file between platforms Linux -> Windows or Windows -> Linux.
Is this expected?
Is there a way to make the AST files "portable"?

Not "portable" if you have some template C++ code that is accepted by MSVC, but typically diagnosed by Clang as invalid.
From Clang Documentation:
Third, MSVC accepts some C++ code that Clang will typically diagnose as invalid. When these constructs are present in widely included system headers, Clang attempts to recover and continue compiling the user’s program. Most parsing and semantic compatibility tweaks are controlled by -fms-compatibility and -fdelayed-template-parsing, and they are a work in progress.
Compare your saved AST files to check if they have the same nodes (the raw pointer address could be different). If not, fixed with the mentioned flags -fms-compatibility, -fdelayed-template-parsing.

Related

Is there a way to use sqrt when using clang and web assembly target

I'm compiling c++ to web assembly using clang --target=wasm32 --no-standard-libraries. Is there a way to convince clang to generate sqrt? It's not finding <math.h> with this target.
Do you already tried to compile without the flag --no-standard-libraries? If you remove it, the clang probably will find math.h library (because its a standard library).
This is because wasm32-unknown-unknown is a completely barebones targets in Clang, and doesn't have any standard library - that is, no math.h, no I/O functions, not even memcpy.
However, you can usually get away with using --target wasm32-wasi + WASI SDK instead: https://github.com/WebAssembly/wasi-sdk
It includes the whole standard library, including even functions for interacting with the filesystem via the WASI standard in compatible environments.
If your code doesn't depend on filesystem / clock / other I/O, then you can safely use WASI-SDK to get math.h, memcpy, malloc and other standard functions, and the resulting WebAssembly will be compatible with any non-WASI environments as well.

How to get bitcode llvm after linking?

I am trying to get LLVM IR for a file which is linked with some static libararies.
I tried to link using llvm-link . It just copy the .bc files in one file ( not like native linking).
clang -L$(T_LIB_PATH) -lpthread -emit-llvm gives an error: emit-llvm can not be used with linking. When passing -c option, it gives warning that the linking options were not used.
My main goal is to get .bc file with all resolved symbols and references. How can I achieve that with clang version 3.4.?
You may have a look at wllvm. It is a wrapper on the compiler, which enable to build a project and extract the LLVM bitcode of the whole program.
You need to use wllvm and wllvm++ for C and C++, respectively (after setting some environment variables).
Some symbols come from source code via LLVM IR. IR is short for intermediate representation. Those symbols are easy to handle, just stop in the middle of the build process.
Some others come from a library and probably were generated by some other compiler, one that never makes any IR, and in any case the compiler was run by some other people at some other location. You can't go back in time and make those people build IR for you, even if their compiler has the right options. All you can do is obtain the source code for the libraries and build your entire application from source.

clang -module-file-info doesn't generate any output

I'm trying to move a cross-compiled CMake project to Clang Modules to see whether compile time reduction is worth it. However, it seems that Clang is generating lots of duplicate modules in it's ModuleCache.
I'd like to figure out why (maybe some CMake config, etc), so I'm trying to run clang -module-file-info on the generated module files.
However, clang's output is just empty whenever I provide a proper module file. Am I doing anything wrong? Is there anything special that I need to take care of?
The files all have a reasonable size (from a few kB to a few MB), look fine in a Hex editor (start with CPCH, have some recognizable strings, etc) and whenever I specify a wrong file (or a file compiled with a different version of clang) I get the appropriate errors.
I've tried with clang 7.0.1 as well as 8.0.0.
I also tried --verbose but that didn't show any problems either.
To answer my own question:
clang doesn't output the stats on the command line, it puts it into a file by default written in the current directory.

LLVM: intermediate bytecode vs binary

I'm confused about one aspect of LLVM:
For all the languages it supports, does it support compiling both to the intermediate code AND to straight binary?
For instance, if I write something in C, can LLVM (or Clang?) compile to either binary (like GCC) or intermediate code?
Or can only some languages be converted to intermediate? I guess it goes without saying that this intermediate requires some type of LLVM runtime? I never really hear bout the runtime, though.
LLVM is a framework for manipulating LLVM IR (the "bytecode" you're alluding to) and lowering it to target-specific binaries (for example x86 machine code). Clang is a front-end for C/C++ (and Objective C) that translates these source languages into LLVM IR.
With this in mind, answering your questions:
For all the languages it supports, does it support compiling both to
the intermediate code AND to straight binary?
LLVM can compile IR (intermediate code) to binary (or to assembly text).
For instance, if I write something in C, can LLVM (or Clang?) compile
to either binary (like GCC) or intermediate code?
Yes. Clang can compile your code to a binary directly (using LLVM as a backend), or just emit LLVM IR if you want that.
Or can only some languages be converted to intermediate? I guess it
goes without saying that this intermediate requires some type of LLVM
runtime?
Theoretically, once you have LLVM IR, the LLVM library can convert it to binary. Some languages require a runtime (say Java, or Python), so any compiler from these languages to LLVM IR will have to provide a runtime in one way or another. LLVM has some support for connecting to such runtimes (for example - GC hooks) but carries no "runtime of its own". The only "runtime" project related to LLVM is compiler-rt, which provides fast implementations of some language/compiler builtins and intrinsics. It's mainly used for C/C++/Objective C. It's not officially part of LLVM, though full toolchains based on Clang often use it.

Native code execution by JVM/CLR

How does JVM/CLR execute JIT compiled native code? Is it by some code injection or by copying code to executable memory? What are the system calls that allows dynamic code execution?
I can explain how we do it in CACAO VM (a research JIT-only JVM). First, the machine code for a method is generated into some heap-allocated memory block. After compilation, the final code length is known, and a chunk of executable memory is allocated using mmap and the PROT_EXEC flag (relevant CACAO code here). Then, the machine code is copied into the mmapped area. After that, many architectures require some machine-specific cache flushing mechanism. As an example, have a look at the cache-flushing function for PowerPC 64. Notably, on i386 and x86_64, there is nothing to do. After this step, the processor is ready to execute the newly-generated code. Alternatively, already allocated memory pages can be marked executable with mprotect. Note that mmap/mprotect are Unix facilities.
I don't know specifically how Java does it, but in general you'd insert "trap" opcodes into the interpreter's instruction stream. There are two opcodes in the JVM spec that seem tailor-made for this purpose.
If you want to know for sure, there's no better answer than the source: http://download.java.net/jdk6/source/
The Common Language Runtime has a methodtable for each type with entries pointing to native code or a native stub to JIT managed code and then fixup the methodtable with the pointer to the just created native code.
MSDN has a more in depth explanation in the MethodDesc section
This blog entry by Dave Notario explains how the CLR JIT compiler works.

Resources