LLVM IR for a large C project - clang

I'm new to LLVM, clang etc.
My primary need is to convert C code to LLVM IR (.ll) then parse this to a assembly-like language. For this I was till now working with small length c codes (single file) and to get these converted to LLVM IR I used -
$ clang -emit-llvm -S multiply.c -o multiply.ll
To do this conversion with a large C project, I tried gllvm - https://github.com/SRI-CSL/gllvm
and it was really easy & convenient but I am facing this issue with it - https://github.com/PLSysSec/sys/issues/16
which basically a bug in the llvm-hs repo essentially asking me to pull the commit which has the fix and build it. I'm trying to avoid doing this (is this a better way ?)
So I moved onto another solution of using the gold linker - http://gbalats.github.io/2015/12/10/compiling-autotooled-projects-to-LLVM-bitcode.html
I am now stuck on using this gold plugin to get .bc or .ll files - I want to use this gold plugin to convert a large C project into .ll or .bc files - but the steps only show how to use the linker to get optimised executable. Are there any command line options to only convert it till .bc files ?
Any help would be appreciated.

The llvm-link tool combines multiple .bc files into a single .bc. It does not support native object files at all, nor does it work well with normal linker flags (it does not pretend to be ld).
If you do want changes to the llvm IR due to your native libraries on the system, the gold plugin to llvm also supports a emit-llvm flag. Run a clang or gcc link with -Wl,-plugin-opt=emit-llvm when linking with gold.

Related

How to get bitcode llvm after linking?

I am trying to get LLVM IR for a file which is linked with some static libararies.
I tried to link using llvm-link . It just copy the .bc files in one file ( not like native linking).
clang -L$(T_LIB_PATH) -lpthread -emit-llvm gives an error: emit-llvm can not be used with linking. When passing -c option, it gives warning that the linking options were not used.
My main goal is to get .bc file with all resolved symbols and references. How can I achieve that with clang version 3.4.?
You may have a look at wllvm. It is a wrapper on the compiler, which enable to build a project and extract the LLVM bitcode of the whole program.
You need to use wllvm and wllvm++ for C and C++, respectively (after setting some environment variables).
Some symbols come from source code via LLVM IR. IR is short for intermediate representation. Those symbols are easy to handle, just stop in the middle of the build process.
Some others come from a library and probably were generated by some other compiler, one that never makes any IR, and in any case the compiler was run by some other people at some other location. You can't go back in time and make those people build IR for you, even if their compiler has the right options. All you can do is obtain the source code for the libraries and build your entire application from source.

clang -module-file-info doesn't generate any output

I'm trying to move a cross-compiled CMake project to Clang Modules to see whether compile time reduction is worth it. However, it seems that Clang is generating lots of duplicate modules in it's ModuleCache.
I'd like to figure out why (maybe some CMake config, etc), so I'm trying to run clang -module-file-info on the generated module files.
However, clang's output is just empty whenever I provide a proper module file. Am I doing anything wrong? Is there anything special that I need to take care of?
The files all have a reasonable size (from a few kB to a few MB), look fine in a Hex editor (start with CPCH, have some recognizable strings, etc) and whenever I specify a wrong file (or a file compiled with a different version of clang) I get the appropriate errors.
I've tried with clang 7.0.1 as well as 8.0.0.
I also tried --verbose but that didn't show any problems either.
To answer my own question:
clang doesn't output the stats on the command line, it puts it into a file by default written in the current directory.

LLVM compilation process and LLD

I've been trying to make the switch to LLVM, since I'd like to get more into the whole 'software-dev' scene, and it seems like right now, LLVM is the future. I built LLVM/Clang/LLD/compiler-rt/libcxx from source several times now, both with GNU/GCC and LLVM/Clang.
The problem appears when I try to use the newly compiled compilers. From what I can see, clang is using GNU ld rather than LLVM's lld. Is this true?
LLD seems to be a very limited program from the lld -help output, but from what I have read, it is as full featured as ld. I cannot find documentation on how to use it anywhere -- does anyone know where I can find some kind of comprehensive manual on it?
Thank you.
Pass -fuse-ld=lld to clang to make it use lld for linking. By now, it's in very good shape.
You can pass -v or -### to clang to make it print which linker command it runs or would run.
There's no manual for the moment and depending on platform may work well enough for you. That said, if lld were "production ready" we'd have switched clang to using it by default for the various platforms. It's not there yet so I wouldn't suggest you use it for your day to day development.
The LLVM team say that is production ready because FreeBSD can compile and link a lot of things with LLD.
The documentation on the LLD project can be found on http://lld.llvm.org/.
It's written :
LLD is a drop-in replacement for the GNU linkers.
That accepts the same command line arguments and linker scripts as GNU.
So you can use same arguments than GNU LD.
I know this question is old, but there is a newer solution to it:
To use the ld.lld linker when building any llvm target, just pass -DLLVM_ENABLE_LLD=ON in the commandline to cmake.
//Use lld as C and C++ linker.
LLVM_ENABLE_LLD:BOOL=TRUE
For other cmake projects, pass: -DCMAKE_LINKER=/etc/bin/ld.lld

Make an LLVM ModulePass available on clang command line

I have a ModulePass that's working with the opt tool, but I'm having trouble figuring out how to make it available to clang at the command line. My current workflow for using my pass is:
clang -c -emit-llvm [c-source code files]
llvm-link [llvm bitcode files]
opt -load [PassName].so -[pass-name] [linked llvm file]
llc [resulting bitcode file]
gcc [resulting assembler file] -o [target]
I would like to get my pass integrated with the clang command line so that it could be invoked as part of the build of existing software (e.g. c++ standard library) without having to remake the whole build system for each thing I compile. I've seen hints about how to do this, but I haven't been able to put the pieces together into a working setup.
Run an LLVM Pass Automatically with Clang describes exactly what I want, but the method appears to be deprecated in LLVM 3.4 (PassManagerBuilder has been moved to the legacy namespace).
LLVM - Run Own Pass automatically with clang seems to address the basic issue, but I was hoping I could do this without having to modify clang (which seems to be what's suggested there).
What is the best way to make a new pass available from clang using LLVM 3.4?
Clang still uses PassManagerBuilder as of 3.5 (see the PassManagerBuilderWrapper class in BackendUtil.cpp). So I believe extending it with RegisterStandardPasses, as in my blog post, is still the only way to add a pass to Clang's pass manager.
It's frustratingly difficult to find any information about how deprecated the "old" pass manager infrastructure is. But since Clang is still using it, it can't be that deprecated. :)

LLVM jit and native

I don't understand how LLVM JIT relates to normal no JIT compilation and the documentation isn't good.
For example suppose I use the clang front end:
Case 1: I compile C file to native with clang/llvm. This flow I understand is like gcc flow - I get my x86 executable and that runs.
Case 2: I compile into some kind of LLVM IR that runs on LLVM JIT. In this case the executable contains the LLVM runtime to execute the IR on JIT, or how does it work?
What is the difference between these two and are they correct? Does LLVM flow include support for both JIT and non JIT? When do I want to use JIT - does it make sense at all for a language like C?
You have to understand that LLVM is a library that helps you build compilers. Clang is merely a frontend for this library.
Clang translates C/C++ code into LLVM IR and hands it over to LLVM, which compiles it into native code.
LLVM is also able to generate native code directly in memory, which then can be called as a normal function. So case 1. and 2. share LLVM's optimization and code generation.
So how does one use LLVM as a JIT compiler? You build an application which generates some LLVM IR (in memory), then use the LLVM library to generate native code (still in memory). LLVM hands you back a pointer which you can call afterwards. No clang involved.
You can, however, use clang to translate some C code into LLVM IR and load this into your JIT context to use the functions.
Real World examples:
Unladen Swallow Python VM
Rubinius Ruby VM
There is also the Kaleidoscope tutorial which shows how to implement a simple language with JIT compiler.
First, you get LLVM bytecode (LLVM IR):
clang -emit-llvm -S -o test.bc test.c
Second, you use LLVM JIT:
lli test.bc
That runs the program.
Then, if you wish to get native, you use LLVM backend:
llc test.bc
From the assembly output:
as test.S
I am taking the steps to compile and run the JIT'ed code from a mail message in LLVM community.
[LLVMdev] MCJIT and Kaleidoscope Tutorial
Header file:
// foo.h
extern void foo(void);
and the function for a simple foo() function:
//foo.c
#include <stdio.h>
void foo(void) {
puts("Hello, I'm a shared library");
}
And the main function:
//main.c
#include <stdio.h>
#include "foo.h"
int main(void) {
puts("This is a shared library test...");
foo();
return 0;
}
Build the shared library using foo.c:
gcc foo.c -shared -o libfoo.so -fPIC
Generate the LLVM bitcode for the main.c file:
clang -Wall -c -emit-llvm -O3 main.c -o main.bc
And run the LLVM bitcode through jit (and MCJIT) to get the desired output:
lli -load=./libfoo.so main.bc
lli -use-mcjit -load=./libfoo.so main.bc
You can also pipe the clang output into lli:
clang -Wall -c -emit-llvm -O3 main.c -o - | lli -load=./libfoo.so
Output
This is a shared library test...
Hello, I'm a shared library
Source obtained from
Shared libraries with GCC on Linux
Most compilers have a front end, some middle code/structure of some sort, and the backend. When you take your C program and use clang and compile such that you end up with a non-JIT x86 program that you can just run, you have still gone from frontend to middle to backend. Same goes for gcc, gcc goes from frontend to a middle thing and a backend. Gccs middle thing is not wide open and usable as is like LLVM's.
Now one thing that is fun/interesting about llvm, that you cannot do with others, or at least gcc, is that you can take all of your source code modules, compile them to llvms bytecode, merge them into one big bytecode file, then optimize the whole thing, instead of per file or per function optimization you get with other compilers, with llvm you can get any level of partial to compilete program optimization you like. then you can take that bytecode and use llc to export it to the targets assembler. I normally do embedded so I have my own startup code that I wrap around that but in theory you should be able to take that assembler file and with gcc compile and link it and run it. gcc myfile.s -o myfile. I imagine there is a way to get the llvm tools to do this and not have to use binutils or gcc, but I have not taken the time.
I like llvm because it is always a cross compiler, unlike gcc you dont have to compile a new one for each target and deal with nuances for each target. I dont know that I have any use for the JIT thing is what I am saying I use it as a cross compiler and as a native compiler.
So your first case is the front, middle, end and the process is hidden from you you start with source and get a binary, done. The second case is if I understand right the front and the middle and stop with some file that represents the middle. Then the middle to end (the specific target processor) can happen just in time at runtime. The difference there is the backend, the real time execution of the middle language of case two, is likely different than the backend of case one.

Resources