Clang compiler stages - clang

Clang Compiler is built based on LLVM infrastructure, Clang frontend takes the C/C++ source code and generates the LLVM-IR, who does the job of using the Optimizer and the code generation?
Since the optimizer pass libraries are to be strategically placed and called in an order to generate the optimized code, where is the order specified, also who generates the target code? Is this part of Clang fronted program or is there any other program that does this optimization and generation?

There are actually two clangs, so to speak.
One is a front-end: it just does parsing, builds an Abstract Syntax Tree(AST), and applies various semantic checks. It also can do some static analysis and other helpful things. One can access the front-end using -cc1 option, e.g.: clang -cc1 -ast-dump
Second one is a driver: it takes AST from front-end and emits LLVM IR while applying some optimizations, and then making all the other magic such as building object files and linking various components together. This is what usually happens when one calls clang main.c or similar command.
Please, consider looking at help provided by both clangs:
clang -help
clang -help-hidden
clang -cc1 -help
clang -cc1 -help-hidden

Related

Is it possible to create LLVM Pass for OpenCL Kernel?

I would like to create an LLVM Pass to optimize OpenCL kernel for NVIDIA Cards. I wonder if it is possible.
I have tried followings:
clang -Xclang -load -Xclang lib/simplePass.so main.c
It did not work, cannot alter the kernel code.
Separate compiling then linking.
It also does not work, gave me error that get_global_id is undefined.
Using offline compiler then clCreateProgramWithBinary
I followed Apple's example, It work on the with Intel GPU, however was not able to use an LLVM Pass. When I tried to use it, it gave me error:
LLVM ERROR: Sized aggregate specification in datalayout string
When I tried to adopt it into Xubuntu, it does not work.
Is there any another method that I can tried? I know I can use SPIR-V IR but Nvidia does not support OpenCL 2.2 currently.
Thank you for your time.

How can I find the flag dependency or conflict in LLVM?

As I know, GCC has this website to figure out the relationship between different flags using while optimization. GCC example website. Like fpartialInlining can only be useful when findirectInlining is turned on.
I think the same thing would happen in clang, in other words, I think the different passes may have this kind of dependcy/confilcts relationship in LLVM(CLANG).
But after checking all the document provided by developers, I find it just say something about the functionality in these passes. LLVM PASS DOC
So my question can be divided into 2 parts I think:
Does the dependency exists in LLVM PASS or there is no such dependency/conflicts
If there is, how can I find them.
You can find which passes are using in which optimization levels by clang while compiling any c or c++ code with clang and try to figure out dependencies. For example:
clang -O2 --target=riscv32 -mllvm -debug-pass=Structure example.c
(You can use also -debug-pass=Arguments instead of -debug-pass=Structure. It depends readability.)
this will give which passes used by clang at 2. optimization level for riscv32 target. If you don't give a target it sets default as your host machine target, and keep in mind that some used passes changes related to different targets at same optimization levels.

LLVM: intermediate bytecode vs binary

I'm confused about one aspect of LLVM:
For all the languages it supports, does it support compiling both to the intermediate code AND to straight binary?
For instance, if I write something in C, can LLVM (or Clang?) compile to either binary (like GCC) or intermediate code?
Or can only some languages be converted to intermediate? I guess it goes without saying that this intermediate requires some type of LLVM runtime? I never really hear bout the runtime, though.
LLVM is a framework for manipulating LLVM IR (the "bytecode" you're alluding to) and lowering it to target-specific binaries (for example x86 machine code). Clang is a front-end for C/C++ (and Objective C) that translates these source languages into LLVM IR.
With this in mind, answering your questions:
For all the languages it supports, does it support compiling both to
the intermediate code AND to straight binary?
LLVM can compile IR (intermediate code) to binary (or to assembly text).
For instance, if I write something in C, can LLVM (or Clang?) compile
to either binary (like GCC) or intermediate code?
Yes. Clang can compile your code to a binary directly (using LLVM as a backend), or just emit LLVM IR if you want that.
Or can only some languages be converted to intermediate? I guess it
goes without saying that this intermediate requires some type of LLVM
runtime?
Theoretically, once you have LLVM IR, the LLVM library can convert it to binary. Some languages require a runtime (say Java, or Python), so any compiler from these languages to LLVM IR will have to provide a runtime in one way or another. LLVM has some support for connecting to such runtimes (for example - GC hooks) but carries no "runtime of its own". The only "runtime" project related to LLVM is compiler-rt, which provides fast implementations of some language/compiler builtins and intrinsics. It's mainly used for C/C++/Objective C. It's not officially part of LLVM, though full toolchains based on Clang often use it.

Make an LLVM ModulePass available on clang command line

I have a ModulePass that's working with the opt tool, but I'm having trouble figuring out how to make it available to clang at the command line. My current workflow for using my pass is:
clang -c -emit-llvm [c-source code files]
llvm-link [llvm bitcode files]
opt -load [PassName].so -[pass-name] [linked llvm file]
llc [resulting bitcode file]
gcc [resulting assembler file] -o [target]
I would like to get my pass integrated with the clang command line so that it could be invoked as part of the build of existing software (e.g. c++ standard library) without having to remake the whole build system for each thing I compile. I've seen hints about how to do this, but I haven't been able to put the pieces together into a working setup.
Run an LLVM Pass Automatically with Clang describes exactly what I want, but the method appears to be deprecated in LLVM 3.4 (PassManagerBuilder has been moved to the legacy namespace).
LLVM - Run Own Pass automatically with clang seems to address the basic issue, but I was hoping I could do this without having to modify clang (which seems to be what's suggested there).
What is the best way to make a new pass available from clang using LLVM 3.4?
Clang still uses PassManagerBuilder as of 3.5 (see the PassManagerBuilderWrapper class in BackendUtil.cpp). So I believe extending it with RegisterStandardPasses, as in my blog post, is still the only way to add a pass to Clang's pass manager.
It's frustratingly difficult to find any information about how deprecated the "old" pass manager infrastructure is. But since Clang is still using it, it can't be that deprecated. :)

LLVM jit and native

I don't understand how LLVM JIT relates to normal no JIT compilation and the documentation isn't good.
For example suppose I use the clang front end:
Case 1: I compile C file to native with clang/llvm. This flow I understand is like gcc flow - I get my x86 executable and that runs.
Case 2: I compile into some kind of LLVM IR that runs on LLVM JIT. In this case the executable contains the LLVM runtime to execute the IR on JIT, or how does it work?
What is the difference between these two and are they correct? Does LLVM flow include support for both JIT and non JIT? When do I want to use JIT - does it make sense at all for a language like C?
You have to understand that LLVM is a library that helps you build compilers. Clang is merely a frontend for this library.
Clang translates C/C++ code into LLVM IR and hands it over to LLVM, which compiles it into native code.
LLVM is also able to generate native code directly in memory, which then can be called as a normal function. So case 1. and 2. share LLVM's optimization and code generation.
So how does one use LLVM as a JIT compiler? You build an application which generates some LLVM IR (in memory), then use the LLVM library to generate native code (still in memory). LLVM hands you back a pointer which you can call afterwards. No clang involved.
You can, however, use clang to translate some C code into LLVM IR and load this into your JIT context to use the functions.
Real World examples:
Unladen Swallow Python VM
Rubinius Ruby VM
There is also the Kaleidoscope tutorial which shows how to implement a simple language with JIT compiler.
First, you get LLVM bytecode (LLVM IR):
clang -emit-llvm -S -o test.bc test.c
Second, you use LLVM JIT:
lli test.bc
That runs the program.
Then, if you wish to get native, you use LLVM backend:
llc test.bc
From the assembly output:
as test.S
I am taking the steps to compile and run the JIT'ed code from a mail message in LLVM community.
[LLVMdev] MCJIT and Kaleidoscope Tutorial
Header file:
// foo.h
extern void foo(void);
and the function for a simple foo() function:
//foo.c
#include <stdio.h>
void foo(void) {
puts("Hello, I'm a shared library");
}
And the main function:
//main.c
#include <stdio.h>
#include "foo.h"
int main(void) {
puts("This is a shared library test...");
foo();
return 0;
}
Build the shared library using foo.c:
gcc foo.c -shared -o libfoo.so -fPIC
Generate the LLVM bitcode for the main.c file:
clang -Wall -c -emit-llvm -O3 main.c -o main.bc
And run the LLVM bitcode through jit (and MCJIT) to get the desired output:
lli -load=./libfoo.so main.bc
lli -use-mcjit -load=./libfoo.so main.bc
You can also pipe the clang output into lli:
clang -Wall -c -emit-llvm -O3 main.c -o - | lli -load=./libfoo.so
Output
This is a shared library test...
Hello, I'm a shared library
Source obtained from
Shared libraries with GCC on Linux
Most compilers have a front end, some middle code/structure of some sort, and the backend. When you take your C program and use clang and compile such that you end up with a non-JIT x86 program that you can just run, you have still gone from frontend to middle to backend. Same goes for gcc, gcc goes from frontend to a middle thing and a backend. Gccs middle thing is not wide open and usable as is like LLVM's.
Now one thing that is fun/interesting about llvm, that you cannot do with others, or at least gcc, is that you can take all of your source code modules, compile them to llvms bytecode, merge them into one big bytecode file, then optimize the whole thing, instead of per file or per function optimization you get with other compilers, with llvm you can get any level of partial to compilete program optimization you like. then you can take that bytecode and use llc to export it to the targets assembler. I normally do embedded so I have my own startup code that I wrap around that but in theory you should be able to take that assembler file and with gcc compile and link it and run it. gcc myfile.s -o myfile. I imagine there is a way to get the llvm tools to do this and not have to use binutils or gcc, but I have not taken the time.
I like llvm because it is always a cross compiler, unlike gcc you dont have to compile a new one for each target and deal with nuances for each target. I dont know that I have any use for the JIT thing is what I am saying I use it as a cross compiler and as a native compiler.
So your first case is the front, middle, end and the process is hidden from you you start with source and get a binary, done. The second case is if I understand right the front and the middle and stop with some file that represents the middle. Then the middle to end (the specific target processor) can happen just in time at runtime. The difference there is the backend, the real time execution of the middle language of case two, is likely different than the backend of case one.

Resources