compile binary with afl-clang-fast with custom LLVM IR passes

compile binary with afl-clang-fast with custom LLVM IR passes - clang

How can I use AFL fuzzer's clang frontend for compile time instrumentation with custom LLVM IR passes.
I wish to extract the custom IR passes from : https://github.com/obfuscator-llvm/obfuscator/
and use them when compiling with afl-clang-fast to obtain a binary that has afl instrumention AND the obfuscations specified by ollvm. Can i hypothetically use xclang with the compiled transformation passes?
In the end I just want to fuzz a binary without the obfuscation passes and compare the fuzzing to a binary with obfuscation passes. Any general ideas on how to begin would be appreciated.

Figured it out, was able to just run the modified clang with obfuscation passes on common IR generated from afl instrumentation pass.

Related

How can I find the flag dependency or conflict in LLVM?

As I know, GCC has this website to figure out the relationship between different flags using while optimization. GCC example website. Like fpartialInlining can only be useful when findirectInlining is turned on.
I think the same thing would happen in clang, in other words, I think the different passes may have this kind of dependcy/confilcts relationship in LLVM(CLANG).
But after checking all the document provided by developers, I find it just say something about the functionality in these passes. LLVM PASS DOC
So my question can be divided into 2 parts I think:
Does the dependency exists in LLVM PASS or there is no such dependency/conflicts
If there is, how can I find them.

You can find which passes are using in which optimization levels by clang while compiling any c or c++ code with clang and try to figure out dependencies. For example:
clang -O2 --target=riscv32 -mllvm -debug-pass=Structure example.c
(You can use also -debug-pass=Arguments instead of -debug-pass=Structure. It depends readability.)
this will give which passes used by clang at 2. optimization level for riscv32 target. If you don't give a target it sets default as your host machine target, and keep in mind that some used passes changes related to different targets at same optimization levels.

How to get bitcode llvm after linking?

I am trying to get LLVM IR for a file which is linked with some static libararies.
I tried to link using llvm-link . It just copy the .bc files in one file ( not like native linking).
clang -L$(T_LIB_PATH) -lpthread -emit-llvm gives an error: emit-llvm can not be used with linking. When passing -c option, it gives warning that the linking options were not used.
My main goal is to get .bc file with all resolved symbols and references. How can I achieve that with clang version 3.4.?

You may have a look at wllvm. It is a wrapper on the compiler, which enable to build a project and extract the LLVM bitcode of the whole program.
You need to use wllvm and wllvm++ for C and C++, respectively (after setting some environment variables).

Some symbols come from source code via LLVM IR. IR is short for intermediate representation. Those symbols are easy to handle, just stop in the middle of the build process.
Some others come from a library and probably were generated by some other compiler, one that never makes any IR, and in any case the compiler was run by some other people at some other location. You can't go back in time and make those people build IR for you, even if their compiler has the right options. All you can do is obtain the source code for the libraries and build your entire application from source.

Obtain compiler flags in skylark

I'd like to convert a CMake-based C++ library to bazel.
As part of the current CMake project, I'm using a libclang-based code generator that parses C++ headers and generates C++ code from the parsed AST. In order to do that, I need the actual compiler flags used to build the cc_library the header is part of. The flags are passed to the code generation tool so it can use clang's preprocessor.
Is there any way I could access the compiler flags used to build a dependency from a skylark- or gen_rule rule? I'm particularly interested in the include paths and defines.

We're working on it. Well, not right now, but will soon. You might want to subscribe to the corresponding issue, and maybe describe your requirements there so we take them into account when designing the API.

Clang compiler stages

Clang Compiler is built based on LLVM infrastructure, Clang frontend takes the C/C++ source code and generates the LLVM-IR, who does the job of using the Optimizer and the code generation?
Since the optimizer pass libraries are to be strategically placed and called in an order to generate the optimized code, where is the order specified, also who generates the target code? Is this part of Clang fronted program or is there any other program that does this optimization and generation?

There are actually two clangs, so to speak.
One is a front-end: it just does parsing, builds an Abstract Syntax Tree(AST), and applies various semantic checks. It also can do some static analysis and other helpful things. One can access the front-end using -cc1 option, e.g.: clang -cc1 -ast-dump
Second one is a driver: it takes AST from front-end and emits LLVM IR while applying some optimizations, and then making all the other magic such as building object files and linking various components together. This is what usually happens when one calls clang main.c or similar command.
Please, consider looking at help provided by both clangs:
clang -help
clang -help-hidden
clang -cc1 -help
clang -cc1 -help-hidden

LLVM: intermediate bytecode vs binary

I'm confused about one aspect of LLVM:
For all the languages it supports, does it support compiling both to the intermediate code AND to straight binary?
For instance, if I write something in C, can LLVM (or Clang?) compile to either binary (like GCC) or intermediate code?
Or can only some languages be converted to intermediate? I guess it goes without saying that this intermediate requires some type of LLVM runtime? I never really hear bout the runtime, though.

LLVM is a framework for manipulating LLVM IR (the "bytecode" you're alluding to) and lowering it to target-specific binaries (for example x86 machine code). Clang is a front-end for C/C++ (and Objective C) that translates these source languages into LLVM IR.
With this in mind, answering your questions:
For all the languages it supports, does it support compiling both to
the intermediate code AND to straight binary?
LLVM can compile IR (intermediate code) to binary (or to assembly text).
For instance, if I write something in C, can LLVM (or Clang?) compile
to either binary (like GCC) or intermediate code?
Yes. Clang can compile your code to a binary directly (using LLVM as a backend), or just emit LLVM IR if you want that.
Or can only some languages be converted to intermediate? I guess it
goes without saying that this intermediate requires some type of LLVM
runtime?
Theoretically, once you have LLVM IR, the LLVM library can convert it to binary. Some languages require a runtime (say Java, or Python), so any compiler from these languages to LLVM IR will have to provide a runtime in one way or another. LLVM has some support for connecting to such runtimes (for example - GC hooks) but carries no "runtime of its own". The only "runtime" project related to LLVM is compiler-rt, which provides fast implementations of some language/compiler builtins and intrinsics. It's mainly used for C/C++/Objective C. It's not officially part of LLVM, though full toolchains based on Clang often use it.

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart