I want to use DS-5 Streamline profiler to profile my code. In the documentation its mentioned that to be able to see call stacks, we need to compile code with compiler option -fno-omit-frame-pointer. This option is there in gcc.
Is there an equivalent option for clang also?
-fno-omit-frame-pointer is not working for me with clang.
I have also tried setting the compiler optimization level to 0, but still I am not getting call stacks in streamline.
It looks like DS-5 is an ARM thing, so this might not be relevant, but I ran into a similar issue trying to get good stack traces out of a clang-compiled executable using Linux's perf profiler.
The problem ended up being that, on x86-64 Linux at least, Clang requires both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer in order to get the same behavior that gcc gives with only -fno-omit-frame-pointer. See this LLVM bug: "Need both -fno-omit-frame-pointer and -mno-omit-leaf-frame-pointer to get a fp on linux on a leaf function"
Related
I am trying to build clang, however the build size is quite large. As clang supports non-C family languages as well ( e.g. Java, Fortran ), is there a way to turn that off during the build. I just want to have support for C and C++ and don't care about other languages.
Is there a CMake option that needs to be set to do that??
Thanks a lot!
Best Regards,
Nitish
As others have commented, clang is a C/C++ front end only, and there's no Java/Fortran front end to disable.
However, there are others ways to reduce clang build size:
Choosing a suitable build configuration
The default build configuration for LLVM/clang is Debug. Building for Debug (not specifying a build configuration) results with huge executables, and build folder may take > 20GB. This is primarily due to debug information.
If you're not developing clang, and don't need debug information, you may build for MinSizeRel, which is a release build that is optimized for size.
Tweaking build settings
If you are planning to debug clang or do light clang developement, another option is building with a minimal debug information - the -gmlt option keeps line debug information only which allows source stepping, and results with much more compact object files, compared to full debug information (-g).
Disabling build components
You may disable some components from building, such as tests and examples:
-DLLVM_INCLUDE_TESTS=Off -DLLVM_INCLUDE_EXAMPLES=Off
Putting it together:
cmake -DCMAKE_BUILD_TYPE=MinSizeRel -DLLVM_INCLUDE_TESTS=Off -DLLVM_INCLUDE_EXAMPLES=Off
For compact debug build:
cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_CXX_FLAGS=-gmlt -DLLVM_INCLUDE_TESTS=Off -DLLVM_INCLUDE_EXAMPLES=Off
Hope this helps!
The answer is easy: clang is C/C++ frontend, it does not support neither Java nor Fortran, therefore there is no such option - there is nothing to turn off.
I'm not sure how much it would help, but you could optimize your compilation of clang for size. Disabling debug symbols as others have said should also help. Set CFLAGS="-Os" CXXFLAGS="-Os" as environment variables when you build clang.
This is from GCC 4.8.5
-Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
-Os disables the following optimization flags: -falign-functions -falign-jumps -falign-loops -falign-labels -freorder-blocks
-freorder-blocks-and-partition -fprefetch-loop-arrays -ftree-vect-loop-version
Do you guys know why the AddressSanitizer would be taking a whole different set of libraries.
For instance, I was trying to recreate strcmp, when I was comparing my output with the standard strcmp from string.h but what I realized is that compiling it normally with gcc it outputs the difference, but with the -fsanitize=address flag added it gives me 1, 0, -1 outputs.
both gcc and clang behave the same way
I am on a OSX 10.11.6, btw.
Is this behavior unique to MACOS or other systems have similar effects?
Btw, from what I was reading, the strcmp of the GNU C library outputs the difference and the Apple version only has outputs of 1, -1 and 0.
So this is even more puzzling to me, because the gcc/clang in MACOS seems to be using the gnu libc by default, and somehow shifting to the apple's version of libc when using the -fsanitize=address flag.
If anyone can explain this to me I would very grateful.
btw, just in case, this is my configuration of gcc:
➜ gcc --version
Configured with:
--prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.12.sdk/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.38)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
-fsanitize=address forces your binary to link against Asan runtime which overloads a lot of standard functions (including strcmp). Overloading is done to check input arguments to these functions. Asan implementations are generally standard-compliant but don't follow all the nits for a particular platform so this may be the reason for differences that you see.
I've been trying to make the switch to LLVM, since I'd like to get more into the whole 'software-dev' scene, and it seems like right now, LLVM is the future. I built LLVM/Clang/LLD/compiler-rt/libcxx from source several times now, both with GNU/GCC and LLVM/Clang.
The problem appears when I try to use the newly compiled compilers. From what I can see, clang is using GNU ld rather than LLVM's lld. Is this true?
LLD seems to be a very limited program from the lld -help output, but from what I have read, it is as full featured as ld. I cannot find documentation on how to use it anywhere -- does anyone know where I can find some kind of comprehensive manual on it?
Thank you.
Pass -fuse-ld=lld to clang to make it use lld for linking. By now, it's in very good shape.
You can pass -v or -### to clang to make it print which linker command it runs or would run.
There's no manual for the moment and depending on platform may work well enough for you. That said, if lld were "production ready" we'd have switched clang to using it by default for the various platforms. It's not there yet so I wouldn't suggest you use it for your day to day development.
The LLVM team say that is production ready because FreeBSD can compile and link a lot of things with LLD.
The documentation on the LLD project can be found on http://lld.llvm.org/.
It's written :
LLD is a drop-in replacement for the GNU linkers.
That accepts the same command line arguments and linker scripts as GNU.
So you can use same arguments than GNU LD.
I know this question is old, but there is a newer solution to it:
To use the ld.lld linker when building any llvm target, just pass -DLLVM_ENABLE_LLD=ON in the commandline to cmake.
//Use lld as C and C++ linker.
LLVM_ENABLE_LLD:BOOL=TRUE
For other cmake projects, pass: -DCMAKE_LINKER=/etc/bin/ld.lld
I am relatively new to Xcode. We are testing an app that displays incoming data and it needs to be as fast as possible. With other platforms I need to change from "debug" to "release" in order for optimizations to kick in and debug code to be removed, which can have a profound effect on speed. What are the equivalent things I need to do in Xcode to build in fast/release mode?
(I am googling this and see lots of hits that seem to be in the general vicinity but I might be a little thrown off by the terminology, I might need it dumbed down a bit :))
Thanks for the help.
The first step is to set the Optimization Level for release as described above. There are lots of options here. From the clang LLVM compiler man page (man cc) -- (note that -Os is the default for Release):
Code Generation Options
-O0 -O1 -O2 -O3 -Ofast -Os -Oz -O -O4
Specify which optimization level to use:
-O0 Means "no optimization": this level compiles the fastest and
generates the most debuggable code.
-O1 Somewhere between -O0 and -O2.
-O2 Moderate level of optimization which enables most
optimizations.
-O3 Like -O2, except that it enables optimizations that take longer
to perform or that may generate larger code (in an attempt to
make the program run faster).
-Ofast
Enables all the optimizations from -O3 along with other
aggressive optimizations that may violate strict compliance
with language standards.
-Os Like -O2 with extra optimizations to reduce code size.
-Oz Like -Os (and thus -O2), but reduces code size further.
-O Equivalent to -O2.
-O4 and higher
Currently equivalent to -O3
You will notice the 'Ofast' option -- very fast, somewhat risky.
A second step is to consider whether to enable "Unroll Loops". I've read that this can in some code lead to a 15% speed increase (at the expense of debugging, but not an issue for Release builds).
Next, consider whether you want to Build and use an Optimization Profile. See Apple for details, but the gist is that:
Profile Guided Optimization (PGO) is a means to improve compiler
optimization of an app. PGO utilizes a specially instrumented build of
the app to generate profile information about the most commonly used
code paths and methods. The compiler then uses this profile
information to focus optimization efforts on the most frequently used
code, taking advantage of the extra information about how the program
typically behaves to do a better job of optimization.
You define the profile and whether you use it under Build Settings -> Apple LLVM 6.0 - Code Generation -> Use Optimization Profile.
First have a look at this part in Xcode (screenshot of Xcode 5 but same on Xcode 6)
You should also prefer PNG to Jpeg (as Jpeg requires more calculation - but are generally smaller in terms of size so better for network...)
Finally, Use multi-threading.
Those are (to mu humble opinion) the first steps to look at.
Edit the scheme to use release configuration.
I don't understand how LLVM JIT relates to normal no JIT compilation and the documentation isn't good.
For example suppose I use the clang front end:
Case 1: I compile C file to native with clang/llvm. This flow I understand is like gcc flow - I get my x86 executable and that runs.
Case 2: I compile into some kind of LLVM IR that runs on LLVM JIT. In this case the executable contains the LLVM runtime to execute the IR on JIT, or how does it work?
What is the difference between these two and are they correct? Does LLVM flow include support for both JIT and non JIT? When do I want to use JIT - does it make sense at all for a language like C?
You have to understand that LLVM is a library that helps you build compilers. Clang is merely a frontend for this library.
Clang translates C/C++ code into LLVM IR and hands it over to LLVM, which compiles it into native code.
LLVM is also able to generate native code directly in memory, which then can be called as a normal function. So case 1. and 2. share LLVM's optimization and code generation.
So how does one use LLVM as a JIT compiler? You build an application which generates some LLVM IR (in memory), then use the LLVM library to generate native code (still in memory). LLVM hands you back a pointer which you can call afterwards. No clang involved.
You can, however, use clang to translate some C code into LLVM IR and load this into your JIT context to use the functions.
Real World examples:
Unladen Swallow Python VM
Rubinius Ruby VM
There is also the Kaleidoscope tutorial which shows how to implement a simple language with JIT compiler.
First, you get LLVM bytecode (LLVM IR):
clang -emit-llvm -S -o test.bc test.c
Second, you use LLVM JIT:
lli test.bc
That runs the program.
Then, if you wish to get native, you use LLVM backend:
llc test.bc
From the assembly output:
as test.S
I am taking the steps to compile and run the JIT'ed code from a mail message in LLVM community.
[LLVMdev] MCJIT and Kaleidoscope Tutorial
Header file:
// foo.h
extern void foo(void);
and the function for a simple foo() function:
//foo.c
#include <stdio.h>
void foo(void) {
puts("Hello, I'm a shared library");
}
And the main function:
//main.c
#include <stdio.h>
#include "foo.h"
int main(void) {
puts("This is a shared library test...");
foo();
return 0;
}
Build the shared library using foo.c:
gcc foo.c -shared -o libfoo.so -fPIC
Generate the LLVM bitcode for the main.c file:
clang -Wall -c -emit-llvm -O3 main.c -o main.bc
And run the LLVM bitcode through jit (and MCJIT) to get the desired output:
lli -load=./libfoo.so main.bc
lli -use-mcjit -load=./libfoo.so main.bc
You can also pipe the clang output into lli:
clang -Wall -c -emit-llvm -O3 main.c -o - | lli -load=./libfoo.so
Output
This is a shared library test...
Hello, I'm a shared library
Source obtained from
Shared libraries with GCC on Linux
Most compilers have a front end, some middle code/structure of some sort, and the backend. When you take your C program and use clang and compile such that you end up with a non-JIT x86 program that you can just run, you have still gone from frontend to middle to backend. Same goes for gcc, gcc goes from frontend to a middle thing and a backend. Gccs middle thing is not wide open and usable as is like LLVM's.
Now one thing that is fun/interesting about llvm, that you cannot do with others, or at least gcc, is that you can take all of your source code modules, compile them to llvms bytecode, merge them into one big bytecode file, then optimize the whole thing, instead of per file or per function optimization you get with other compilers, with llvm you can get any level of partial to compilete program optimization you like. then you can take that bytecode and use llc to export it to the targets assembler. I normally do embedded so I have my own startup code that I wrap around that but in theory you should be able to take that assembler file and with gcc compile and link it and run it. gcc myfile.s -o myfile. I imagine there is a way to get the llvm tools to do this and not have to use binutils or gcc, but I have not taken the time.
I like llvm because it is always a cross compiler, unlike gcc you dont have to compile a new one for each target and deal with nuances for each target. I dont know that I have any use for the JIT thing is what I am saying I use it as a cross compiler and as a native compiler.
So your first case is the front, middle, end and the process is hidden from you you start with source and get a binary, done. The second case is if I understand right the front and the middle and stop with some file that represents the middle. Then the middle to end (the specific target processor) can happen just in time at runtime. The difference there is the backend, the real time execution of the middle language of case two, is likely different than the backend of case one.