Setup
I have a simple helloworld program:
// content of main.c
#include <stdio.h>
#include <limits.h>
int main() {
for (int i = 0; i < INT_MAX; ++i) {
printf("simply helloworld!\n");
}
return 0;
}
I compile a baseline version with clang 13.0.0 using clang -flto=thin -fvisibility=hidden -fuse-ld=lld main.c
To experiment with CFI, I compile another version using clang -flto=thin -fsanitize=cfi -fsanitize-cfi-cross-dso -fno-sanitize-cfi-canonical-jump-tables -fsanitize-trap=cfi -fvisibility=hidden -fuse-ld=lld main.c
Expectation
I am expecting negligible performance overhead as I am only calling into a shared library that I expect will run the same code for both. The disassembly for main function for both binaries look the same.
Reality
The baseline version completes execution in ~27s while the cfi version completes execution in ~32s. Using perf stat -e instructions <binary> I can see that the cfi version runs ~100,000,000,000 more instructions. With perf record then perf diff, I can see that the difference is primarily in two functions _pthread_cleanup_push_defer and _pthread_cleanup_pop_restore that the cfi version runs. Using gdb, these functions are called as the call stack of printf gets deeper.
Question
How do I begin to explain the performance difference between these two binaries? What makes a simple call to printf call two different versions of itself for two different binaries?
Related
The following short c example uses the standard c library and therefore requires the wasi sdk:
#include <stdio.h>
int main(void)
{
puts("Hello");
return 0;
}
When compiling the code directly with clang to wasm it works without problem:
clang --target=wasm32-unknown-wasi -s -o example.wasm example.c
My understanding of the LLVM tool chain is that I could achieve the same result with either
clang -> LLVM IR (.ll) -> LLVM native object files (.o) -> convert to wasm
clang -> LLVM native object files (.o) -> convert to wasm
I am able to use the second approach with a simple C program which does not use standard lib calls, when trying with the example above I receive a undefined symbol error:
clang --target=wasm32-unknown-wasi -c example.c
wasm-ld example.o -o example.wasm --no-entry --export-all
wasm-ld: error: example.o: undefined symbol: puts
I do not know if my problem is that I use the wrong clang parameters and therefore not export enough information or that the error is in the wasm-ld command.
Would be happy if someone could give me more insight into tool chain, thanks
I am facing a weird problem when compiling a Python extension featuring OpenMP with Clang.
Minimal Example
I managed to boil down my actual problem to the following code:
The Python extension could not be much simpler, while still featuring OpenMP.
Apart from the function bar, this is mostly standard boilerplate:
# include <Python.h>
static PyObject * bar(PyObject *self)
{
#pragma omp parallel sections
{
#pragma omp section
{float x=42.0; x+=1;}
}
Py_RETURN_NONE;
}
static PyMethodDef foo_methods[] = {
{"bar", (PyCFunction) bar, METH_NOARGS, NULL},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef moduledef = {
PyModuleDef_HEAD_INIT, "foo", NULL, -1,
foo_methods, NULL, NULL, NULL, NULL
};
PyMODINIT_FUNC PyInit_foo(void)
{
return PyModule_Create(&moduledef);
}
With the above being foo.c, I compile and load this with:
clang -fPIC -fopenmp -I/usr/include/python3.7m -c foo.c -o foo.o
clang -shared foo.o -o foo.so -lgomp
python3 -c "import foo"
The last line, i.e., the import of the module throws the following error:
ImportError: /home/wrzlprmft/…/foo.so: undefined symbol: __kmpc_for_static_fini
What I found out so far
This does not happen when replacing Clang with GCC.
This does not happen with regular shared libraries (not involving Python).
Using Setuptools to compile the extension does not help.
(In fact, my compile commands are a reduction of what Setuptools does to find out whether it uses any non-essential compiler extensions that cause this.)
All of this happens on Ubuntu 19.10 with Python 3.7, Clang 9.0.0-2, and GCC 9.2.1.
I can also replicate the problem on current Arch Linux with Python 3.8 and Clang 9.0.1.
This worked until a year ago, probably longer.
Using Python 3.6 does not help.
Using Clang 3.8, 4.0, 6.0, 7, or 8 does not help.
Here, somebody reports a similar problem when trying to compile TensorFlow.
This is yet unsolved.
Question
What is going wrong here and how can I fix this?
Right now I do not even have an idea whether this is an error by me, in Clang, OpenMP, or Python.
I am trying to compile two *.c files to LLVM bitcode via clang, link them together using llvm-link, and make a single *.wasm file out of it. I built LLVM on my machine via the Makefile provided by https://github.com/yurydelendik/wasmception
This works fine until I use memcpy in the C code. Then llvm-link stops with error:
Intrinsic has incorrect argument type!
void (i8*, i8*, i32, i1)* #llvm.memcpy.p0i8.p0i8.i32
The following is a minimal example to reproduce the issue:
one.c
#define EXPORT __attribute__((visibility("default")))
#include <string.h>
char* some_str();
EXPORT void do_something() {
char* cpy_src = some_str();
char other_str[15];
memcpy(other_str, cpy_src, strlen(cpy_src));
}
two.c
char* some_str() {
return "Hello World";
}
Execute the following commands:
$ clang --target=wasm32-unknown-unknown-wasm --sysroot=../wasmception/sysroot -S -emit-llvm -nostartfiles -fvisibility=hidden one.c -o one.bc
[...]
$ clang --target=wasm32-unknown-unknown-wasm --sysroot=../wasmception/sysroot -S -emit-llvm -nostartfiles -fvisibility=hidden two.c -o two.bc
[...]
Note that no optimization is done because that would eliminate the unnecessary memcpy call here. As I said, this is a minimal example out of context to show the error.
$ llvm-link one.bc two.bc -o res.bc -v
Loading 'one.bc'
Linking in 'one.bc'
Loading 'two.bc'
Linking in 'two.bc'
Intrinsic has incorrect argument type!
void (i8*, i8*, i32, i1)* #llvm.memcpy.p0i8.p0i8.i32
llvm-link: error: linked module is broken!
When I comment out the memcpy call in the example file, the error is gone. Of course this is not an option in the real project I am working at.
Am I doing something wrong? Is it a bad idea in general to use memcpy in a WebAssembly context? Can this be a bug in LLVM/Clang?
Reading through these github issues, it seems the memcpy intrinsic is not currently supported by the WASM backend:
https://github.com/WebAssembly/design/issues/236
https://github.com/WebAssembly/design/issues/1003
As a workaround, you could instruct clang to disable intrinsic expansion using -fno-builtin, so that the generated code will call the actual memcpy function.
I'm trying to get line information for an instruction.
I have
const CallInst* callInst = dyn_cast<const CallInst>(&*I);
MDNode *N = callInst->getMetadata("dbg");
N is evidently NULL, but I have compiled the input IR with "clang -g -S -emit-llvm"
Does anyone know why this might be the case?
Probably your instruction doesn't correspond to any statement of the source program and thus has no debug metadata.
For example it was generated by one or another optimization as passing -emit-llvm not only emits llvm, but applies bundle of optimizations to your program first.
To exclude optimizations influence and to see the pure code just after the front-end do clang -g -S -emit-llvm -mllvm -disable-llvm-optzns and ensure your instruction has the metadata required.
Summary: When I set the -mcmodel=large flag when compiling with clang my application segfaults when accessing thread local storage. This does not happen when compiling with gcc. Is this a bug with clang or something I am doing wrong?
Details:
The following code segment crashes when compiled with clang when setting the -mcmodel flag, but it runs fine when compiled with gcc
#include <stdio.h>
#include <pthread.h>
__thread int tlsTest;
int main(int argc, char **argv) {
printf("&tlsTest is %p\n", &tlsTest);
tlsTest = argc;
printf("tlsTest is %d\n", tlsTest);
return 0;
}
When I compile with: clang test.c -pthread -mcmodel=large the result is:
&tlsTest is 0x7fd24262c6fc
Segmentation fault (core dumped)
But with: gcc test.c -pthread -mcmodel=large the result is:
&tlsTest is 0x7f1cf785c6fc
tlsTest is 1
The program also works fine when compiled with: clang test.c -pthread
I read the following link about mcmodel but I'm not sure how this relates to the segfault that I've observed. Note that the problem occurs for -mcmodel=medium also, but not for -mcmodel=small.
Is this a bug with clang/llvm or is it a different interpretation of the standard or some unimplemented feature?
Also my system is Ubuntu 12.04. My version of gcc is 4.6.3 and the version of clang/llvm that I tested is a recent snapshot of 3.3 development, and I also tested with clang 3.2.