I am modifying OpenMP source Code, and I want to make sure that it is indeed working.
For example the following code:
#include "omp.h"
int main()
{
int i=0;
#pragma omp parallel for schedule(dynamic,4)
for(i=0;i<1000;++i)
{
int x = 4+i;
}
}
Should call __kmpc_dispatch_init_4()
And I have verified this by using emit-llvm option in clang.
To double check I added the following print statement:
void __kmpc_dispatch_init_4(ident_t *loc, kmp_int32 gtid,
enum sched_type schedule, kmp_int32 lb,
kmp_int32 ub, kmp_int32 st, kmp_int32 chunk) {
KMP_DEBUG_ASSERT(__kmp_init_serial);
printf("%s\n", "Hello OpenMP");
#if OMPT_SUPPORT && OMPT_OPTIONAL
OMPT_STORE_RETURN_ADDRESS(gtid);
#endif
__kmp_dispatch_init<kmp_int32>(loc, gtid, schedule, lb, ub, st, chunk, true);
}
But when I am compiling the code, I am not getting this output on terminal.
I am compiling the code like this:
llvm_build/bin/clang sc.c -L/usr/local/lib -fopenmp
And after building the openmp source code the output of sudo make install is this:
Install the project...
-- Install configuration: "Release"
-- Up-to-date: /usr/local/lib/libomp.so
-- Up-to-date: /usr/local/include/omp.h
-- Up-to-date: /usr/local/include/omp-tools.h
-- Up-to-date: /usr/local/include/ompt.h
-- Up-to-date: /usr/local/lib/libomptarget.so
-- Up-to-date: /usr/local/lib/libomptarget.rtl.x86_64.so
To cross check if this is used I used:
ldd a.out
And the output is:
linux-vdso.so.1 (0x00007fff25bb6000)
libomp.so => /usr/local/lib/libomp.so (0x00007f75a52c6000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f75a50a7000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f75a4cb6000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f75a4ab2000)
/lib64/ld-linux-x86-64.so.2 (0x00007f75a558a000)
So I am assuming it should be using the code that I have modified.
But not able to see the output of the print statement.
I have also tried using fprintf(stderr, ....)
But doesn't work.
Thank you in advance.
First, you should flush the output using fflush(stdout) after the printf to enforce the line to be printed in the console.
Moreover, you should also ensure that /usr/local/lib/libomp.so is the good one by just checking the modification date of the file (with the stat command).
If the loaded file is not the good one you can force it with the LD_LIBRARY_PATH, LIBRARY_PATH and LD_PRELOAD environment variables.
If this is not sufficient, you can use the nm or objdump tools to check the dynamic symbols: the function should be undefined in your program and provided by the modified LLVM OpenMP runtime.
PS: If none of this works, I strongly advise you to build your program and the OpenMP runtime library with debugging (and tracing) information so you can use debuggers like gdb in order to track function calls.
Related
Windows 10, Ryzen 3700x, gcc 8.1.0 (Posix, SEH-enabled)
I am building clang, llvm, and compiler-rt (the PGO tools) from source. I have downloaded the clang+llvm source for 14.0.0, and built it successfully with the following:
cmake -G "MinGW Makefiles" -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt" -DCMAKE_BUILD_TYPE=Release -DLLVM_TARGETS_TO_BUILD=X86 ../llvm
After this, I can invoke clang, and build projects, ranging from simple "Hello, World", to a much more complex one. I am able to make use of -flto, with the addition of -fuse-ld=lld.
However, if I attempt to do ANY sort of PGO building, I fail. For example, here is the minimal example to demonstrate the problem.
Andrew#Ryzen3700x MINGW64 ~/Desktop
$ cat test.c
#include <stdio.h>
int main() {
printf("Hello, World!");
return 0;
}
Andrew#Ryzen3700x MINGW64 ~/Desktop
$ clang -fprofile-instr-generate test.c
Andrew#Ryzen3700x MINGW64 ~/Desktop
$ ./a.exe
Hello, World!
Andrew#Ryzen3700x MINGW64 ~/Desktop
$ llvm-profdata merge -output=test.profdata default.profraw
warning: default.profraw: malformed instrumentation profile data
error: no profile can be merged
Andrew#Ryzen3700x MINGW64 ~/Desktop
$ llvm-profdata show default.profraw
error: default.profraw: malformed instrumentation profile data: function name is empty
I am aware of many answers to this question, and none of them seem to apply. My .profraw file is not empty.
I will note that when I installed LLVM/Clang directly (not building on my own), the PGO portions DID work. However, after many hours I could not resolve linking issues regarding -flto.
I have built libstdc++ with no modifications yet:
cd gccsrcdir/libstdc++-v3/build
../configure --prefix=$PWD/../install
make && make install
I am using Ubuntu 21.10 and I set the following environment variables:
export LIBRARY_PATH=gccsrcdir/libstdc++-v3/install/lib
export LD_LIBRARY_PATH=gccsrcdir/libstdc++-v3/install/lib
export CPLUS_INCLUDE_PATH=gccsrcdir/libstdc++-v3/install/include/c++/13.0.0
When I then use the system's GCC, I get no problems. When I use the system's Clang, it produces a symbol lookup error - even with no parameters:
clang++
clang++: symbol lookup error: /lib/x86_64-linux-gnu/libicuuc.so.67: undefined symbol: _ZSt15__once_callable, version GLIBCXX_3.4.11
In fact I only need to update LD_LIBRARY_PATH to arrive here. What am I doing wrong?
The symbol -- std::__once_callable is defined in your system libstdc++.so.6 (it has version GLIBCXX_3.4.11 in my build, which means it was added in GCC-4.4.0).
Your build of libstdc++.so.6 should define this symbol as well, but for some reason does not. That is a problem -- any binary which uses this symbol will fail at runtime when using your build of libstdc++.so.6 (which is happening because you've pointed LD_LIBRARY_PATH to it).
Note: in your case it's the clang++ binary that is failing to run -- any flags you add to it (such as -femulated-tls) are irrelevant -- they only affect the binary that would have been generated IF clang++ itself didn't fail.
I just repeated your configure && make steps, and the library built this way also doesn't define this symbol.
I then repeated the configure && make, but starting from top-level GCC directory, and libstdc++.so.6 built that way does define the symbol.
Conclusion: libstdc++ is configured differently during "normal" GCC build.
The definition comes from mutex.o, which is built from ./libstdc++-v3/src/c++11/mutex.cc, and which has this chunk of code:
#ifdef _GLIBCXX_HAS_GTHREADS
namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
#ifdef _GLIBCXX_HAVE_TLS
__thread void* __once_callable;
__thread void (*__once_call)();
...
So it sounds like either _GLIBCXX_HAS_GTHREADS or _GLIBCXX_HAVE_TLS is not defined when doing configure && make in the libstdc++-v3 directly.
Digging further, I see that libstdc++-v3 determines _GLIBCXX_HAS_GTHREADS by trying to compile #include "gthr.h", and that file is available in libgcc/gthr.h, but not in "standard" installed GCC.
../libstdc++-v3/configure && grep _GLIBCXX_HAS_GTHREADS config.h
/* #undef _GLIBCXX_HAS_GTHREADS */
TL;DR: correctly configuring libstdc++.so is complicated, and you will be better off building complete GCC.
Once you have a complete build, you will have a libstdc++-v3 directory properly configured, and can just rebuilt in that directory:
grep _GLIBCXX_HAS_GTHREADS ./x86_64-pc-linux-gnu/libstdc++-v3/config.h
#define _GLIBCXX_HAS_GTHREADS 1
Setup
I have a simple helloworld program:
// content of main.c
#include <stdio.h>
#include <limits.h>
int main() {
for (int i = 0; i < INT_MAX; ++i) {
printf("simply helloworld!\n");
}
return 0;
}
I compile a baseline version with clang 13.0.0 using clang -flto=thin -fvisibility=hidden -fuse-ld=lld main.c
To experiment with CFI, I compile another version using clang -flto=thin -fsanitize=cfi -fsanitize-cfi-cross-dso -fno-sanitize-cfi-canonical-jump-tables -fsanitize-trap=cfi -fvisibility=hidden -fuse-ld=lld main.c
Expectation
I am expecting negligible performance overhead as I am only calling into a shared library that I expect will run the same code for both. The disassembly for main function for both binaries look the same.
Reality
The baseline version completes execution in ~27s while the cfi version completes execution in ~32s. Using perf stat -e instructions <binary> I can see that the cfi version runs ~100,000,000,000 more instructions. With perf record then perf diff, I can see that the difference is primarily in two functions _pthread_cleanup_push_defer and _pthread_cleanup_pop_restore that the cfi version runs. Using gdb, these functions are called as the call stack of printf gets deeper.
Question
How do I begin to explain the performance difference between these two binaries? What makes a simple call to printf call two different versions of itself for two different binaries?
I am trying to compile two *.c files to LLVM bitcode via clang, link them together using llvm-link, and make a single *.wasm file out of it. I built LLVM on my machine via the Makefile provided by https://github.com/yurydelendik/wasmception
This works fine until I use memcpy in the C code. Then llvm-link stops with error:
Intrinsic has incorrect argument type!
void (i8*, i8*, i32, i1)* #llvm.memcpy.p0i8.p0i8.i32
The following is a minimal example to reproduce the issue:
one.c
#define EXPORT __attribute__((visibility("default")))
#include <string.h>
char* some_str();
EXPORT void do_something() {
char* cpy_src = some_str();
char other_str[15];
memcpy(other_str, cpy_src, strlen(cpy_src));
}
two.c
char* some_str() {
return "Hello World";
}
Execute the following commands:
$ clang --target=wasm32-unknown-unknown-wasm --sysroot=../wasmception/sysroot -S -emit-llvm -nostartfiles -fvisibility=hidden one.c -o one.bc
[...]
$ clang --target=wasm32-unknown-unknown-wasm --sysroot=../wasmception/sysroot -S -emit-llvm -nostartfiles -fvisibility=hidden two.c -o two.bc
[...]
Note that no optimization is done because that would eliminate the unnecessary memcpy call here. As I said, this is a minimal example out of context to show the error.
$ llvm-link one.bc two.bc -o res.bc -v
Loading 'one.bc'
Linking in 'one.bc'
Loading 'two.bc'
Linking in 'two.bc'
Intrinsic has incorrect argument type!
void (i8*, i8*, i32, i1)* #llvm.memcpy.p0i8.p0i8.i32
llvm-link: error: linked module is broken!
When I comment out the memcpy call in the example file, the error is gone. Of course this is not an option in the real project I am working at.
Am I doing something wrong? Is it a bad idea in general to use memcpy in a WebAssembly context? Can this be a bug in LLVM/Clang?
Reading through these github issues, it seems the memcpy intrinsic is not currently supported by the WASM backend:
https://github.com/WebAssembly/design/issues/236
https://github.com/WebAssembly/design/issues/1003
As a workaround, you could instruct clang to disable intrinsic expansion using -fno-builtin, so that the generated code will call the actual memcpy function.
I need to have armadillo (current version is 5.100.1) available as a local library within $HOME (cluster application, and can't install on every compute node, but $HOME is shared folder). I'm using cmake to manage the application, and have been able to get cmake to link to local libraries in $HOME (e.g., boost) rather than elsewhere just fine. Armadillo needs BLAS and LAPACK, although it can use (and in fact advises as such) OpenBLAS for both. However, I don't understand how to force armadillo to use OpenBLAS even when its own cmake .configure confirms that it has found OpenBLAS. Here is the output from running ./configure on a pristine armadillo folder:
$ ./configure
-- Configuring Armadillo 5.100.1
-- CMAKE_SYSTEM_NAME = Linux
-- CMAKE_CXX_COMPILER_ID = GNU
-- CMAKE_CXX_COMPILER_VERSION = 4.9.1
-- CMAKE_COMPILER_IS_GNUCXX = 1
-- Found MKL libraries: /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_rt.so
-- Found OpenBLAS: /home/rolf/lib/libopenblas.so
-- Found BLAS: /usr/lib64/libblas.so
-- Found LAPACK: /usr/lib64/liblapack.so
-- MKL_FOUND = YES
-- ACMLMP_FOUND = NO
-- ACML_FOUND = NO
-- OpenBLAS_FOUND = YES
-- ATLAS_FOUND = NO
-- BLAS_FOUND = YES
-- LAPACK_FOUND = YES
--
-- *** If the MKL or ACML libraries are installed in non-standard locations such as
-- *** /opt/intel/mkl, /opt/intel/composerxe/, /usr/local/intel/mkl
-- *** make sure the run-time linker can find them.
-- *** On Linux systems this can be done by editing /etc/ld.so.conf
-- *** or modifying the LD_LIBRARY_PATH environment variable.
--
-- *** On systems with SELinux enabled (eg. Fedora, RHEL),
-- *** you may need to change the SELinux type of all MKL/ACML libraries
-- *** to fix permission problems that may occur during run-time.
-- *** See README.txt for more information
--
-- Found ARPACK: /usr/lib64/libarpack.so
-- ARPACK_FOUND = YES
-- Could not find SuperLU
-- SuperLU_FOUND = NO
--
-- *** Armadillo wrapper library will use the following libraries:
-- *** ARMA_LIBS = /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_rt.so;/usr/lib64/libarpack.so
--
-- Detected gcc 4.8.3 or later. Added '-std=c++11' to compiler flags
-- Copying /home/rolf/work/pdefect/armadillo/include/ to /home/rolf/work/pdefect/armadillo/tmp/include/
-- Generating /home/rolf/work/pdefect/armadillo/tmp/include/config.hpp
-- Generating /home/rolf/work/pdefect/armadillo/examples/Makefile
-- CMAKE_CXX_FLAGS = -std=c++11 -O2
-- CMAKE_SHARED_LINKER_FLAGS = -Wl,--no-as-needed
-- CMAKE_REQUIRED_INCLUDES =
-- *** CMAKE_INSTALL_PREFIX was initalised by cmake to the default value of /usr/local
-- *** CMAKE_INSTALL_PREFIX changed to /usr
-- *** Detected 64 bit system
-- *** /usr/lib64/ exists, so destination directory for the run-time library changed to /usr/lib64/
-- *** Your system and/or compiler must search /usr/lib64/ during linking
-- CMAKE_INSTALL_PREFIX = /usr
-- INSTALL_LIB_DIR = /usr/lib64
-- INSTALL_INCLUDE_DIR = /usr/include
-- INSTALL_DATA_DIR = /usr/share
-- INSTALL_BIN_DIR = /usr/bin
-- Generating '/home/rolf/work/pdefect/armadillo/ArmadilloConfig.cmake'
-- Generating '/home/rolf/work/pdefect/armadillo/ArmadilloConfigVersion.cmake'
-- Generating '/home/rolf/work/pdefect/armadillo/InstallFiles/ArmadilloConfig.cmake'
-- Generating '/home/rolf/work/pdefect/armadillo/InstallFiles/ArmadilloConfigVersion.cmake'
-- Configuring done
-- Generating done
-- Build files have been written to: /home/rolf/work/pdefect/armadillo
So it succeeds in finding it in $HOME, but if I query the library's links to shared libraries after
$ cmake .
$ make
I see that it has linked to the login node's standard copies of BLAS and LAPACK, but made no use of OpenBLAS:
$ ldd libarmadillo.so.5.100.1
linux-vdso.so.1 => (0x00007fff05b9b000)
libmkl_rt.so => /opt/intel/composer_xe_2013_sp1.2.144/mkl/lib/intel64/libmkl_rt.so (0x00007f14d2558000)
libarpack.so.2 => /home/rolf/lib/libarpack.so.2 (0x00007f14d230a000)
libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00007f14d1fe9000)
libm.so.6 => /lib64/libm.so.6 (0x00007f14d1d64000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f14d1b4e000)
libc.so.6 => /lib64/libc.so.6 (0x00007f14d17ba000)
/lib64/ld-linux-x86-64.so.2 (0x0000003d83400000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f14d15b5000)
libblas.so.3 => /usr/lib64/libblas.so.3 (0x00007f14d135e000)
liblapack.so.3 => /usr/lib64/atlas/liblapack.so.3 (0x00007f14d0b3d000)
libgfortran.so.3 => /usr/lib64/libgfortran.so.3 (0x00007f14d084a000)
libf77blas.so.3 => /usr/lib64/atlas/libf77blas.so.3 (0x00007f14d062d000)
libcblas.so.3 => /usr/lib64/atlas/libcblas.so.3 (0x00007f14d040c000)
libatlas.so.3 => /usr/lib64/atlas/libatlas.so.3 (0x00007f14cfcfe000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f14cfae1000)
Unfortunately, libblas.so.3 and liblapack.so.3 are not available on the nodes:
$ ssh node01 'ldd /home/rolf/work/pdefect/armadillo/libarmadillo.so.5.100.1 | grep "not found" '
libblas.so.3 => not found
liblapack.so.3 => not found
How do I force armadillo to compile and link to my local copy of OpenBLAS, and not the standard copies of BLAS and LAPACK in /usr/lib64. There is a note in the faq that states that
* For Linux-based systems the automatic installer can figure out that
OpenBLAS, MKL, ACML or ATLAS are installed, and will use them instead of
the standard LAPACK and BLAS libraries. See README.txt within the Armadillo
archive for more information.
but from the above results, this does not seem to be the case. Can anyone point me to what I'm doing wrong here?
You can tell Armadillo to directly use whatever BLAS and LAPACK you want, as well as their location. You need to define ARMA_DONT_USE_WRAPPER before including the armadillo header, and then link with whatever BLAS and LAPACK you have.
For example:
g++ code.cpp -o code -O3 -DARMA_DONT_USE_WRAPPER -L/home/abc/libs -lmyblas -lmylapack
Substitute /home/abc/libs with the directory which has your libraries. Change -lmyblas -lmylapack to whatever library/libraries implement BLAS and LAPACK functions (for example: -lopenblas)
Bear in mind that the system linker will also need find your libraries. You may need to set the LD_LIBRARY_PATH environment variable. For example:
export LD_LIBRARY_PATH=/home/abc/libs:${LD_LIBRARY_PATH}
Alternatively, you can just link statically during compilation (see the -static switch in g++)