Using Valgrind in large, multimodule, fortran program - memory

I'm currently finishing up a development project in the Quantum Espresso (QE) code. It's a large (>100k loc) fortran package, with many modules all linked together. I'm trying to find a painful memory bug, and have turned to valgrind to find the issue. It appears to have found bugs but, unfortunately, the valgrind output below is less than clear on where exactly the error is, and it doesn't give any important line numbers.
==10233== Invalid write of size 8
==10233== at 0x100520518: __mbdvdw_module_MOD_mbdvdw_tgg_complex (in /Users/tmarkovich/bin/pw.x)
==10233== by 0x100555A88: __mbdvdw_module_MOD_mbdvdw_check_quantity_dh (in /Users/tmarkovich/bin/pw.x)
==10233== Address 0x103131248 is 8 bytes after a block of size 1,728 alloc'd
==10233== at 0x1011814AB: malloc (in /usr/local/Cellar/valgrind/HEAD/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==10233== by 0x1005547B0: __mbdvdw_module_MOD_mbdvdw_check_quantity_dh (in /Users/tmarkovich/bin/pw.x)
==10233== by 0x1003D56E4: v_of_rho_ (in /Users/tmarkovich/bin/pw.x)
==10233== by 0x1000F540B: electrons_scf_ (in /Users/tmarkovich/bin/pw.x)
==10233== by 0x1000F6E18: electrons_ (in /Users/tmarkovich/bin/pw.x)
==10233== by 0x10032AA28: run_pwscf_ (in /Users/tmarkovich/bin/pw.x)
==10233== by 0x1000010BB: MAIN__ (pwscf.f90:30)
==10233== by 0x100B67C1F: main (pwscf.f90:14)
Beyond this, addr2line gives thoroughly unhelpful output:
▶ gaddr2line -e pw.x 0x100520518
??:0.
Note, it looks like some of the debugging symbols exist because we see pwscf.f90:30, and we see stuff like "__mbdvdw_module_MOD_mbdvdw_tgg_complex" in the backtrace, but it'd be super useful to know which part of the function is causing the issue.
I have compiled QE given the following flags with gfortran 4.9:
FFLAGS = -Og -g -pg -fopenmp -fbacktrace -fcheck=all -finit-real=nan -ffpe-trap=zero,invalid,zero,overflow
Is there any way to compile QE such that it generates all the debugging symbols, so that I can get more readable and informative output from valgrind? I thought all I needed was the -g flag, but it appears that I might need more? Is there a way to get addr2line to give more informative output? what about valgrind?
It's also worth nothing that GDB has the same exact behaviour.
edit:
I made sure to put -g in both CFLAGS and LDFLAGS such that my make file looked like:
CFLAGS = -Og -g $(DFLAGS) $(IFLAGS)
F90FLAGS = $(FFLAGS) -x f95-cpp-input -fopenmp $(FDFLAGS) $(IFLAGS) $(MODFLAGS)
FFLAGS = -Og -g -pg -fopenmp -Wall -Wextra -Warray-temporaries -Wconversion -fbacktrace -ffree-line-length-0 -finit-real=nan -ffpe-trap=zero,invalid,zero,overflow
LD = mpif90
LDFLAGS = -g -pthread -fopenmp
LD_LIBS =
This resulted in a compilation statement of:
mpif90 -Og -g -pg -fopenmp -Wall -Wextra -Warray-temporaries -Wconversion -fbacktrace -ffree-line-length-0 -finit-real=nan -ffpe-trap=zero,invalid,zero,overflow -x f95-cpp-input -fopenmp -D__GFORTRAN -D__STD_F95 -D__FFTW -D__MPI -D__PARA -D__SCALAPACK -D__OPENMP -I../include -I../iotk/src -I../ELPA/src -I. -c mbdvdw.f90
Another run of valgrind with all the above info results in:
==30486== Invalid write of size 8
==30486== at 0x1002735ED: __mbdvdw_module_MOD_mbdvdw_tgg_complex (in /Users/tmarkovich/bin/pw.x)
==30486== by 0x100290F58: __mbdvdw_module_MOD_mbdvdw_check_quantity_dh (in /Users/tmarkovich/bin/pw.x)
==30486== Address 0x1037989f0 is 0 bytes after a block of size 1,728 alloc'd
==30486== at 0x10092B4AB: malloc (in /usr/local/Cellar/valgrind/HEAD/lib/valgrind/vgpreload_memcheck-amd64-darwin.so)
==30486== by 0x10028FEE2: __mbdvdw_module_MOD_mbdvdw_check_quantity_dh (in /Users/tmarkovich/bin/pw.x)
==30486== by 0x1001D4567: v_of_rho_ (in /Users/tmarkovich/bin/pw.x)
==30486== by 0x10007C0BE: electrons_scf_ (in /Users/tmarkovich/bin/pw.x)
==30486== by 0x10007D385: electrons_ (in /Users/tmarkovich/bin/pw.x)
==30486== by 0x10018B30B: run_pwscf_ (in /Users/tmarkovich/bin/pw.x)
==30486== by 0x100001157: MAIN__ (pwscf.f90:30)
==30486== by 0x1004EC496: main (pwscf.f90:14)

Related

How to use the installed (pre-compiled) drake as external with bazel?

I am working on a C++ project with drake, using bazel as the build system. Previously, I use the drake source code as the external, following the drake_bazel_external example. Everything works fine.
Since I want to use the SNOPT solver in drake, I want to change to use the pre-compiled drake. I follow the drake_bazel_installed example. However, I got the following errors.
Compiling kuka/diffIK_controller.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 27 arguments skipped)
Use --sandbox_debug to see verbose messages from the sandbox
In file included from bazel-out/k8-opt/bin/external/drake/_virtual_includes/.drake_headers/drake/common/default_scalars.h:3,
from bazel-out/k8-opt/bin/external/drake/_virtual_includes/.drake_headers/drake/systems/framework/leaf_system.h:14,
from ./kuka/diffIK_controller.h:3,
from kuka/diffIK_controller.cc:3:
bazel-out/k8-opt/bin/external/drake/_virtual_includes/.drake_headers/drake/common/autodiff.h:12:10: fatal error: Eigen/Core: No such file or directory
12 | #include <Eigen/Core>
| ^~~~~~~~~~~~
compilation terminated.
I also find that the apps in the drake_bazel_external cannot be compiled successfully by drake_bazel_installed setting. The error message is
ERROR: error loading package 'app': Label '#drake//tools/skylark:py.bzl' is invalid because 'tools/skylark' is not a package; perhaps you meant to put the colon here: '#drake//:tools/skylark/py.bzl'?
Update
The bug can be produced by both the http_archive fetched drake and the apt installed drake (the latest stable drake I think, since I just installed it yesterday). I have isolated the relevant code to reproduce the bug in a github repo. Currently, I can get the original apps in drake_bazel_installed to work.
Update
By adding
# solve the eigen not found bug
build --cxxopt=-I/usr/include/eigen3
to the .bazelrc file, I can solve the above problem. However, when I try to build a program that uses iiwa_status_receiver.h, I encounter a new problem.
ERROR: /home/chenwang/repro_drake_bazel_external/drake_bazel_installed/apps/BUILD.bazel:102:10: Compiling apps/connection_test.cc failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 32 arguments skipped)
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from apps/connection_test.cc:10:
bazel-out/k8-opt/bin/external/drake/_virtual_includes/.drake_headers/drake/manipulation/kuka_iiwa/iiwa_status_receiver.h:6:10: fatal error: drake/lcmt_iiwa_status.hpp: No such file or directory
6 | #include "drake/lcmt_iiwa_status.hpp"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
INFO: Elapsed time: 2.967s, Critical Path: 0.24s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully
This problem is also a missing header file problem. I have update the github repo to reproduce this problem.
This is a bug in Drake (filed as https://github.com/RobotLocomotion/drake/issues/17965 now).
To work around it, pass --cxxopt=-I/usr/include/eigen3 on all of your bazel commands, e.g., by adding this line to your projects' .bazelrc file:
build --cxxopt=-I/usr/include/eigen3
Edit: The nightly builds of Apt packages as of 20220923 should have this fixed as well.

How to use coverage analysis with Cython

I'm trying to run coverage analysis on some Cython code using pytest-cov and coveralls.io. I've got as far as building the extension modules with tracing enabled, and running the analysis with the help of the links below:
http://docs.cython.org/src/tutorial/profiling_tutorial.html
http://blog.behnel.de/posts/coverage-analysis-for-cython-modules.html
However, I'm getting some results that I can't explain. It seems that many of the def/cdef/cpdef lines in the code are showing as not running, despite code within them being OK. The results aren't even consistent as some lines seem OK.
Example report: https://coveralls.io/files/1871744040
I don't know if I'm calling something wrong, if this is a bug, or if I'm just not interpreting the results correctly.
In the example above, the get_cost method seems OK, but the __set__ method for the property above is not called, despite the lines within the function having been called.
Update: It seems the issue is with Cython classes. If the class is defined with def rather than cdef the problem goes away. I guess there isn't full support for this yet.
If the Cython tracing facility does not seem to work as intended, it should be possible to use gcov for the coverage analysis of cython code. This way one could verify if some line of the generated C code is executed or not.
With a simple main.pyx
import mymod
def main():
mymod.test()
and mymod.pyx
def test():
return 42
and then
cython --embed main.pyx
cython mymod.pyx
gcc -O1 -fPIC -fprofile-arcs -ftest-coverage -Wall -I/usr/include/python2.7 -c -o main.o main.c
gcc main.o -fprofile-arcs -lpython2.7 -lgcov -o main
gcc -O1 -fPIC -fprofile-arcs -ftest-coverage -Wall -I/usr/include/python2.7 -c -o mymod.o mymod.c
gcc -shared mymod.o -fprofile-arcs -lgcov -lpython2.7 -o mymod.so
an executable was created. After executing ./main main.gcda and mymod.gcda were created for gcov.

690mb memory overhead for openmp program compiled with ifort

I was running some tests using openmp and fortran and came to realize that a binary compiled with ifort 15 (15.0.0 20140723) has 690MB of virtual memory overhead.
My sample program is:
program sharedmemtest
use omp_lib
implicit none
integer :: nroot1
integer, parameter :: dp = selected_real_kind(14,200)
real(dp),allocatable :: matrix_elementsy(:,:,:,:)
!$OMP PARALLEL NUM_THREADS(10) SHARED(matrix_elementsy)
nroot1=2
if (OMP_GET_THREAD_NUM() == 0) then
allocate(matrix_elementsy(nroot1,nroot1,nroot1,nroot1))
print *, "after allocation"
read(*,*)
end if
!$OMP BARRIER
!$OMP END PARALLEL
end program
running
ifort -openmp test_openmp_minimal.f90 && ./a.out
shows a memory usage of
50694 user 20 0 694m 8516 1340 S 0.0 0.0 0:03.58 a.out
in top. Running
gfortran -fopenmp test_openmp_minimal.f90 && ./a.out
shows a memory usage of
50802 user 20 0 36616 956 740 S 0.0 0.0 0:00.98 a.out
Where is the 690MB of overhead coming from when compiling with ifort? Am I doing something wrong? Or is this a bug in ifort?
For completeness: This is a minimal example taken from a much larger program. I am using gfortran 4.4 (4.4.7 20120313).
I appreciate all comments and ideas.
I don't believe top is reliable here. I do not see any evidence that the binary created from your test allocates anywhere near that much memory.
Below I have shown the result of generating the binary normally, with the Intel libraries linked statically and with everything linked statically. The static binary is in the ballpark of 2-3 megabytes.
It is possible that OpenMP thread stacks, which I believe are allocated from the heap, could be the source of the addition virtual memory here. Can you try this test with OMP_STACKSIZE=4K? I think the default is a few megabytes.
Dynamic Executable
jhammond#cori11:/tmp> ifort -O3 -qopenmp smt.f90 -o smt
jhammond#cori11:/tmp> size smt
text data bss dec hex filename
748065 13984 296024 1058073 102519 smt
jhammond#cori11:/tmp> ldd smt
linux-vdso.so.1 => (0x00002aaaaaaab000)
libm.so.6 => /lib64/libm.so.6 (0x00002aaaaab0c000)
libiomp5.so => /opt/intel/parallel_studio_xe_2016.0.047/compilers_and_libraries_2016.0.109/linux/compiler/lib/intel64/libiomp5.so (0x00002aaaaad86000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002aaaab0c7000)
libc.so.6 => /lib64/libc.so.6 (0x00002aaaab2e4000)
libgcc_s.so.1 => /opt/gcc/5.1.0/snos/lib64/libgcc_s.so.1 (0x00002aaaab661000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002aaaab878000)
/lib64/ld-linux-x86-64.so.2 (0x0000555555554000)
Dynamic Executable with Static Intel
jhammond#cori11:/tmp> ifort -O3 -qopenmp smt.f90 -static-intel -o smt
jhammond#cori11:/tmp> size smt
text data bss dec hex filename
1608953 41420 457016 2107389 2027fd smt
jhammond#cori11:/tmp> ls -l smt
-rwxr-x--- 1 jhammond jhammond 1872489 Jan 12 05:51 smt
Static Executable
jhammond#cori11:/tmp> ifort -O3 -qopenmp smt.f90 -static -o smt
jhammond#cori11:/tmp> size smt
text data bss dec hex filename
2262019 43120 487320 2792459 2a9c0b smt
jhammond#cori11:/tmp> ldd smt
not a dynamic executable

How is -fPIE passed along to llc?

I'm working on an LLVM backend for a new architecture and we need to have position independent executables. I can pass '-fPIE' on the clang command line, but I don't see any indication of this show up in the resulting LLVM IR. For example, if I run:
clang -v -emit-llvm -fPIC -O0 -S global_dat.c -o global_dat_x86_pic.ll
And then take a look at the resulting global_dat_x86_pic.ll file, I see the following near the bottom:
!0 = !{i32 1, !"PIC Level", i32 2}
Ok, makes sense.
However if I run:
clang -v -emit-llvm -fPIE -O0 -S global_dat.c -o global_dat_x86_pie.ll
I see that the two .ll files are identical. Near the bottom of global_cat_x86_pie.ll I see:
!0 = !{i32 1, !"PIC Level", i32 2}
Which is identical to the case where I ran with -fPIE. There's no indication of "PIE Level" in the .ll file. If this .ll file were passed on to llc how would llc know that -fPIE had been set on the clang command line?
I have run in gdb and see that in fact in the second case with -fPIE on the clang commandline there is an Opts.PIELevel (in $LLVM_HOME/tools/clang/lib/Frontend/CompilerInvocation.cpp) that gets set to 2 (in fact, both Opts.PIELevel and Opts.PICLevel are set to 2 in that case whereas in -fPIC is passed to clang only Opts.PICLevel is set to 2)
This depends on your default target triple, which I can't tell from your question. You can see what happens if you cut off the default architecture (or run a native clang on an arch that does support PIE)
For example, bare X86-64 shows this,
$ clang -c hello.c -target x86_64 -fPIE -emit-llvm -###
if you run that command, you'll find "-pie-level" "2" in the output, which is how llc (well, it's internal equivalent) knows about it.
The key here is that you'll have to arrange for your backend to do something with this flag. Certain platforms (like Darwin) just ignore it. If you happen to be experiment on an OSX host, you won't see -pie-level in the bogbrush output.

Suppress Clang Warnings related to Thirdparty-Libraries

I get multiple Clang Warnings, that reside in the Thirdparty-Library named Boost, while compiling my Code.
How do i suppress these Warnings, so i only get Warnings from my own Code?
I'm using CMake to compile my Project and tried different Clang-Options, namely
-iystem /usr/include/boost
and
--system-header-prefix boost/
in
set( CMAKE_CXX_FLAGS ...
But i still get the Warnings related to Boost.
The affected Files are always .cpp-Files, so the CMAKE_CXX_FLAGS should take Effect.
If i activate verbose Makefiles with
set( CMAKE_VERBOSE_MAKEFILE 1 )
i get the following Output.
cd /tmp/Sandbox && /usr/lib/clang-analyzer/scan-build/c++-analyzer -march=native -std=c++11 -DBOOST_HAS_INT128=1 -DBOOST_ASIO_HAS_STD_CHRONO -pthread -DBOOST_LOG_DYN_LINK -Wall -march=native -std=c++11 -DBOOST_HAS_INT128=1
-DBOOST_ASIO_HAS_STD_CHRONO -pthread -DBOOST_LOG_DYN_LINK -g -ftemplate-backtrace-limit=0 -fmodules-prune-interval=5 -fPIE -fcolor-diagnostics -fno-omit-frame-pointer
-fsanitize=bool,bounds,float-cast-overflow,float-divide-by-zero,function,integer-divide-by-zero,nonnull-attribute,null,object-size,return,returns-nonnull-attribute,shift,signed-integer-overflow,unreachable,unsigned-integer-overflow,vla-bound -fstandalone-debug -fsanitize=address,leak -fsanitize-coverage=1
-isystem /usr/include/boost -isystem boost --system-header-prefix boost/ -isystem /usr/include/boost/
-I/usr/include/gtkmm-3.0 -I/usr/lib/gtkmm-3.0/include -I/usr/include/atkmm-1.6 -I/usr/include/gtk-3.0/unix-print -I/usr/include/gdkmm-3.0 -I/usr/lib/gdkmm-3.0/include -I/usr/include/giomm-2.4 -I/usr/lib/giomm-2.4/include -I/usr/include/pangomm-1.4 -I/usr/lib/pangomm-1.4/include -I/usr/include/glibmm-2.4 -I/usr/lib/glibmm-2.4/include -I/usr/include/gtk-3.0 -I/usr/include/at-spi2-atk/2.0 -I/usr/include/at-spi-2.0 -I/usr/include/dbus-1.0 -I/usr/lib/dbus-1.0/include -I/usr/include/gio-unix-2.0 -I/usr/include/cairo -I/usr/include/pango-1.0 -I/usr/include/atk-1.0 -I/usr/include/cairomm-1.0 -I/usr/lib/cairomm-1.0/include -I/usr/include/pixman-1 -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/harfbuzz -I/usr/include/libdrm -I/usr/include/sigc++-2.0 -I/usr/lib/sigc++-2.0/include -I/usr/include/gdk-pixbuf-2.0 -I/usr/include/glib-2.0 -I/usr/lib/glib-2.0/include -I/usr/include/jsoncpp -isystem /usr/include/opencv
-o CMakeFiles/density.dir/density.cpp.o -c /tmp/Sandbox/density.cpp
In file included from /tmp/Sandbox/serialcom.hpp:24:
In file included from /usr/include/boost/lexical_cast.hpp:32:
In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:35:
In file included from /usr/include/boost/lexical_cast/detail/converter_lexical.hpp:54:
/usr/include/boost/lexical_cast/detail/converter_lexical_streams.hpp:169:17: warning: Returned pointer value points outside the original object (potential buffer overflow)
return finish;
^~~~~~~~~~~~~
The -isystem respectively the --system-header-prefix Options are both set. I also tried different Variants as a consequence of no Effect beeing recognised with the previous Options.
-isystem /usr/include/boost -isystem boost --system-header-prefix boost/ -isystem /usr/include/boost/
But the Warnings are still there
In file included from /tmp/Sandbox/serialcom.hpp:24:
In file included from /usr/include/boost/lexical_cast.hpp:32:
In file included from /usr/include/boost/lexical_cast/try_lexical_convert.hpp:35:
In file included from /usr/include/boost/lexical_cast/detail/converter_lexical.hpp:54:
/usr/include/boost/lexical_cast/detail/converter_lexical_streams.hpp:169:17: warning: Returned pointer value points outside the original object (potential buffer overflow)
return finish;
^~~~~~~~~~~~~
Thanks in Advance!

Resources