Corrupted stack backtrace on Solaris

Corrupted stack backtrace on Solaris - stack

Could someone explain why the following corrupted stack trace can occur?
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libxnet.so.1...done.
Loaded symbols for /usr/lib/libxnet.so.1
Reading symbols from /usr/lib/libsocket.so.1...done.
Loaded symbols for /usr/lib/libsocket.so.1
Reading symbols from /usr/lib/libnsl.so.1...done.
Loaded symbols for /usr/lib/libnsl.so.1
Reading symbols from /usr/lib/libxml2.so.2...done.
Loaded symbols for /usr/lib/libxml2.so.2
Reading symbols from /opt/csw/lib/libiconv.so.2...done.
Loaded symbols for /opt/csw/lib/libiconv.so.2
Reading symbols from /usr/lib/libcrypt_i.so.1...done.
Loaded symbols for /usr/lib/libcrypt_i.so.1
Reading symbols from /usr/lib/libpthread.so.1...
warning: Lowest section in /usr/lib/libpthread.so.1 is .dynamic at 00000074
done.
Loaded symbols for /usr/lib/libpthread.so.1
Reading symbols from /usr/lib/libm.so.2...done.
Loaded symbols for /usr/lib/libm.so.2
Reading symbols from /usr/lib/librt.so.1...done.
Loaded symbols for /usr/lib/librt.so.1
Reading symbols from /usr/lib/libc.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/lib/libz.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libgen.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libgen.so.1
Reading symbols from /usr/lib/libaio.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libaio.so.1
Reading symbols from /usr/lib/libmd.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libmd.so.1
#0 0xfeb3487a in _malloc_unlocked () from /usr/lib/libc.so.1
(gdb) bt
#0 0xfeb3487a in _malloc_unlocked () from /usr/lib/libc.so.1
#1 0x210b5a68 in ?? ()
#2 0xfec0e5d0 in signames () from /usr/lib/libc.so.1
#3 0xfec0d000 in _sys_cldlist () from /usr/lib/libc.so.1
#4 0x08046a28 in ?? ()
#5 0xfeb34704 in _malloc_unlocked () from /usr/lib/libc.so.1
#6 0x00002008 in ?? ()
#7 0x210b5a68 in ?? ()
#8 0x21151b70 in ?? ()
#9 0xfeeda3b0 in ?? () from /usr/lib/libxml2.so.2
#10 0x08046a3c in ?? ()
#11 0xfee03c42 in xmlBufferCreateSize () from /usr/lib/libxml2.so.2
Previous frame inner to this frame (corrupt stack?)
The core occurs from a process built on x86 machine.
If the backtrace is performed on the machine executing the process, the backtrace is perfect, with full
frame information.
However if I do the backtrace with the core on the build machine (a different machine), I the trace above.
One obvious thing I considered was different patch level on the OS
One has 5.10 Generic_138889-03(execution machine) and the other has 5.10 Generic_138889-02 (build machine)
So the rev number is off.
Would this be the reason? Or what else could it be?
Anything I can do to see full frame information to allow me to examine core memory in more detail?
Would appreciate any thoughts.
Thanks.

Make sure that you have on the build machine completely the same set of shared libraries as on the computer that is executing the process. If this is not the case copy all shared libraries that are used by your process from the working computer to a folder on the build machine, set LD_LIBRARY_PATH to this folder, start gdb and run bt again.
The full list of relevant shared libraries you can get with the info sharedlibraries command in gdb on the computer that is executing the process.

Related

hexagon-sim WARNING: DRIL in dril.cpp:6843: Unable to load symbols from exe file

Can you tell me how to solve it?
"hexagon-sim INFO: The rev_id used in the simulation is 0x00004060 (v60a_512)
hexagon-sim WARNING: uiLoadElfSymfile in dril.cpp:5902: Cannot find string table in ELF file
hexagon-sim WARNING: DRIL in dril.cpp:6843: Unable to load symbols from exe file "

OpenCV 4.2.0 with python3 and CUDA : Segfault on cv2.VideoCapture()

I encounter a problem with OpenCV that I have for several days now : it segfaults when calling the cv2.VideoCapture() function.
When launching my script (with GDB) :
extract-all_1 | Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
extract-all_1 | 0x00007f83857fe33b in bool pyopencv_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(_object*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, ArgInfo const&) [clone .isra.1286] ()
extract-all_1 | from /usr/lib/python3/dist-packages/cv2/python-3.6/cv2.cpython-36m-x86_64-linux-gnu.so
extract-all_1 | (gdb) quit
When running my script without GDB, the container exits with code 139
I identified the problem occures when calling the "cv2.VideoCapture()" function :
def perform_video_extraction(video_path):
input_movie = cv2.VideoCapture(video_path)
nb_total_frames = int(input_movie.get(cv2.CAP_PROP_FRAME_COUNT))
[...]
Hints :
I process MP4 video files
I've tried compressing my videos that are >30fps to 25fps
I've tried with OpenCV 3.4.9, 4.1.0, 4.1.1, 4.1.2, 4.2.0 and 4.3.0 (pip install)
I've tried compiling OpenCV 4.2.0 and 4.3.0 from source
I've tried each version above successively with CUDA 10.0, 10.1 and 10.2 : each version for each case produces the same error
This segfault is not reproduced when using the CPU (non-cuda) version of OpenCV
Here is my Dockerfile (CUDA 10.2 with OpenCV 4.2.0 built from source) : https://pastebin.com/raw/a42wtcRG
Here is what the cmake summary build returns : https://pastebin.com/raw/SFPUakyL
My config :
Ubuntu 18.04
Nvidia Docker (CUDA 10.2, CUDNN 7, Ubuntu 18.04, devel)
Python 3.6
Have you any recommendation for debugging this problem ?
Thank you

I managed to debug the problem. Due to a stupid encoding issue.
Adding :
ENV LANG C.UTF-8
to my Dockerfile managed to make the container run (my original pastebin mentioned this line but after doublecheck, I didn't have it).
I was able to find out this idea because of this more accurate backtrace from GDB :
root#f42846d26d89:/opencv-4.2.0/build# gdb --args python3 -u /usr/app/scripts/extract.py
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/python3 -u /usr/app/scripts/extract.py
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[...]
Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
getUnicodeString (str="", obj=<optimized out>) at /opencv-4.2.0/modules/python/src2/pycompat.hpp:69
69 if (PyBytes_Check(bytes))
(gdb) backtrace
#0 0x00007f2959a1433b in getUnicodeString (str="", obj=<optimized out>) at /opencv-4.2.0/modules/python/src2/pycompat.hpp:69
#1 0x00007f2959a1433b in pyopencv_to<std::__cxx11::basic_string<char> >(PyObject*, cv::String&, ArgInfo const&) (obj=<optimized out>, value="", info=...)
at /opencv-4.2.0/modules/python/src2/cv2.cpp:731
#2 0x00007f2959dd6a2d in pyopencv_cv_VideoCapture_VideoCapture(pyopencv_VideoCapture_t*, PyObject*, PyObject*) (self=0x7f2965344190, args=0x7f296307c3c8, kw=0x0)
at /opencv-4.2.0/build/modules/python_bindings_generator/pyopencv_generated_types_content.h:21272
#3 0x0000000000551b81 in ()
#4 0x00000000005aa6ec in _PyObject_FastCallKeywords ()
#5 0x000000000050abb3 in ()
#6 0x000000000050c5b9 in _PyEval_EvalFrameDefault ()
#7 0x0000000000509d48 in ()
#8 0x000000000050aa7d in ()
#9 0x000000000050c5b9 in _PyEval_EvalFrameDefault ()
#10 0x0000000000508245 in ()
#11 0x000000000050b403 in PyEval_EvalCode ()
#12 0x0000000000635222 in ()
#13 0x00000000006352d7 in PyRun_FileExFlags ()
#14 0x0000000000638a8f in PyRun_SimpleFileExFlags ()
#15 0x0000000000639631 in Py_Main ()
#16 0x00000000004b0f40 in main ()
(gdb) list
64 {
65 bool res = false;
66 if (PyUnicode_Check(obj))
67 {
68 PyObject * bytes = PyUnicode_AsUTF8String(obj);
69 if (PyBytes_Check(bytes))
70 {
71 const char * raw = PyBytes_AsString(bytes);
72 if (raw)
73 {
(gdb)
/opencv-4.2.0 being my install path
It seems like my filenames were not in a right encoding format.
Finally, I specify that pip installing the python binding directly works perfectly fine now this modification has been brought.

perl 5.26.2 crashes with SIGSEGV (exit code 139) in ubuntu container

I have a confusing perl behaviour. From time to time (several times a day) it crashes when running inside ubuntu 18.04 kubernetes container with SIGSEGV (exit code 139). A coredump reveals some strange error like below:
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/mettools/bin/perl...done.
[New LWP 2218]
[New LWP 1]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `perl /opt/mettools/bin/s4p-server-http.pl -a'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __GI_getenv (name=0x7f34b8d9b226 "", name#entry=0x7f34b8d9b224 "TZ") at getenv.c:84
84 getenv.c: No such file or directory.
[Current thread is 1 (Thread 0x7f34b79ab700 (LWP 2218))]
(gdb) bt
#0 __GI_getenv (name=0x7f34b8d9b226 "", name#entry=0x7f34b8d9b224 "TZ") at getenv.c:84
#1 0x00007f34b8cba0db in tzset_internal (always=1) at tzset.c:378
#2 __tzset () at tzset.c:552
#3 0x00005650223b41ef in Perl_localtime64_r ()
#4 0x00005650223698c5 in Perl_pp_gmtime ()
#5 0x000056502230c5b6 in Perl_runops_standard ()
#6 0x000056502228586f in Perl_call_sv ()
#7 0x00007f34b9aba93b in S_jmpenv_run () from /opt/mettools/lib/5.26.2/x86_64-linux-thread-multi/auto/threads/threads.so
#8 0x00007f34b9abab9d in S_ithread_run () from /opt/mettools/lib/5.26.2/x86_64-linux-thread-multi/auto/threads/threads.so
#9 0x00007f34b97b86db in start_thread (arg=0x7f34b79ab700) at pthread_create.c:463
#10 0x00007f34b8d0788f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Does anybody have an idea where to look at for that kind of error? The perl script is an API server running in the container. Thanks to kubernetes nature it restarts the container every time the perl crashes (5-6 failures a day), but of course it's annoying to see such a buggy system. :)
The system is:
ubuntu 18.04
perl 5.26.2 (from anaconda conda-forge channel)
base module for the server is HTTP::Server::Simple::CGI 0.52
I don't know if it relates or just a coincidence, but running the same docker container in a similar docker swarm infrastructure helps to increase the life time of the container up to several weeks and sometimes months. Pls share your thoughts!

After running CDE I can't build sample polymer app in dart editor

file:/C:/Users/Dave/dart/td2/build.dart
build.dart returned error code 255
Unhandled exception:
Uncaught Error: FileSystemException: Cannot open file, path = 'C:\Users\Dave\dart\td2\packages\args\args.dart' (OS Error: The system cannot find the file specified., errno = 2)
Stack Trace:
#0 _File.open.<anonymous closure> (dart:io/file_impl.dart:349)
#1 _RootZone.runUnary (dart:async/zone.dart:1151)
Dart editor was working fine with a polymer app that I was developing previously.
I have rebooted, reinstalled dart editor with no luck
Running Pub Get manually fails as well as it can't find files either. Evrrything was working prior to trying CDE and the designer.

I think running pub cache repair (run in command line interface) should be able to fix this.

fail to build Opencv in windows using code:block

Following this link. I am stuck in the last two steps during setting up my workstation.
As a compiler I use Code:Block MinGW; I have already generated the compiled opencv files, now I need to build the sln file in Windows. I use Code Block to build this OpenCV Project File in the path D:\OpenCV\Build\Opencv, where I put the generated bin file after using Cmake.
During the building, it stopped at 40%, saying;
Linking CXX executable ....\bin\opencv_perf_core.exe
c:/codeblock/codeblocks/mingw/bin/../lib/gcc/mingw32/4.4.1/../../../../mingw32/bin/ld.exe:
warning: auto-importing has been activated without
--enable-auto-import specified on the command line. This should work unless it involves constant data structures referencing symbols from
auto-imported DLLs. Cannot export _ZN12_GLOBAL__N_13ROp3allEv: symbol
not found Cannot export _ZN12_GLOBAL__N_17CmpType3allEv: symbol not
found collect2: ld returned 1 exit status mingw32-make.exe[2]: *
[bin/opencv_perf_core.exe] Error 1 mingw32-make.exe1: *
[modules/core/CMakeFiles/opencv_perf_core.dir/all] Error 2
mingw32-make.exe: * [all] Error 2 Info: resolving vtable for
cv::_OutputArray by linking to imp_ZTVN2cv12_OutputArrayE
(auto-import) Info: resolving vtable for cv::_InputArray by linking to
imp_ZTVN2cv11_InputArrayE (auto-import) Info: resolving vtable for cv::Exception by linking to imp_ZTVN2cv9ExceptionE (auto-import)
Creating library file: ....\bin\libopencv_perf_core.dll.a Process
terminated with status 2 (14 minutes, 29 seconds) 0 errors, 3 warnings
How can I solve this problem?

Unfortunately, there's not much you can do, according to http://code.opencv.org/issues/2523.
You will have to use a recent version of MinGW. It builds fine using the latest MinGW shipping with GCC 4.7.2.
This issue seems to have been introduced in OpenCV 2.4.3 as it is said version 2.4.2 builds fine.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart