OpenCV 4.2.0 with python3 and CUDA : Segfault on cv2.VideoCapture()

OpenCV 4.2.0 with python3 and CUDA : Segfault on cv2.VideoCapture() - docker

I encounter a problem with OpenCV that I have for several days now : it segfaults when calling the cv2.VideoCapture() function.
When launching my script (with GDB) :
extract-all_1 | Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
extract-all_1 | 0x00007f83857fe33b in bool pyopencv_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >(_object*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, ArgInfo const&) [clone .isra.1286] ()
extract-all_1 | from /usr/lib/python3/dist-packages/cv2/python-3.6/cv2.cpython-36m-x86_64-linux-gnu.so
extract-all_1 | (gdb) quit
When running my script without GDB, the container exits with code 139
I identified the problem occures when calling the "cv2.VideoCapture()" function :
def perform_video_extraction(video_path):
input_movie = cv2.VideoCapture(video_path)
nb_total_frames = int(input_movie.get(cv2.CAP_PROP_FRAME_COUNT))
[...]
Hints :
I process MP4 video files
I've tried compressing my videos that are >30fps to 25fps
I've tried with OpenCV 3.4.9, 4.1.0, 4.1.1, 4.1.2, 4.2.0 and 4.3.0 (pip install)
I've tried compiling OpenCV 4.2.0 and 4.3.0 from source
I've tried each version above successively with CUDA 10.0, 10.1 and 10.2 : each version for each case produces the same error
This segfault is not reproduced when using the CPU (non-cuda) version of OpenCV
Here is my Dockerfile (CUDA 10.2 with OpenCV 4.2.0 built from source) : https://pastebin.com/raw/a42wtcRG
Here is what the cmake summary build returns : https://pastebin.com/raw/SFPUakyL
My config :
Ubuntu 18.04
Nvidia Docker (CUDA 10.2, CUDNN 7, Ubuntu 18.04, devel)
Python 3.6
Have you any recommendation for debugging this problem ?
Thank you

I managed to debug the problem. Due to a stupid encoding issue.
Adding :
ENV LANG C.UTF-8
to my Dockerfile managed to make the container run (my original pastebin mentioned this line but after doublecheck, I didn't have it).
I was able to find out this idea because of this more accurate backtrace from GDB :
root#f42846d26d89:/opencv-4.2.0/build# gdb --args python3 -u /usr/app/scripts/extract.py
GNU gdb (Ubuntu 8.1-0ubuntu3.2) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/python3 -u /usr/app/scripts/extract.py
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[...]
Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
getUnicodeString (str="", obj=<optimized out>) at /opencv-4.2.0/modules/python/src2/pycompat.hpp:69
69 if (PyBytes_Check(bytes))
(gdb) backtrace
#0 0x00007f2959a1433b in getUnicodeString (str="", obj=<optimized out>) at /opencv-4.2.0/modules/python/src2/pycompat.hpp:69
#1 0x00007f2959a1433b in pyopencv_to<std::__cxx11::basic_string<char> >(PyObject*, cv::String&, ArgInfo const&) (obj=<optimized out>, value="", info=...)
at /opencv-4.2.0/modules/python/src2/cv2.cpp:731
#2 0x00007f2959dd6a2d in pyopencv_cv_VideoCapture_VideoCapture(pyopencv_VideoCapture_t*, PyObject*, PyObject*) (self=0x7f2965344190, args=0x7f296307c3c8, kw=0x0)
at /opencv-4.2.0/build/modules/python_bindings_generator/pyopencv_generated_types_content.h:21272
#3 0x0000000000551b81 in ()
#4 0x00000000005aa6ec in _PyObject_FastCallKeywords ()
#5 0x000000000050abb3 in ()
#6 0x000000000050c5b9 in _PyEval_EvalFrameDefault ()
#7 0x0000000000509d48 in ()
#8 0x000000000050aa7d in ()
#9 0x000000000050c5b9 in _PyEval_EvalFrameDefault ()
#10 0x0000000000508245 in ()
#11 0x000000000050b403 in PyEval_EvalCode ()
#12 0x0000000000635222 in ()
#13 0x00000000006352d7 in PyRun_FileExFlags ()
#14 0x0000000000638a8f in PyRun_SimpleFileExFlags ()
#15 0x0000000000639631 in Py_Main ()
#16 0x00000000004b0f40 in main ()
(gdb) list
64 {
65 bool res = false;
66 if (PyUnicode_Check(obj))
67 {
68 PyObject * bytes = PyUnicode_AsUTF8String(obj);
69 if (PyBytes_Check(bytes))
70 {
71 const char * raw = PyBytes_AsString(bytes);
72 if (raw)
73 {
(gdb)
/opencv-4.2.0 being my install path
It seems like my filenames were not in a right encoding format.
Finally, I specify that pip installing the python binding directly works perfectly fine now this modification has been brought.

Related

libserial is not detected in my dart programm

I made a minimalistic dart software that interfaces a serial port:
import 'package:libserialport/libserialport.dart';
import 'dart:typed_data';
void main(List<String> arguments) {
final port = SerialPort("/dev/pts/4");
if (!port.openReadWrite()) {
print(SerialPort.lastError);
}
port.write(Uint8List.fromList("Lorem Ipsum".codeUnits));
final reader = SerialPortReader(port);
reader.stream.listen((data) {
print('received: $data');
});
}
But once I run:
dart run
I get the following error:
Unhandled exception:
Invalid argument(s): Failed to load dynamic library 'libserialport.so': libserialport.so: cannot open shared object file: No such file or directory
#0 _open (dart:ffi-patch/ffi_dynamic_library_patch.dart:12:43)
#1 new DynamicLibrary.open (dart:ffi-patch/ffi_dynamic_library_patch.dart:23:12)
#2 dylib
package:libserialport/src/dylib.dart:32
#3 _SerialPortImpl._init.<anonymous closure>
package:libserialport/src/port.dart:221
#4 Util.call
package:libserialport/src/util.dart:37
#5 _SerialPortImpl._init
package:libserialport/src/port.dart:221
#6 new _SerialPortImpl
package:libserialport/src/port.dart:211
#7 new SerialPort
package:libserialport/src/port.dart:72
#8 main
bin/serial.dart:5
#9 _delayEntrypointInvocation.<anonymous closure> (dart:isolate-patch/isolate_patch.dart:295:32)
#10 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:192:12)
Exited (255)
I run it in a linux machine.

The problem as you see in the error message is that expects somewhere to be found the libserial.so library.
First and formemost we need to locate the library and see if it is installed:
ldconfig -p | grep libserial
At no output you can install it as you can see here. For linux mint and debian based distros run:
sudo apt-get install libserial0
Then re-run the command:
ldconfig -p | grep libserial
If the command after the installation has an output then you must check if the path is /lib/libserialport.so. In my case it was not:
libserialport.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libserialport.so.0
Sometimes after so in librarie's names a number may be appended. In that case we can symlink the libary to the /usr/lib path like that:
sudo ln -s /usr/lib/x86_64-linux-gnu/libserialport.so.0 /usr/lib/libserialport.so
Also, another case is if the library is not located to /usr/lib/libserial.so but in a subfolder inside the /usr/lib path.

The issue is the dynamic library is unable able to load from the system. The solution is
Install libserialport-dev package
for Debian based system use
sudo apt install libserialport-dev
Note: The dart package libserialport uses FFI to access the native API. Under the hood it uses libserialport a minimal, cross-platform shared library written in C. The package actually contain the c code. when we build or run the application it's supposed to be compiled it to libserial.so, but due to some reason this doesn't happen when we use this package directly.
If you use flutter framework use flutter_libserialport package, this take care of creating the .so file without any issue

Puppeteer and Docker for Mac (Apple M1)

Chromium is crashing when opened via puppeteer navigation with the following stack trace on my M1. I'm looking for some help from the community as non m1 based machines don't seem to have an issue with our puppeteer container.
[0613/204124.018517:ERROR:stack_trace_posix.cc(707)] Failed to parse the contents of /proc/self/maps
[0613/204124.746267:ERROR:stack_trace_posix.cc(707)] Failed to parse the contents of /proc/self/maps
[0613/204124.751355:ERROR:stack_trace_posix.cc(707)] Failed to parse the contents of /proc/self/maps
[0613/204124.981155:FATAL:nacl_helper_linux.cc(440)] Check failed: nacl_sandbox->IsSingleThreaded().
qemu: uncaught target signal 5 (Trace/breakpoint trap) - core dumped
[130:130:0613/204125.140482:FATAL:zygote_main_linux.cc(162)] Check failed: sandbox::ThreadHelpers::IsSingleThreaded().
#0 0x0040072b9339 <unknown>
#1 0x00400722ff23 <unknown>
#2 0x00400722d070 <unknown>
#3 0x00400722dc6e <unknown>
#4 0x004006dae926 <unknown>
#5 0x004006da973e <unknown>
#6 0x004006daa369 <unknown>
#7 0x004006dab0cb <unknown>
#8 0x004006da838e <unknown>
#9 0x004006da8d4e <unknown>
#10 0x0040036e1227 <unknown>
#11 0x00400faba0b3 <unknown>
#12 0x0040036e102a <unknown>
Crash keys:
"switch-7" = "--enable-crashpad"
"switch-6" = "--change-stack-guard-on-fork=enable"
"switch-5" = "--user-data-dir=/tmp/puppeteer_dev_chrome_profile-5BphEe"
"switch-4" = "--enable-crash-reporter=,"
"switch-3" = "--crashpad-handler-pid=117"
"switch-2" = "--enable-crashpad"
"switch-1" = "--no-sandbox"
"num-switches" = "8"
qemu: uncaught target signal 5 (Trace/breakpoint trap) - core dumped
[112:138:0613/204125.830241:ERROR:file_path_watcher_inotify.cc(329)] inotify_init() failed: Function not implemented (38)
[0613/204125.946536:ERROR:scoped_ptrace_attach.cc(27)] ptrace: Function not implemented (38)
Assertion failed: p_rcu_reader->depth != 0 (/qemu/include/qemu/rcu.h: rcu_read_unlock: 101)
TROUBLESHOOTING: https://github.com/puppeteer/puppeteer/blob/main/docs/troubleshooting.md
Error scraping url: <my - url>:
Error: Unable to launch chrome
I'm launching puppeteer with the following options:
const args = [
'--no-first-run',
'--no-sandbox',
'--disable-setuid-sandbox',
'--single-process',
'--disable-dev-shm-usage',
'--ignore-certificate-errors',
'--ignore-urlfetcher-cert-requests',
'--disable-blink-features=AutomationControlled'
];
And I'm installing chromium into by ubuntu based container via:
# Install Chrome for Ubuntu
RUN apt-get update \
&& apt-get install -y chromium-browser
ENV PUPPETEER_SKIP_CHROMIUM_DOWNLOAD true
ENV PUPPETEER_EXECUTABLE_PATH /usr/bin/chromium

Spent all day trying to fix it. In the end running docker with colima solved the issue. The thing is that some stuff simply will not work on arm and there is no point in fighting it. Colima allows you to run everything via Rosetta, hence emulating x86 all the way.
First install colima with brew install colima.
Shut down docker desktop if it's running.
Start colima with colima start --arch aarch64 --vm-type=vz (check readme for more info)
Now you have docker running fully via Rossetta. When running images, use "--platform linux/amd64".

perl 5.26.2 crashes with SIGSEGV (exit code 139) in ubuntu container

I have a confusing perl behaviour. From time to time (several times a day) it crashes when running inside ubuntu 18.04 kubernetes container with SIGSEGV (exit code 139). A coredump reveals some strange error like below:
Type "apropos word" to search for commands related to "word"...
Reading symbols from /opt/mettools/bin/perl...done.
[New LWP 2218]
[New LWP 1]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `perl /opt/mettools/bin/s4p-server-http.pl -a'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 __GI_getenv (name=0x7f34b8d9b226 "", name#entry=0x7f34b8d9b224 "TZ") at getenv.c:84
84 getenv.c: No such file or directory.
[Current thread is 1 (Thread 0x7f34b79ab700 (LWP 2218))]
(gdb) bt
#0 __GI_getenv (name=0x7f34b8d9b226 "", name#entry=0x7f34b8d9b224 "TZ") at getenv.c:84
#1 0x00007f34b8cba0db in tzset_internal (always=1) at tzset.c:378
#2 __tzset () at tzset.c:552
#3 0x00005650223b41ef in Perl_localtime64_r ()
#4 0x00005650223698c5 in Perl_pp_gmtime ()
#5 0x000056502230c5b6 in Perl_runops_standard ()
#6 0x000056502228586f in Perl_call_sv ()
#7 0x00007f34b9aba93b in S_jmpenv_run () from /opt/mettools/lib/5.26.2/x86_64-linux-thread-multi/auto/threads/threads.so
#8 0x00007f34b9abab9d in S_ithread_run () from /opt/mettools/lib/5.26.2/x86_64-linux-thread-multi/auto/threads/threads.so
#9 0x00007f34b97b86db in start_thread (arg=0x7f34b79ab700) at pthread_create.c:463
#10 0x00007f34b8d0788f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Does anybody have an idea where to look at for that kind of error? The perl script is an API server running in the container. Thanks to kubernetes nature it restarts the container every time the perl crashes (5-6 failures a day), but of course it's annoying to see such a buggy system. :)
The system is:
ubuntu 18.04
perl 5.26.2 (from anaconda conda-forge channel)
base module for the server is HTTP::Server::Simple::CGI 0.52
I don't know if it relates or just a coincidence, but running the same docker container in a similar docker swarm infrastructure helps to increase the life time of the container up to several weeks and sometimes months. Pls share your thoughts!

Ruby 1.9 segmentation fault on exit

I'm getting crazy here. I am just trying to deploy a little test application on my server, and I keep getting a segmentation fault whenever the Ruby interpreter quits (i.e., after running rake assets:precompile, or quitting the console, etc). Just running
script/rails r -e production "puts 1"
will lead to
1
[BUG] Segmentation fault
ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]
-- C level backtrace information -------------------------------------------
/usr/local/rvm/rubies/ruby-1.9.3-p0/lib/libruby.so.1.9 [0x7f92fd681f25]
/usr/local/rvm/rubies/ruby-1.9.3-p0/lib/libruby.so.1.9 [0x7f92fd55608c]
/usr/local/rvm/rubies/ruby-1.9.3-p0/lib/libruby.so.1.9(rb_bug+0xb8) [0x7f92fd556208] error.c:277
/usr/local/rvm/rubies/ruby-1.9.3-p0/lib/libruby.so.1.9 [0x7f92fd60db77]
/lib/libpthread.so.0 [0x7f92fd2e5a80]
/lib/libselinux.so.1 [0x7f92f0db831d]
/lib/libselinux.so.1 [0x7f92f0dab57f]
-- Other runtime information -----------------------------------------------
Segmentation fault
Note that the scripts/Rake tasks, whatever do run, but on exit → segmentation fault.
The weird thing is, this 1. happens only on my server (Debian 5.0 (Lenny), rvm, Ruby 1.9.2 or 1.9.3) and only in production mode.
So naturally I thought: some production group gem is causing this, and I tried switching off the production group in my Gemfile,
as well as switching production <-> development gems. No change; every time Ruby quits → segmentation fault.
What can I do? What are some debugging tips to get to the root of this? The backtrace for this segmentation fault leaves me without absolutely any hint.
(I tried removing Ruby, and recompiling, and I tried 1.9.2-p290 and 1.9.3, but the same result.)
Okay, backtrace time:
#0 rb_string_value (ptr=0x5a8) at string.c:1406
1406 VALUE s = *ptr;
(gdb) where
#0 rb_string_value (ptr=0x5a8) at string.c:1406
#1 0x00007f3c5b619428 in rb_string_value_cstr (ptr=0x5a8) at string.c:1424
#2 0x00007f3c5b6708cc in rb_vm_bugreport () at vm_dump.c:826
#3 0x00007f3c5b549f1c in report_bug (file=<value optimized out>, line=<value optimized out>, fmt=0x7f3c5b69e88b "Segmentation fault", args=0x66cd40) at error.c:258
#4 0x00007f3c5b54a098 in rb_bug (fmt=0x7f3c5b69e88b "Segmentation fault") at error.c:277
#5 0x00007f3c5b5fe037 in sigsegv (sig=<value optimized out>, info=<value optimized out>, ctx=<value optimized out>) at signal.c:609
#6 <signal handler called>
#7 0x00007f3c4e6fa18d in fini_context_translations () at setrans_client.c:211
#8 0x00007f3c4e6ed5df in __do_global_dtors_aux () from /lib/libselinux.so.1
#9 0x0000000000400850 in setlocale#plt ()
#10 0x00007fffffffdec0 in ?? ()
#11 0x00007f3c4e6fb991 in _fini () from /lib/libselinux.so.1
#12 0x000000000000005f in ?? ()
#13 0x00007f3c5b933d94 in ?? () from /lib64/ld-linux-x86-64.so.2
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Okay, I got it!
It's actually bug #505920 - python-clutter: ends with segmentation fault in libselinux, which has been fixed a while ago, but the updates didn't make it into Debian 5.0 (Lenny)(?).
I took the liberty and installed the libselinux1 deb from Debian 6.0 (Squeeze). I'm not sure if this is actually a good idea, but at least the problem is gone.

Corrupted stack backtrace on Solaris

Could someone explain why the following corrupted stack trace can occur?
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/lib/libxnet.so.1...done.
Loaded symbols for /usr/lib/libxnet.so.1
Reading symbols from /usr/lib/libsocket.so.1...done.
Loaded symbols for /usr/lib/libsocket.so.1
Reading symbols from /usr/lib/libnsl.so.1...done.
Loaded symbols for /usr/lib/libnsl.so.1
Reading symbols from /usr/lib/libxml2.so.2...done.
Loaded symbols for /usr/lib/libxml2.so.2
Reading symbols from /opt/csw/lib/libiconv.so.2...done.
Loaded symbols for /opt/csw/lib/libiconv.so.2
Reading symbols from /usr/lib/libcrypt_i.so.1...done.
Loaded symbols for /usr/lib/libcrypt_i.so.1
Reading symbols from /usr/lib/libpthread.so.1...
warning: Lowest section in /usr/lib/libpthread.so.1 is .dynamic at 00000074
done.
Loaded symbols for /usr/lib/libpthread.so.1
Reading symbols from /usr/lib/libm.so.2...done.
Loaded symbols for /usr/lib/libm.so.2
Reading symbols from /usr/lib/librt.so.1...done.
Loaded symbols for /usr/lib/librt.so.1
Reading symbols from /usr/lib/libc.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libc.so.1
Reading symbols from /usr/lib/libz.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libz.so.1
Reading symbols from /usr/lib/libgen.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libgen.so.1
Reading symbols from /usr/lib/libaio.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libaio.so.1
Reading symbols from /usr/lib/libmd.so.1...done.
warning: rw_common (): unable to read at addr 0x0
warning: sol_thread_new_objfile: td_ta_new: Debugger service failed
Loaded symbols for /usr/lib/libmd.so.1
#0 0xfeb3487a in _malloc_unlocked () from /usr/lib/libc.so.1
(gdb) bt
#0 0xfeb3487a in _malloc_unlocked () from /usr/lib/libc.so.1
#1 0x210b5a68 in ?? ()
#2 0xfec0e5d0 in signames () from /usr/lib/libc.so.1
#3 0xfec0d000 in _sys_cldlist () from /usr/lib/libc.so.1
#4 0x08046a28 in ?? ()
#5 0xfeb34704 in _malloc_unlocked () from /usr/lib/libc.so.1
#6 0x00002008 in ?? ()
#7 0x210b5a68 in ?? ()
#8 0x21151b70 in ?? ()
#9 0xfeeda3b0 in ?? () from /usr/lib/libxml2.so.2
#10 0x08046a3c in ?? ()
#11 0xfee03c42 in xmlBufferCreateSize () from /usr/lib/libxml2.so.2
Previous frame inner to this frame (corrupt stack?)
The core occurs from a process built on x86 machine.
If the backtrace is performed on the machine executing the process, the backtrace is perfect, with full
frame information.
However if I do the backtrace with the core on the build machine (a different machine), I the trace above.
One obvious thing I considered was different patch level on the OS
One has 5.10 Generic_138889-03(execution machine) and the other has 5.10 Generic_138889-02 (build machine)
So the rev number is off.
Would this be the reason? Or what else could it be?
Anything I can do to see full frame information to allow me to examine core memory in more detail?
Would appreciate any thoughts.
Thanks.

Make sure that you have on the build machine completely the same set of shared libraries as on the computer that is executing the process. If this is not the case copy all shared libraries that are used by your process from the working computer to a folder on the build machine, set LD_LIBRARY_PATH to this folder, start gdb and run bt again.
The full list of relevant shared libraries you can get with the info sharedlibraries command in gdb on the computer that is executing the process.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart