failed using cuda-gdb walking-through example - cuda-gdb

I am trying the walking-through example in CUDA-GDB manual, and follow exactly the same compilation command. I am using CUDA-4 on Fermi M2090, and CUDA-GDB failed with the following message when I type "run" under GDB environment:
/home/buildmeister/build/rel/gpgpu/toolkit/r4.1/debugger/cuda-gdb/7.2/gdb/cuda-tdep.c:1203: internal-error: cuda_get_bfd_abi_version: Assertion `CUDA_ELFOSABIV_16BIT <= abiv && abiv <= CUDA_ELFOSABIV_LATEST' failed.
A problem internal to GDB has been detected,
further debugging may prove unreliable.

I have experienced same problem. (Kepler architecture, ubuntu 13.04) I have done some research and found out this link.
The problem occurs because of your driver version is higher than your toolkit version. Your toolkit isn't able to recognize your driver. I have solved this problem by installing Cuda-Toolkit-5.5 (Release Candidate) and display driver from the same self extracting package.
I have done this because it is almost impossible to install cuda toolkit 5.0 on a kernel 3.8+.
You can find the instructions here on my blog page.

Related

How to tackle pipeline slow down due to `mypy --install-types`

Problem summary
When running mypy on my code, I keep getting many Library stubs not installed errors.
Few examples below:
/opt/conda/envs/my_ci/lib/python3.9/site-packages/ray/core/generated/agent_manager_pb2.py:5: error: Library stubs not installed for "google.protobuf.internal.enum_type_wrapper" (or incompatible with Python 3.9)
/opt/conda/envs/my_ci/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py:8: error: Library stubs not installed for "six.moves" (or incompatible with Python 3.9)
/opt/conda/envs/my_ci/lib/python3.9/site-packages/ray/rllib/algorithms/algorithm.py:28: error: Library stubs not installed for "pkg_resources" (or incompatible with Python 3.9)
Currently I have to use the command
mypy --install-types --non-interactive my_folder --config-file=mypy.ini.
And although it solves the issue, the problem now is that it takes at least 2 min for installing missing types. This is very long for our CI/CD pipeline.
Question
What are alternative ways of addressing missing library stubs? E.g., such that I could maybe 'split' mypy install types (or other solution that is more time-consuming and potentially can be put as part of docker image), from pure run mypy command (that takes less time and runs as part of gitlab ci/cd pipeline).
I tried running mypy --install-types command first, and then run mypy without success. Could it be that I am doing something wrong?
I will appreciate any help and ideas!

Problems with updating pfgplots inside docker with tds file structure

I have a docker image with texlive installed (via apt not tlmgr). I have a pgfplot in my project which needs a newer pgfplot version. I'm searching for ways to update my pgplots because I can't update it with tlmgr because of base install via apt.
Initial error message if I try to compile with texlive 2014:
! Package pgfkeys Error: Choice '1.16' unknown in choice key '/pgfplots/compat/
anchors'. I am going to ignore this key.
See the pgfkeys package documentation for explanation.
Type H <return> for immediate help.
...
l.7 \pgfplotsset{compat=1.16}
?
! Emergency stop.
...
l.7 \pgfplotsset{compat=1.16}
I downloaded the pgfplots.tds and did the following steps like the manual said:
docker cp pgfplots.tds docker_container_name:/root/texmf/pgfplots
export TEXINPUTS=/root/texmf/pgfplots/tex//:
export TEXDOCS=/root/texmf/pgfplots/doc//:
export LUAINPUTS=/root/texmf/pgfplots//:
texhash
Of course the export and texhash were done inside the container and not on the host system.
After this, the error message is gone, but I have a new issue:
package pgfplots notification 'compat/show suggested version=true': you might b
enefit from \pgfplotsset{compat=1.18} (current compat level: 1.16).
! Illegal parameter number in definition of \pgfmaththisrow#.
<to be read again>
I searched online and got the response that this is because of a broken pgfplots installation. In many articles the fix was just to install the texlive new. But I can't do that.
The issue should also not be in the tex code itself. If I install texlive on my host system, which is the most recent Ubuntu distro, the tex compiles just fine.
Can somebody help me in fixing this or lead me to a better way of upgrading pgfplots?
Resolution:
The pgfplots package 1.18.1 and also 1.16 were to recent. It had conflicts with the pgf package. I tried to go further back and landed on \pgfplotsset{1.14} and version 1.14 of pgfplots.tds.
This works fine now. I was probably pretty lucky that my plot looks and functions the same with this version as in 1.18.
This approach probably won't work for you if your more bound to version 1.18.

Installing MRPT on Fedora

Can anyone provide a detailed procedure for installing MRPT on Fedora 33 Scientific (one of the Fedora Labs which has a KDE interface)? The MRPT installation instructions for Ubuntu mentions something about cmake/cmake-gui. Checking the man pages, F33Sci has no such thing. It must be possible to accomplish this somehow, because Fedora Robotics Lab includes MRPT. I've already tried "$sudo dnf install mrpt", resulting in "Error: Unable to find a match: mrpt". However, "$dnf search mrpt" results in a bunch of items from mrpt-base... to mrpt-stereo-camera-calibration.
The version of MRPT that ships with Fedora is really outdated, so you do well in building from sources.
cmake-gui is not 100% required, it is only mentioned in the instructions to make things easier to those preferring GUIs, but you should be able to compile using the console commands here (that is, the standard workflow with cmake).
Next, the CMake configuration process will warn you about missing libraries. Most are optional, but at least make sure of installing eigen3, opencv and wxwidgets. Those should be installed with the standard commands used in Fedora...

Building tensorflow 2.2.0 pip wheel file, for use in CentOS system (older libc)

Introduction:
I have to create a pip wheel of Tensorflow 2.2.0 with cuda libraries dynamically linked(specifically cudart.so). To accomplish this i am currently using the tensorflow-dev docker image.
I am able to build the tf wheel file, an able to install and use it while inside the build container.
Issue:
The issue is that importing the generated wheel file in a CentOS server, i get the following error:
ImportError: /lib64/libm.so.6: version `GLIBC_2.27' not found (required by /home1/private/mavridis/Vineyard/tensorflowshared/test/lib64/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so)
Having looked around, the issue is caused by the build container using a newer libc:
ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27
Compared to CentOS older version:
ldd --version
ldd (GNU libc) 2.17
Expected behavior:
Having already tried the 'vanilla' tenorflow 2.2.0 version with no issues, installed using pip:
pip install tensorflow==2.2.0
I expected my own build to also work.
So i assume there is some configuration option or docker configuration to allow me to use the docker built wheel file, in a CentOS setup, just like the pip installed version. As this wheel file is intended to be deployed to setups beyond my control, solutions involving alternate OSes and/or libc replacement are not applicable.
Build configuration:
During build i use the following configuration/ command line:
export TF_NEED_CUDA=1
export TF_USE_XLA=0
export TF_SET_ANDROID_WORKSPACE=0
export TF_NEED_OPENCL_SYCL=0
export TF_NEED_ROCM=0
bazel build --config=opt --config=cuda --output_filter=DONT_MATCH_ANYTHING --linkopt=-L/usr/local/cuda/lib64 --linkopt=-lcudart --linkopt=-static-libstdc++ //tensorflow/tools/pip_package:build_pip_package
Regarding options used:
--output_filter=DONT_MATCH_ANYTHING : Silence warnings
--linkopt=-L/usr/local/cuda/lib64 --linkopt=-lcudart : Dynamic linking of cudart.so
--linkopt=-static-libstdc++ : Static link libstc++ as libstc++ also caused the libc error, this however is not possible for libm
I expected my own build to also work.
That expectation is (obviously) incorrect. The symbols your program or library requires from GLIBC depend on exactly which functions you call.
Consider the following program:
int main() { exit(0); }
When compiled/linked on a GLIBC-2.30 system, this program only depends on GLIBC_2.2.5 (because it doesn't call any newer symbols).
Now change the program slightly:
int main() { gettid(); exit(0); }
Compile/link it again, and all of a sudden this program now requires GLIBC_2.30 (because that's where gettid() was added to GLIBC), and will not work on any system which has older GLIBC.
So i assume there is some configuration option or docker configuration
Sure: your Docker image must have GLIBC that is not newer than what your target system have, i.e. GLIBC-2.17. Your current image contains GLIBC-2.27 (or newer).
You need a different Docker image, and you'll likely have to build it yourself, since GLIBC-2.17 is over 7 years old, and predates TensorFlow by many years.
Update:
What i don't understand is how come the pip tensorflow package (which i assumed was build with the docker image i am using) works with CentOS?
It works by accident, just like my first program would work on CentOS, but the second one wouldn't.
In short i wanted to generate a pip package that would work on 'any' linux/libc version
That is an impossible goal: Linux predates GLIBC, and it is impossible to build a single package that will work on a Linux distribution which didn't include GLIBC and on a distribution that did.
You have to draw a line somewhere. The developers of tensorflow-dev docker image drew a line at GLIBC-2.27. Packages built on this image should work on any system with 2.27 or later, and might (but are not at all guaranteed to) work on older systems.
just like the pip installed version.
You claim that the pip installed version has no "only GLIBC-xx or later" requirement, but that is not true. I am 99.9% sure that it requires at least GLIBC-2.14.
To find which GLIBC versions that package requires, run this command:
readelf -WV _pywrap_tensorflow_internal.so | grep GLIBC_
I assumed, the pip installed version was built using the publicly available tensorflow-devel docker image.
That is quite likely. And like I said, it happens to work on CentOS, but minute changes may make it not work anymore.
Update 2:
So running the readelf command as you suggested, does show the most recent required versions to be: - pip version: GLIBC_2.12 - mine : GLIBC_2.27 So from what i understand the pip version uses an older version even from CentOS, which explains why it works.
It doesn't "use" older version, it uses whatever version is available.
It requires a minimum version 2.12, while your build requires a minimum version 2.27.
How do they achieve this? Do they use a different image that has an older libc? If so, where can i get it? Or do they use the public image, but build with some bazel flag, that 'limits' symbols to the ones contained up to libc 2.12?
You are still not getting it.
The version that your program requires depends on exactly which functions you call. In my example program, if I only call exit, my program requires vesion 2.2.5, but if I also call gettid, then my program requires version 2.30. Note: these two programs are built on the same system with the same flags.
So no: they (most likely) didn't use a different Docker image, and didn't use "magic" bazel flags. They just happened to not call any functions which require GLIBC version > 2.12, and you did.
P.S. You can find which symbol(s) are causing "bad" dependency in your build like so:
readelf -Ws _pywrap_tensorflow_internal.so | egrep 'GLIBC_2.2[0-9]'
readelf -Ws _pywrap_tensorflow_internal.so | egrep 'GLIBC_2.1[89]'
This would produce output similar to (using my second program):
readelf -Ws a.out | egrep 'GLIBC_2.[23][0-9]'
2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND gettid#GLIBC_2.30 (2)
48: 0000000000000000 0 FUNC GLOBAL DEFAULT UND gettid##GLIBC_2.30
The output above shows that the only symbol my binary requires from GLIBC 2.20 or above is gettid.
To make a counter point to what Employed Russian wrote:
The version that your program requires depends on exactly which functions you call. In my example program, if I only call exit, my program requires vesion 2.2.5, but if I also call gettid, then my program requires version 2.30. Note: these two programs are built on the same system with the same flags.
I don't think that's quite accurate. My understanding, which is corroborated by https://github.com/wheybags/glibc_version_header, is that things work like so (quoting that project, emphasis mine):
Glibc uses something called symbol versioning. This means that when you use e.g., malloc in your program, the symbol the linker will actually link against is malloc#GLIBC_YOUR_INSTALLED_VERSION (actually, it will link to malloc from the most recent version of glibc that changed the implementaton of malloc, but you get the idea).
So my guess (I have not checked) would be that the Tensorflow releases are built against an older glibc (perhaps by way of being built on an older release of their target Linux distro).

Cant upload to the NodeMCU Lua

I have a NodeMCU board running the Lua interpreter, I can access the serial connection via the nodemcu-tool to input commands but when using the nodemcu-tool to upload or reset the filesystem it returns
[NodeMCU-Tool]~ Unable to establish connection
[NodeMCU-Tool]~ Timeout, no response detected - is NodeMCU online and the Lua interpreter ready ?
I might have an answer:
I ran into the same (or very, very similar) problem, on Mac OS X Mojave.
In the end, I reverted to completely uninstalling Node.js (this experience does not help convincing me of Node.js but that is another story) and start from scratch.
Even that did not help because I ran into trouble installing nodemcu-tool ...
Previously I installed it as a global package, and that somehow worked, but it caused me to always sudo my nodemcu-tool invocations - not a good thing!
In any case, sudo-ing plus the commandline parameter "--connection_delay" (or as a project setting, "connectionDelay") helped getting me going.
Until I messed up, and reinstalled everything from scratch. However, the key difference to the instructions for installing nodemcu-tool was adding the '--unsafe-perm' parameter to it, like so:
sudo npm install --unsafe-perm nodemcu-tool -g
That was to be able to get past the repeated installation errors for the serialport package...
IMO, relying on unsafe permissions (for what exactly, anyway!?) is, well, UNSAFE! GRRRRR
To the OP, make sure that:
you have installed Node.js and nodemcu-tool properly (download stable installer etc), and
that you use the --connection_delay parameter in each and every nodemcu-tool invocation!
I had the same problem!
The solution was to reset the board:
Conect the board via USB and press FLASH + RST (two buttons on the board)
relese FLASH
relese RST
Now you can upload your sketch.
If it doesn't work try to disconnect all pins. In my case the GPIO4 was soldered to a LED-Strip and i it was imposible to load the sketch until i disconnected it.

Resources