tensorflow-gpu is not working with Blas GEMM launch failed - nvidia

I installed tensorflow-gpu to run my tensorflow code on my GPU. But I can't make it run. It keeps on giving the above mentioned error. Following is my sample code followed by the error stack trace:
import tensorflow as tf
import numpy as np
def check(W,X):
return tf.matmul(W,X)
def main():
W = tf.Variable(tf.truncated_normal([2,3], stddev=0.01))
X = tf.placeholder(tf.float32, [3,2])
check_handle = check(W,X)
with tf.Session() as sess:
tf.initialize_all_variables().run()
num = sess.run(check_handle, feed_dict =
{X:np.reshape(np.arange(6), (3,2))})
print(num)
if __name__ == '__main__':
main()
My GPU is pretty good GeForce GTX 1080 Ti with 11 GB vram and there is nothing else significant running on it(just chrome) as you can see in the nvidia-smi :
Fri Aug 4 16:34:49 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 381.22 Driver Version: 381.22 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 0000:07:00.0 On | N/A |
| 30% 55C P0 79W / 250W | 711MiB / 11169MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 7650 G /usr/lib/xorg/Xorg 380MiB |
| 0 8233 G compiz 192MiB |
| 0 24226 G ...el-token=963C169BB38ADFD67B444D57A299CE0A 136MiB |
+-----------------------------------------------------------------------------+
Following is the error stack trace:
2017-08-04 15:44:21.585091: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585110: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585114: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585118: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.585122: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-08-04 15:44:21.853700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:07:00.0
Total memory: 10.91GiB
Free memory: 9.89GiB
2017-08-04 15:44:21.853724: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-08-04 15:44:21.853728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-08-04 15:44:21.853734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:07:00.0)
2017-08-04 15:44:24.948616: E tensorflow/stream_executor/cuda/cuda_blas.cc:365] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2017-08-04 15:44:24.948640: W tensorflow/stream_executor/stream.cc:1601] attempting to perform BLAS operation using StreamExecutor without BLAS support
2017-08-04 15:44:24.948805: W tensorflow/core/framework/op_kernel.cc:1158] Internal: Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
[[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
Traceback (most recent call last):
File "test.py", line 51, in <module>
_, loss_out, res_out = sess.run([train_op, loss, res], feed_dict)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 789, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 997, in _run
feed_dict_string, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1132, in _do_run
target_list, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1152, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
[[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
[[Node: layer2/MatMul/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_158_layer2/MatMul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
Caused by op u'layer1/MatMul', defined at:
File "test.py", line 18, in <module>
pre_activation = tf.matmul(input_ph, weights)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 1816, in matmul
a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 1217, in _mat_mul
transpose_b=transpose_b, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
self._traceback = _extract_stack()
InternalError (see above for traceback): Blas GEMM launch failed : a.shape=(1, 5), b.shape=(5, 10), m=1, n=10, k=5
[[Node: layer1/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_arg_Placeholder_0_0/_11, layer1/weights/read)]]
[[Node: layer2/MatMul/_17 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_158_layer2/MatMul", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]
To add to it, my previous installation of tensorflow cpu worked pretty well. Any help is appreciated. Thanks!
Note- I have cuda-8.0 with cudnn-5.1 installed and their paths added in my bashrc profile .

I had a very similar problem. For me it coincided with an nvidia driver update. So I though it was a problem with the driver. But changing the driver had no effect. What eventually worked for me was cleaning out the nvidia cache:
sudo rm -rf ~/.nv/
Found this suggestion in the NVIDIA developer forum:
https://devtalk.nvidia.com/default/topic/1007071/cuda-setup-and-installation/cuda-error-when-running-matrixmulcublas-sample-ubuntu-16-04/post/5169223/
I suspect that during the driver update there where still some compiled files of the old version that were not compatible, or even that were corrupted during the process. Assumptions aside, this solved the problem for me.

So for me the reason for this error was that my cuda and all sub directories and files required root privileges. So tensorflow required root privileges as well to be able to use cuda. So uninstalling tensorflow and installing it again as a root user solved the problem for me.

Installing the right NVIDIA driver and CUDA versions for the my NVIDIA graphics card(For eg. NVIDIA RTX 2070 in my case) worked for me.

Related

Why the same tasks cost differerent CPU on linux kernel 4.9 and 5.4?

My application is a compute intensive task(I.e. video encoding). When it is running on linux kernel 4.9(Ubuntu 16.04), the cpu usage is 3300%. But when it is running on linux kernel 5.4(Ubuntu 20.04), the cpu Usage is just 2850%. Promise the processes do the same job.
So I wonder if linux kernel had done some cpu scheduling optimization or related work between 4.9 and 5.4? Could you give any advice to investigate the reason?
I am not sure if the version of glic has effect or not, for your information, the version of glic is 2.23 on linux kernel 4.9 while 2.31 on linux kernel 5.4.
CPU Info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Silver 4210 CPU # 2.20GHz
Stepping: 7
CPU MHz: 2200.000
BogoMIPS: 4401.69
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 14080K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
Output of perf stat on Linux Kernel 4.9
Performance counter stats for process id '32504':
3146297.833447 cpu-clock (msec) # 32.906 CPUs utilized
1,718,778 context-switches # 0.546 K/sec
574,717 cpu-migrations # 0.183 K/sec
2,796,706 page-faults # 0.889 K/sec
6,193,409,215,015 cycles # 1.968 GHz (30.76%)
6,948,575,328,419 instructions # 1.12 insn per cycle (38.47%)
540,538,530,660 branches # 171.801 M/sec (38.47%)
33,087,740,169 branch-misses # 6.12% of all branches (38.50%)
1,966,141,393,632 L1-dcache-loads # 624.906 M/sec (38.49%)
184,477,765,497 L1-dcache-load-misses # 9.38% of all L1-dcache hits (38.47%)
8,324,742,443 LLC-loads # 2.646 M/sec (30.78%)
3,835,471,095 LLC-load-misses # 92.15% of all LL-cache hits (30.76%)
<not supported> L1-icache-loads
187,604,831,388 L1-icache-load-misses (30.78%)
1,965,198,121,190 dTLB-loads # 624.607 M/sec (30.81%)
438,496,889 dTLB-load-misses # 0.02% of all dTLB cache hits (30.79%)
7,139,892,384 iTLB-loads # 2.269 M/sec (30.79%)
260,660,265 iTLB-load-misses # 3.65% of all iTLB cache hits (30.77%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
95.615072142 seconds time elapsed
Output of perf stat on Linux Kernel 5.4
Performance counter stats for process id '3355137':
2,718,192.32 msec cpu-clock # 29.184 CPUs utilized
1,719,910 context-switches # 0.633 K/sec
448,685 cpu-migrations # 0.165 K/sec
3,884,586 page-faults # 0.001 M/sec
5,927,930,305,757 cycles # 2.181 GHz (30.77%)
6,848,723,995,972 instructions # 1.16 insn per cycle (38.47%)
536,856,379,853 branches # 197.505 M/sec (38.47%)
32,245,288,271 branch-misses # 6.01% of all branches (38.48%)
1,935,640,517,821 L1-dcache-loads # 712.106 M/sec (38.47%)
177,978,528,204 L1-dcache-load-misses # 9.19% of all L1-dcache hits (38.49%)
8,119,842,688 LLC-loads # 2.987 M/sec (30.77%)
3,625,986,107 LLC-load-misses # 44.66% of all LL-cache hits (30.75%)
<not supported> L1-icache-loads
184,001,558,310 L1-icache-load-misses (30.76%)
1,934,701,161,746 dTLB-loads # 711.760 M/sec (30.74%)
676,618,636 dTLB-load-misses # 0.03% of all dTLB cache hits (30.76%)
6,275,901,454 iTLB-loads # 2.309 M/sec (30.78%)
391,706,425 iTLB-load-misses # 6.24% of all iTLB cache hits (30.78%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
93.139551411 seconds time elapsed
UPDATE:
It is confirmed the performance gain comes from linux kernel 5.4, because the performance on linux kernel 5.3 is the same as linux kernel 4.9.
It is confirmed the performance gain has no relation with libc, because on linux kernel 5.10 whose libc is 2.23 the performance is the same as linux kernel 5.4 whose libc is 2.31
It seems performance gain comes from this fix:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de53fd7aedb100f03e5d2231cfce0e4993282425

nvprof Warning: The path to CUPTI and CUDA Injection libraries might not be set in LD_LIBRARY_PATH

I get the message in the subject when I try to run a program I developed with OpenACC through Nvidia's nvprof profiler like this:
nvprof ./SFS 4
If I run nvprof with -o [output_file] the warning message doesn't appear, but the output file is not created. What could be wrong here?
The LD_LIBRARY_PATH is set in my .bashrc to: /opt/nvidia/hpc_sdk/Linux_x86_64/20.7/cuda/11.0/lib64/ because there I have found these files there (they have "cupti" and "inj" in their names and I thought they are the needed ones):
lrwxrwxrwx 1 root root 19 Aug 4 05:27 libaccinj64.so -> libaccinj64.so.11.0
lrwxrwxrwx 1 root root 23 Aug 4 05:27 libaccinj64.so.11.0 -> libaccinj64.so.11.0.194
...
lrwxrwxrwx 1 root root 16 Aug 4 05:27 libcupti.so -> libcupti.so.11.0
lrwxrwxrwx 1 root root 20 Aug 4 05:27 libcupti.so.11.0 -> libcupti.so.2020.1.0
...
I am on Ubuntu 18.04. workstation with Nvidia GeForce RTX 2070, and have CUDA version 11 installed.
nvidia-smi command gives me this:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66 Driver Version: 450.66 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:02:00.0 On | N/A |
| 30% 40C P2 58W / 185W | 693MiB / 7981MiB | 3% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
The compilers I have (nvidia and portland) are from the latest Nvidia HPC-SDK, version 20.7-0
I compile my programs with -acc -Minfo=accel options, not sure how could I set -ta= and if it is needed at all?
P.S. I am also not sure if running my code, with or without nvprof uses GPUs at all, although I did set ACC_DEVICE_TYPE to nvidia.
Any advice would be very welcome.
Cheers
Which nvprof are you using? The one that ships with NV HPC 20.7 or your own install?
This looks very similar to an issue reported yesterday on the NVIDIA DevTalk user forums:
https://forums.developer.nvidia.com/t/new-20-7-version-where-is-the-detail-release-bugfix/146168/4
Granted this was for Nsight-systems, but it may be the same issue. It appears to be a problem with the 2020.3 version of the profilers which is the version we ship with the NV HPC 20.7 SDK. As I note, the Nsight-Systems 2020.4 release should have this fixed, so the work around would be download and install 2020.4 or use a prior release.
https://developer.nvidia.com/nsight-systems
There does seem to be a temporary issue with the Nsight-systems download that hopefully be corrected before you see this note.
Also, nvprof is in the process of being deprecated so you should consider moving to use Nsight-systems and Nsight-compute.
https://developer.nvidia.com/blog/migrating-nvidia-nsight-tools-nvvp-nvprof/

How to get steam to run on Ubuntu 20.04

Steam won't run =( Here's what I've tried:
I have a fresh install of Ubuntu 20.04 (via Ubuntu Server Live Installer + ubuntu-desktop package) with nvidia drivers:
$ nvidia-smi
Mon Jun 22 10:26:49 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2070 Off | 00000000:01:00.0 On | N/A |
| 28% 31C P8 22W / 175W | 303MiB / 7981MiB | 2% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1542 G /usr/lib/xorg/Xorg 53MiB |
| 0 7835 G /usr/lib/xorg/Xorg 124MiB |
| 0 8086 G /usr/bin/gnome-shell 111MiB |
+-----------------------------------------------------------------------------+
Attempt 1: .deb
Download deb from https://store.steampowered.com/about/
$ sudo dpkg -i steam_latest.deb
$ steam
Steam needs to install these additional packages:
libgl1-mesa-dri:i386, libgl1:i386, libc6:i386
Enter sudo password and to install them, and it installs a 49 *:i386 packages
"Updating Steam..." windows pops up and downloads and runs stuff for a bit, then!
CRASH!
[2020-06-22 17:00:18] Installing update...
[2020-06-22 17:00:19] Cleaning up...
[2020-06-22 17:00:19] Update complete, launching...
[2020-06-22 17:00:19] Shutdown
Restarting Steam by request...
Traceback (most recent call last):
File "/usr/bin/steamdeps", line 484, in <module>
sys.exit(main())
File "/usr/bin/steamdeps", line 460, in main
if dep.is_available():
File "/usr/bin/steamdeps", line 96, in is_available
return is_provided(self.name)
File "/usr/bin/steamdeps", line 68, in is_provided
(name, version) = provider.split()
ValueError: too many values to unpack (expected 2)
Running Steam on ubuntu 20.04 64-bit
STEAM_RUNTIME has been set by the user to: /home/username/.local/share/Steam/ubuntu12_32/steam-runtime
Found newer runtime version for 64-bit libGLU.so.1. Host: 1.3.1 Runtime: 1.3.8004
Found newer runtime version for 64-bit libdbusmenu-glib.so.4. Host: 4.0.12 Runtime: 4.0.13
Found newer runtime version for 64-bit libvulkan.so.1. Host: 1.2.131 Runtime: 1.2.135
Forced use of runtime version for 64-bit libcurl.so.4. Host: 4.6.0 Runtime: 4.2.0
Found newer runtime version for 32-bit libvulkan.so.1. Host: 1.2.131 Runtime: 1.2.135
Steam client's requirements are satisfied
/home/username/.local/share/Steam/ubuntu12_32/steam
[2020-06-22 17:00:34] Startup - updater built Jun 4 2020 05:50:42
Installing breakpad exception handler for appid(steam)/version(1591251555)
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
SteamUpdateUI: An X Error occurred
X Error of failed request: GLXBadContext
SteamUpdateUI: An X Error occurred
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 152 (GLX)
Minor opcode of failed request: 3 (X_GLXCreateContext)
Value in failed request: 0x0
Serial number of failed request: 51
xerror_handler: X failed, continuing
Major opcode of failed request: 152 (GLX)
Minor opcode of failed request: 6 (X_GLXIsDirect)
Serial number of failed request: 52
xerror_handler: X failed, continuing
Installing breakpad exception handler for appid(steam)/version(1591251555)
[2020-06-22 17:00:34] Verifying installation...
[2020-06-22 17:00:35] Verification complete
Loaded SDL version 2.0.13-5893924
Gtk-Message: Failed to load module "gail"
Gtk-Message: Failed to load module "atk-bridge"
(steam:32777): Gtk-WARNING **: Unable to locate theme engine in module_path: "adwaita",
/usr/share/themes/Yaru/gtk-2.0/main.rc:775: error: unexpected identifier `direction', expected character `}'
(steam:32777): Gtk-WARNING **: Unable to locate theme engine in module_path: "adwaita",
/usr/share/themes/Yaru/gtk-2.0/hacks.rc:28: error: invalid string constant "normal_entry", expected valid string constant
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
Steam: An X Error occurred
X Error of failed request: GLXBadContext
Major opcode of failed request: 152
Serial number of failed request: 64
xerror_handler: X failed, continuing
Steam: An X Error occurred
X Error of failed request: BadValue (integer parameter out of range for operation)
Major opcode of failed request: 152
Value in failed request: 0x0
Serial number of failed request: 63
xerror_handler: X failed, continuing
Steam: An X Error occurred
X Error of failed request: BadMatch (invalid parameter attributes)
Major opcode of failed request: 152
Serial number of failed request: 65
xerror_handler: X failed, continuing
assert_20200622170034_1.dmp[32831]: Uploading dump (out-of-process)
/tmp/dumps/assert_20200622170034_1.dmp
/home/username/.local/share/Steam/steam.sh: line 750: 32777 Segmentation fault (core dumped) $STEAM_DEBUGGER "$STEAMROOT/$STEAMEXEPATH" "$#"
Subsequent attempts to run steam result in the update window flashing and then same crash.
Attempt #2 via multiverse repo, per linuxconfig.org
$ sudo add-apt-repository multiverse
'multiverse' distribution component is already enabled for all sources.
$ sudo apt update
$ sudo apt install steam
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following additional packages will be installed:
steam-launcher
The following NEW packages will be installed:
steam:i386 steam-launcher
0 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 2,980 kB of archives.
After this operation, 3,163 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://repo.steampowered.com/steam precise/steam amd64 steam-launcher all 1:1.0.0.62 [2,972 kB]
Get:2 http://repo.steampowered.com/steam precise/steam i386 steam i386 1:1.0.0.62 [8,052 B]
Fetched 2,980 kB in 1s (3,294 kB/s)
Selecting previously unselected package steam-launcher.
(Reading database ... 158744 files and directories currently installed.)
Preparing to unpack .../steam-launcher_1%3a1.0.0.62_all.deb ...
Unpacking steam-launcher (1:1.0.0.62) ...
Selecting previously unselected package steam:i386.
Preparing to unpack .../steam_1%3a1.0.0.62_i386.deb ...
Unpacking steam:i386 (1:1.0.0.62) ...
Setting up steam-launcher (1:1.0.0.62) ...
Setting up steam:i386 (1:1.0.0.62) ...
Processing triggers for mime-support (3.64ubuntu1) ...
Processing triggers for hicolor-icon-theme (0.17-2) ...
Processing triggers for gnome-menus (3.36.0-1ubuntu1) ...
Processing triggers for man-db (2.9.1-1) ...
Processing triggers for desktop-file-utils (0.24-1ubuntu3) ...
$ steam
CRASH! Same errors as the first method.
I recently had the same issue, but had a fix. I hope this works for you.
Here is what I did:
Install steam from the steam website. https://store.steampowered.com/about/
Run this line in a terminal STEAM_RUNTIME=0 steam
You should get a line telling you the missing dependencies
Running Steam on ubuntu 20.04 64-bit
STEAM_RUNTIME is disabled by the user
Error: You are missing the following 32-bit libraries, and Steam may not run:
libXtst.so.6
libXrandr.so.2
libXrender.so.1
libgobject-2.0.so.0
libglib-2.0.so.0
libgio-2.0.so.0
libgtk-x11-2.0.so.0
libpulse.so.0
libgdk_pixbuf-2.0.so.0
libva.so.2
libbz2.so.1.0
libvdpau.so.1
libva.so.2
libva-x11.so.2
Can't find 'steam-runtime-check-requirements', continuing anyway
/home/timothy/.local/share/Steam/ubuntu12_32/steam
Once you get the missing dependencies run this line in the terminal for every missing dependency. sudo apt install (Dependency name)
EXAMPLE (sudo apt install libXtst.so.6)
"libXtst.so.6" was part of the list of dependencies that was given to me by the terminal
Once you installed all those dependencies steam should open up.
Let steam install what it needs to and login in, it should work
If you have any issues just leave a reply.
Other Forums/Community where I got most of the idea from
https://steamcommunity.com/app/221410/discussions/0/530645446314818582/
As #Helper Shoes mentioned, it is highly probable that you have missing 32 bit libraries.
Installing the following libraries made it work for me:
$ sudo dpkg --add-architecture i386
$ sudo apt update
$ sudo apt install libxtst6:i386 libxrandr2:i386 libgtk2.0-0:i386 libsm6:i386 libpulse0:i386 ffmpeg:i386

GPU out of memory error just by declaring TF Keras Metrics

I was recently moving my code from local to a GPU-enabled server, and I'm running into a strange OOM error. By elimination, the problem seems to be TF Keras Metrics. My code now has been reduced to
import tensorflow as tf
METRICS = [
tf.keras.metrics.Precision(name='precision')
]
...and yet I'm still encountering an OOM error. No other process is running. I'm doing this inside a docker container (tensorflow/tensorflow:latest-gpu-py3) btw, and it might be the issue, but I can't find the proper parameters to change.
Would really appreciate your help!
Versions: Docker 17.12.1-ce, TF 2.1.0, Keras 2.3.1
Docker command:
docker run --runtime=nvidia -it --rm -v tensorflow/tensorflow:latest-gpu-py3 bash
nvidia-smi output:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21 Driver Version: 435.21 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1080 Off | 00000000:17:00.0 Off | N/A |
| 54% 68C P2 137W / 200W | 7931MiB / 8119MiB | 81% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce GTX 1080 Off | 00000000:65:00.0 Off | N/A |
| 33% 34C P8 10W / 200W | 115MiB / 8119MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Whole error output is below:
2020-04-20 11:05:08.088874: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-04-20 11:05:08.090195: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
2020-04-20 11:05:08.747745: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-04-20 11:05:08.751503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:17:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7715GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2020-04-20 11:05:08.751905: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:
pciBusID: 0000:65:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7715GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2020-04-20 11:05:08.751936: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-20 11:05:08.751963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-20 11:05:08.753290: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-20 11:05:08.753551: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-20 11:05:08.754983: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-20 11:05:08.755747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-20 11:05:08.755786: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-20 11:05:08.757022: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1
2020-04-20 11:05:08.757267: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-04-20 11:05:08.782042: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz
2020-04-20 11:05:08.783237: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5dd24e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-04-20 11:05:08.783274: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-04-20 11:05:08.996600: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5e37ca0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-04-20 11:05:08.996653: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce GTX 1080, Compute Capability 6.1
2020-04-20 11:05:08.996670: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): GeForce GTX 1080, Compute Capability 6.1
2020-04-20 11:05:08.998089: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:17:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7715GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2020-04-20 11:05:08.999119: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 1 with properties:
pciBusID: 0000:65:00.0 name: GeForce GTX 1080 computeCapability: 6.1
coreClock: 1.7715GHz coreCount: 20 deviceMemorySize: 7.93GiB deviceMemoryBandwidth: 298.32GiB/s
2020-04-20 11:05:08.999175: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-20 11:05:08.999200: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-20 11:05:08.999241: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-04-20 11:05:08.999270: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-04-20 11:05:08.999298: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-04-20 11:05:08.999327: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-04-20 11:05:08.999359: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-04-20 11:05:09.004066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0, 1
2020-04-20 11:05:09.004172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-04-20 11:05:09.561399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-04-20 11:05:09.561437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0 1
2020-04-20 11:05:09.561442: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N Y
2020-04-20 11:05:09.561446: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 1: Y N
2020-04-20 11:05:09.562474: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 37 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080, pci bus id: 0000:17:00.0, compute capability: 6.1)
2020-04-20 11:05:09.563399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7460 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080, pci bus id: 0000:65:00.0, compute capability: 6.1)
2020-04-20 11:05:09.570968: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 37.56M (39387136 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-04-20 11:05:09.572125: I tensorflow/stream_executor/cuda/cuda_driver.cc:801] failed to allocate 33.81M (35448576 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
Traceback (most recent call last):
File "sample.py", line 4, in <module>
tf.keras.metrics.Precision(name='precision')
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/metrics.py", line 1186, in __init__
initializer=init_ops.zeros_initializer)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/metrics.py", line 276, in add_weight
aggregation=aggregation)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 446, in add_weight
caching_device=caching_device)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/tracking/base.py", line 744, in _add_variable_with_custom_getter
**kwargs_for_getter)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 142, in make_variable
shape=variable_shape if variable_shape else None)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 258, in __call__
return cls._variable_v1_call(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 219, in _variable_v1_call
shape=shape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 197, in <lambda>
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variable_scope.py", line 2596, in default_variable_creator
shape=shape)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py", line 262, in __call__
return super(VariableMetaclass, cls).__call__(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 1411, in __init__
distribute_strategy=distribute_strategy)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 1557, in _init_from_args
graph_mode=self._in_graph_mode)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 232, in eager_safe_variable_handle
shape, dtype, shared_name, name, graph_mode, initial_value)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 164, in _variable_handle_from_shape_and_dtype
math_ops.logical_not(exists), [exists], name="EagerVariableNameReuse")
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_logging_ops.py", line 55, in _assert
_ops.raise_from_not_ok_status(e, name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: assertion failed: [0] [Op:Assert] name: EagerVariableNameReuse

cudnn error :: CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED

I am trying to install a open-source software "openpose" for which I needed to install cuda, cudnn and nvidia drivers. Output of nvidia-smi is :
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce 940MX Off | 00000000:01:00.0 Off | N/A |
| N/A 47C P8 N/A / N/A | 107MiB / 2004MiB | 7% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1513 G /usr/lib/xorg/Xorg 63MiB |
| 0 1698 G /usr/bin/gnome-shell 41MiB |
+-----------------------------------------------------------------------------+
And output of cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2 gives:
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 6
#define CUDNN_PATCHLEVEL 5
After successfull installations of all the above softwares and libraries, I finally ran openpose
with:
./build/examples/openpose/openpose.bin --video examples/media/video.avi
But the output was:
Starting OpenPose demo...
Configuring OpenPose...
Starting thread(s)...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
F0214 01:02:35.327615 3433 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED
*** Check failure stack trace: ***
# 0x7fabb8f390cd google::LogMessage::Fail()
# 0x7fabb8f3af33 google::LogMessage::SendToLog()
# 0x7fabb8f38c28 google::LogMessage::Flush()
# 0x7fabb8f3b999 google::LogMessageFatal::~LogMessageFatal()
# 0x7fabb89459d3 caffe::CuDNNConvolutionLayer<>::LayerSetUp()
# 0x7fabb8a42308 caffe::Net<>::Init()
# 0x7fabb8a441e0 caffe::Net<>::Net()
# 0x7fabbaa2ccaa op::NetCaffe::initializationOnThread()
# 0x7fabbaa500a1 op::addCaffeNetOnThread()
# 0x7fabbaa51518 op::PoseExtractorCaffe::netInitializationOnThread()
# 0x7fabbaa57163 op::PoseExtractorNet::initializationOnThread()
# 0x7fabbaa4be61 op::PoseExtractor::initializationOnThread()
# 0x7fabbaa46a51 op::WPoseExtractor<>::initializationOnThread()
# 0x7fabbaa8aff1 op::Worker<>::initializationOnThreadNoException()
# 0x7fabbaa8b120 op::SubThread<>::initializationOnThread()
# 0x7fabbaa8d2d8 op::Thread<>::initializationOnThread()
# 0x7fabbaa8d4a7 op::Thread<>::threadFunction()
# 0x7fabba32566f (unknown)
# 0x7fabb9a476db start_thread
# 0x7fabb9d8088f clone
Aborted
I have gone through a lot of online discussions but could not figure out how to resolve this.
I have been having the same problem with CUDNN.
Although not ideal, I have been running without CUDNN. In cmake-gui uncheck USE_CUDNN and then compile. When running openpose I have also had to reduce -net_resolution.
For example: ./build/examples/openpose/openpose.bin -net_resolution 256x192
The greater the resolution the slower the FPS though.

Resources