NVML: Driver/library version mismatch - nvidia

I don't know why nvidia-smi doesn't work
what I need to do for fix it?
I think my library and driver version is match but nvidia-smi dosen't recognize it
test

I was facing same problem and I'm posting here my solution.
In my case NVRM version was 440.100 and driver version was 460.32.03. My driver was updated by sudo apt install caffe-cuda and I didn't notice that time but I checked it from /var/log/apt/history.log.
By following my NVRM version I just used sudo apt install nvidia-driver-440 but it installed 450.102, I don't know why it installed other version and nvidia-smi is showing 450.102.04.
Anyhow after rebooting my PC everything including cuda is working fine now.
I didn't remove/purge anything related to nvidia driver. Version 460.32.03 was uninstalled automatically by running sudo apt install nvidia-driver-440

For me, this solution from NVIDIA forums solved the issue.
Run sudo apt purge nvidia* libnvidia*
Then sudo apt install nvidia-driver-520

Related

Installing Specific Docker Version via Puppet

While deploying docker using puppet, I encounter an interesting issue.
docker installs fine if I use: version => latest
docker install fails if I use: version => '20.10.16'
My setup is as follows:
puppet master is Ubuntu 20.04.
puppet agent is Ubuntu 22.04 (on which I am trying to install docker).
I believe puppet docker module supports this setup.
The version lines I tried:
version => '20.10.16'
version => '20.10.16~3-0~ubuntu'
version => '20.10.16~3-0~ubuntu-jammy'
The error I get when I specify a specific version is as follows:
Error: Could not update: Execution of '/usr/bin/apt-get -q -y -o DPkg::Options::=--force-confold --force-yes install docker-ce=20.10.16' returned 100: Reading package lists...
Building dependency tree...
Reading state information...
Package docker-ce is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
However the following packages replace it:
docker-ce-cli
Anyone has any idea what can be done so that it installs specific version of docker instead of the latest one?
When running apt-cache madison docker-ce it appears that the version number is 5:20.10.16~3-0~ubuntu-jammy.
Running apt install docker-ce=20.10.16~3-0~ubuntu-jammy returns the same error as yours, but apt install docker-ce=5:20.10.16~3-0~ubuntu-jammy works.
I suggest trying with the 5: in front of the version number.

Nvidia-smi command not found - nvidia drivers installed

~$ nvidia-smi
nvidia-smi: command not found
~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:07:04_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148
Its worked before. I made changes in grub using the command
gksu gedit /etc/default/grub
but revert it back. (I don't know if it could have effect)
Reason
The most common reason for any “command not found” error is that the software being accessed is not installed on the system. Check out the example below where the tool is not installed on the system and the error “nvidia-smi command not found” is thrown:
Solution
The best way of resolving this error is to install the “nvidia-utils” package which will also contain the “nvidia-smi” tool inside it. To install this package, run the command in the terminal:
$ sudo apt install nvidia-utils-515
I got the detailed solution form this post: https://itslinuxfoss.com/fix-nvidia-smi-command-not-found-error/

Where is docker-ee-selinux-17.06.1.ee.1-1.el7.noarch.rpm

Docker has just released a 17.06 version of docker-ee. But there is no selinux rpm in trial packages /rhel/7.3/x86_64/stable-17.06/Packages/, and yum install docker-ee-17.06.1.ee.1-1.el7.rhel.x86_64.rpm failed because of selinux needed.
Anybody knows where to find it? Thanks.
Got it!
docker-ee 17.06 uses container-selinux, so yum install container-selinux-2.19-2.1.el7.noarch.rpm.

Install openCV with CUDA toolkit 8.0

I'm trying to install Opencv 3.2.0 and Nvidia CUDA toolkit 8.0 on Ubuntu 16.04 but I can't configure them together. I get the following error when I try to make project using both:
CMake Error at /usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
Could NOT find CUDA: Found unsuitable version "8.0", but required is exact
version "7.5" (found /usr/local/cuda)
Call Stack (most recent call first):
/usr/share/cmake-3.5/Modules/FindPackageHandleStandardArgs.cmake:386 (_FPHSA_FAILURE_MESSAGE)
/usr/share/cmake-3.5/Modules/FindCUDA.cmake:949 (find_package_handle_standard_args)
/usr/local/share/OpenCV/OpenCVConfig.cmake:86 (find_package)
/usr/local/share/OpenCV/OpenCVConfig.cmake:105 (find_host_package)
CMakeLists.txt:10 (find_package)
-- Configuring incomplete, errors occurred!
I have tried installing cuda toolkit 7.5 but its not compatible with ubuntu 16.04 I believe. I'm really clueless now, I hope someone can help with this.
Thanks
so I solved this issue by managing to install toolkit 7.5. Here is how I did it:
Updated nvidia driver for my Operating System
Download cuda toolkit 7.5 and extract it to a folder
$ mkdir ~/Downloads/NVIDIA_TOOLKIT
$ cd ~/Downloads
$ ./cuda_7.5.18_linux.run -extract=~/Downloads/NVIDIA_TOOLKIT;
go to the virtual console by pressing Ctrl + Alt + F1 and turn off
lightdm service
$ sudo service lightdm stop
cd to downloads and install the extracted toolkit and samples
$ cd ~/Downloads/NVIDIA_TOOLKIT
$ sudo ./cuda-linux64-rel-6.0.37-18176142.run
$ sudo ./cuda-samples-linux-6.0.37-18176142.run
Set environment variables in .bashrc file
$ PATH=/usr/local/cuda-7.5/bin
$ LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64
Turn back on the lightdm service
$ sudo service lightdm start
Reboot and you should be able to use the nvcc compiler
For openCV you will have to downgrade your gcc/ g++ compiler to 4.9 since it is not yet compatible with the higher versions

How to uninstall Cuda7.5 from ubuntu?

I have a NVIDIA jetson TX1 board. I want to install caffe on that. Based on Caffe prerequisites I installed CUDA toolkit from https://developer.nvidia.com/cuda-downloads. Later I found that this board has its own installtion description. It needs 10GB space and I do not have it since I have given some to Caffe prerequisites installations.
Now I need to remove this CUDA toolkit completely.
I did not find a sure way till now. Can you please help me?
I am using ubunto 14.4.+ NVIDIA jetson TX1
If you installed CUDA 7.5 using the .run :
From the manual:
4.6. Uninstallation
To uninstall the CUDA Toolkit, run the uninstallation script provided in the bin directory of the toolkit. By
default, it is located in /usr/local/cuda-7.5/bin:
$ sudo /usr/local/cuda-7.5/bin/uninstall_cuda_7.5.pl
To uninstall the NVIDIA Driver, run nvidia-uninstall:
$ sudo /usr/bin/nvidia-uninstall
If you installed CUDA 7.5 using the .deb package:
$ sudo apt-get purge cuda-7.5
(I think the package name is cuda-7.5, if it does not work, try with cuda-7-5 or just cuda)
Try:
sudo apt-get --purge -y remove 'cuda*'
sudo apt-get --purge -y remove 'nvidia*'
sudo reboot
It removes any installed cuda and nvidia packages and then you can install any specific version that you like from:
https://developer.nvidia.com/cuda-toolkit-archive.
To add up on mhaghighat's answer.
You can do this.
sudo apt purge -y '*cuda*'
sudo apt purge -y '*cudnn*'
reboot
Since you only asked about removing cuda, I assume you dont need to reinstall nvidia, so no need to remove that. Beware purge is a powerful command, use it with caution.

Resources