NVIDIA-SMI failed. Could'nt communicate with Nvidia driver [duplicate] - driver

This question already has answers here:
Error: NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver
(4 answers)
Closed 2 years ago.
I am running a cloud instance on a gpu node. I installed CUDA and nvidia-smi showed the driver details, memory utlilization. After a couple of days, I face this error
"NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running".
I installed the latest driver (Nvidia-375.39 for Tesla M40 Gpus). I still face the same issue. Is there any way to
i) debug why nvidia-smi is not able to communicate with the driver?
ii)check if the driver is running properly.

This is an operating system issue. The solution will depend on your operating system. For example, if you are running Ubuntu 16 the solution might be something like this:
Uninstall / purge all Nvidia drivers
sudo apt-get remove --purge nvidia* && sudo apt autoremove
Download Nvidia driver from Nvidia's website (.run file)

I met the same question as you, I solved this by modify the security option, the step is when I reboot the system,enter the bios,modify secure boot option as disabled,then reboot,It is ok!

Related

Is it possible to install docker on Sparc machines?

I have a M5000 Sparc server which I have installed a solaris Os version 11.3 (SunOS RT5 5.11 11.3 sun4u sparc SUNW,SPARC-Enterprise) on it. Is it possible to install docker-ce on this machine? I have tested some workaround like using MobyProject (opensource docker) but non of them helped. What is the solution for dockerization on sparc systems?
There currently is not any support of Docker on Solaris. Oracle's container solution for Solaris has been Oracle Solaris Zones. There has been talk of supporting Docker on Solaris according to the Docker community forums, but there hasn't been any recent update on the actual timeline/roadmap. Please check out the Docker Community Forum thread

How did Docker know to emulate arm architecture?

This was a huge surprise for me:
Today, using Docker For Mac (18.03.1-ce-mac65), I ran a Debian Stretch image. Inside the image I mounted the latest Raspbian Stretch image (2018-04-18-raspbian-stretch-lite) using mount. I then used chroot to this mounted Raspbian filesystem.
This is where it got weird. I was able to use apt (without any special modifications) to install software into this mounted filesystem.
Running:
dpkg --print-architecture
returned: armfh
and the software I installed (vim) worked like a charm
I was even able to compile a simple program using gcc and run it.
But, I need to know! How is this possible?
According to Docker:
Docker for Mac provides binfmt_misc multi architecture support, so you can run containers for different Linux architectures, such as arm, mips, ppc64le, and even s390x.
EDIT
On Linux, you can install qemu-user-static and then follow this git repo to get cross-architecture support!

Is it possible to install Docker container in Windows -32 bit systems?

I am getting error as Docker needs 64bit configuration, is there any way where i can install docker in 32 bit systems. It's a 32 bit i3 processor having windows 7 machine.
As per the document of docker, it says that you can install it only on 64 bit machine.
But there is another solution if you don't have a 64 bit machine. Just use the below url to play with docker (it's official site) it works like a charm
Docker play
Edit 1: Please note that this is only for practise purpose, Once your session is over your work will be lost.
Edit 2: I have found a very interesting site to practise docker kodekloud
The first step to getting this whole setup to work is installing Oracle’s VirtualBox on the host system. Once the installation is complete, installing docker-machine is as simple as running the following in an Administrative PowerShell session:
choco install docker-machine -y
docker-machine create --driver virtualbox default
docker-machine env | Invoke-Expression
For more details You can follow this blog.
No, this is not possible for current versions of the docker.
This was possible for early versions of docher (boot2docker 32bit iso), but the project is closed and thoroughly killed

Docker hanging requiring reboot

We are running docker 1.7.1, build 786b29d on RHEL 6.7. Recently we have had multiple times when the docker daemon locked up and we had to reboot the machine to get it back.
A typical scenario is that a container that has been running fine for weeks suddenly starts throwing errors. Sometime we can restart the container and all is well. But other times all docker commands will hang, and restarting the daemon fails, and I see this in a ps:
4 Z root 4895 1 0 80 0 - 0 exit Aug23 ? 00:01:24 [docker]
Looking in the system log I've seen this:
device-mapper: ioctl: unable to remove open device docker-253:6-1048578-317bb6ad40cded3fbfd752d95551861c2e4ef08dffc1186853fea0e85da6b12b
INFO: task docker:16676 blocked for more than 120 seconds.
Not tainted 2.6.32-573.12.1.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
docker D 000000000000000b 0 16676 1 0x00000080
ffff88035ef13ea8 0000000000000082 ffff88035ef13e70 ffff88035ef13e6c
ffff88035ef13e28 ffff88062fc29a00 0000376c85170937 ffff8800283759c0
0000000000000400 00000001039d40c7 ffff8803000445f8 ffff88035ef13fd8
Call Trace:
[] _mutexlock_slowpath+0x96/0x210
[] ? wake_up_process+0x15/0x20
[] mutex_lock+0x2b/0x50
[] sync_filesystems+0x26/0x150
[] sys_sync+0x17/0x40
[] system_call_fastpath+0x16/0x1b
The latest docker version is 1.12.1 and we are on 1.7.1. Can or should I install a new version? 1.7.1 is the version yum installs. If I did want a new version how would I install that (sorry if that is a dumb question, I am not a sys admin).
Googling, I found on this on a Red Hat site "Red Hat does not recommend running any version of Docker on any RHEL 6 releases." We have been running docker on RHEL 6 for a few years, so this confuses me. Upgrading to RHEL 7 is not really an option for us right now.
Can anyone shed any light on these issue? We need docker to work reliably without having to reboot often.
Docker 1.7.1 is really old by today's standards. There have been hundreds of bugs fixed, enhancements to driver stacks, security patches, and valuable features added in the versions since. It looks like you're having a issue with your storage stack, and there is a good chance this is fixed in a newer version.
Docker has stated that default versions in package management systems like yum and apt can be way out of date, and that you should use their repo. The best way to do this is add their Yum repo information to your system so you can install it like other packages. The instructions are here: Installation on Red Hat Enterprise Linux.
Note: This will allow you to install Docker, and the service will be called docker, but the package is docker-engine. This has confused some people in the past.
yum install docker-engine
Docker has also provided a script that does this to make things easier (run as admin/root):
curl -fsSL https://get.docker.com/ | sh
Don't use a RHEL6 based system.
RHEL6 uses a 2.6 kernel with backported fixes to keep Docker working. Docker would normally require a 3.10+ kernel. Docker dropped support for RHEL6 from v1.8 on so it's unlikely there will be any more packages for it.
If you must use RHEL6, don't use the default loopback devicemapper for storage. Setup an LVM thin pool for Docker to use.

Getting Docker to recognize nvidia graphics card on mac

When I am in my container, I run
lspci | grep -i nvidia
and nothing shows.
When I run ./deviceQuery from the samples NVIDIA provides I get
no CUDA-capable device is detected
I know I have a nvidia driver on my mac. I just can't figure out how to get my docker container to realize that.
On OS X, docker is a container running inside a separate virtualbox vm which does not expose the host GPU.
You'll first need to make the graphics card available in the Virtual Box VM. I'm not sure how to do that, but this looks like it might help:
https://www.virtualbox.org/manual/ch04.html#guestadd-video
Once you've got it mounted within the VM, then you can also share it with the container.
I haven't tried this myself, but this guy says that he can run native X11 Apps on a Mac using a beta docker client called Kinematic along with socat, XQuartz, and QGIS, and he seems to imply that NVidia driver issues were thus avoided. This looks worth a try!

Resources