nvidia-smi executable file not found - docker

I have went through 3 different issues in the nvidia-docker repo about this exact problem but actually couldn't figure out what's wrong.
I'm a heavy docker user but I don't understand much of the terminology and solution used in those issues.
When I run nvidia-smi as sudo or not, everything works great and I get the standard output.
My nvidia-docker-plugin is up and running, and I get these messages when I run nvidia-docker run --rm nvidia/cuda nvidia-smi:
nvidia-docker-plugin | 2017/11/04 09:14:18 Received mount request for volume 'nvidia_driver_387.22'
Blockquote
nvidia-docker-plugin | 2017/11/04 09:14:18 Received unmount request for volume 'nvidia_driver_387.22'
I also tried to run the deepo repository, can't get it to work as all my containers exit upon starting, and the nvidia-docker run --rm nvidia/cuda nvidia-smi outputs the error:
container_linux.go:247: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH"
/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH".
What am I doing wrong?
I run Fedora 26, if it makes any difference

On Ubuntu, you should install nvidia-modprobe package. I understand that also exists in Fedora. For some reason, this dependency isn't required either documented.

I've just solved this.
Removing the volume related to nvidia-docker-plugin solved the issue.
For future readers, just read out the log messages on your nvidia-docker-plugin, look for the mount/unmount logged lines, and use the following command to remove the volume
docker volume rm -f <volume_to_remove> where volume_to_remove should be something like nvidia_driver_387.22 (which matched my case)
Seems like the issue is that the mapping to the nvidia-smi call is made upon the volume creation and removing and reattaching the volume fixes this

Related

docker: Error response from daemon: OCI runtime create failed: invalid mount

I am trying to create an Anonymous Volume with docker but it throws this error:
docker: Error response from daemon: OCI runtime create failed: invalid mount {Destination:C:/Program Files/Git/app/node_modules Type:bind Source:/var/lib/docker/volumes/51c96f13f0232b1d052a91fdb0d8ed60881420ee214aa46ae85e16dfa4bbece0/_data Options:[rbind]}: mount destination C:/Program Files/Git/app/node_modules not absolute: unknown.
I came across this issue today while running hyperledger fabric on windows OS, it seems to be a mounting issue. The error went away when I ran the below command on git bash.
export MSYS_NO_PATHCONV=1
Firstly you must open up the command prompt (powershell/cmd) at your working directory, if feedback is your working directory open up the cmd at feedback and then, in windows, in the powershell/cmd use:
docker run -p (localhost port):(container port) -v %cd%:/app
instead of $(pwd) use %cd% it worked for me. I probably think the issue is with gitbash, you could use powershell instead, I came across the exact same problem just now and gitbash gave the exact same error.
I got a permanent fix for windows, docker volumes has an issue in windows https://forums.docker.com/t/volume-mounts-in-windows-does-not-work/10693/7. So we could use:
docker run -p 8080:3000 -v /app/node_modules -v //d/Desktop/Docker/react-app/front-end:/app
if your files where in c drive use //c/Desktop/Docker/react-app/front-end:/app instead of //d/Desktop/Docker/react-app/front-end:/app
And yeah remember to use powershell not gitbash it also has an issue.
If your using a react application make sure that you have .env file with CHOKIDAR_USEPOLLING=true
Also, be sure you are running the docker command in the same folder where the docker-compose.yml lives. Otherwise, the docker command can execute fine until it reaches the point where it loads the volume and then trying to access the relative path might cause it to throw the above error.

"ERRO[0002] error waiting for container: context canceled" issue with sshfs plugin

I have two Ubuntu 18.04 servers running. I installed sshfs plugin using the command -
docker plugin install --grant-all-permissions vieux/sshfs
Created a volume -
docker volume create --driver vieux/sshfs -o sshcmd=<user>#<ip>:/home/<user>/test -o password=<passsword> sshvolume
Now when I'm trying to mount the volume using any method (e.g docker run --rm -v sshvolume:/test busybox touch /test/boom), I'm getting this error -
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"rootfs_linux.go:58: mounting \\\"/var/lib/docker/231072.231072/plugins/81f1f27082956d94e7f28a862687bec7d52cb25de49ecb43859d1a006710d0ec/propagated-mount/456f85ae480f26df582b897cb955d44e\\\" to rootfs \\\"/var/lib/docker/231072.231072/overlay2/ddf7de72af831594b09f7e09d5ff314877d6df22629cde6ae4a7e1e00b16f525/merged\\\" at \\\"/test\\\" caused \\\"stat /var/lib/docker/231072.231072/plugins/81f1f27082956d94e7f28a862687bec7d52cb25de49ecb43859d1a006710d0ec/propagated-mount/456f85ae480f26df582b897cb955d44e: permission denied\\\"\"": unknown.
ERRO[0002] error waiting for container: context canceled
Does anyone know have any idea about what might be going wrong here?
I couldn't find the exact reason behind this issue. But, here's how I fixed it (NOT A SOLUTION) -
I backed up all the necessary volumes, and config files.
Purged docker-ce docker-ce-cli containerd.io
Removed any trailing files/directories (Use find / -iname docker to find those)
Rebooted
Reinstalled the previous packages
Restored all my backups
If you still face the issue, try inspecting the files you backed up previously. If anyone knows the exact reason, or has any idea why the error might've surfaced, please let me know.
Again, THIS IS NOT A VIABLE SOLUTION. If anything, it's a workaround.

docker: permissions error for mounted volume

I've built a docker image from a Dockerfile available on Github. I'm running it with the recommended command:
docker run -i --rm -v </my/local/path>:</path_inside_container>:ro -v </another/local/path>:</another_path_inside_container>:ro <image_name> <...more arguments...>
However, I immediately get the below error:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"/path_inside_container\": permission denied": unknown.
I've looked online for a solution, but most posts involve restricting the permissions of mounted volumes. The closest I have found is this, where the problem was with a script. It was solved by setting execute permissions for the script with chmod. My case is different, but I tried setting the permissions of the local directory to '777' with chmod just in case; this did not work. The problem is with the directory created inside the container.
If anyone has a suggestion regarding what could be wrong, or can point me to resources that explain where/how the permissions for mounted volumes inside containers are set, it would be a big help. Thank you!

docker cannot start container: "permission denied"

Upon starting a docker container, I get the following error:
standard_init_linux.go:175: exec user process caused "permission denied"
sudo does not fix it. I have all permissions.
docker-compose only shows the container crashing in the same way.
I use Linux and the Dockerfile is on a cifs-share. Starting from a locally mounted drive, everything works.
As hinted at here the filesystem is no-exec. I.e. executing scripts or binaries from there is not allowed. You can test that by finding e.g. a shellscript, checking it has the execute bit set with ls -l and then try to run it. Furthermore, looking at the mount-parameters can reveal the problem:
//nas.local/home on /cifs/h type cifs ( <lots of options omitted> , noexec)
Interestingly, the command that mounted the share did not explicitly request noexec. However the mount came out this way anyway. Adding -o exec to the mounting command and remounting fixed it.
I solved the issue by changing the top line in train file.
It was
#!/usr/bin/env python
I changed it to:
#!/usr/bin/env python3.5
depending on what version of python i had installed.

Cloudera quickstart docker: unable to run/start the container

I am using windows 10 machine, with Docker on windows, and pulled cloudera-quickstart:latest image. while trying to run it, I am getting into below error.
can someone please suggest.
docker: Error response from daemon: oci runtime error: container_linux.go:262: starting container process caused "exec: \"/usr/bin/docker-quickstart\": stat /usr/bin/docker-quickstart: no such file or directory"
my run command:
docker run --hostname=quickstart.cloudera --privileged=true -t -i cloudera/quickstart /usr/bin/docker-quickstart
The issue was that I have download docker separately and created the image with this command, which is not supported in cloudera 5.10 and above.
tar xzf cloudera-quickstart-vm-*-docker.tar.gz
docker import - cloudera/quickstart:latest < cloudera-quickstart-vm--docker/.tar
so I finally removed the docker image and then pulled it properly
docker pull cloudera/quickstart:latest
now docker is properly up and running.
If you had downloaded CDH v5.13 docker image, then the issue might be mostly due to the structure of the image archive; in my case, I found it to be clouder*.tar.gz > cloudera*.tar > cloudera*.tar ! Seems the packaging was done by fault and the official documentation too doesn't capture this :( In which case, just perform one more level of extraction to get to the correct cloudera*.tar archieve. This post from the cloudera forum helped me.

Resources