docker version 18.09 version of --gpus all - docker

I'm trying to run a gpu-enabled container on a server with docker 18.09.5 installed. It's a shared server so I can't just upgrade the docker version.
I have a private server with docker 19.03.12 and the following works fine:
docker pull vistart/cuda
docker run --name somename --gpus all -it --shm-size=10g -v /dataloc:/mountedData vistart/cuda /bin/sh
nvidia-smi
yields: expected gpu stats
When I try this on the server with docker 18.09:
docker pull vistart/cuda
docker run --name somename --gpus all -it --shm-size=10g -v /dataloc:/mountedData
yields:
unknown flag: --gpus-all
See 'docker run --help'.
docker run --name somename -it --shm-size=10g -v /dataloc:/mountedData
works but..
nvidia-smi yields:
/bin/sh: 1: nvidia-smi: not found
Is there some v18.09 version of --gpus all that would work?
I've tried with nvidia-docker:
nvidia-docker run --name somename -it --shm-size=10g -v /dataloc:/mountedData
and this yields:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:424: container init caused \"process_linux.go:407: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=#/sbin/ldconfig --device=all --compute --utility --require=cuda>=11.0 brand=tesla,driver>=384,driver<385 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 brand=tesla,driver>=440,driver<441 brand=tesla,driver>=450,driver<451 --pid=3030 /local/var_local/nobackup/docker/overlay2/d096e63d0a34537f04cbafeb1b6c3315b4e6f0ff15e3e2cb30057f549dc75cb5/merged]\\\\nnvidia-container-cli: requirement error: unsatisfied condition: brand = tesla\\\\n\\\"\"": unknown.
Looks like the share is running CUDA 10.1 so it's not hitting the cuda>-11.0 req...

From docker 19.03 onwards, you can use:
docker run --gpus all myimage
For previous versions, you would use nvidia-docker like this:
nvidia-docker run myimage

Related

Operation not permitted for TUNSETIFF

i am trying to open a TUN device and using ioctl with operation code TUNSETIFF and getting operation not permitted error.
environment
PRETTY_NAME="Ubuntu 22.04.1 LTS"
$ docker --version
Docker version 20.10.17, build 100c701
Python 3.10.6
using following command to run the container
docker run --rm -it --network host --cap-add=NET_ADMIN --device=/dev/net/tun ubuntutest bash -c "tuntaptest.py"
i have tried following options
docker run --rm -it --network host --privileged docker run --rm -it --network host --cap-add=SYS_ADMIN nothing has worked so far
code snapshot
TUNSETIFF: int = 0x400454ca
IFF_TUN: int = 0x0001
IFF_NO_PI: int = 0x1000
tun = open('/dev/net/tun', 'r+b', buffering=0)
ifr: bytes = struct.pack('!16sH', bytes('tun0', 'utf-8'), IFF_TUN | IFF_NO_PI)
fcntl.ioctl(self.tun, TUNSETIFF, ifr)

How to use CUDA_VISIBLE_DEVICES to set --gpus argument of docker run cmd?

The docker run cmd docs show an example of how to specify several (but not all) gpus:
docker run -it --rm --gpus '"device=0,2"' nvidia-smi
I'd like to set the --gpus to use those indicated by the environment variable CUDA_VISIBLE_DEVICES.
I tried the obvious
docker run --rm -it --env CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES --gpus '"device=$CUDA_VISIBLE_DEVICES"' some_repo:some_tag /bin/bash
But this gives the error:
docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: device error: $CUDA_VISIBLE_DEVICES: unknown device: unknown.
Note: currently CUDA_VISIBLE_DEVICES=0,1
I saw a github issue about this, but the solution is a bit messy and didn't work for me.
What is a good way to use CUDA_VISIBLE_DEVICES to set --gpus argument of docker run cmd?
The single quotes in '"device=$CUDA_VISIBLE_DEVICES"' prevent the expansion of the variable. Try without the single quotes.
docker run --rm -it \
--env CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES \
--gpus device=$CUDA_VISIBLE_DEVICES some_repo:some_tag /bin/bash

Running a container with a Docker bind-mount causes container to return Node version and exit

I am trying to attach a directory of static assets to my docker instance after it has been built. When I do something like this
docker run -it app /bin/bash
The container runs perfectly fine. However, if I do something like this:
docker run -it app -v "${PWD}/assets:/path/to/empty/directory" /bin/bash
This also reproduces it:
docker run -it node:12.18-alpine3.12 -v "${PWD}/assets:/path/to/empty/directory" /bin/bash
It spits out the version of Node v12.18.4 I am using and immediately dies. Where am I going wrong? I am using docker with wsl2 on windows 10. Is it due to filesystem incompatibility?
edit: whoops it's spitting out the node version and not the alpine version
To debug my issue I tried running a bare-bones alpine container:
docker run -it alpine:3.12 -v "${PWD}/assets:/usr/app" /bin/sh
Which gave a slightly more useful error message:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"-v\": executable file not found in $PATH": unknown.
From this I realized that docker was trying to run -v as a starting command. I decided to change the order around, things started working.
TL;DR The -v argument and its corresponding parameter must be placed before the container name when performing a docker run command. i.e. the following works
docker run -it -v "${PWD}/assets:/usr/app" alpine:3.12 /bin/sh
but this doesn't:
docker run -it alpine:3.12 -v "${PWD}/assets:/usr/app" /bin/sh

Docker Windows: opening a Path starting at C: (docker: Error response from daemon: mkdir C:: permission denied.)

I use Docker on my Windows system and run following commands in the Windows Subsystem for Linux:
docker run --name selenium-container -d -p 4444:4444 -p 5900:5900 -v C:/Users/Alexa/OneDrive/Backend-web-architecture/github-repos/data-privacy-api:/dev/shm selenium/standalone-chrome-debug
This yields:
docker: Error response from daemon: mkdir C:: permission denied.
Since I was in the data-privacy-api folder, I also tried:
docker run --name selenium-container -d -p 4444:4444 -p 5900:5900 -v "$(pwd):/dev/shm selenium/standalone-chrome-debug"
However, this lead to following error:
"docker run" requires at least 1 argument.
See 'docker run --help'.
Usage: docker run [OPTIONS] IMAGE [COMMAND] [ARG...]
Run a command in a new container

Docker: tendermint container not work

My OS is Windows 10 and docker version 17.12.0-ce, build c97c6d6.
Here is my plan:
0. Get containers
docker pull tendermint/tendermint
docker pull tendermint/monitor
1. Init container
docker run --rm -p 46657:46657 --name tendermint_bc -v "C:/Users/user/sandbox/tendermind/tmdata:/tendermint" tendermint/tendermint init
2. Start container
docker run --rm -d -v "C:/Users/user/sandbox/tendermind/tmdata:/tendermint" tendermint/tendermint node --proxy_app=dummy
3. Start tendemint monitor
docker run -it --rm --link=tm tendermint/monitor tendermint_bc:46657
By start of tendermint container I see only one hash, but by docker ps -a container is not listed.
If I run docker logs tendermint_bc, result is:
Error response from daemon: No such container: tendermint_bc
Same workflow on Unix work fine.
Thx for help.
In step 1, you are initializing Tendermint, but not running it. To run it, execute:
docker run --rm -p 46657:46657 --name tendermint_bc -v "C:/Users/user/sandbox/tendermind/tmdata:/tendermint" tendermint/tendermint node --proxy_app=dummy

Resources