Docker compose: can't access GPU from compose (but can from run) - docker

I've installed nvidia-container-runtime on my machine (Ubuntu 22.04), and can access the GPU through docker run.
docker run -it --rm --gpus all selenium/node-chrome:3.141.59 nvidia-smi
Mon Oct 24 00:32:32 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:0A:00.0 Off | N/A |
| 0% 41C P8 44W / 370W | 68MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
However, when running with the following docker-compose.yml, nvidia-smi can't be found. Applications inside the container don't seem to be using the GPU either.
version: "3.8"
services:
nvidia:
image: selenium/node-chrome:3.141.59
runtime: nvidia
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command:
["nvidia-smi"]
Running docker-compose up
[+] Running 1/0
⠿ Container docker-compose-gpu-nvidia-1 Recreated 0.0s
Attaching to docker-compose-gpu-nvidia-1
Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown
If I swap the selenium image to nvidia/cuda, docker-compose can see the GPU. Why is the GPU accessible in docker run but not docker-compose?

Specifying the driver & count fixed this.
version: "3.8"
services:
nvidia:
image: selenium/node-chrome:3.141.59
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
command:
["nvidia-smi"]
I'm not sure why this worked - the docs seem to indicate that omitting these will just use all available GPUs.

Related

I start docker container with IBM MQ, but i can't connect to it

I start docker container with IBM MQ, but i can't connect to it. Containers log always identical:
[+] Running 1/0
- Container ibm-ibm-mq-1 Created 0.0s
Attaching to ibm-ibm-mq-1
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z CPU architecture: amd64
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Linux kernel version: 5.10.16.3-microsoft-standard-WSL2
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Container runtime: docker
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Base image: Red Hat Enterprise Linux 8.2 (Ootpa)
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Running as user ID 1001 with primary group 0
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Capabilities (bounding set): chown,dac_override,fowner,fsetid,kill,setgid,setuid,setpcap,net_bind_service,net_raw,sys_chroot,mknod,audit_write,setfcap
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z seccomp enforcing mode: filtering
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Process security attributes: none
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Detected 'ext4' volume mounted to /mnt/mqm
ibm-ibm-mq-1 | 2022-07-27T13:28:33.118Z Using queue manager name: QM1
ibm-ibm-mq-1 | 2022-07-27T13:28:33.122Z Created directory structure under /var/mqm
ibm-ibm-mq-1 | 2022-07-27T13:28:33.122Z Image created: 2020-05-27T11:03:04+00:00
ibm-ibm-mq-1 | 2022-07-27T13:28:33.122Z Image tag: ibm-mqadvanced-server-dev:9.1.5.0-r2-amd64
ibm-ibm-mq-1 | 2022-07-27T13:28:33.129Z MQ version: 9.1.5.0
ibm-ibm-mq-1 | 2022-07-27T13:28:33.129Z MQ level: p915-ifix-L200325.DE
ibm-ibm-mq-1 | 2022-07-27T13:28:33.129Z MQ license: Developer
Last string always: MQ license: Developer
My docker-compose.yml file below. I tried another version of IBM MQ and result was the same.
version: "3.7"
services:
ibm-mq:
image: ibmcom/mq:9.1.5.0-r2
networks:
- mq-demo-network
volumes:
- "qm1data:/mnt/mqm"
ports:
- "1414:1414"
- "9443:9443"
environment:
- LICENSE=accept
- MQ_QMGR_NAME=QM1
volumes:
qm1data:
networks:
mq-demo-network:
I work on Windows 10. Yesterday I have started IBM MQ container on this machine once and all was fine, but today something wrong. I tried to do it on another machine (Windows 10, identical Docker version) and all fine on it too.
This is IBM MQ bug on some new AMD CPUs.
Fortunately, there is workaround. Set ICC_SHIFT=3 in the container environment.
https://github.com/ibm-messaging/mq-container/issues/462

Problem with using GPU with Docker-compose

I want to run a container based on python:3.8.8-slim-buster that needs access to the GPU.
When I build it from this Dockerfile:
FROM python:3.8.8-slim-buster
CMD ["sleep", "infinity"]
and then run it with "--gpus all" flag and exec nvidia-smi i get a proper response:
Sat Jun 19 12:26:57 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27 Driver Version: 465.27 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 N/A / N/A | 301MiB / 1878MiB | 14% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
and when I use this docker-compose:
services:
test:
image: tensorflow/tensorflow:2.5.0-gpu
command: sleep infinity
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
and exec nvidia-smi after running it i get the same response.
But when i replace the image in the docker-compose to python:3.8.8-slim-buster like in the Dockerfile, i get this response:
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "nvidia-smi": executable file not found in $PATH: unknown
I appreciate any help figuring this out.

docker-compose can't found nvidia dirver

I am trying to run the clara train example, but when I execute the startClaraTrainNoteBooks.sh, the container cannot find the nvidia driver.
I already know that the script executes docker-compose.yml. So I tested whether docker-compose can found the nvidia driver:
services:
test:
image: nvidia/cuda:10.2-base
command: nvidia-smi
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
device_ids: ['0']
Output:
USER#test:~$ docker-compose up
WARNING: Found orphan containers (hp_nvsmi_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
Starting hp_test_1 ... done
Attaching to hp_test_1
test_1 | Mon Jun 7 09:01:44 2021
test_1 | +-----------------------------------------------------------------------------+
test_1 | | NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
test_1 | |-------------------------------+----------------------+----------------------+
test_1 | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
test_1 | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
test_1 | | | | MIG M. |
test_1 | |===============================+======================+======================|
test_1 | | 0 GeForce RTX 206... Off | 00000000:01:00.0 Off | N/A |
test_1 | | 0% 34C P8 17W / 215W | 100MiB / 7979MiB | 0% Default |
test_1 | | | | N/A |
test_1 | +-------------------------------+----------------------+----------------------+
test_1 |
test_1 | +-----------------------------------------------------------------------------+
test_1 | | Processes: |
test_1 | | GPU GI CI PID Type Process name GPU Memory |
test_1 | | ID ID Usage |
test_1 | |=============================================================================|
test_1 | +-----------------------------------------------------------------------------+
hp_test_1 exited with code 0
But the startClaraTrainNoteBooks.sh cna not find it.
root#claratrain:/claraDevDay# nvidia-smi
root#claratrain:/claraDevDay#
Actually, startDocker.sh can find the driver.
root#c7c2d5597eb8:/claraDevDay# nvidia-smi
Mon Jun 7 09:11:43 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 206... Off | 00000000:01:00.0 Off | N/A |
| 0% 35C P8 17W / 215W | 100MiB / 7979MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
root#c7c2d5597eb8:/claraDevDay#
What should I do?
The docker-compose.yml script need to rewrite like this and working:
# SPDX-License-Identifier: Apache-2.0
version: "3.8"
services:
claratrain:
container_name: claradevday-pt
hostname: claratrain
##### use vanilla clara train docker
#image: nvcr.io/nvidia/clara-train-sdk:v4.0
##### to build image with GPU dashboard inside jupyter lab
build:
context: ./dockerWGPUDashboardPlugin/ # Project root
dockerfile: ./Dockerfile # Relative to context
image: clara-train-nvdashboard:v4.0
depends_on:
- tritonserver
ports:
- "3030:8888" # Jupyter lab port
- "3031:5000" # AIAA port
ipc: host
volumes:
- ${TRAIN_DEV_DAY_ROOT}:/claraDevDay/
- /raid/users/aharouni/data:/data/
command: "jupyter lab /claraDevDay --ip 0.0.0.0 --allow-root --no-browser --config /claraDevDay/scripts/jupyter_notebook_config.py"
# command: tail -f /dev/null
# tty: true
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [ gpu ]
# To specify certain GPU uncomment line below
#device_ids: ['0,3']
#############################################################
tritonserver:
image: nvcr.io/nvidia/tritonserver:21.02-py3
container_name: aiaa-triton
hostname: tritonserver
restart: unless-stopped
command: >
sh -c "chmod 777 /triton_models &&
/opt/tritonserver/bin/tritonserver \
--model-store /triton_models \
--model-control-mode="poll" \
--repository-poll-secs=5 \
--log-verbose ${TRITON_VERBOSE}"
volumes:
- ${TRAIN_DEV_DAY_ROOT}/AIAA/workspace/triton_models:/triton_models
# shm_size: 1gb
# ulimits:
# memlock: -1
# stack: 67108864
# logging:
# driver: json-file

" 'devices' properties is not allowed" while creating docker-compose with nvidia gpu

Description of the issue
Context information (for bug reports)
Output of docker-compose version
docker-compose version 1.17.1, build unknown
docker-py version: 2.5.1
CPython version: 2.7.17
OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
Output of docker version
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Fri Dec 18 12:21:44 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Thu Dec 10 13:23:49 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.3-0ubuntu1~18.04.4
GitCommit:
runc:
Version: spec: 1.0.1-dev
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
Output of docker-compose config
(Make sure to add the relevant -f and other flags)
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
Steps to reproduce the issue
Creating a Dockerfile with a simple pull of nvidia cuda image and a command to check the nvidia-gpu
FROM nvidia/cuda:10.2-base
CMD nvidia-smi
2.Works like a charm when we build the image and run it without docker compose
docker image build testserver/ -t testserverimage
docker run --gpus all -exec -it testserverimage
Shows the nvidia-gpu devices
Sat Feb 20 13:10:46 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00001918:00:00.0 Off | 0 |
| N/A 52C P0 71W / 149W | 7897MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Now creating the docker-compose.yml
version: "3.5"
services:
testserver:
image: nvidia/cuda:10.2-base
build: './modelserver'
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
driver: nvidia
Observed result
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
Expected result
Sat Feb 20 13:10:46 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00001918:00:00.0 Off | 0 |
| N/A 52C P0 71W / 149W | 7897MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Stacktrace / full error message
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
Additional information
OS version / distribution, docker-compose install method, etc.
OS Information:
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Docker compose installation:
sudo apt install docker-compose
In the documentation https://docs.docker.com/compose/gpu-support/#enabling-gpu-access-to-service-containers :
Docker Compose v1.28.0+ allows to define GPU reservations using the device structure defined in the Compose Specification.
Your docker-compose version is 1.17.1, so you need to upgrade your docker-compose to, at least, 1.28.0.

How to connect to Docker bridge network from host?

For a test project, I want to have the following network connectivity
+---------------------------+
| redis-service-1 in docker |
+-----------+ B +-----------+
| R |
+--------+ | I |
| Host | ---> | D |
+--------+ | G |
| E |
+-----------+ +-----------+
| redis-service-2 in docker |
+---------------------------+
What I want to achieve is for my app running on Host it should be able to connect to both of the Redis services running in two docker containers using the DNS routing given by docker.
docker-compose.yml looks like the following:
version: '3.6'
services:
redis-service-1:
image: redis:5.0
networks:
- overlay
redis-service-2:
image: redis:5.0
networks:
- overlay
Ideally after docker-compose up on the host I want to be able to ping both of the containers as following:
> redis-cli -h redis-service-1 ping
> redis-cli -h redis-service-2 ping
But I am unable to connect to these containers and Redis inside them from the host machine.

Resources