Description of the issue
Context information (for bug reports)
Output of docker-compose version
docker-compose version 1.17.1, build unknown
docker-py version: 2.5.1
CPython version: 2.7.17
OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
Output of docker version
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Fri Dec 18 12:21:44 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Thu Dec 10 13:23:49 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.3-0ubuntu1~18.04.4
GitCommit:
runc:
Version: spec: 1.0.1-dev
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
Output of docker-compose config
(Make sure to add the relevant -f and other flags)
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
Steps to reproduce the issue
Creating a Dockerfile with a simple pull of nvidia cuda image and a command to check the nvidia-gpu
FROM nvidia/cuda:10.2-base
CMD nvidia-smi
2.Works like a charm when we build the image and run it without docker compose
docker image build testserver/ -t testserverimage
docker run --gpus all -exec -it testserverimage
Shows the nvidia-gpu devices
Sat Feb 20 13:10:46 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00001918:00:00.0 Off | 0 |
| N/A 52C P0 71W / 149W | 7897MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Now creating the docker-compose.yml
version: "3.5"
services:
testserver:
image: nvidia/cuda:10.2-base
build: './modelserver'
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
driver: nvidia
Observed result
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
Expected result
Sat Feb 20 13:10:46 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00001918:00:00.0 Off | 0 |
| N/A 52C P0 71W / 149W | 7897MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Stacktrace / full error message
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
Additional information
OS version / distribution, docker-compose install method, etc.
OS Information:
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Docker compose installation:
sudo apt install docker-compose
In the documentation https://docs.docker.com/compose/gpu-support/#enabling-gpu-access-to-service-containers :
Docker Compose v1.28.0+ allows to define GPU reservations using the device structure defined in the Compose Specification.
Your docker-compose version is 1.17.1, so you need to upgrade your docker-compose to, at least, 1.28.0.
Related
I am running node-red in docker compose and from the gitlab-cli file I am calling docker/compose image and my pipeline is working and I can see this:
node-red | 11 Nov 11:28:51 - [info]
462node-red |
463node-red | Welcome to Node-RED
464node-red | ===================
465node-red |
466node-red | 11 Nov 11:28:51 - [info] Node-RED version: v3.0.2
467node-red | 11 Nov 11:28:51 - [info] Node.js version: v16.16.0
468node-red | 11 Nov 11:28:51 - [info] Linux 5.15.49-linuxkit x64 LE
469node-red | 11 Nov 11:28:52 - [info] Loading palette nodes
470node-red | 11 Nov 11:28:53 - [info] Settings file : /data/settings.js
471node-red | 11 Nov 11:28:53 - [info] Context store : 'default' [module=memory]
472node-red | 11 Nov 11:28:53 - [info] User directory : /data
473node-red | 11 Nov 11:28:53 - [warn] Projects disabled : editorTheme.projects.enabled=false
474node-red | 11 Nov 11:28:53 - [info] Flows file : /data/flows.json
475node-red | 11 Nov 11:28:53 - [warn]
476node-red |
477node-red | ---------------------------------------------------------------------
478node-red | Your flow credentials file is encrypted using a system-generated key.
479node-red |
480node-red | If the system-generated key is lost for any reason, your credentials
481node-red | file will not be recoverable, you will have to delete it and re-enter
482node-red | your credentials.
483node-red |
484node-red | You should set your own key using the 'credentialSecret' option in
485node-red | your settings file. Node-RED will then re-encrypt your credentials
486node-red | file using your chosen key the next time you deploy a change.
487node-red | ---------------------------------------------------------------------
488node-red |
489node-red | 11 Nov 11:28:53 - [info] Server now running at http://127.0.0.1:1880/
490node-red | 11 Nov 11:28:53 - [warn] Encrypted credentials not found
491node-red | 11 Nov 11:28:53 - [info] Starting flows
492node-red | 11 Nov 11:28:53 - [info] Started flows
but when I am trying to open the localhost server to access the node-red or the dashboard, I am getting the error "Failed to open page"
This is my docker-compose.yml
version: "3.7"
services:
node-red:
image: nodered/node-red:latest
user: '1000'
container_name: node-red
environment:
- TZ=Europe/Amsterdam
ports:
- "1880:1880"
This is my .gitlab-cli.yml
yateen-docker:
stage: build
image:
name: docker/compose
services:
- docker:dind
variables:
DOCKER_HOST: tcp://docker:2375/
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: ""
before_script:
- echo "$CI_REGISTRY_PASSWORD" | docker login -u $CI_REGISTRY_USER $CI_REGISTRY --password-stdin
script:
- docker-compose up
only:
- main
Any help!
I tried to create the node-red docker via docker-compose not just by running docker run command. Though my node-red image is running but I can't access the server page
I've installed nvidia-container-runtime on my machine (Ubuntu 22.04), and can access the GPU through docker run.
docker run -it --rm --gpus all selenium/node-chrome:3.141.59 nvidia-smi
Mon Oct 24 00:32:32 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:0A:00.0 Off | N/A |
| 0% 41C P8 44W / 370W | 68MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
However, when running with the following docker-compose.yml, nvidia-smi can't be found. Applications inside the container don't seem to be using the GPU either.
version: "3.8"
services:
nvidia:
image: selenium/node-chrome:3.141.59
runtime: nvidia
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command:
["nvidia-smi"]
Running docker-compose up
[+] Running 1/0
⠿ Container docker-compose-gpu-nvidia-1 Recreated 0.0s
Attaching to docker-compose-gpu-nvidia-1
Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown
If I swap the selenium image to nvidia/cuda, docker-compose can see the GPU. Why is the GPU accessible in docker run but not docker-compose?
Specifying the driver & count fixed this.
version: "3.8"
services:
nvidia:
image: selenium/node-chrome:3.141.59
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
command:
["nvidia-smi"]
I'm not sure why this worked - the docs seem to indicate that omitting these will just use all available GPUs.
I start docker container with IBM MQ, but i can't connect to it. Containers log always identical:
[+] Running 1/0
- Container ibm-ibm-mq-1 Created 0.0s
Attaching to ibm-ibm-mq-1
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z CPU architecture: amd64
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Linux kernel version: 5.10.16.3-microsoft-standard-WSL2
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Container runtime: docker
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Base image: Red Hat Enterprise Linux 8.2 (Ootpa)
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Running as user ID 1001 with primary group 0
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Capabilities (bounding set): chown,dac_override,fowner,fsetid,kill,setgid,setuid,setpcap,net_bind_service,net_raw,sys_chroot,mknod,audit_write,setfcap
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z seccomp enforcing mode: filtering
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Process security attributes: none
ibm-ibm-mq-1 | 2022-07-27T13:28:33.061Z Detected 'ext4' volume mounted to /mnt/mqm
ibm-ibm-mq-1 | 2022-07-27T13:28:33.118Z Using queue manager name: QM1
ibm-ibm-mq-1 | 2022-07-27T13:28:33.122Z Created directory structure under /var/mqm
ibm-ibm-mq-1 | 2022-07-27T13:28:33.122Z Image created: 2020-05-27T11:03:04+00:00
ibm-ibm-mq-1 | 2022-07-27T13:28:33.122Z Image tag: ibm-mqadvanced-server-dev:9.1.5.0-r2-amd64
ibm-ibm-mq-1 | 2022-07-27T13:28:33.129Z MQ version: 9.1.5.0
ibm-ibm-mq-1 | 2022-07-27T13:28:33.129Z MQ level: p915-ifix-L200325.DE
ibm-ibm-mq-1 | 2022-07-27T13:28:33.129Z MQ license: Developer
Last string always: MQ license: Developer
My docker-compose.yml file below. I tried another version of IBM MQ and result was the same.
version: "3.7"
services:
ibm-mq:
image: ibmcom/mq:9.1.5.0-r2
networks:
- mq-demo-network
volumes:
- "qm1data:/mnt/mqm"
ports:
- "1414:1414"
- "9443:9443"
environment:
- LICENSE=accept
- MQ_QMGR_NAME=QM1
volumes:
qm1data:
networks:
mq-demo-network:
I work on Windows 10. Yesterday I have started IBM MQ container on this machine once and all was fine, but today something wrong. I tried to do it on another machine (Windows 10, identical Docker version) and all fine on it too.
This is IBM MQ bug on some new AMD CPUs.
Fortunately, there is workaround. Set ICC_SHIFT=3 in the container environment.
https://github.com/ibm-messaging/mq-container/issues/462
I want to run a container based on python:3.8.8-slim-buster that needs access to the GPU.
When I build it from this Dockerfile:
FROM python:3.8.8-slim-buster
CMD ["sleep", "infinity"]
and then run it with "--gpus all" flag and exec nvidia-smi i get a proper response:
Sat Jun 19 12:26:57 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27 Driver Version: 465.27 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 N/A / N/A | 301MiB / 1878MiB | 14% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
and when I use this docker-compose:
services:
test:
image: tensorflow/tensorflow:2.5.0-gpu
command: sleep infinity
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
and exec nvidia-smi after running it i get the same response.
But when i replace the image in the docker-compose to python:3.8.8-slim-buster like in the Dockerfile, i get this response:
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "nvidia-smi": executable file not found in $PATH: unknown
I appreciate any help figuring this out.
I am trying to run the clara train example, but when I execute the startClaraTrainNoteBooks.sh, the container cannot find the nvidia driver.
I already know that the script executes docker-compose.yml. So I tested whether docker-compose can found the nvidia driver:
services:
test:
image: nvidia/cuda:10.2-base
command: nvidia-smi
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
device_ids: ['0']
Output:
USER#test:~$ docker-compose up
WARNING: Found orphan containers (hp_nvsmi_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
Starting hp_test_1 ... done
Attaching to hp_test_1
test_1 | Mon Jun 7 09:01:44 2021
test_1 | +-----------------------------------------------------------------------------+
test_1 | | NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
test_1 | |-------------------------------+----------------------+----------------------+
test_1 | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
test_1 | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
test_1 | | | | MIG M. |
test_1 | |===============================+======================+======================|
test_1 | | 0 GeForce RTX 206... Off | 00000000:01:00.0 Off | N/A |
test_1 | | 0% 34C P8 17W / 215W | 100MiB / 7979MiB | 0% Default |
test_1 | | | | N/A |
test_1 | +-------------------------------+----------------------+----------------------+
test_1 |
test_1 | +-----------------------------------------------------------------------------+
test_1 | | Processes: |
test_1 | | GPU GI CI PID Type Process name GPU Memory |
test_1 | | ID ID Usage |
test_1 | |=============================================================================|
test_1 | +-----------------------------------------------------------------------------+
hp_test_1 exited with code 0
But the startClaraTrainNoteBooks.sh cna not find it.
root#claratrain:/claraDevDay# nvidia-smi
root#claratrain:/claraDevDay#
Actually, startDocker.sh can find the driver.
root#c7c2d5597eb8:/claraDevDay# nvidia-smi
Mon Jun 7 09:11:43 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 206... Off | 00000000:01:00.0 Off | N/A |
| 0% 35C P8 17W / 215W | 100MiB / 7979MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
root#c7c2d5597eb8:/claraDevDay#
What should I do?
The docker-compose.yml script need to rewrite like this and working:
# SPDX-License-Identifier: Apache-2.0
version: "3.8"
services:
claratrain:
container_name: claradevday-pt
hostname: claratrain
##### use vanilla clara train docker
#image: nvcr.io/nvidia/clara-train-sdk:v4.0
##### to build image with GPU dashboard inside jupyter lab
build:
context: ./dockerWGPUDashboardPlugin/ # Project root
dockerfile: ./Dockerfile # Relative to context
image: clara-train-nvdashboard:v4.0
depends_on:
- tritonserver
ports:
- "3030:8888" # Jupyter lab port
- "3031:5000" # AIAA port
ipc: host
volumes:
- ${TRAIN_DEV_DAY_ROOT}:/claraDevDay/
- /raid/users/aharouni/data:/data/
command: "jupyter lab /claraDevDay --ip 0.0.0.0 --allow-root --no-browser --config /claraDevDay/scripts/jupyter_notebook_config.py"
# command: tail -f /dev/null
# tty: true
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [ gpu ]
# To specify certain GPU uncomment line below
#device_ids: ['0,3']
#############################################################
tritonserver:
image: nvcr.io/nvidia/tritonserver:21.02-py3
container_name: aiaa-triton
hostname: tritonserver
restart: unless-stopped
command: >
sh -c "chmod 777 /triton_models &&
/opt/tritonserver/bin/tritonserver \
--model-store /triton_models \
--model-control-mode="poll" \
--repository-poll-secs=5 \
--log-verbose ${TRITON_VERBOSE}"
volumes:
- ${TRAIN_DEV_DAY_ROOT}/AIAA/workspace/triton_models:/triton_models
# shm_size: 1gb
# ulimits:
# memlock: -1
# stack: 67108864
# logging:
# driver: json-file