docker-compose can't found nvidia dirver - docker

I am trying to run the clara train example, but when I execute the startClaraTrainNoteBooks.sh, the container cannot find the nvidia driver.
I already know that the script executes docker-compose.yml. So I tested whether docker-compose can found the nvidia driver:
services:
test:
image: nvidia/cuda:10.2-base
command: nvidia-smi
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
device_ids: ['0']
Output:
USER#test:~$ docker-compose up
WARNING: Found orphan containers (hp_nvsmi_1) for this project. If you removed or renamed this service in your compose file, you can run this command with the --remove-orphans flag to clean it up.
Starting hp_test_1 ... done
Attaching to hp_test_1
test_1 | Mon Jun 7 09:01:44 2021
test_1 | +-----------------------------------------------------------------------------+
test_1 | | NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
test_1 | |-------------------------------+----------------------+----------------------+
test_1 | | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
test_1 | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
test_1 | | | | MIG M. |
test_1 | |===============================+======================+======================|
test_1 | | 0 GeForce RTX 206... Off | 00000000:01:00.0 Off | N/A |
test_1 | | 0% 34C P8 17W / 215W | 100MiB / 7979MiB | 0% Default |
test_1 | | | | N/A |
test_1 | +-------------------------------+----------------------+----------------------+
test_1 |
test_1 | +-----------------------------------------------------------------------------+
test_1 | | Processes: |
test_1 | | GPU GI CI PID Type Process name GPU Memory |
test_1 | | ID ID Usage |
test_1 | |=============================================================================|
test_1 | +-----------------------------------------------------------------------------+
hp_test_1 exited with code 0
But the startClaraTrainNoteBooks.sh cna not find it.
root#claratrain:/claraDevDay# nvidia-smi
root#claratrain:/claraDevDay#
Actually, startDocker.sh can find the driver.
root#c7c2d5597eb8:/claraDevDay# nvidia-smi
Mon Jun 7 09:11:43 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 206... Off | 00000000:01:00.0 Off | N/A |
| 0% 35C P8 17W / 215W | 100MiB / 7979MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
root#c7c2d5597eb8:/claraDevDay#
What should I do?

The docker-compose.yml script need to rewrite like this and working:
# SPDX-License-Identifier: Apache-2.0
version: "3.8"
services:
claratrain:
container_name: claradevday-pt
hostname: claratrain
##### use vanilla clara train docker
#image: nvcr.io/nvidia/clara-train-sdk:v4.0
##### to build image with GPU dashboard inside jupyter lab
build:
context: ./dockerWGPUDashboardPlugin/ # Project root
dockerfile: ./Dockerfile # Relative to context
image: clara-train-nvdashboard:v4.0
depends_on:
- tritonserver
ports:
- "3030:8888" # Jupyter lab port
- "3031:5000" # AIAA port
ipc: host
volumes:
- ${TRAIN_DEV_DAY_ROOT}:/claraDevDay/
- /raid/users/aharouni/data:/data/
command: "jupyter lab /claraDevDay --ip 0.0.0.0 --allow-root --no-browser --config /claraDevDay/scripts/jupyter_notebook_config.py"
# command: tail -f /dev/null
# tty: true
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [ gpu ]
# To specify certain GPU uncomment line below
#device_ids: ['0,3']
#############################################################
tritonserver:
image: nvcr.io/nvidia/tritonserver:21.02-py3
container_name: aiaa-triton
hostname: tritonserver
restart: unless-stopped
command: >
sh -c "chmod 777 /triton_models &&
/opt/tritonserver/bin/tritonserver \
--model-store /triton_models \
--model-control-mode="poll" \
--repository-poll-secs=5 \
--log-verbose ${TRITON_VERBOSE}"
volumes:
- ${TRAIN_DEV_DAY_ROOT}/AIAA/workspace/triton_models:/triton_models
# shm_size: 1gb
# ulimits:
# memlock: -1
# stack: 67108864
# logging:
# driver: json-file

Related

Docker compose: can't access GPU from compose (but can from run)

I've installed nvidia-container-runtime on my machine (Ubuntu 22.04), and can access the GPU through docker run.
docker run -it --rm --gpus all selenium/node-chrome:3.141.59 nvidia-smi
Mon Oct 24 00:32:32 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:0A:00.0 Off | N/A |
| 0% 41C P8 44W / 370W | 68MiB / 10240MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
However, when running with the following docker-compose.yml, nvidia-smi can't be found. Applications inside the container don't seem to be using the GPU either.
version: "3.8"
services:
nvidia:
image: selenium/node-chrome:3.141.59
runtime: nvidia
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
command:
["nvidia-smi"]
Running docker-compose up
[+] Running 1/0
⠿ Container docker-compose-gpu-nvidia-1 Recreated 0.0s
Attaching to docker-compose-gpu-nvidia-1
Error response from daemon: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown
If I swap the selenium image to nvidia/cuda, docker-compose can see the GPU. Why is the GPU accessible in docker run but not docker-compose?
Specifying the driver & count fixed this.
version: "3.8"
services:
nvidia:
image: selenium/node-chrome:3.141.59
runtime: nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
command:
["nvidia-smi"]
I'm not sure why this worked - the docs seem to indicate that omitting these will just use all available GPUs.

Problem with using GPU with Docker-compose

I want to run a container based on python:3.8.8-slim-buster that needs access to the GPU.
When I build it from this Dockerfile:
FROM python:3.8.8-slim-buster
CMD ["sleep", "infinity"]
and then run it with "--gpus all" flag and exec nvidia-smi i get a proper response:
Sat Jun 19 12:26:57 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 465.27 Driver Version: 465.27 CUDA Version: 11.3 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| N/A 45C P8 N/A / N/A | 301MiB / 1878MiB | 14% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
and when I use this docker-compose:
services:
test:
image: tensorflow/tensorflow:2.5.0-gpu
command: sleep infinity
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
and exec nvidia-smi after running it i get the same response.
But when i replace the image in the docker-compose to python:3.8.8-slim-buster like in the Dockerfile, i get this response:
OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "nvidia-smi": executable file not found in $PATH: unknown
I appreciate any help figuring this out.

" 'devices' properties is not allowed" while creating docker-compose with nvidia gpu

Description of the issue
Context information (for bug reports)
Output of docker-compose version
docker-compose version 1.17.1, build unknown
docker-py version: 2.5.1
CPython version: 2.7.17
OpenSSL version: OpenSSL 1.1.1 11 Sep 2018
Output of docker version
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Fri Dec 18 12:21:44 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Thu Dec 10 13:23:49 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.3-0ubuntu1~18.04.4
GitCommit:
runc:
Version: spec: 1.0.1-dev
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:
Output of docker-compose config
(Make sure to add the relevant -f and other flags)
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
Steps to reproduce the issue
Creating a Dockerfile with a simple pull of nvidia cuda image and a command to check the nvidia-gpu
FROM nvidia/cuda:10.2-base
CMD nvidia-smi
2.Works like a charm when we build the image and run it without docker compose
docker image build testserver/ -t testserverimage
docker run --gpus all -exec -it testserverimage
Shows the nvidia-gpu devices
Sat Feb 20 13:10:46 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00001918:00:00.0 Off | 0 |
| N/A 52C P0 71W / 149W | 7897MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Now creating the docker-compose.yml
version: "3.5"
services:
testserver:
image: nvidia/cuda:10.2-base
build: './modelserver'
deploy:
resources:
reservations:
devices:
- capabilities: [gpu]
driver: nvidia
Observed result
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
Expected result
Sat Feb 20 13:10:46 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00001918:00:00.0 Off | 0 |
| N/A 52C P0 71W / 149W | 7897MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Stacktrace / full error message
ERROR: The Compose file './docker-compose.yml' is invalid because:
services.testserver.deploy.resources.reservations value Additional properties are not allowed ('devices' was unexpected)
Additional information
OS version / distribution, docker-compose install method, etc.
OS Information:
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
Docker compose installation:
sudo apt install docker-compose
In the documentation https://docs.docker.com/compose/gpu-support/#enabling-gpu-access-to-service-containers :
Docker Compose v1.28.0+ allows to define GPU reservations using the device structure defined in the Compose Specification.
Your docker-compose version is 1.17.1, so you need to upgrade your docker-compose to, at least, 1.28.0.

Access to a Docker Gitlab instance from the network

I'm installing a Gitlab instance with docker-compose on a server machine on my local network, and I'd like to access to my Gitlab instance from anywhere in my local network by visiting for example "https://my-hostname"
I follow this.
I'm running:
web:
image: 'gitlab/gitlab-ce:latest'
restart: always
hostname: 'gitlab.example.com'
environment:
GITLAB_OMNIBUS_CONFIG: |
external_url 'https://gitlab.example.com'
# Add any other gitlab.rb configuration here, each on its own line
ports:
- '7780:80'
- '7443:443'
- '7722:22'
volumes:
- '/srv/gitlab/config:/etc/gitlab'
- '/srv/gitlab/logs:/var/log/gitlab'
- '/srv/gitlab/data:/var/opt/gitlab'
Now I have very (very) limited network knowledge, so basically, how do I access to my running gitlab instance ? When I go to the local network IP of my host, my browser tells me that it can't connect.
Here is what I'm hoping to achieve:
LOCAL NETWORK
+--------------------------------------------------------------------------+
| |
| +--------------------+ |
| | My_Server | |
| | | |
| | +----------------+ | |
| | | | | "https://my-hostname" +-------------------+ |
| | | Docker: Gitlab | <------------------------+ My_Client | |
| | | | | +-------------------+ |
| | +----------------+ | |
| | | |
| +--------------------+ |
| |
+--------------------------------------------------------------------------+
The ports part of your configuration maps the host's ports to the container's ports.
So if you have
ports:
- '7780:80'
- '7443:443'
- '7722:22'
that is redirecting port 7780 on your host to port 80 on your container, and so forth. You should be able to access your container's services (via its local IP address, and then its hostname via local DNS) with this knowledge.

Application templates and instances manager for docker deployment?

I'm looking about application deployment with docker containers for production in some server (not hundreds).
I can see some deployment managers like docker-compose who deploy according to YAML service
description file.
Official docker-compose.yml example file:
web:
build: .
ports:
- "5000:5000"
volumes:
- .:/code
links:
- redis
redis:
image: redis
I'm looking about solution to manage/produce these YAML files and communicate with deployment managers like docker-compose.
This solution should permeit to manage Applications templates, deployeds instances of them, configuration of them, etc.
Illustration of it:
Docker
+-------------------+
docker-compose.yml | |
+---------------+ +-------+ | containers |
| APP manager |------->|Mysql_a| | +---------------+ |
| | |Mysql_b+-----------+ | |MySQL_a |Mysq| |
| MySQL Tpl | |Mysql_c| docker-compose | +---------------+ |
| Wordpress tpl | |Wp_a | | | |l_b |Mysql_c | |
| | +---+---+ | | +---------+-----+ |
| Mysql_a | | +------+ |Wp_a | | |
| Mysql_b +----------> | | | +---------+ | |
| Mysql_c | | | | | | |
| Wp_a | | | | | | |
+---------------+ | | | | | |
+---------------+ | +---------------+ |
+-------------------+
My thirst think is for panamax but is it approriate ? Whats other open source solutions exists ?

Resources