I'm using docker compose to run a container:
version: "3.9"
services:
app:
image: nvidia/cuda:11.0.3-base-ubuntu20.04
deploy:
resources:
reservations:
devices:
- capabilities: [ gpu ]
The container can benefit from the presence of a GPU, but it does not strictly need one. Using the above docker-compose.yaml results in an error
Error response from daemon: could not select device driver "" with capabilities: [[gpu]]
when being used on a machine without a GPU. Is it possible to specify "use a GPU, if one is available, else start the container without one"?
#herku, there are no conditional statements in docker compose. In 2018 the feature was out of scope https://github.com/docker/compose/issues/5756
Anyway you can check this answer with options how to workaround the problem
https://stackoverflow.com/a/50393225/3730077
here is my docker compose yml file:
version: "3.3"
services:
tutorial:
image: fiware/tutorials.context-provider
hostname: iot-sensors
container_name: fiware-tutorial
networks:
- default
expose:
- "3000"
- "3001"
ports:
- "3000:3000"
- "3001:3001"
environment:
- "DEBUG=tutorial:*"
- "PORT=3000"
- "IOTA_HTTP_HOST=iot-agent"
- "IOTA_HTTP_PORT=7896"
- "DUMMY_DEVICES_PORT=3001"
- "DUMMY_DEVICES_API_KEY=4jggokgpepnvsb2uv4s40d59ov"
And this the result:docker-compose.yml execution
Why cant i run it on Raspberry PI3( OS Linux 11 Debian bullseye)? Please help!
Thank you very much for your time!
As the error message suggests, you are hitting an exec format error when trying to run the dockerization of the fiware/tutorials.context-provider on a Raspberry PI since the compiled binaries are based on the amd64 architecture.
As can be seen from the answer to this question, that won't work on an ARM based machine, since Docker is a virtualisation platform, not an emulator.
Since no image based on your architecture is currently available, if you need it, you will have to build an ARM version yourself. The current code and Dockerfile can be found here: https://github.com/FIWARE/tutorials.NGSI-v2/tree/master/docker
So I would assume you will need to amend the dockerization and rebuild the binaries to overcome the exec format error issue - this seems to be a commons Issue with Raspberry Pi.
However I'm still unsure why creating a ARM dockerization is necessary, as all you are attempting to do is to containerize and run code emulating dummy IoT devices on a Raspberry Pi. A Raspberry Pi itself can send a stream of data directly as a real device - it doesn't need a device emulator to be a device, it is one.
I can successfully bring up a CosmosDb Emulator instance within docker-compose, but the data I am trying to seed has more than 25 static containers, which is more than the default emulator allows. Per https://learn.microsoft.com/en-us/azure/cosmos-db/emulator-command-line-parameters#set-partitioncount you can set this partition count higher with a parameter, but I am unable to find a proper entrypoint into the compose that accepts that parameter.
I have found nothing in my searches that affords any insight into this as most people have either not used compose or not even used Docker for their Cosmos Emulator instance. Any insight would be appreciated.
Here is my docker-compose.yml for CosmosDb:
services:
cosmosdb:
container_name: "azurecosmosemulator"
hostname: "azurecosmosemulator"
image: 'mcr.microsoft.com/cosmosdb/windows/azure-cosmos-emulator'
platform: windows
tty: true
mem_limit: 2GB
ports:
- '8081:8081'
- '8900:8900'
- '8901:8901'
- '8902:8902'
- '10250:10250'
- '10251:10251'
- '10252:10252'
- '10253:10253'
- '10254:10254'
- '10255:10255'
- '10256:10256'
- '10350:10350'
networks:
default:
ipv4_address: 172.16.238.246
volumes:
- '${hostDirectory}:C:\CosmosDB.Emulator\bind-mount'
I have attempted to add a command in there for starting the container, but it does not accept any arguments I have tried.
My answer for this was a work around. Ultimately, running windows and linux containers side-by-side was a sizeable pain. Recently, Microsoft put out a linux container version of the emulator, which allowed me to provide an environment variable for partition counts, and run the process far more efficiently.
Reference here: https://learn.microsoft.com/en-us/azure/cosmos-db/linux-emulator?tabs=ssl-netstd21
I want to create some neural network in tensorflow 2.x that trains on a GPU and I want to set up all the necessary infrastructure inside a docker-compose network (assuming that this is actually possible for now). As far as I know, in order to train a tensorflow model on a GPU, I need the CUDA toolkit and the NVIDIA driver. To install these dependencies natively on my computer (OS: Ubuntu 18.04) is always quite a pain, as there are many version dependencies between tensorflow, CUDA and the NVIDIA driver. So, I was trying to find a way how to create a docker-compose file that contains a service for tensorflow, CUDA and the NVIDIA driver, but I am getting the following error:
# Start the services
sudo docker-compose -f docker-compose-test.yml up --build
Starting vw_image_cls_nvidia-driver_1 ... done
Starting vw_image_cls_nvidia-cuda_1 ... done
Recreating vw_image_cls_tensorflow_1 ... error
ERROR: for vw_image_cls_tensorflow_1 Cannot start service tensorflow: OCI runtime create failed: container_linux.go:346: starting container process caused "exec: \"import\": executable file not found in $PATH": unknown
ERROR: for tensorflow Cannot start service tensorflow: OCI runtime create failed: container_linux.go:346: starting container process caused "exec: \"import\": executable file not found in $PATH": unknown
ERROR: Encountered errors while bringing up the project.
My docker-compose file looks as follows:
# version 2.3 is required for NVIDIA runtime
version: '2.3'
services:
nvidia-driver:
# NVIDIA GPU driver used by the CUDA Toolkit
image: nvidia/driver:440.33.01-ubuntu18.04
environment:
- NVIDIA_VISIBLE_DEVICES=all
volumes:
# Do we need this volume to make the driver accessible by other containers in the network?
- nvidia_driver:/usr/local/nvidai/:ro # Taken from here: http://collabnix.com/deploying-application-in-the-gpu-accelerated-data-center-using-docker/
networks:
- net
nvidia-cuda:
depends_on:
- nvidia-driver
image: nvidia/cuda:10.1-base-ubuntu18.04
volumes:
# Do we need the driver volume here?
- nvidia_driver:/usr/local/nvidai/:ro # Taken from here: http://collabnix.com/deploying-application-in-the-gpu-accelerated-data-center-using-docker/
# Do we need to create an additional volume for this service to be accessible by the tensorflow service?
devices:
# Do we need to list the devices here, or only in the tensorflow service. Taken from here: http://collabnix.com/deploying-application-in-the-gpu-accelerated-data-center-using-docker/
- /dev/nvidiactl
- /dev/nvidia-uvm
- /dev/nvidia0
networks:
- net
tensorflow:
image: tensorflow/tensorflow:2.0.1-gpu # Does this ship with cuda10.0 installed or do I need a separate container for it?
runtime: nvidia
restart: always
privileged: true
depends_on:
- nvidia-cuda
environment:
- NVIDIA_VISIBLE_DEVICES=all
volumes:
# Volumes related to source code and config files
- ./src:/src
- ./configs:/configs
# Do we need the driver volume here?
- nvidia_driver:/usr/local/nvidai/:ro # Taken from here: http://collabnix.com/deploying-application-in-the-gpu-accelerated-data-center-using-docker/
# Do we need an additional volume from the nvidia-cuda service?
command: import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000]))); print("SUCCESS")
devices:
# Devices listed here: http://collabnix.com/deploying-application-in-the-gpu-accelerated-data-center-using-docker/
- /dev/nvidiactl
- /dev/nvidia-uvm
- /dev/nvidia0
- /dev/nvidia-uvm-tools
networks:
- net
volumes:
nvidia_driver:
networks:
net:
driver: bridge
And my /etc/docker/daemon.json file looks as follows:
{"default-runtime":"nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
So, it seems like the error is somehow related to configuring the nvidia runtime, but more importantly, I am almost certain that I didn't set up my docker-compose file correctly. So, my questions are:
Is it actually possible to do what I am trying to do?
If yes, did I setup my docker-compose file correctly (see comments in docker-compose.yml)?
How do I fix the error message I received above?
Thank you very much for your help, I highly appreciate it.
I agree that installing all tensorflow-gpu dependencies is rather painful. Fortunately, it's rather easy with Docker, as you only need NVIDIA Driver and NVIDIA Container Toolkit (a sort of a plugin). The rest (CUDA, cuDNN) Tensorflow images have inside, so you don't need them on the Docker host.
The driver can be deployed as a container too, but I do not recommend that for a workstation. It is meant to be used on servers where there is no GUI (X-server, etc). The subject of containerized driver is covered at the end of this post, for now let's see how to start tensorflow-gpu with docker-compose. The process is the same regardless of whether you have the driver in container or not.
How to launch Tensorflow-GPU with docker-compose
Prerequisites:
docker & docker-compose
NVIDIA Container Toolkit & NVIDIA Driver
To enable GPU support for a container you need to create the container with NVIDIA Container Toolkit. There are two ways you can do that:
You can configure Docker to always use nvidia container runtime. It is fine to do so as it works just as the default runtime unless some NVIDIA-specific environment variables are present (more on that later). This is done by placing "default-runtime": "nvidia" into Docker's daemon.json:
/etc/docker/daemon.json:
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
},
"default-runtime": "nvidia"
}
You can select the runtime during container creation. With docker-compose it is only possible with format version 2.3.
Here is a sample docker-compose.yml to launch Tensorflow with GPU:
version: "2.3" # the only version where 'runtime' option is supported
services:
test:
image: tensorflow/tensorflow:2.3.0-gpu
# Make Docker create the container with NVIDIA Container Toolkit
# You don't need it if you set 'nvidia' as the default runtime in
# daemon.json.
runtime: nvidia
# the lines below are here just to test that TF can see GPUs
entrypoint:
- /usr/local/bin/python
- -c
command:
- "import tensorflow as tf; tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)"
By running this with docker-compose up you should see a line with the GPU specs in it. It appears at the end and looks like this:
test_1 | 2021-01-23 11:02:46.500189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 1624 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
And that is all you need to launch an official Tensorflow image with GPU.
NVIDIA Environment Variables and custom images
As I mentioned, NVIDIA Container Toolkit works as the default runtime unless some variables are present. These are listed and explained here. You only need to care about them if you build a custom image and want to enable GPU support in it. Official Tensorflow images with GPU have them inherited from CUDA images they use a base, so you only need to start the image with the right runtime as in the example above.
If you are interested in customising a Tensorflow image, I wrote another post on that.
Host Configuration for NVIDIA driver in container
As mentioned in the beginning, this is not something you want on a workstation. The process require you to start the driver container when no other display driver is loaded (that is via SSH, for example). Furthermore, at the moment of writing only Ubuntu 16.04, Ubuntu 18.04 and Centos 7 were supported.
There is an official guide and below are extractions from it for Ubuntu 18.04.
Edit 'root' option in NVIDIA Container Toolkit settings:
sudo sed -i 's/^#root/root/' /etc/nvidia-container-runtime/config.toml
Disable the Nouveau driver modules:
sudo tee /etc/modules-load.d/ipmi.conf <<< "ipmi_msghandler" \
&& sudo tee /etc/modprobe.d/blacklist-nouveau.conf <<< "blacklist nouveau" \
&& sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf <<< "options nouveau modeset=0"
If you are using an AWS kernel, ensure that the i2c_core kernel module is enabled:
sudo tee /etc/modules-load.d/ipmi.conf <<< "i2c_core"
Update the initramfs:
sudo update-initramfs -u
Now it's time to reboot for the changes to take place. After reboot check that no nouveau or nvidia modules are loaded. The commands below should return nothing:
lsmod | grep nouveau
lsmod | grep nvidia
Starting driver in container
The guide offers a command to run the driver, I prefer docker-compose. Save the following as driver.yml:
version: "3.0"
services:
driver:
image: nvidia/driver:450.80.02-ubuntu18.04
privileged: true
restart: unless-stopped
volumes:
- /run/nvidia:/run/nvidia:shared
- /var/log:/var/log
pid: "host"
container_name: nvidia-driver
Use docker-compose -f driver.yml up -d to start the driver container. It will take a couple of minutes to compile modules for your kernel. You may use docker logs nvidia-driver -f to overview the process, wait for 'Done, now waiting for signal' line to appear. Otherwise use lsmod | grep nvidia to see if the driver modules are loaded. When it's ready you should see something like this:
nvidia_modeset 1183744 0
nvidia_uvm 970752 0
nvidia 19722240 17 nvidia_uvm,nvidia_modeset
Docker Compose v1.27.0+
since 2022 version 3.x
version: "3.6"
services:
jupyter-8888:
image: "tensorflow/tensorflow:latest-gpu-jupyter"
env_file: "env-file"
deploy:
resources:
reservations:
devices:
- driver: "nvidia"
device_ids: ["0"]
capabilities: [gpu]
ports:
- 8880:8888
volumes:
- workspace:/workspace
- data:/data
if you want to specify different GPU id eg. 0 and 3
device_ids: ['0', '3']
Managed to get it working by installing WSL2 on my windows machine to to use VS Code along with the Remote-Containers extension. Here is a collection of articles that helped a lot with the installation of WSL2 and using VS Code from within it:
https://learn.microsoft.com/en-us/windows/wsl/install-win10
ubuntu.com/blog/getting-started-with-cuda-on-ubuntu-on-wsl-2
https://code.visualstudio.com/docs/remote/containers
With the remote-containers extension of VS Code, you can then setup you devcontainer based on a docker-compose file (or just a Dockerfile as I did), which is probably better explained in the third link above. One thing for myself to remember is that when defining the .devcontainer.json file you need to make sure to set
// Optional arguments passed to ``docker run ... ``
"runArgs": [
"--gpus", "all"
]
Before VS Code, I have used Pycharm for a long time, so switching to VS Code was quite a pain at first, but VS Code along with WSL2, the remote-containers, and the pylance extension have made it quite easy to develop in a container with GPU support. As far as I know Pycharcm doesnt support debugging inside a container in WSL atm, because of
https://intellij-support.jetbrains.com/hc/en-us/community/posts/360009752059-Using-docker-compose-interpreter-on-wsl-project-Windows-
https://youtrack.jetbrains.com/issue/WI-53325
I would like to run 2 docker images with docker-compose.
one image should run with nvidia-docker and the other with docker.
I've seen this post use nvidia-docker-compose launch a container, but exited soon
but this is not working for me(not even running only one image)...
any idea would be great.
UPDATE : please check nvidia-docker 2 and its support of docker-compose first
https://github.com/NVIDIA/nvidia-docker/wiki/Frequently-Asked-Questions#do-you-support-docker-compose
(I'd first suggest adding the nvidia-docker tag).
If you look at the nvidia-docker-compose code here it only generates a specific docker-file for docker-compose after a query of the nvidia configuration on localhost:3476.
You can also make by hand this docker-compose file as they turn out to be quite simple, follow this example, replace 375.66 with your nvidia driver version and put as many /dev/nvidia[n] lines as you have graphic cards (did not try to put services on separate GPUs but go for it !):
services:
exampleservice0:
devices:
- /dev/nvidia0
- /dev/nvidia1
- /dev/nvidiactl
- /dev/nvidia-uvm
- /dev/nvidia-uvm-tools
environment:
- EXAMPLE_ENV_VARIABLE=example
image: company/image
volumes:
- ./disk:/disk
- nvidia_driver_375.66:/usr/local/nvidia:ro
version: '2'
volumes:
media: null
nvidia_driver_375.66:
external: true
Then just run this hand-made docker-compose file with a classic docker-compose command.
Maybe you can then compose with non nvidia dockers by skipping the nvidia specific stuff in the other services.
Additionally to the accepted answer, here's my approach, a bit shorter.
I needed to use the old version of docker-compose (2.3) because of the required runtime: nvidia (won't necessarily work with version: 3 - see this). Setting NVIDIA_VISIBLE_DEVICES=all will make all the GPUs visible.
version: '2.3'
services:
your-service-name:
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
# ...your stuff
My example is available here.
Tested on NVIDIA Docker 2.5.0, Docker CE 19.03.13 and NVIDIA-SMI 418.152.00 and CUDA 10.1 on Debian 10.