Is there a way for GPU support without nvidia-docker - docker

I am trying to get gpu support on my container
without the nvidia-docker
I know with the nvidia docker, I just have to use
--runtime=nvidia but my current circumstances does not allow using nvidia-docker
I tried installing the nvidia driver, cuda, cudnn on my container but it fails.
How can I use tensorflow gpu without nvidia docker on my container?

You can use x11docker
Running a docker image on X with gpu is as simple as
x11docker --gpu imagename

You'll be happy to know that the latest Docker version now comes with support for nvidia GPU's. You'll need to use --device flag to specify your Nvidia driver. See - How to use GPU a docker container
Earlier, you had to install nvidia-docker which was plain docker with a thin layer of abstraction for nvidia GPU's. See - Nvidia Docker

You cannot simply install nvidia drivers in a docker container. The container must have access to the hardware. Though I'm not certain, but mounts might help you with that issue. See- https://docs.docker.com/storage/
You can use anaconda to install and use Tensorflow-gpu.
Make sure you have the latest nvidia drivers installed.
Install Anaconda 2 or 3 from the official site.
https://www.anaconda.com/distribution/
Create a new environment and install tensorflow-gpu and cudatoolkit.
$conda create -n tf-gpu tensorflow-gpu python cudnn cudatoolkit
You can also specify the version of application.
E.g $conda create -n tf-gpu tensorflow-gpu python=3.5 cudnn cudatoolkit=8
Please do check if your hardware has the minimum compute capability to support the version of CUDA that you are/will be using.

If you can't pass --runtime=nvidia as a command-line option (eg docker-compose), you can set the default runtime in the Docker daemon config file /etc/docker/daemon.json:
{
"default-runtime": "nvidia"
}

Related

Docker installing multiple versions of cuda on ubuntu 16.04

firstly, I'm still beginner in docker.
I need to run multiple version of TensorFlow and each version requires a specific cuda version.
my host operating system is ubuntu 16.04
I need to have multiple versions of cuda on my OS since I'm working on multiple projects each requires a different versions of cuda. I tried to use conda and virtual environments to solve that problem. after a while I gave up and started to search for alternatives.
apparently virtual machines can't access GPU, only if you own a specif gpu type you can run the official NVIDIA visualizer.
I have a NVIDIA 1080 gpu. I installed a new image of Ubuntu 16.04 and started to work on dockerfiles to create custom images for my projects.
I was trying to avoid using docker to avoid complexity,after I failed in installing and running multiple versions of cuda I turned to docker. apparently you can't access cuda via docker directly if you don't install the cuda driver on the host machine.
I'm still not sure if I could run docker containers with a different cuda version than the one I installed in my pc.
if that is the case, NVIDIA messed up big time. usually if their is no need to use docker we avoid it to overcome additional complexities. when we need to work with multiple environments, and conda and virtual environment fail. we head out towards docker. so If nvidia limits the usage in docker container to one cuda version, they only intended to allow developers to work on one project of special dependencies per operating system.
please confirm if I can run containers that each have a specific cuda versions.
Moreover I will greatly appreciate if someone point out to a guide on how to use conda enviroments to build docker files and how to run conda env in docker container.
Having several CUDA versions is possible with Docker. Moreover, none of them needs to be at your host machine, you can have CUDA in a container and that's IMO is the best place for it.
To enable GPU support in container and make use of CUDA in it you need to have all of these installed:
Docker
(optionally but recommended) docker-compose
NVIDIA Container Toolkit
NVIDIA GPU Driver
Once you've obtained these you can simply grab one of the official tensorflow images (if the built-in python version fit your needs), install pip packages and start working in minutes. CUDA is included in the container image, you don't need it on host machine.
Here's an example docker-compose.yml to start a container with tensorflow-gpu. All the container does is a test whether any of GPU devices available.
version: "2.3" # the only version where 'runtime' option is supported
services:
test:
image: tensorflow/tensorflow:2.3.0-gpu
# Make Docker create the container with NVIDIA Container Toolkit
# You don't need it if you set 'nvidia' as the default runtime in
# daemon.json.
runtime: nvidia
# the lines below are here just to test that TF can see GPUs
entrypoint:
- /usr/local/bin/python
- -c
command:
- "import tensorflow as tf; tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)"
Running this with docker-compose up you should see a line with GPU specs in it. It looks like this and appears at the end:
test_1 | 2021-01-23 11:02:46.500189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/device:GPU:0 with 1624 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)

How to expose all GPUs to Kubernetes without the command "--gpus all" in docker 19.03?

I want to install Kubernetes and docker 19.03 with NVIDIA GPU supporting.
Before docker 19.03, the default rumtime of docker needs to be assigned to nvidia.
Now the method is not supported, the recommend method is to insert "--gpus all" in command line.
Is there any way to make "--gpus all" as the default setting of docker?
It is also acceptable to change the command of Kubernetes for invoking docker, but I have not found the solution.
BTW, I don't want to use NVIDIA's k8s-device-plugin because I want to control GPUs by myself.
I just need all GPUs are exposed to PODs.
According to NVIDIA's documents, we need to install Nvidia docker 2.0 even if it is not a recommended method. After the installing, you can set the Nvidia runtime as the default.
Kubernetes cannot support the new command "--gpus all" currently.

Looking for any Jetson Nano DockerHub example that uses the GPU

I'm getting started with my Jetson Nano and I'm looking for an example that I can launch by running docker run xxx where xxx is some image at DockerHub that uses the GPU.
I assume I'll have to pass in some --device flags, but is there even any kind of "hello world"-style sample ready to go that uses the GPU from docker?
I'm hoping to just demonstrate that you can access the GPU from a docker container on the Jetson Nano. Mostly to make sure that my configuration is correct.
Nvidia Jetpack 4.2.1 enabled easy Docker with GPU on Jetson Nano.
See here for detailed instruction on how to get Docker and Kubernetes running on Jetson Nano with GPU:
https://medium.com/jit-team/building-a-gpu-enabled-kubernets-cluster-for-machine-learning-with-nvidia-jetson-nano-7b67de74172a
It uses a simple Docker Hub hosted image for TensorFlow:
You're not alone in wanting that, but you cannot do it at the moment. The NVIDIA Nano team are aware of the need, and the feature is expected later this year.
See https://devtalk.nvidia.com/default/topic/1050809/jetson-nano/docker-image-to-see-if-cuda-is-working-in-container-on-jetson-nano-/
At present you can run a docker container with TensorFlow or PyTorch installed. but it will only use the CPU, not the GPU.

nvidia-cuda docker container os, different from host

In Nvidia's developer page (https://devblogs.nvidia.com/nvidia-docker-gpu-server-application-deployment-made-easy/)
It states that nvidia-docker provides "driver-agnostic CUDA images".
I would just like to inquire/clarify if this is only driver version specific or does this also apply to OS?
For example:
Host = CentOS
Docker Image/Container = Ubuntu
Does using nvidia-docker provide a way to utilize the CentOS's nvidia driver in the Ubuntu Docker Container?
Currently what I do is I always have 2 Docker files for supporting Ubuntu Host and CentOS Host and manually mount /dev/nvidia0 and copy the library files (or install the driver) inside the docker image.
I've asked this already to the Nvidia, but still waiting for them to answer.
I'll be trying it my self too to find out but I just thought to try my luck if anyone from SO already knows the answer.
Thank you in advance guys.
I've tested this and it does work.
"driver-agnostic CUDA images" is not only limitted to different versions of the driver but also across different OS (binary)
Thank you.

All images and containers disappeared after host kernel downgrade

Good day.
On the host machine was installed kernel 3.16. After installation the kernel 3.14 via deb package I lost all docker images and containers. Output of commands "docker images" and "docker ps -a" is empty. Is this normal behavior of docker?
Thanks.
I will answer myself. It may be useful someone.
Docker used storage driver "aufs" on the old kernel. Therefore the module "aufs.ko" must be loaded. In the new kernel support aufs was not be enabled and docker began to use storage driver "devicemapper".
To actually fix it on Ubuntu, run
sudo apt-get -y install linux-image-extra-$(uname -r)
This will install the aufs kernel module that docker requires but can be lost during kernel upgrades. Not sure why the package manager misses this dependency.
As Denis Pitikov points out, images and containers can disappear if the storage driver that created them (e.g. aufs) is no longer available.
When run on Ubuntu 14.04, the current Docker install script automatically installs the linux-image-extra-* package (suitable for your current kernel version). This includes the aufs kernel module.
On some systems, the linux-image-generic package may not be installed. On these systems, the next time you run a dist-upgrade, the kernel will be upgraded but the corresponding linux-image-extra-* will not be installed. When you reboot you won't have the aufs module, and your containers and images may have disappeared.
To fix it: first, check that you're running a generic kernel already:
$ uname -r
3.13.0-49-generic
If so, consider installing linux-image-generic:
$ apt-get install linux-image-generic
That will upgrade your kernel to the version required by that package and will install the -extra package too.

Resources