Using Docker TensorFlow 1.13 with Newer GPU - docker

I'm trying to train a neural network (https://github.com/AiviaCommunity/3D-RCAN) written in version TensorFlow 1.13.1. I have an RTX A6000 GPU (running on Windows), which is not compatible with older versions of TensorFlow, and attempting to train the network on newer versions has introduced other complications. Would Docker provide a means of training the network on TensorFlow 1.13.1 with my RTX A6000? The project does not include a Dockerfile, but I've been able to train the network on my other computer (also Windows) with an older GPU and could try to create a Dockerfile myself if that's a potential solution.

Related

How to make custom code in python utilize GPU while using Pytorch tensors and matrice functions

I've created a CNN from scratch only using Pytorch tensors and matrix operation functions in the hope of utilizing GPU. To my surprise, the GPU stays 0% utilized and my training doesn't seem to be faster than running on my cpu.
Before Training:
While Training:
I've double checked whether CUDA is available and have installed it already.
Graphics card: Nvidia GEFORCE 2070 SUPER
Processor: Intel i5 10400
Coding Environment: Jupyter Notebook
Cuda & Cudnn Version: 11.0
Pytorch version: 1.6.0
You have to move your model and data to GPU using
model.cuda()
# and
x = x.cuda()
y = y.cuda()
You seem to be doing this with-in the calls of forward and backwards. To make sure the model is going on to GPU, monitor the GPU usage continually using shell command
watch -n 5 nvidia-smi

How do I pull Docker images with specific library versions installed in them?

I have an outdated neural network training python2.7 script which utilizes keras 2.1 on top of tensorflow 1.4; and I want it to be trained on my nVidia GPU; and I have CUDA SDK 10.2 installed on Linux. I thought Docker Hub is exactly for publishing frozen software packages which just work, but it seems there is no way to find a container with specific software set.
I know docker >=19.3 has native gpu support, and that nvidia-docker utility has cuda-agnostic layer; but the problem is i cannot install both keras-gpu and tensorflow-gpu of required versions, cannot find wheels, and this legacy script does not work with other versions.
Where did you get the idea tha Docker Hub hosts images with all possible library combinations?
If you cannot find an image that suits you, simply build your own.

ML training process is not on GPU

I just moved from AWS to Google Cloud Platform because of its lower GPU price. I followed the instructions on the website creating a compute engine instance with a K80 GPU, installed the latest-versioned Tensorflow, keras, cuda driver and cnDNN, everything goes very well. However when I try to train my model, the training process is still on CPU.
NVIDIA-SMI 387.26 Driver Version: 387.26
Cuda compilation tools, release 9.1, V9.1.85
Tensorflow version -1.4.1
cudnn cudnn-9.1-linux-x64-v7
Ubuntu 16.04
Perhaps you installed the cpu version of tensorflow?
Now, Google Cloud's compute engine has a VM OS image with all the needed software installed for an easier/faster way to get started: https://cloud.google.com/deep-learning-vm/

Can Tensorflow be installed alongside Theano?

I'm trying to install tensorflow alongside Theano on a Nvidia Tesla K80. I'm working with Cuda 7.5 and following the instructions given here
Theano by itself works well, but as soon as I install tensorflow from source code following the instructions OR using pip install, nvidia-smi as well as Theano stops working.
More specifically, nvidia-smi hangs indefinitely whereas Theano just refuses to run in GPU mode.
I'm also using the latest version of cudnn v4.
Does Tensorflow have known issues with respect to causing nvidia-smi to hang and being non-compatible with Theano?
TensorFlow pick all the avaible GPU. So if you start it before Theano, Theano won't have any GPUs available by default. If you start Theano first, TensorFlow will segfault when it can't get the GPU Theano use.
To work around that, make the NVIDIA driver only show to TensorFlow the device that you want him to see with this environment variable:
CUDA_VISIBLE_DEVICES=0 python ...

How to run GPGPU inside docker image with different from host kernel and GPU driver version

I have machine with several GPUs. My idea is to attach them to different docker instances in order to use that instances in CUDA (or OpenCL) calculations.
My goal is to setup docker image with quite old Ubuntu and quite old AMD video drivers (13.04). Reason is simple: upgrade to newer version of driver will broke my OpenCL program (due to buggy AMD linux drivers).
So question is following. Is it possible to run docker image with old Ubuntu, old kernel (3.14 for example) and old AMD (fglrx) driver on fresh Arch Linux setup with fresh kernel 4.2 and newer AMD (fglrx) drivers in repository?
P.S. I tried this answer (with Nvidia cards) and unfortunately deviceQuery inside docker image doesn't see any CUDA devices (as It happened with some commentors of original answer)...
P.P.S. My setup:
CPU: Intel Xeon E5-2670
GPUs:
1 x Radeon HD 7970
$ lspci -nn | grep Rad
83:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X] [1002:6798]
83:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT HDMI Audio [Radeon HD 7970 Series] [1002:aaa0]
2 x GeForce GTX Titan Black
With docker you rely on virtualization on Operating System level. That means you use the same kernel in all containers. If you wish to run different kernels for each container, you'll probably have to use system-level virtualization, e.g., KVM, VirtualBox. If your setup supports Intel's VT-d, you can pass the GPU as a PCIe device to the container(better terminology in this case is, Virtual Machine).

Resources