ML training process is not on GPU - machine-learning

I just moved from AWS to Google Cloud Platform because of its lower GPU price. I followed the instructions on the website creating a compute engine instance with a K80 GPU, installed the latest-versioned Tensorflow, keras, cuda driver and cnDNN, everything goes very well. However when I try to train my model, the training process is still on CPU.
NVIDIA-SMI 387.26 Driver Version: 387.26
Cuda compilation tools, release 9.1, V9.1.85
Tensorflow version -1.4.1
cudnn cudnn-9.1-linux-x64-v7
Ubuntu 16.04

Perhaps you installed the cpu version of tensorflow?
Now, Google Cloud's compute engine has a VM OS image with all the needed software installed for an easier/faster way to get started: https://cloud.google.com/deep-learning-vm/

Related

How to develop a model on a CPU before migrating

I am building a multivariate LSTM to model longitudinal data with pytorch.
I have installed Graphcore pytorch (3.1, which includes poplar and popart) and tools from docker. Rather than installing an IPU immediately, can I develop the model on the CPU to start with before adding or migrating to an IPU? When I issue any gc-* command it reports no IPU available, which I know is true!
I generally prefer to run in bare metal [Ubuntu 20.04 LTS AMD 1950X Threadripper] rather than via VMs. Do I need a Graphcore account to do this, so I can sign the licence agreement etc? I guess that is implied in the docker application.

Using Docker TensorFlow 1.13 with Newer GPU

I'm trying to train a neural network (https://github.com/AiviaCommunity/3D-RCAN) written in version TensorFlow 1.13.1. I have an RTX A6000 GPU (running on Windows), which is not compatible with older versions of TensorFlow, and attempting to train the network on newer versions has introduced other complications. Would Docker provide a means of training the network on TensorFlow 1.13.1 with my RTX A6000? The project does not include a Dockerfile, but I've been able to train the network on my other computer (also Windows) with an older GPU and could try to create a Dockerfile myself if that's a potential solution.

CPU and GPU Stress test for Nvidia Jetson Xavier (Ubuntu 18.04, Jetpack 4.6)

How can I make a simultaneous CPU and GPU stress test on Jetson Xavier machine (Ubuntu 18.04, Jetpack 4.6)?
The only code found is
https://github.com/JTHibbard/Xavier_AGX_Stress_Test with tough package incompatibility issues. It only works for CPU.
Anyone can contribute with providing another code or solve the issue with the mentioned one? A python code is preferred.
Solution found. For CPU stress test, the above link works. It needs numba package to be installed. For GPU stress test, the samples in cuda folder of the Nvidia Jetson machines can be simply and efficiently used. The samples are in the /usr/local/cuda/samples. Choose one and compile it using sudo make. The compiled test file will be accessible in /usr/local/cuda/samples/bin/aarch64/linux/release (aarch64 may differ in different architectures). Run the test and check the performances using sudo jtop in another command line.

How do I pull Docker images with specific library versions installed in them?

I have an outdated neural network training python2.7 script which utilizes keras 2.1 on top of tensorflow 1.4; and I want it to be trained on my nVidia GPU; and I have CUDA SDK 10.2 installed on Linux. I thought Docker Hub is exactly for publishing frozen software packages which just work, but it seems there is no way to find a container with specific software set.
I know docker >=19.3 has native gpu support, and that nvidia-docker utility has cuda-agnostic layer; but the problem is i cannot install both keras-gpu and tensorflow-gpu of required versions, cannot find wheels, and this legacy script does not work with other versions.
Where did you get the idea tha Docker Hub hosts images with all possible library combinations?
If you cannot find an image that suits you, simply build your own.

Can Tensorflow be installed alongside Theano?

I'm trying to install tensorflow alongside Theano on a Nvidia Tesla K80. I'm working with Cuda 7.5 and following the instructions given here
Theano by itself works well, but as soon as I install tensorflow from source code following the instructions OR using pip install, nvidia-smi as well as Theano stops working.
More specifically, nvidia-smi hangs indefinitely whereas Theano just refuses to run in GPU mode.
I'm also using the latest version of cudnn v4.
Does Tensorflow have known issues with respect to causing nvidia-smi to hang and being non-compatible with Theano?
TensorFlow pick all the avaible GPU. So if you start it before Theano, Theano won't have any GPUs available by default. If you start Theano first, TensorFlow will segfault when it can't get the GPU Theano use.
To work around that, make the NVIDIA driver only show to TensorFlow the device that you want him to see with this environment variable:
CUDA_VISIBLE_DEVICES=0 python ...

Resources