Can Tensorflow be installed alongside Theano? - nvidia

I'm trying to install tensorflow alongside Theano on a Nvidia Tesla K80. I'm working with Cuda 7.5 and following the instructions given here
Theano by itself works well, but as soon as I install tensorflow from source code following the instructions OR using pip install, nvidia-smi as well as Theano stops working.
More specifically, nvidia-smi hangs indefinitely whereas Theano just refuses to run in GPU mode.
I'm also using the latest version of cudnn v4.
Does Tensorflow have known issues with respect to causing nvidia-smi to hang and being non-compatible with Theano?

TensorFlow pick all the avaible GPU. So if you start it before Theano, Theano won't have any GPUs available by default. If you start Theano first, TensorFlow will segfault when it can't get the GPU Theano use.
To work around that, make the NVIDIA driver only show to TensorFlow the device that you want him to see with this environment variable:
CUDA_VISIBLE_DEVICES=0 python ...

Related

Using Docker TensorFlow 1.13 with Newer GPU

I'm trying to train a neural network (https://github.com/AiviaCommunity/3D-RCAN) written in version TensorFlow 1.13.1. I have an RTX A6000 GPU (running on Windows), which is not compatible with older versions of TensorFlow, and attempting to train the network on newer versions has introduced other complications. Would Docker provide a means of training the network on TensorFlow 1.13.1 with my RTX A6000? The project does not include a Dockerfile, but I've been able to train the network on my other computer (also Windows) with an older GPU and could try to create a Dockerfile myself if that's a potential solution.

CPU and GPU Stress test for Nvidia Jetson Xavier (Ubuntu 18.04, Jetpack 4.6)

How can I make a simultaneous CPU and GPU stress test on Jetson Xavier machine (Ubuntu 18.04, Jetpack 4.6)?
The only code found is
https://github.com/JTHibbard/Xavier_AGX_Stress_Test with tough package incompatibility issues. It only works for CPU.
Anyone can contribute with providing another code or solve the issue with the mentioned one? A python code is preferred.
Solution found. For CPU stress test, the above link works. It needs numba package to be installed. For GPU stress test, the samples in cuda folder of the Nvidia Jetson machines can be simply and efficiently used. The samples are in the /usr/local/cuda/samples. Choose one and compile it using sudo make. The compiled test file will be accessible in /usr/local/cuda/samples/bin/aarch64/linux/release (aarch64 may differ in different architectures). Run the test and check the performances using sudo jtop in another command line.

How to install multiple Tensorflow versions?

I'm trying to run the code from this repository: https://github.com/danielgordon10/thor-iqa-cvpr-2018
It has the following requirements
Python 3.5
CUDA 8 or 9
cuDNN
Tensorflow 1.4 or 1.5
Ubuntu 16.04, 18.04
an installation of darknet
My system satisfies neither of these. I don't want to reinstall tf/cuda/cudnn on my machine (especially if have to do that everytime I try to run deep learning code with different tensorflow requirements everytime).
I'm looking for a way to install the requirements and run the code regardless of the host.
To my knowledge that is exactly what Docker is for.
Looking into this there exist docker images from nvidia. For example one called "nvidia/cuda:9.1-cudnn7-runtime". Based on the name I assumed that any image build with this as the base comes with cuda installed. This does not seem to be the case as if I try to install darknet it will fail with the error that "cuda_runtime.h" is missing.
So what my question basicaly boils down to is: How do I keep multiple different versions of cuda and tensorflow on the same machine ? Ideally with docker (or similar) so I won't have to do the process to many times.
It feels like I'm missing and/or don't understand something obvious, because I can't imagine that it can be so hard to run tensorflow code with different versions without reinstalling things from scratch all the time.

AWS SageMaker MXNet USE_CUDA=1

I am using the AWS ml.p2.xlarge sagemaker instance and conda_amazonei_mxnet_p36 kernel after install MXnet CUDA
!pip install mxnet-cu101
when I try to run the following code
mx_tfidf = mx.nd.sparse.array(tfidf_matrix, ctx=mx.gpu())
I am getting the following error
MXNetError: [19:54:53] src/storage/storage.cc:119:
Compile with USE_CUDA=1 to enable GPU usage
Please help me to resolve the issue
nvidia-smi
please consider using the other Jupyter kernels conda_mxnet_p27 or conda_mxnet_p36. The kernel you use - conda_amazonei_mxnet_p36 - is primarily designed for local testing of the Amazon Elastic Inference hardware accelerator, (exposed as mx.eia() hardware context in mxnet), and presumably doesn't come with the mx.gpu() context enabled.

ML training process is not on GPU

I just moved from AWS to Google Cloud Platform because of its lower GPU price. I followed the instructions on the website creating a compute engine instance with a K80 GPU, installed the latest-versioned Tensorflow, keras, cuda driver and cnDNN, everything goes very well. However when I try to train my model, the training process is still on CPU.
NVIDIA-SMI 387.26 Driver Version: 387.26
Cuda compilation tools, release 9.1, V9.1.85
Tensorflow version -1.4.1
cudnn cudnn-9.1-linux-x64-v7
Ubuntu 16.04
Perhaps you installed the cpu version of tensorflow?
Now, Google Cloud's compute engine has a VM OS image with all the needed software installed for an easier/faster way to get started: https://cloud.google.com/deep-learning-vm/

Resources