CPU and GPU Stress test for Nvidia Jetson Xavier (Ubuntu 18.04, Jetpack 4.6) - nvidia

How can I make a simultaneous CPU and GPU stress test on Jetson Xavier machine (Ubuntu 18.04, Jetpack 4.6)?
The only code found is
https://github.com/JTHibbard/Xavier_AGX_Stress_Test with tough package incompatibility issues. It only works for CPU.
Anyone can contribute with providing another code or solve the issue with the mentioned one? A python code is preferred.

Solution found. For CPU stress test, the above link works. It needs numba package to be installed. For GPU stress test, the samples in cuda folder of the Nvidia Jetson machines can be simply and efficiently used. The samples are in the /usr/local/cuda/samples. Choose one and compile it using sudo make. The compiled test file will be accessible in /usr/local/cuda/samples/bin/aarch64/linux/release (aarch64 may differ in different architectures). Run the test and check the performances using sudo jtop in another command line.

Related

How to install multiple Tensorflow versions?

I'm trying to run the code from this repository: https://github.com/danielgordon10/thor-iqa-cvpr-2018
It has the following requirements
Python 3.5
CUDA 8 or 9
cuDNN
Tensorflow 1.4 or 1.5
Ubuntu 16.04, 18.04
an installation of darknet
My system satisfies neither of these. I don't want to reinstall tf/cuda/cudnn on my machine (especially if have to do that everytime I try to run deep learning code with different tensorflow requirements everytime).
I'm looking for a way to install the requirements and run the code regardless of the host.
To my knowledge that is exactly what Docker is for.
Looking into this there exist docker images from nvidia. For example one called "nvidia/cuda:9.1-cudnn7-runtime". Based on the name I assumed that any image build with this as the base comes with cuda installed. This does not seem to be the case as if I try to install darknet it will fail with the error that "cuda_runtime.h" is missing.
So what my question basicaly boils down to is: How do I keep multiple different versions of cuda and tensorflow on the same machine ? Ideally with docker (or similar) so I won't have to do the process to many times.
It feels like I'm missing and/or don't understand something obvious, because I can't imagine that it can be so hard to run tensorflow code with different versions without reinstalling things from scratch all the time.

How do I pull Docker images with specific library versions installed in them?

I have an outdated neural network training python2.7 script which utilizes keras 2.1 on top of tensorflow 1.4; and I want it to be trained on my nVidia GPU; and I have CUDA SDK 10.2 installed on Linux. I thought Docker Hub is exactly for publishing frozen software packages which just work, but it seems there is no way to find a container with specific software set.
I know docker >=19.3 has native gpu support, and that nvidia-docker utility has cuda-agnostic layer; but the problem is i cannot install both keras-gpu and tensorflow-gpu of required versions, cannot find wheels, and this legacy script does not work with other versions.
Where did you get the idea tha Docker Hub hosts images with all possible library combinations?
If you cannot find an image that suits you, simply build your own.

GPU Memory in TensorFlow container with NVIDIA SLI

I have constructed a machine-learning computer with two RTX 2070 SUPER NVIDIA GPUs connected with SLI Bridge, Windows OS (SLI verified in NVIDIA Control Panel).
I have benchmarked the system using http://ai-benchmark.com/alpha and got impressive results.
In order to take the best advantage of libraries that use the GPU for scientific tasks (cuDF) I have created a TensorFlow Linux container:
https://www.tensorflow.org/install/docker
using “latest-gpu-py3-jupyter” tag.
I have then connected PyCharm to this container and configured its interpreter as an interpreter of the same project (I mounted the host project folder in the container).
When I run the same benchmark on the container, I get the error:
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[50,56,56,144] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
[[node MobilenetV2/expanded_conv_2/depthwise/BatchNorm/FusedBatchNorm (defined at usr/local/lib/python3.6/dist-packages/ai_benchmark/utils.py:238) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
This error relates to the exhaustion of GPU memory inside the container.
Why is the GPU on the windows host successfully handle the computation and the GPU on the Linux container exhaust the memory?
What makes this difference? is that related to memory allocation in the container?
Here is an awesome link from docker.com that explains why your desired workflow won't work. It also wouldn't work with RAPIDS cudf either. Docker Desktop works using Hyper V which isolates the hardware and doesn't access to the GPU the way the Linux drivers expect. Also, nvidia-docker is linux only
I can let you know that RAPIDS (cudf) currently doesn't support this implementation either. Windows, however, does work better with a Linux host. For both tensorflow and cudf, I strongly recommend that you use (or dual boot) one of the recommended OSes as your host OS, found here: https://rapids.ai/start.html#prerequisites. If you need Windows in your workflow, you can run it on top of your Linux host.
There is a chance that in the future, a WSL version will allow you to run RAPIDS on Windows, letting you- on your own- craft an on Windows solution.
Hope this helps!

Compile Tensorflow from source with Docker to get CPU speed up

I am looking for a way to set up or modify an existing Docker image for installing tensorflow that will install it such that the SSE4, AVX, AVX2, and FMA instructions can be utilized for CPU speed up. So far I have found how to install from source using bazel How to Compile Tensorflow... and CPU instructions not compiled.... Neither of these explain how to do this within Docker. So I think what I am looking for is what you need to add to an existing docker image that installs without these options so that you can get a compile version of tensorflow with the CPU options enabled. The existing docker images do not do this because they want the image to run on as many machines as possible. I am using Ubuntu 14.04 on linux PC. I am new to docker but have installed tensorflow and have it working without getting the CPU warnings I get when I use the docker images. I may not need this for speed, but I have seen posts that claim the speed up can be significant. I searched for existing docker images that do this and could not find anything. I need this to work with gpu so needs to be compatible with nvidia-docker.
I just found this docker support for bazel and it might provide an answer, however I do not understand it well enough to know for sure. I believe this is saying that you can not build tensorflow with bazel inside a Dockerfile. You have to build a Dockerfile using bazel. Is my understanding correct and is this the only way to get a docker image with tensorflow compiled from source? If so, I could still use help in how to do it and still get the other dependencies that I would get if using an existing docker image for tensorflow.
Dockerfiles that build with CPU support can be found here.
Hope that helps! Spent many a late night here on Stack Overflow and Github Issues and stuff. Now it's my turn to give back! :)
The GPU stuff in particular is really hairy - especially when enabling the XLA/JIT/AOT stuff as well as the Graph Transform Tools.
Lots of hacks embedded in my Dockerfiles. Feel free to review and ask me questions!
The contributing guidelines mention building TensorFlow from source with Docker to run the unit tests:
Refer to the
CPU-only developer Dockerfile and
GPU developer Dockerfile
for the required packages. Alternatively, use the said
Docker images, e.g.,
tensorflow/tensorflow:nightly-devel and tensorflow/tensorflow:nightly-devel-gpu
for development to avoid installing the packages directly on your system.

Can Tensorflow be installed alongside Theano?

I'm trying to install tensorflow alongside Theano on a Nvidia Tesla K80. I'm working with Cuda 7.5 and following the instructions given here
Theano by itself works well, but as soon as I install tensorflow from source code following the instructions OR using pip install, nvidia-smi as well as Theano stops working.
More specifically, nvidia-smi hangs indefinitely whereas Theano just refuses to run in GPU mode.
I'm also using the latest version of cudnn v4.
Does Tensorflow have known issues with respect to causing nvidia-smi to hang and being non-compatible with Theano?
TensorFlow pick all the avaible GPU. So if you start it before Theano, Theano won't have any GPUs available by default. If you start Theano first, TensorFlow will segfault when it can't get the GPU Theano use.
To work around that, make the NVIDIA driver only show to TensorFlow the device that you want him to see with this environment variable:
CUDA_VISIBLE_DEVICES=0 python ...

Resources