AWS SageMaker MXNet USE_CUDA=1

AWS SageMaker MXNet USE_CUDA=1 - machine-learning

I am using the AWS ml.p2.xlarge sagemaker instance and conda_amazonei_mxnet_p36 kernel after install MXnet CUDA
!pip install mxnet-cu101
when I try to run the following code
mx_tfidf = mx.nd.sparse.array(tfidf_matrix, ctx=mx.gpu())
I am getting the following error
MXNetError: [19:54:53] src/storage/storage.cc:119:
Compile with USE_CUDA=1 to enable GPU usage
Please help me to resolve the issue
nvidia-smi

please consider using the other Jupyter kernels conda_mxnet_p27 or conda_mxnet_p36. The kernel you use - conda_amazonei_mxnet_p36 - is primarily designed for local testing of the Amazon Elastic Inference hardware accelerator, (exposed as mx.eia() hardware context in mxnet), and presumably doesn't come with the mx.gpu() context enabled.

Related

CPU and GPU Stress test for Nvidia Jetson Xavier (Ubuntu 18.04, Jetpack 4.6)

How can I make a simultaneous CPU and GPU stress test on Jetson Xavier machine (Ubuntu 18.04, Jetpack 4.6)?
The only code found is
https://github.com/JTHibbard/Xavier_AGX_Stress_Test with tough package incompatibility issues. It only works for CPU.
Anyone can contribute with providing another code or solve the issue with the mentioned one? A python code is preferred.

Solution found. For CPU stress test, the above link works. It needs numba package to be installed. For GPU stress test, the samples in cuda folder of the Nvidia Jetson machines can be simply and efficiently used. The samples are in the /usr/local/cuda/samples. Choose one and compile it using sudo make. The compiled test file will be accessible in /usr/local/cuda/samples/bin/aarch64/linux/release (aarch64 may differ in different architectures). Run the test and check the performances using sudo jtop in another command line.

How do I pull Docker images with specific library versions installed in them?

I have an outdated neural network training python2.7 script which utilizes keras 2.1 on top of tensorflow 1.4; and I want it to be trained on my nVidia GPU; and I have CUDA SDK 10.2 installed on Linux. I thought Docker Hub is exactly for publishing frozen software packages which just work, but it seems there is no way to find a container with specific software set.
I know docker >=19.3 has native gpu support, and that nvidia-docker utility has cuda-agnostic layer; but the problem is i cannot install both keras-gpu and tensorflow-gpu of required versions, cannot find wheels, and this legacy script does not work with other versions.

Where did you get the idea tha Docker Hub hosts images with all possible library combinations?
If you cannot find an image that suits you, simply build your own.

Google AutoML Vision Exported Model on NVIDIA Jetson Nano

I would like to run the Exported Model from the Google AutoML Vision on NVIDIA Jetson Nano. Since it is easy I wanted to use the pre-built containers to do predictions following the official Edge containers tutorial.
The problem is that the pre-built CPU container stored in Google Container Registry (gcr.io/automl-vision-ondevice/gcloud-container-1.12.0:latest) is based on amd64 arch, while NVIDIA Jetson Nano is using arm64 arch (Ubuntu 18.04). That is why 'docker run ...' returns:
docker image error: standard_init_linux.go:211: exec user process caused “exec format error”
What can I do? Should I build a container similar to the pre-built one compatible with arm64 arch?

There are two ideas that would help you to achieve your goal:
[Idea 1] You could export the model with *.tflite format to do your detection.
[Idea 2] Deploy the model as an API service on Google AutoML Vision and call it with python or any other supported language.

ML training process is not on GPU

I just moved from AWS to Google Cloud Platform because of its lower GPU price. I followed the instructions on the website creating a compute engine instance with a K80 GPU, installed the latest-versioned Tensorflow, keras, cuda driver and cnDNN, everything goes very well. However when I try to train my model, the training process is still on CPU.
NVIDIA-SMI 387.26 Driver Version: 387.26
Cuda compilation tools, release 9.1, V9.1.85
Tensorflow version -1.4.1
cudnn cudnn-9.1-linux-x64-v7
Ubuntu 16.04

Perhaps you installed the cpu version of tensorflow?
Now, Google Cloud's compute engine has a VM OS image with all the needed software installed for an easier/faster way to get started: https://cloud.google.com/deep-learning-vm/

Can Tensorflow be installed alongside Theano?

I'm trying to install tensorflow alongside Theano on a Nvidia Tesla K80. I'm working with Cuda 7.5 and following the instructions given here
Theano by itself works well, but as soon as I install tensorflow from source code following the instructions OR using pip install, nvidia-smi as well as Theano stops working.
More specifically, nvidia-smi hangs indefinitely whereas Theano just refuses to run in GPU mode.
I'm also using the latest version of cudnn v4.
Does Tensorflow have known issues with respect to causing nvidia-smi to hang and being non-compatible with Theano?

TensorFlow pick all the avaible GPU. So if you start it before Theano, Theano won't have any GPUs available by default. If you start Theano first, TensorFlow will segfault when it can't get the GPU Theano use.
To work around that, make the NVIDIA driver only show to TensorFlow the device that you want him to see with this environment variable:
CUDA_VISIBLE_DEVICES=0 python ...

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

AWS SageMaker MXNet USE_CUDA=1 - machine-learning

Related

CPU and GPU Stress test for Nvidia Jetson Xavier (Ubuntu 18.04, Jetpack 4.6)

How do I pull Docker images with specific library versions installed in them?

Google AutoML Vision Exported Model on NVIDIA Jetson Nano

ML training process is not on GPU

Can Tensorflow be installed alongside Theano?

Categories

Resources