Tensorflow with GPU on Google Cloud - machine-learning

I have a model on Google Machine Learning using tensorflow, and it's ok.
Now I want to do some predicts using the GPU.
I saw this link, but it tells about trainning with GPU not prediction. There's nothing about GPU in prediction session.
Someone Know if its possible to do prediction using google machine learning engine with GPU? Or if I use the trainning with GPU, my Prediction automatically run with GPU?
I'm using the follow commandline:
gcloud ml-engine predict --model ${MODEL_NAME} --json-instances request.json
This command works, but It's using CPU.
Additional information: My model is published in us-east1 zone, and my scale is automatically.

You cannot choose to use the GPU for prediction in ml-engine. It's unclear whether they are using GPUs by default -- I would link to documentation but there is none available.
I am sure, however, that they are not using TPUs. Currently, Google is only using TPUs for internal services; although they have created a TPU cloud exclusively for researchers to experiment with: https://yourstory.com/2017/05/google-cloud-tpus-machine-learning-models-training/
If you want more control over how your prediction is run, for the same price you can configure a Google Compute Engine with a high-powered Tesla K80 GPU. Your Tensorflow model will work there, too, and it is straightforward to set up.
My suggestion would be to make benchmark predictions using your GCE instance and then compare them to the ml-engine. If ml-engine is faster than GCE, then Google is probably using GPUs for prediction. Surely, their goal is to provide GPUs and TPUs as an ml-engine in the future, but demand is overloading the HPC cloud these days.

Online predictions on GCP ML Engine use Single core CPUs by default which have high latency. If it suits your requirement, you can use the Quad core CPU which serve predictions faster. For using that, you must specify the type of CPU for predictions which creating a version of your model on ML Engine. Link to the documentation : https://cloud.google.com/ml-engine/docs/tensorflow/online-predict.

We support GPU now. Documentation here!
Example:
gcloud beta ai-platform versions create version_name \
--model model_name \
--origin gs://model-directory-uri \
--runtime-version 2.1 \
--python-version 3.7 \
--framework tensorflow \
--machine-type n1-standard-4 \
--accelerator count=1,type=nvidia-tesla-t4 \
--config config.yaml
If you use one of the Compute Engine (N1) machine types for your model version, you can optionally add GPUs to accelerate each prediction node.
NVIDIA Tesla K80
NVIDIA Tesla P4
NVIDIA Tesla P100
NVIDIA Tesla T4
NVIDIA Tesla V100

This site has some information:
https://cloud.google.com/ml-engine/docs/how-tos/getting-started-training-prediction
However, it is a completely different way to train and predict. They provide the means for training and predictiong on their service infrastructure. You just build the model with your Tensorflow program and then use their hardware with their cloud SDK. So it shouldn't bother you whether it runs on CPU or GPU.

Related

How to build customer docker image for Amazon elastic inference for Sagemaker inference in PyTorch

I am trying to build a customer docker image for Sagemaker elastic inference.
I read the document page, https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html#ei-intro-endpoint
and
https://docs.aws.amazon.com/sagemaker/latest/dg/ei-endpoints.html#ei-endpoints-boto3
https://github.com/aws/sagemaker-pytorch-inference-toolkit#building-your-image
I do not see an example showing how to build customized docker container.
If you know how to build a customized docker container for Sagemaker elastic inference, could you help me and show me how to do it?
The docker container will be used in cloudformation file to build the Sagemaker endpoint.
Type: AWS::SageMaker::Model
Properties:
Containers:
- ContainerDefinition
EnableNetworkIsolation: Boolean
ExecutionRoleArn: String
InferenceExecutionConfig:
InferenceExecutionConfig
ModelName: String
PrimaryContainer:
ContainerDefinition
Tags:
- Tag
VpcConfig:
VpcConfig
'''
Thanks,
there are better alternatives that may be cheaper and more performant than Amazon Elastic Inference such as the GPU accelerated ml.g4dn.xlarge and AWS Inferentia instances.
Can you evaluate these options for your workload and consider using them instead of Amazon Elastic Inference?
You can use the Amazon SageMaker Inference Recommender to benchmark and compare performance of these options for your workload. Here is a sample price-performance comparison of various hardware acceleration options on Amazon SageMaker based on US-East-1 pricing -
Instance name
On-Demand hourly rate
vCPU
Memory
Storage
Network performance.
ml.g4dn.xlarge
$0.736
4
16 GiB
125 GB NVMe SSD
Up to 25 Gigabit
ml.inf1.xlarge
$0.297
4
8
any EBS
Up to 25 Gigabit
ml.eia2.xlarge*
$0.476
n/a
n/a
n/a
n/a
In terms of a custom docker image for EI,
Is there a reason you need a Bring Your Own Container (BYOC) can you not make use of one of SageMaker's containers?
You need to make sure your DL framework is EI compatible. For example, if you read https://docs.aws.amazon.com/elastic-inference/latest/developerguide/ei-tensorflow.html, you can find that pip install -U ei_for_tf*.whl is required. That means the tf version is different from the generic one.
If you use DL AMI or DL images by AWS, the package has been pre-installed.

How does AI Platform (ML Engine) allocate resources to jobs?

I'm trying out a few experiments using Google's AI Platform and have a few questions regarding that.
Basically, my project is structured as per the docs with a trainer task and a separate batch prediction task. I want to understand how AI Platform allocates resources to the tasks I execute. Comparing it with the current SOTA solutions like Spark, Tensorflow and Pytorch is where my doubts arise.
These engines/ libraries have distributed workers with dedicated coordination systems and have separate distributed implementation of all the machine learning algorithms. Since my tasks are written using ScikitLearn, how do these computations parallellize across the cluster that is provisioned by AI Platform since sklearn doesn't have any such distributed computing capabilities?
Following the docs here. The command I'm using,
gcloud ai-platform jobs submit training $JOB_NAME \
--job-dir $JOB_DIR \
--package-path $TRAINING_PACKAGE_PATH \
--module-name $MAIN_TRAINER_MODULE \
--region $REGION \
--runtime-version=$RUNTIME_VERSION \
--python-version=$PYTHON_VERSION \
--scale-tier $SCALE_TIER
Any help/ clarifications would be appreciated!
Alas, AI Platform Training can't automatically distribute your scikit-learn tasks. It basically just sets up the cluster, deploys your package to each node, and runs it.
You might want to try a distributed backend such as Dask for scaling out the task -- it has a drop-in replacement for Joblib that can run scikit-learn pipelines on a cluster.
I found one tutorial here: https://matthewrocklin.com/blog/work/2017/02/07/dask-sklearn-simple
Hope that helps!

TensorFlow Docker Images

When using the general TensorFlow docker images, they won't be optimized for the exact target architecture.
a) Are there studies for the performance penalty for using these general docker images vs. compiling for the specific architecture?
b) When using a orchestration system such as KubeFlow/Mesos across a heterogeneous cluster, what are best practices for mapping nodes to the optimized TensorFlow compilation (e.g., installing it on each node, having multiple docker images....).
Thanks for your feedback!
For the performance, you can have a look at breandangregg container performance analysis
It's quite good, due to dockerizing is more similar to doing chroot than virtualization, because containers share some kernel functions.
Best practice for heterogeneous cluster is to have a set of images.
For each image, you can run containers with different configuration passing environment variables.
If configuration is going to be the same, you can use autoscaling function of kubernetes, for example.

Google cloud platform setup ERROR: (gcloud.beta.ml) Invalid choice: 'init-project'

I am using cloud shell in google cloud platform. I am trying to getting things installed for machine learning. The codes that I have used so far are
curl https://storage.googleapis.com/cloud-ml/scripts/setup_cloud_shell.sh | bash
export PATH=${HOME}/.local/bin:${PATH}
curl https://storage.googleapis.com/cloud-ml/scripts/check_environment.py | python
gcloud beta ml init-project
It works fine in the first three lines but for the last command, I get
ERROR: (gcloud.beta.ml) Invalid choice: 'init-project'.
Usage: gcloud beta ml [optional flags] <group>
group may be language | speech | video | vision
For detailed information on this command and its flags, run:
gcloud beta ml --help
this error for the last gcloud~ line. Anyone knows what I can do to solve this problem?
Thank you.
First off, let me note that you don't need to run the BETA command as the gcloud ml variant is also available.
As the error message indicates, 'init-project' is not a valid choice, you should instead use one of the following groups: language, speech, video, vision, each of which allows you to make calls to the corresponding API. For instance, you could run the following:
$gcloud ml vision detect-faces IMAGE_PATH
and detect faces within the indicated image.
That said, from your comments it appears that you are not interested in any of the above. If you are looking to train your own TensorFlow models on google cloud platform, you should take a look at the docs relating to Cloud ML Engine. The page that dsesto pointed you to is a good start. I would advise that you also try out the examples in this github repository, particularly the census one. Once there, you'll also see that the gcloud command group used for training models on the cloud (as well as deploying them and using them for prediction jobs) is actually gcloud ml-engine, not gcloud ml.

Google cloud ML slow cpu operations

I am training a Tensorflow model with, unfortunately, many CPU operations. On my local machine, I compiled Tensorflow from source with support for SSE4.2/AVX/FMA to make training run faster. When I train on gcloud via their ML engine service, I am getting 10x slow down compared to local. I suspect that Tensorflow on gcloud ML engine wasn't compile with CPU optimizations. I was wondering if what are ways around this.

Resources