How to get a specific machine type for ML Engine online prediction? - google-cloud-ml-engine

Is there an option to request a faster node for online prediction in ML Engine?
For example, when training I can configure any of these machines for my job:
standard,
large_model,
complex_model_s,
complex_model_m,
complex_model_l,
standard_gpu,
complex_model_m_gpu,
complex_model_l_gpu,
standard_p100,
complex_model_m_p100
See description of available clusters and machines for training here and here
I am struggling to find if it is possible to control what kind of machine runs my online prediction.

We are currently adding that capability and will let you know when it's publicly available.

ML Engine offers 4-core instance type in addition to the default serving instance type for online prediction. However the feature is still at alpha stage and it will only be available to a selected list of accounts who opted in as "Trusted Testers". Please contact cloudml-feedback#google.com if you need help to setup prediction service with faster node.

Related

Does AutoML accept external models?

I used random search and got the best hyper parameters for my model, can I pass that model to the AutoML?
Does AutoML do the random search for the best hyper parameters by itself? or is there something I need to pass?
I presume you're referring to Google Cloud AutoML. It is a cloud-based Machine Learning (ML) platform that suggests a no-code approach to building data-driven solutions. AutoML was designed to build custom models for both newcomers and experienced machine learning engineers.
For newcomers, you could use Vertex AI (fully automated) to build a ML model:
For experienced ML engineers, you could also use AutoML Tabular to build a custom model, with the ability to select a model and input the selected hyperparameters:
You can read more details from here

Using scikit-learn on Databricks

Scikit-Learn algorithms are single node implementations. Does this mean, that they are not an appropriate choice for building machine learning models on Databricks cluster for the reason that they cannot take advantage of the cluster computing resources ?
They are not appropriate, in the sense that, as you say, they cannot take advantage of the cluster computing resources, which Databricks is arguably all about. The raison d'être of Databricks is Apache Spark, and specifically for ML tasks, its ML library Spark MLlib.
This does not mean that you cannot use scikit-learn in Databricks (you'll find that a Databricks cluster comes by scikit-learn installed by default), only that it is usable for problems that do not actually require a cluster. If you want to exploit the cluster resource capabilities for ML, you need to revert to Spark MLlib.
I think desertnaut hit the nail on the head here. I believe Scikit Learn algos are designed only for non-parallel processing jobs, and all the MLlib stuff is designed to leverage cluster compute resources and parallel processing resources. Take a look at the link below for sample code for standard regression and classification tasks.
https://spark.apache.org/docs/latest/ml-classification-regression.html
In addition, here are some code samples for different clustering tasks.
https://spark.apache.org/docs/latest/ml-clustering.html
That should probably cover most of the things you will be doing.
I believe that it depends on the task at hand. I see two general scenarios:
Your data is big and does not fit into memory. Go with the Spark MLlib and their distributed algos.
Your data is not that big and you want to utilize sheer computing power. The typical use case is hyperparameter search.
Databricks allow for distributing such workloads from the driver node to the executors with hyperopt and its SparkTrials (random + Bayesian search).
Some docs here>
http://hyperopt.github.io/hyperopt/scaleout/spark/
However, there are much more attempts to make the sklearn on spark work. You can supposedly distribute the workloads through UDF, using joblib, or others. I am investigating the issue myself, and will update the answer later.

Kubernetes Machine Learning Model Serving

Is there a suggested way to serve hundreds of machine learning models in Kubernetes?
Solutions like Kfserving seem to be more suitable for cases where there is a single trained model, or a few versions of it, and this model serves all requests. For instance a typeahead model that is universal across all users.
But is there a suggested way to serve hundreds or thousands of such models? For example, a typeahead model trained specifically on each user's data.
The most naive way to achieve something like that, would be that each typeahead serving container maintains a local cache of models in memory. But then scaling to multiple pods would be a problem because each cache is local to the pod. So each request would need to get routed to the correct pod that has loaded the model.
Also having to maintain such a registry where we know which pod has loaded which model and perform updates on model eviction seems like a lot of work.
You can use Catwalk mixed with Grab.
Grab has a tremendous amount of data that we can leverage to solve
complex problems such as fraudulent user activity, and to provide our
customers personalized experiences on our products. One of the tools
we are using to make sense of this data is machine learning (ML).
That is how Catwalk is created: an easy-to-use, self-serve, machine
learning model serving platform for everyone at Grab.
More infromation about Catwalk you can find here: Catwalk.
You can serve multiple Machine Learning models using TensorFlow and Google Cloud.
The reason the field of machine learning is experiencing such an epic
boom is because of its real potential to revolutionize industries and
change lives for the better. Once machine learning models have been
trained, the next step is to deploy these models into usage, making
them accessible to those who need them — be they hospitals,
self-driving car manufacturers, high-tech farms, banks, airlines, or
everyday smartphone users. In production, the stakes are high and one
cannot afford to have a server crash, connection slow down, etc. As
our customers increase their demand for our machine learning services,
we want to seamlessly meet that demand, be it at 3AM or 3PM.
Similarly, if there is a decrease in demand we want to scale down the
committed resources so as to save cost, because as we all know, cloud
resources are very expensive.
More information you cna find here: machine-learning-serving.
Also you can use Seldon.
Seldon Core is an open source platform for deploying machine learning models on a Kubernetes cluster.
Features:
deploying machine learning models in the cloud or on-premise.
gaining metrics ensuring proper governance and compliance for your
running machine learning models.
creating inference graphs made up of multiple components.
providing a consistent serving layer for models built using
heterogeneous ML toolkits.
Useful documentation: Kubernetes-Machine-Learning.

Image Classification in Azure Machine Learning

I'm preparing for the Azure Machine Learning exam, and here is a question confuses me.
You are designing an Azure Machine Learning workflow. You have a
dataset that contains two million large digital photographs. You plan
to detect the presence of trees in the photographs. You need to ensure
that your model supports the following:
Solution: You create a Machine
Learning experiment that implements the Multiclass Decision Jungle
module. Does this meet the goal?
Solution: You create a Machine Learning experiment that implements the
Multiclass Neural Network module. Does this meet the goal?
The answer for the first question is No while for second is Yes, but I cannot understand why Multiclass Decision Jungle doesn't meet the goal since it is a classifier. Can someone explain to me the reason?
I suppose that this is part of a series of questions that present the same scenario. And there should be definitely some constraints in the scenario.
Moreover if you have a look on the Azure documentation:
However, recent research has shown that deep neural networks (DNN)
with many layers can be very effective in complex tasks such as image
or speech recognition. The successive layers are used to model
increasing levels of semantic depth.
Thus, Azure recommends using Neural Networks for image classification. Remember, that the goal of the exam is to test your capacity to design data science solution using Azure so better to use their official documentation as a reference.
And comparing to the other solutions:
You create an Azure notebook that supports the Microsoft Cognitive
Toolkit.
You create a Machine Learning experiment that implements
the Multiclass Decision Jungle module.
You create an endpoint to the
Computer vision API.
You create a Machine Learning experiment that
implements the Multiclass Neural Network module.
You create an Azure
notebook that supports the Microsoft Cognitive Toolkit.
There are only 2 Azure ML Studio modules, and as the question is about constructing a workflow I guess we can only choose between them. (CNTK is actually the best solution as it allows constructing a deep neural network with ReLU whereas AML Studio doesn't, and API call is not about data science at all).
Finally, I do agree with the other contributors that the question is absurd. Hope this helps.
This question is indeed part of a series of questions that present the same scenario with multiple options. Both of the solutions approach the problem as a multi-class classification problem, which is correct. However, the key element here is dimensionality.
Your inputs (images) are highly dimensional which requires a deep learning approach in order to be effective. A decision jungle won't be able to learn effectively in such a high dimensional feature space, where a NN has higher chances to do so.
I hope it helps.

Data prediction from previous data history using AI/ML

I am looking for solutions where I can automatically approve or disapprove different supplier invoices based on historical data.
Let's say, I got an invoice from an HP laptop supplier and based on the previous data, I have to approve or reject that invoice.
Basically, I want to make a decision or prediction based on the data already available based on the history with artificial intelligence, machine learning or any other cloud service
This isn't a direct question though but you can start by looking into various methods of classifications. There is a huge amount of material available online. Try reading about K-Nearest Neighbors, Naive Bayes, K-means, etc. to get an idea about how algorithms in Machine Learning domain work. Once you start understanding what is written in the documentation then start implementing them. You will face a lot of problems which you can search online and I'm sure you will find most of them answered here in this portal.

Resources