I created a Machine Learning pipeline from training the model and deploy it as a web service. I put everything on Github but I did not put the training dataset as Github limits file size up to 100 MB. After I train the model, I save the model and necessary files into .pkl file. The model file size itself ~300 MB so I can't upload the model to Github. I connected my repo to Heroku and try to send a request but then I realized that I do not have the model along the training dataset so I can't make a request.
Is there any best practice to do deploy Machine Learning model considering some limitation from Github?
Please advise
Github is a version control system. Technically, your repository should not contain training data or trained models.
The most real-life Machine Learning systems store trained models in the file storage, for instance S3.
Related
Is it possible to upload a pre-trained machine learning model that was trained on a different environment on databricks, and serve it? Or is it impossible on Databricks ?
The best way to use a trained model on another environment is to use MLflow. You can save several models with different versions and load them in any Databricks environment. I advise you to consult the following documentation here.
I have built an XGBoost Classifier and RandomForest Classifier model for the audio classification project. I want to deploy these models which are saved in pickle (.pkl) format on AWS Sagemaker. From what I have observed, there isn't a lot of resources available online. Can anyone guide me with the steps and if possible also provide the code? I already have the models built and I'm just left with deploying it on Sagemaker.
By saying that you want to deploy to sagemaker, I assume you mean a sagemaker endpoint.
The answer is sagemaker inference toolkit. It's basically about educating sagemaker how to load and do inference. More details here: https://github.com/aws/sagemaker-inference-toolkit and here is an example implementation: https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/multi_model_bring_your_own
Is there any guideline or best practice for storing machine learning models? We can store them as binary files. However, machine learning model is more than model artifact (it also contains data, code, hyperparameters, metrics). Wonder if there is any practice around integrating Artifactory with CI/CD process (using to manage ML model artifact/metadata, supporting automated model promotion and human-in-the-loop promotion)?
The only article touching this topic very lightly is:
https://towardsdatascience.com/who-moved-my-binaries-7c4d797cd783
Is there a suggested way to serve hundreds of machine learning models in Kubernetes?
Solutions like Kfserving seem to be more suitable for cases where there is a single trained model, or a few versions of it, and this model serves all requests. For instance a typeahead model that is universal across all users.
But is there a suggested way to serve hundreds or thousands of such models? For example, a typeahead model trained specifically on each user's data.
The most naive way to achieve something like that, would be that each typeahead serving container maintains a local cache of models in memory. But then scaling to multiple pods would be a problem because each cache is local to the pod. So each request would need to get routed to the correct pod that has loaded the model.
Also having to maintain such a registry where we know which pod has loaded which model and perform updates on model eviction seems like a lot of work.
You can use Catwalk mixed with Grab.
Grab has a tremendous amount of data that we can leverage to solve
complex problems such as fraudulent user activity, and to provide our
customers personalized experiences on our products. One of the tools
we are using to make sense of this data is machine learning (ML).
That is how Catwalk is created: an easy-to-use, self-serve, machine
learning model serving platform for everyone at Grab.
More infromation about Catwalk you can find here: Catwalk.
You can serve multiple Machine Learning models using TensorFlow and Google Cloud.
The reason the field of machine learning is experiencing such an epic
boom is because of its real potential to revolutionize industries and
change lives for the better. Once machine learning models have been
trained, the next step is to deploy these models into usage, making
them accessible to those who need them — be they hospitals,
self-driving car manufacturers, high-tech farms, banks, airlines, or
everyday smartphone users. In production, the stakes are high and one
cannot afford to have a server crash, connection slow down, etc. As
our customers increase their demand for our machine learning services,
we want to seamlessly meet that demand, be it at 3AM or 3PM.
Similarly, if there is a decrease in demand we want to scale down the
committed resources so as to save cost, because as we all know, cloud
resources are very expensive.
More information you cna find here: machine-learning-serving.
Also you can use Seldon.
Seldon Core is an open source platform for deploying machine learning models on a Kubernetes cluster.
Features:
deploying machine learning models in the cloud or on-premise.
gaining metrics ensuring proper governance and compliance for your
running machine learning models.
creating inference graphs made up of multiple components.
providing a consistent serving layer for models built using
heterogeneous ML toolkits.
Useful documentation: Kubernetes-Machine-Learning.
I am looking to host 5 deep learning models where data preprocessing/postprocessing is required.
It seems straightforward to host each model using TF serving (and Kubernetes to manage the containers), but if that is the case, where should the data pre and post-processing take place?
I'm not sure there's a single definitive answer to this question, but I've had good luck deploying models at scale bundling the data pre- and post-processing code into fairly vanilla Go or Python (e.g., Flask) applications that are connected to my persistent storage for other operations.
For instance, to take the movie recommendation example, on the predict route it's pretty performant to pull the 100 films a user has watched from the database, dump them into a NumPy array of the appropriate size and encoding, dispatch to the TensorFlow serving container, and then do the minimal post-processing (like pulling the movie name, description, cast from a different part of the persistent storage layer) before returning.
Additional options to josephkibe's answer, you can:
Implementing processing into model itself (see signatures for keras models and input receivers for estimators in SavedModel guide).
Install Seldon-core. It is a whole framework for serving that handles building images and networking. It builds service as a graph of pods with different API's, one of them are transformers that pre/post-process data.