When should you use AWS SageMaker GroundTruth (SMGT) vs AWS Sagemaker Augmented AI (A2I)? - data-annotations

I'm working on building an annotation workflow for my data science team. I see there are conflicting AWS products and it's not clear which one to use for what purpose.

Related

Can a Vertex AI custom prediction model (docker) read from Cloud Storage FUSE?

According to the documentation, you can use Cloud Storage FUSE during training. However, I cannot find in the documentation if the filesystem is also available during inference.
My goal is to load various pre-trained models from GCS. I do not want to package these models (huggingface) into the Docker image, so I am able to make changes to these models without pushing the entire image.

How to deploy machine learning model saved as pickle file on AWS SageMaker

I have built an XGBoost Classifier and RandomForest Classifier model for the audio classification project. I want to deploy these models which are saved in pickle (.pkl) format on AWS Sagemaker. From what I have observed, there isn't a lot of resources available online. Can anyone guide me with the steps and if possible also provide the code? I already have the models built and I'm just left with deploying it on Sagemaker.
By saying that you want to deploy to sagemaker, I assume you mean a sagemaker endpoint.
The answer is sagemaker inference toolkit. It's basically about educating sagemaker how to load and do inference. More details here: https://github.com/aws/sagemaker-inference-toolkit and here is an example implementation: https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/multi_model_bring_your_own

Migrate from running ML training and testing locally to Google Cloud

I currently have a simple Machine Learning infrastructure running locally and I want to migrate this all onto Google Cloud. I simply fetch the data I need from a database, build my model and then test the model on test data. This is all done in PyCharm locally.
I want to simply migrate this and have the possibility for all this to be done on Google Cloud, while having the flexibility to make local changes that can apply when run on the cloud as well. There are many Google Cloud resources relating to this and so I am looking for best practices people follow on running such a procedure.
Thanks and please let me know if there are any clarifications needed.
I highly suggest you to take a look at this machine learning workflow in the cloud which consists of:
Data Ingestion and Collection
Storing the data.
Processing data.
ML training.
ML deployment.
Data Ingestion and Collection
There are multiple resources you can use if you would like to ingest data with Google Cloud Platform. The simplest solution I can recommend to you are both Google Compute Engine or an App Engine App (for example for a forum where a user fill some data up).
Nonetheless, if you would like to ingest data in real-time, you can also use Cloud Pub/Sub.
Storing the data
As you mentioned, you are retrieving all the information from a database. If you are used to work with SQL or NoSQL I highy suggest you to go after Cloud SQL. Not only provides a good interface when building your instance, but also lets you access it securely and very rapidly.
If it not the case, you can also use Google Cloud Storage or BigQuery, but over those two, I will pick BigQuery since it has also the possibility to work with stream data.
Processing data
For processing data before feeding it to the model you can use either:
Cloud DataFlow: Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness -- no more complex workarounds or compromises needed.
Cloud Dataproc: Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark and Apache Hadoop clusters in a simpler, more cost-efficient way.
Cloud Dataprep: Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis, reporting, and machine learning.
ML training & ML deployment
For training/deploying your ML model I would suggest to use AI platform.
AI Platform makes it easy for machine learning developers, data scientists, and data engineers to take their ML projects from ideation to production and deployment, quickly and cost-effectively.
If you have to work with huge datasets, the best practices are run the model as a Tensorflow job with AI Platform so you can have a training cluster.
Finally for deploying your models using AI Platform, you can take a look here.

Is there a way to use external, compiled packages for data processing in Google's AI Platform?

I would like to set up a prediction task, but the data preprocessing step requires using tools outside of Python's data science ecosystem, though Python has APIs to work with those tools (e.g. a compiled java NLP tool set). I first thought about creating a Docker container to have an environment with those tools available, but a commentator has said that that is not currently supported. Is there perhaps some other way to make such tools available to the Python prediction class needed for AI Platform? I don't really have a clear sense of what's happening on the backend with AI platform, and how much ability a user has to modify or set that up.
Not possible today. Is there any specific use case you are targeting not satisfied today?
Cloud AI platform offers multiple prediction frameworks (TensorFlow, scikit-learn, XGboost, Pytorch, Custom predictions) in multiple versions.
After looking into the requirements you can use the new AI Platform feature custom prediction, https://cloud.google.com/ml-engine/docs/tensorflow/custom-prediction-routine-keras
To deploy a custom prediction routine to serve predictions from your trained model, do the following:
Create a custom predictor to handle requests
Package your predictor and your preprocessing module. Here you can install your custom libraries.
Upload your model artifacts and your custom code to Cloud Storage
Deploy your custom prediction routine to AI Platform

Can Google Cloud Vision API be trained using your image data?

IBM Watson has a capability where you can train the classifiers on Watson using your images but I am unable to find a similar capability on Google Cloud Vision API? What I want is that I upload 10-15 classes of images and on the bases of upload images classify any images loaded after that. IBM Bluemix (Watson) has this capability but their pricing is significantly higher than Google. I am open to other services as well, if prices ares below Google's
As far as I know Google Cloud Vision API cannot be trained with your custom data. However, there is a service called vize.ai, where you can define your custom classes and upload the images, the training is for free and the prices for API usage are below Google's and IBM's.
Disclaimer: I'm vize.it co-founder
Edit: Link changed
You can train your own models using Cloud AutoML Vision. There are 2 different ways to do this:
Cloud-hosted models.
Edge exportable models.
With some work you can train a model for free using TensorFlow - see the model training section.
However, they have released an already trained model, so if you're lucky and what you want to classify already overlaps with their model, then no training is needed.
Azure has started this now, google for "azure custom vision" this is still a preview service but with good accuracy at least for our workload which is preschool children images.

Resources