Can I use Google Cloud data labeling service to do text classification? - data-annotations

I have a large dataset with lots of news articles, stored in Google Cloud Storage. I want to train a sentiment classifier (positive, negative, neutral). Does Google has a data labeling service that I can use to create the training data? If yes, where can I find the API documentation?
It looks like Google Cloud AutoML Vision supports human labeling for image classification here: https://cloud.google.com/vision/automl/docs/human-labeling. However, I didn't find the one for text.

Google Cloud has an alpha version of Data Labeling API, which supports human annotation on text classification, sentiment analysis and other use cases (image, video etc.). You will need to email cloudml-data-customer#google.com to get whitelisted so that you can access their API and documentation.

Related

Can Google Cloud Vision API label faces?

I am currently using google cloud-vision api for a project. I want to assign a unique ID to a face, so that it automatically detects which IDs any image contains. This way I can know which person is in the image.
Can cloud-vision distinguish faces and return some unique ID for a face?
NO, and as Armin has already mentioned, Google Vision API doesn't support Facial Recognition or Face verification. It only performs face detection on an image. What you can actually do is to use tensorflow to complete what you want. Let me explain for you:
A typical face recognition system (pipeline) consists of couple of phases :
Face detection: which you can do it by using Google Vision API
Facial features extraction: which you can do by using tensorflow to extract facial features and get face embeddings of each detected face from step 1. Extracting the facial features could be done by using pre-trained model which are trained on large datasets like (VGGFace2, CASIA-WebFace).
Face recognition (identification or verification): which you can achieve by using
Tensorflow to read the face embeddings (which are fetched and saved in step 2) from the desk (it could be also fetched from a database, it depends where you have saved them)
Support Vector Machines (SVM) in python to do multi-class classification.
(IMO) The most important things in face recognition systems are correctly detecting faces and correctly extracting facial features. The third step is just a classification problem and it can be done in many ways, you can also for example use the Euclidean distance between the facial embeddings to know if two faces are similar or not (identify).
For the second and the third step you can take a look at FaceNet https://github.com/davidsandberg/facenet
which is great example how you can develop your own facial recognition system based on tensorflow.
The Vision API service offers a Face Detection feature that can be used to detect multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear. Based on this, you can get the bounding polygon around the face, the land marks, roll angle, detection confidence, among other properties; however, it is important to note that this feature doesn't support Facial Recognition, which means that it cannot be used to retrieve unique IDs for the faces detected.
In case this feature doesn't cover your current needs, you can use the Send Feedback button, located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in order to raise a Vision API feature request and notify to Google about this desired functionality.

Using Google ML Engine with BigQuery?

I'm currently designing a data warehouse in BigQuery. I'm planning to store user data like past purchases or abandoned carts.
This seems to be perfect to manually analyze trends and to get insights. But what if I want to leverage Machine Learning, e.g. to suggest products to a group of users?
I have looked into Google ML Engine and TensorFlow, and it seems like the TensorFlow model would need to query BigQuery first. In some scenarios, this could mean that TensorFlow would need to query all or most of the data that is stored in BigQuery.
This feels a bit off, so I'm wondering if this is really how things are supposed to happen. Otherwise, I assume that my ML model would have to work with stale data?
So I would agree with you, using BigQuery as a data warehouse for your ML is expensive. It would be cheaper and much more efficient to use Google Cloud Storage to store all the data you wish to process. Once everything is processed and generated, you may then wish to push that data to BigQuery push that data to another source like Spanner or even Cloud Storage.
That being said Google has now created a beta product BigQuery ML. This now allows users to create and execute machine learning models in BigQuery via the use of SQL queries. I believe it uses python and tensorflow under the hood, but I believe it would be the best solution given that you have a light weight ML load.
Since it is still in beta as of now, I don't know well it's performance compares to Google ML engine and tensorflow.
Depending on what kind of model you want to train and how you want to server the model you can do one the following options:
You can export your data to Google Cloud Storage as CSV and then read the files in Cloud ML Engine. This will let you use the power of Tensorflow and you can then use Cloud ML Engine's serving system to send traffic to your model.
On the downside, this means that you have to export all of your BigQuery data to GCS and every time you decide to make any change to the data you need to go back to BigQuery and export again. Also if the data you want to prediction on is in BigQuery you have to export that as well and send it to Cloud ML Engine using a separate system.
If you want to explore and interactively train Logistic or Linear regression models on your data, you can use BigQuery Machine learning. This will allow you to slice and dice your data in BigQuery and experiment with different parts of your data and various preprocessing options. You can also use all the power of SQL. BigQuery ML also allows you to use the model after training within BigQuery (you can use SQL to feed data in to the model).
For many cases using full power of Tensorflow (i.e. using DNNs) is not necessary. This is especially true for structured data. On the other hand, most of your time will be spent on preprocessing and cleaning the data which would be much easier in SQL in BigQuery.
So you have two options here. Choose based on your needs.
P.S.: You can also try using BigQuery Reader in Tensorflow. I don't recommend it as it is very slow. But if your data is not huge it may work for you.

Annotated images classification

I've got a bunch of images (~3000) which have been manually classified (approved/rejected) based on some business criteria. I've processed these images with Google Cloud Platform obtaining annotations and SafeSearch results, for example (csv format):
file name; approved/rejected; adult; spoof; medical; violence; annotations
A.jpg;approved;VERY_UNLIKELY;VERY_UNLIKELY;VERY_UNLIKELY;UNLIKELY;boat|0.9,vehicle|0.8
B.jpg;rejected;VERY_UNLIKELY;VERY_UNLIKELY;VERY_UNLIKELY;UNLIKELY;text|0.9,font|0.8
I want to use machine learning to be able to predict if a new image should be approved or rejected (second column in the csv file).
Which algorithm should I use?
How should I format the data, especially the annotations column? Should I obtain first all the available annotation types and use them as a feature with the numerical value (0 if it doesn't apply)? Or would it be better to just process the annotation column as text?
I would suggest you try convolutional neural networks.
Maybe the fastest way to test your idea if it will work or not (problem could be the number of images you have, which is quite low), is to use transfer learning with Tensorflow. There are great tutorials made by Magnus Erik Hvass Pedersen, who published them on youtube.
I suggest you go through all the videos, but the important ones are #7 and #8.
Using transfer learning allows you to use the models they build at google to classify images. But with transfer learning, you are able to use your own data with your own labels.
Using this approach you will be able to see if this is suitable for your problem. Then you can dive into convolutional neural networks and create the pipeline that will work the best for your problem.

Can Google Cloud Vision API be trained using your image data?

IBM Watson has a capability where you can train the classifiers on Watson using your images but I am unable to find a similar capability on Google Cloud Vision API? What I want is that I upload 10-15 classes of images and on the bases of upload images classify any images loaded after that. IBM Bluemix (Watson) has this capability but their pricing is significantly higher than Google. I am open to other services as well, if prices ares below Google's
As far as I know Google Cloud Vision API cannot be trained with your custom data. However, there is a service called vize.ai, where you can define your custom classes and upload the images, the training is for free and the prices for API usage are below Google's and IBM's.
Disclaimer: I'm vize.it co-founder
Edit: Link changed
You can train your own models using Cloud AutoML Vision. There are 2 different ways to do this:
Cloud-hosted models.
Edge exportable models.
With some work you can train a model for free using TensorFlow - see the model training section.
However, they have released an already trained model, so if you're lucky and what you want to classify already overlaps with their model, then no training is needed.
Azure has started this now, google for "azure custom vision" this is still a preview service but with good accuracy at least for our workload which is preschool children images.

Does google prediction api work for image recogniton

I read the official documentation for the api, but I wanted to make sure that it's possible for it to perform object recognition in images. More specifically, my idea is to provide a lot of images of parking lots with the number of parking spots currently available. I wanna get a model to predict how many spots are available given an image of the parking lot.
Does anybody have previous experience with using the API for a similar goal?
No i don't think google prediction api will works for image recognition.
because prediction api knows only numeric and string.
for image recognition Google Vision Api is the best , i think it cant able to recognize humans or persons but it is recognize place like eiffel tower and all.
even it can able to read the image written strings.

Resources