confusion matrix not displayed in google cloud automl vision - machine-learning

I'm not observing the confusion matrix in the "EVALUATE" tab of the web UI when following the basic quickstart:
https://cloud.google.com/vision/automl/docs/quickstart?refresh=1
The confusion matrix should be displayed according to this documentation:
https://cloud.google.com/vision/automl/docs/evaluate

It appears that this happens because multi-label classification is enabled during importing of the image dataset. Since multiple labels can correspond to each image, hence the confusion matrix is not well-defined.

Related

Features in Images Dataset

As it is known that there are several features in the dataset for the machine learning model. Do the dataset that has only pictures also contain features?
As they can't be opened in excel file, do they contain features?
My project is on PLANT DISEASE DETECTION USING DEEP LEARNING and my professor is asking about the features in the dataset.
I don't know what to say.
I don't know if it is the right place to ask such general question in ML (that would be Cross-Validated I guess). That being said:
So do they contain features?
A feature depends on you and what information you would want to retrieve from it. This means to a certain extent, everything "contains" a feature.
picture datas can always be mapped/transformed into observation-variable dataset where your observation is your picture, and the number of variables/features are arbitrary being an 1D array feature describing the variation of each area in each of your images. The greater your vector is,the more efficient your model will be.
Of course, this is just to answer your question about the how-to theorically as you asked. In practice, you'll need some tool to do that, but I am sure you'll find.
Hope it helped.
If the dataset contains only pictures, features are just hidden in those pictures. You need to extract them automatically using a CNNs - Convolutional Neural Networks for example.
Suppose this is your original image
If you visualize the layers of your CNN (the feature maps from the output of the very first layer for example)
Bright areas are the “activated” regions, meaning the filter detected the pattern it was looking for. This filter seems to encode an eye and nose detector.
Keep reading about CNNs here https://towardsdatascience.com/applied-deep-learning-part-4-convolutional-neural-networks-584bc134c1e2.
From that medium story, I took these photos.

Affinity Propagation for Image Clustering

The link here describes a method for image classification using affinity propagation. I'm confused as to how they got the feature vectors, i.e, the data structure of the images, e.g, arrays?
Additionally, how would I accomplish this given that I can't use Places365 as it's custom data (audio spectrograms)?
Finally, how would I plot the images as they've done in the diagram?
The images are passed through a neural network. The activations of neural network layer for an image is the feature vector. See https://keras.io/applications/ for examples.
Spectrograms can be treated like images.
Sometimes even when domain is very different, the neural network features can extract useful information that can help you with clustering/classification tasks.

Can Google Cloud Vision API label faces?

I am currently using google cloud-vision api for a project. I want to assign a unique ID to a face, so that it automatically detects which IDs any image contains. This way I can know which person is in the image.
Can cloud-vision distinguish faces and return some unique ID for a face?
NO, and as Armin has already mentioned, Google Vision API doesn't support Facial Recognition or Face verification. It only performs face detection on an image. What you can actually do is to use tensorflow to complete what you want. Let me explain for you:
A typical face recognition system (pipeline) consists of couple of phases :
Face detection: which you can do it by using Google Vision API
Facial features extraction: which you can do by using tensorflow to extract facial features and get face embeddings of each detected face from step 1. Extracting the facial features could be done by using pre-trained model which are trained on large datasets like (VGGFace2, CASIA-WebFace).
Face recognition (identification or verification): which you can achieve by using
Tensorflow to read the face embeddings (which are fetched and saved in step 2) from the desk (it could be also fetched from a database, it depends where you have saved them)
Support Vector Machines (SVM) in python to do multi-class classification.
(IMO) The most important things in face recognition systems are correctly detecting faces and correctly extracting facial features. The third step is just a classification problem and it can be done in many ways, you can also for example use the Euclidean distance between the facial embeddings to know if two faces are similar or not (identify).
For the second and the third step you can take a look at FaceNet https://github.com/davidsandberg/facenet
which is great example how you can develop your own facial recognition system based on tensorflow.
The Vision API service offers a Face Detection feature that can be used to detect multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear. Based on this, you can get the bounding polygon around the face, the land marks, roll angle, detection confidence, among other properties; however, it is important to note that this feature doesn't support Facial Recognition, which means that it cannot be used to retrieve unique IDs for the faces detected.
In case this feature doesn't cover your current needs, you can use the Send Feedback button, located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in order to raise a Vision API feature request and notify to Google about this desired functionality.

Implementing Face Recognition using Local Descriptors (Unsupervised Learning)

I'm trying to implement a face recognition algorithm using Python. I want to be able to receive a directory of images, and compute pair-wise distances between them, when short distances should hopefully correspond to the images belonging to the same person. The ultimate goal is to cluster images and perform some basic face identification tasks (unsupervised learning).
Because of the unsupervised setting, my approach to the problem is to calculate a "face signature" (a vector in R^d for some int d) and then figure out a metric in which two faces belonging to the same person will indeed have a short distance between them.
I have a face detection algorithm which detects the face, crops the image and performs some basic pre-processing, so the images i'm feeding to the algorithm are gray and equalized (see below).
For the "face signature" part, I've tried two approaches which I read about in several publications:
Taking the histogram of the LBP (Local Binary Pattern) of the entire (processed) image
Calculating SIFT descriptors at 7 facial landmark points (right of mouth, left of mouth, etc.), which I identify per image using an external application. The signature is the concatenation of the square root of the descriptors (this results in a much higher dimension, but for now performance is not a problem).
For the comparison of two signatures, I'm using OpenCV's compareHist function (see here), trying out several different distance metrics (Chi Square, Euclidean, etc).
I know that face recognition is a hard task, let alone without any training, so I'm not expecting great results. But all I'm getting so far seems completely random. For example, when calculating distances from the image on the far right against the rest of the image, I'm getting she is most similar to 4 Bill Clintons (...!).
I have read in this great presentation that it's popular to carry out a "metric learning" procedure on a test set, which should significantly improve results. However it does say in the presentation and elsewhere that "regular" distance measures should also get OK results, so before I try this out I want to understand why what I'm doing gets me nothing.
In conclusion, my questions, which I'd love to get any sort of help on:
One improvement I though of would be to perform LBP only on the actual face, and not the corners and everything that might insert noise to the signature. How can I mask out the parts which are not the face before calculating LBP? I'm using OpenCV for this part too.
I'm fairly new to computer vision; How would I go about "debugging" my algorithm to figure out where things go wrong? Is this possible?
In the unsupervised setting, is there any other approach (which is not local descriptors + computing distances) that could work, for the task of clustering faces?
Is there anything else in the OpenCV module that maybe I haven't thought of that might be helpful? It seems like all the algorithms there require training and are not useful in my case - the algorithm needs to work on images which are completely new.
Thanks in advance.
What you are looking for is unsupervised feature extraction - take a bunch of unlabeled images and find the most important features describing these images.
The state-of-the-art methods for unsupervised feature extraction are all based on (convolutional) neural networks. Have look at autoencoders (http://ufldl.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity) or Restricted Bolzmann Machines (RBMs).
You could also take an existing face detector such as DeepFace (https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf), take only feature layers and use distance between these to group similar faces together.
I'm afraid that OpenCV is not well suited for this task, you might want to check Caffe, Theano, TensorFlow or Keras.

Using Weka on Images

I am new to Weka, and from the examples on how to use it, I have only seen text problems. Can I use images in Weka with the machine learning classifiers?
You can directly do pixel classification using the Trainable Weka Segmentation plugin (former Advanced Weka Segmentation plugin) from Fiji/ImageJ.
The plugin is designed for segmentation via interactive learning. This means the user is expected to select a set of features (edge detectors, texture filters, etc.), choose the number of classes (by default there are 2) and interactively draw (with the ROI tools) samples of all classes. After training the classifier based on those samples, the whole image pixels will be classified and the segmentation result will be displayed overlaying the original image. The idea is to repeat this process (drawing + training) until obtaining a satisfying segmentation.
The plugin provides as well a set of tools to save/load the samples in ARFF format and save/load the classifier in .model format, so it's completely compatible with the latest version of WEKA.
If what you want to do is image classification, you might be able to reuse some of the plugin's methods as well.
You can use open source Image processing application such as ImageJ and Fiji to extract features from your image and use it in Weka
Fiji has a plugin called Advanced Weka Segmentation which should be very useful in applying Weka classifiers to Image
Weka machine learning classifiers works with numerical and categorical features. Before using weka with images, you need to extract features from your images.
According to your needs, simple features like average, maximum, mean may be enough. Or you may need to use some other algorithms for your images.
Below wikipedia feature extraction algorithms.
Low-level
Edge detection
Corner detection
Blob detection
Ridge detection
Scale-invariant feature transform
I suggest reading a optical character recognition survey to understand how they are used. OCR is pretty simple example for you to use. Standard data sets and algorithms exists for OCR. Therefore it is very instructive to learn about it.

Resources