I am currently using google cloud-vision api for a project. I want to assign a unique ID to a face, so that it automatically detects which IDs any image contains. This way I can know which person is in the image.
Can cloud-vision distinguish faces and return some unique ID for a face?
NO, and as Armin has already mentioned, Google Vision API doesn't support Facial Recognition or Face verification. It only performs face detection on an image. What you can actually do is to use tensorflow to complete what you want. Let me explain for you:
A typical face recognition system (pipeline) consists of couple of phases :
Face detection: which you can do it by using Google Vision API
Facial features extraction: which you can do by using tensorflow to extract facial features and get face embeddings of each detected face from step 1. Extracting the facial features could be done by using pre-trained model which are trained on large datasets like (VGGFace2, CASIA-WebFace).
Face recognition (identification or verification): which you can achieve by using
Tensorflow to read the face embeddings (which are fetched and saved in step 2) from the desk (it could be also fetched from a database, it depends where you have saved them)
Support Vector Machines (SVM) in python to do multi-class classification.
(IMO) The most important things in face recognition systems are correctly detecting faces and correctly extracting facial features. The third step is just a classification problem and it can be done in many ways, you can also for example use the Euclidean distance between the facial embeddings to know if two faces are similar or not (identify).
For the second and the third step you can take a look at FaceNet https://github.com/davidsandberg/facenet
which is great example how you can develop your own facial recognition system based on tensorflow.
The Vision API service offers a Face Detection feature that can be used to detect multiple faces within an image along with the associated key facial attributes such as emotional state or wearing headwear. Based on this, you can get the bounding polygon around the face, the land marks, roll angle, detection confidence, among other properties; however, it is important to note that this feature doesn't support Facial Recognition, which means that it cannot be used to retrieve unique IDs for the faces detected.
In case this feature doesn't cover your current needs, you can use the Send Feedback button, located at the lower left and upper right corners of the service public documentation, as well as take a look the Issue Tracker tool in order to raise a Vision API feature request and notify to Google about this desired functionality.
Related
I need to build an image classification model using tensor flow but in my datasets I have more than 10000 classes and only 5 images per class.
I understand that 5 is too small number of images and ideally there should be "at least" 100 images for each class, but at this point I don't understand how some "Face recognition" models can work.
For instance, all the modern smartphones provide a "face recognition" feature that can identify the phone's owner among all the faces in the world and the setup is very easy, it just needs a quick shot (3 to 5 secs) to the owner face.
So, why can this work and instead the image classification models require a high number of image to achieve an acceptable accuracy?
Are these models built using a different technology behind the scenes?
Would it be possible to build an "image classification" models using the same technology that the smartphones use for "Face recognition"?
Smartphone Face recognition: What your smartphones Face recognition system does is to identify certain key features say S, on your face. So given a new face, it will either say "Yes this face matches with the S" or "No, this face does not match with the S". So as you can see all you need is few samples of your face to identify this good set S. When it sees a new face all it has to do is to get these key features from the new face and compare it with S and finally says "Yes" or "No". It does not have to say, it is your face or your fathers face or your mothers face etc. All it has to say is "yes it matches" or "no it does not match"
Image classification: However, Image classification is a totally different task, where it has to classify each image to a class. To identify if a image is a cat it has to extract certain key features which distinguish it from other animals. So if you have have 100 such different animal you need 100 such sets of distinguishable key features. This is the reason you need large samples for each class so that the Image classification system can identify such key feature set for each class.
How you identify the key features is a totally different ball game. It can be either using the classical Image processing techniques (like SIFT, SURF etc) or by using deep learning techniques (like CNNs, Autoencoder etc)
In the application I am developing, I have about 5000 product label images.(One label per product).
One functionality of my application is that user can take a picture using his camera and get a possible match(es) against the product labels registered the system.
Since initially, my system only has one sample per product, I decided to go with traditional Computer Vision techniques. I managed to implement this using Feature extraction and Descriptor matching.(using OpenCV SIFT and FLANN techniques referring this: https://github.com/kipr/opencv/blob/master/samples/cpp/matching_to_many_images.cpp)
Now I am thinking how to improve the accuracy by combining with CNN or Deep Learning techniques since when users approve matches, it gradually add more label samples for a product.
Is it possible to build a hybrid image matching system combining Computer Vision techniques and CNN/Deep Learning techniques?
Are there any similar services already available as services?
You should learn more about Distance Metrics Learning (DML). There is a lot of information on the internet, but briefly:
You must get embeddings (vector representation) for each image from your base (e.g. get feature vector from last convolutional layer of one of the modern CNN's (Inception, VGG, ResNet, DenseNet))
Then, when you get new image, you should create vector representation of the current image and find the closest vector from your base (by Euclidean distance, for example)
This topic is quite complicated, so study it carefully :)
Have a luck!
The problem statement:
Given two images such as the two images of Brad Pitt below, figure out if the image contains the same person or no. The difficulty is that we have only one reference image for each person and what to figure out if any other incoming image contains the same person or no.
Some research:
There are a few different methods of solving this task, these are
Using color histograms
Keypoint oriented methods
Using deep convolutional neural networks or other ML techniques
The histogram methods involve calculating histograms based on color and defining some sort of metric between them and then deciding upon a threshold. One that I have tried is the Earth Mover's Distance. However this method is lacking in accuracy.
The best approach therefore should be some sort of mix between 2nd and 3rd methods, and some preprocessing.
For preprocessing obvious steps to perform are:
Run a face detection such as Viola-Jones and separate the regions containing faces
Convert the said faces to grayscale
Run eye,mouth,nose detection algorithms perhaps using haar_cascades of opencv
Align the face images according found landmarks
All of this is done using opencv.
Extracting features such as SIFT and MSER generate accuracy of between 73-76%. After some additional research I've come across this paper using fisherfaces. And the fact that opencv has now the ability to create fisherface detectors and train them is great and works fantastically, achieving the accuracy promised by the paper on the Yale datasets.
The complication of the problem is that in my case I don't have a database with several images of the same person, to train the detector on. All I have is a single image corresponding to a single person, and given another image I want to understand whether this is the same person or no.
So what I am interested in knowing is`
Has anyone tried anything of the sort? What are some papers/methods/libraries that I should look into?
Do you have any suggestions on how to tackle problem?
Since you have only one image, you can give this method using DLib a try. I have used 3-4 images per person and it is giving good results.
Detect face (sample_face)
Get face descriptor (128 D vector) using dlib compute_face_descriptor (Check link)
Get the new picture in which you want to recognise the face
detect face and compute the descriptor(lets call test_face).
Compute euclidean distance between test_face descriptor and all sample_faces descriptor
assign the test_face with class(person name) with least euclidean distance.
Give this a whirl, you can play with face aligning if you start getting good results.
This is one of the hot topic for computer visin area. To handle as you have written there are many kind of solutions are available.
But i suggest to look OpenFace which has very high accuracy. There is a implementation of that project at Github.
Thanks
You need to understand that machine learning doesn't work that way, there are intensive training carried out before your model can give some good results.
with the single image of a person you just cannot predict that its the same person, cause you need to train your model over different images of the person under different light intensities, angles and many other varying scenarios.
Still i would like to try this link :
http://hanzratech.in/2015/02/03/face-recognition-using-opencv.html
you may find some match for the image atleast.
So what I am interested in knowing is` Has anyone tried anything of
the sort?
Yes. This is 2017 and facial recognition has been researched for decades.
What are some papers/methods/libraries that I should look into?
Anything google throws at you searching "single image/sample face recognition"
Do you have any suggestions on how to tackle problem?
See above
Extracting features such as SIFT and MSER generate accuracy of between 73-76%.
I doubt humans, who's facial recognition is unmatched perform much better with only 1 image as reference. I mean I couldn't tell for sure if that's Brad Pitt or if one is just a look-alike and I have seen him on houndreds of pictures and hours of movies...
I'm trying to implement a face recognition algorithm using Python. I want to be able to receive a directory of images, and compute pair-wise distances between them, when short distances should hopefully correspond to the images belonging to the same person. The ultimate goal is to cluster images and perform some basic face identification tasks (unsupervised learning).
Because of the unsupervised setting, my approach to the problem is to calculate a "face signature" (a vector in R^d for some int d) and then figure out a metric in which two faces belonging to the same person will indeed have a short distance between them.
I have a face detection algorithm which detects the face, crops the image and performs some basic pre-processing, so the images i'm feeding to the algorithm are gray and equalized (see below).
For the "face signature" part, I've tried two approaches which I read about in several publications:
Taking the histogram of the LBP (Local Binary Pattern) of the entire (processed) image
Calculating SIFT descriptors at 7 facial landmark points (right of mouth, left of mouth, etc.), which I identify per image using an external application. The signature is the concatenation of the square root of the descriptors (this results in a much higher dimension, but for now performance is not a problem).
For the comparison of two signatures, I'm using OpenCV's compareHist function (see here), trying out several different distance metrics (Chi Square, Euclidean, etc).
I know that face recognition is a hard task, let alone without any training, so I'm not expecting great results. But all I'm getting so far seems completely random. For example, when calculating distances from the image on the far right against the rest of the image, I'm getting she is most similar to 4 Bill Clintons (...!).
I have read in this great presentation that it's popular to carry out a "metric learning" procedure on a test set, which should significantly improve results. However it does say in the presentation and elsewhere that "regular" distance measures should also get OK results, so before I try this out I want to understand why what I'm doing gets me nothing.
In conclusion, my questions, which I'd love to get any sort of help on:
One improvement I though of would be to perform LBP only on the actual face, and not the corners and everything that might insert noise to the signature. How can I mask out the parts which are not the face before calculating LBP? I'm using OpenCV for this part too.
I'm fairly new to computer vision; How would I go about "debugging" my algorithm to figure out where things go wrong? Is this possible?
In the unsupervised setting, is there any other approach (which is not local descriptors + computing distances) that could work, for the task of clustering faces?
Is there anything else in the OpenCV module that maybe I haven't thought of that might be helpful? It seems like all the algorithms there require training and are not useful in my case - the algorithm needs to work on images which are completely new.
Thanks in advance.
What you are looking for is unsupervised feature extraction - take a bunch of unlabeled images and find the most important features describing these images.
The state-of-the-art methods for unsupervised feature extraction are all based on (convolutional) neural networks. Have look at autoencoders (http://ufldl.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity) or Restricted Bolzmann Machines (RBMs).
You could also take an existing face detector such as DeepFace (https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf), take only feature layers and use distance between these to group similar faces together.
I'm afraid that OpenCV is not well suited for this task, you might want to check Caffe, Theano, TensorFlow or Keras.
I have to make an application which recognize road signs. I saw that in OpenCV folder there are some XML files for facial recognition but I do not know what that numbers in the XML represents or how they obtained those values. I need to understand this so as I can do my own XML files for road sign recognition.
I do not know much about OpenCV, anyhow I have completed my Final Year Project on Face Recognition using neural networks. Basically I used an algorithm to extract the Facial Portion from a given image. Thereafter I fed that new image (containing only the face) to a neural network that I developed using Matlab. After rigorous improvements, it was a success and by using the Simulation Feature of Matlab it was possible to precisely identify the individual.
Therefore I strongly recommend that you follow the same technique in carrying out this task.
I managed to find some interesting articles related to this topic, here, here , here and here.
What you need is two steps:
detection step
recognition step
for the detection, I suggest you to use cascade classifier that is included with opencv. It's robust and more quick than that of haar trainer. By this step you train the traffic signs to be detected. I found this tutorial that may help you how to prepare your training stuff
by this step you detect your signs . it may detect you some additional false objects in the image, for these undesired objects you can eliminate them by some processing like ratio, or color , or even by adding some negative images.
for the recognition I suggest you to use exactly the opencv's tutorial dedicated for face recognition
here you don't need a lot of modification..