The problem statement:
Given two images such as the two images of Brad Pitt below, figure out if the image contains the same person or no. The difficulty is that we have only one reference image for each person and what to figure out if any other incoming image contains the same person or no.
Some research:
There are a few different methods of solving this task, these are
Using color histograms
Keypoint oriented methods
Using deep convolutional neural networks or other ML techniques
The histogram methods involve calculating histograms based on color and defining some sort of metric between them and then deciding upon a threshold. One that I have tried is the Earth Mover's Distance. However this method is lacking in accuracy.
The best approach therefore should be some sort of mix between 2nd and 3rd methods, and some preprocessing.
For preprocessing obvious steps to perform are:
Run a face detection such as Viola-Jones and separate the regions containing faces
Convert the said faces to grayscale
Run eye,mouth,nose detection algorithms perhaps using haar_cascades of opencv
Align the face images according found landmarks
All of this is done using opencv.
Extracting features such as SIFT and MSER generate accuracy of between 73-76%. After some additional research I've come across this paper using fisherfaces. And the fact that opencv has now the ability to create fisherface detectors and train them is great and works fantastically, achieving the accuracy promised by the paper on the Yale datasets.
The complication of the problem is that in my case I don't have a database with several images of the same person, to train the detector on. All I have is a single image corresponding to a single person, and given another image I want to understand whether this is the same person or no.
So what I am interested in knowing is`
Has anyone tried anything of the sort? What are some papers/methods/libraries that I should look into?
Do you have any suggestions on how to tackle problem?

Since you have only one image, you can give this method using DLib a try. I have used 3-4 images per person and it is giving good results.
Detect face (sample_face)
Get face descriptor (128 D vector) using dlib compute_face_descriptor (Check link)
Get the new picture in which you want to recognise the face
detect face and compute the descriptor(lets call test_face).
Compute euclidean distance between test_face descriptor and all sample_faces descriptor
assign the test_face with class(person name) with least euclidean distance.
Give this a whirl, you can play with face aligning if you start getting good results.

This is one of the hot topic for computer visin area. To handle as you have written there are many kind of solutions are available.
But i suggest to look OpenFace which has very high accuracy. There is a implementation of that project at Github.

You need to understand that machine learning doesn't work that way, there are intensive training carried out before your model can give some good results.
with the single image of a person you just cannot predict that its the same person, cause you need to train your model over different images of the person under different light intensities, angles and many other varying scenarios.
Still i would like to try this link :
you may find some match for the image atleast.

Yes. This is 2017 and facial recognition has been researched for decades.
Extracting features such as SIFT and MSER generate accuracy of between 73-76%.
I doubt humans, who's facial recognition is unmatched perform much better with only 1 image as reference. I mean I couldn't tell for sure if that's Brad Pitt or if one is just a look-alike and I have seen him on houndreds of pictures and hours of movies...


Implementing Face Recognition using Local Descriptors (Unsupervised Learning)

I'm trying to implement a face recognition algorithm using Python. I want to be able to receive a directory of images, and compute pair-wise distances between them, when short distances should hopefully correspond to the images belonging to the same person. The ultimate goal is to cluster images and perform some basic face identification tasks (unsupervised learning).
Because of the unsupervised setting, my approach to the problem is to calculate a "face signature" (a vector in R^d for some int d) and then figure out a metric in which two faces belonging to the same person will indeed have a short distance between them.
I have a face detection algorithm which detects the face, crops the image and performs some basic pre-processing, so the images i'm feeding to the algorithm are gray and equalized (see below).
For the "face signature" part, I've tried two approaches which I read about in several publications:
Taking the histogram of the LBP (Local Binary Pattern) of the entire (processed) image
Calculating SIFT descriptors at 7 facial landmark points (right of mouth, left of mouth, etc.), which I identify per image using an external application. The signature is the concatenation of the square root of the descriptors (this results in a much higher dimension, but for now performance is not a problem).
For the comparison of two signatures, I'm using OpenCV's compareHist function (see here), trying out several different distance metrics (Chi Square, Euclidean, etc).
I know that face recognition is a hard task, let alone without any training, so I'm not expecting great results. But all I'm getting so far seems completely random. For example, when calculating distances from the image on the far right against the rest of the image, I'm getting she is most similar to 4 Bill Clintons (...!).
I have read in this great presentation that it's popular to carry out a "metric learning" procedure on a test set, which should significantly improve results. However it does say in the presentation and elsewhere that "regular" distance measures should also get OK results, so before I try this out I want to understand why what I'm doing gets me nothing.
In conclusion, my questions, which I'd love to get any sort of help on:
One improvement I though of would be to perform LBP only on the actual face, and not the corners and everything that might insert noise to the signature. How can I mask out the parts which are not the face before calculating LBP? I'm using OpenCV for this part too.
I'm fairly new to computer vision; How would I go about "debugging" my algorithm to figure out where things go wrong? Is this possible?
In the unsupervised setting, is there any other approach (which is not local descriptors + computing distances) that could work, for the task of clustering faces?
Is there anything else in the OpenCV module that maybe I haven't thought of that might be helpful? It seems like all the algorithms there require training and are not useful in my case - the algorithm needs to work on images which are completely new.
Thanks in advance.
What you are looking for is unsupervised feature extraction - take a bunch of unlabeled images and find the most important features describing these images.
The state-of-the-art methods for unsupervised feature extraction are all based on (convolutional) neural networks. Have look at autoencoders ( or Restricted Bolzmann Machines (RBMs).
You could also take an existing face detector such as DeepFace (, take only feature layers and use distance between these to group similar faces together.
I'm afraid that OpenCV is not well suited for this task, you might want to check Caffe, Theano, TensorFlow or Keras.

template matching? object recoginition and feature matching or what is the solution?

Problem: I have a photo of an object (a manufactured part like the attached photo below), using my Andoird phone camera I want to verify if the object in camera preview matches to the template or not. (in other words, is it the same part as the template or not)
I can make the user to move the camera in order to have similar view of the template in camera preview as the template however there will be different noise level and/or lighting and maybe different background.
Question: What do you recommend me to use for solving this problem? I was thinking of Canny edge extraction and then matching the camera frames towards the canny edge extract from template? is this a good idea? if yes would you please tell me how can I implement this? any resources? samples? (I can do the Canny edge extraction but couldn't find a way to do the matching)
if Not a good idea then what do you recommend?
Things I have tried:
Feature Extract and Matching: I used few different extractor and matcher implementations from OpenCV and my app is working and drawing the detected feature points and matches, etc. however being a beginner with image processing I cannot make sense of the result and also how to know what is a match. any idea, help, good resources?
Template Matching: I used OpenCV template matching however the performance was horrible and I decided that this cannot be the solution.
I tried object recognition with my phone on your test image and the results were positive.
Detector used :ORB(Binary Detector).
Descriptor used :ORB.
Matching Technique : Brute-force matching .
Image Size 640x480.
I was able to detect around 500 feature points (number of keypoints is around sufficient but it might produce false matches when you have more images with similar looking need to refine your matching to avoid false matches).
Result of object recognition on two different scales.
Regarding you finding difficulties in understanding object recognition. What exactly did you not understand(Specific topic).
I recommend you to go thru the these two books
Learning OpenCV by By Adrian Kaehler, Gary Bradski
OpenCV 2 Computer Vision Application Programming Cookbook by by Robert Laganière(chapter 8 & 9).
from what I understand canny edge detection might not be an optimal solution. according to me after some basic pre-processing of the test image find its sift features and compare it with the sift features of the template. sift being really versatile should work here too.
you can also try opensurf feature they are faster than sift but i havent had an opportunity to work alot with them to be able to comment on its accuracy

Face recognition with a small number of samples

Can anyone advise me way to build effective face classifier that may be able to classify many different faces (~1000)?
And i have only 1-5 examples of each face
I know about opencv face classifier, but it works bad for my task (many classes, a few samples).
It works alright for one face classification with small number of samples. But i think that 1k separate classifier is not good idea
I read a few articles about face recognition but methods from these articles reqiues a lot of samples of each class for work
PS Sorry for my writing mistakes. English in not my native language.
Actually, for giving you a proper answer, I'd be happy to know some details of your task and your data. Face Recognition is a non-trivial problem and there is no general solution for all sorts of image acquisition.
First of all, you should define how many sources of variation (posing, emotions, illumination, occlusions or time-lapse) you have in your sample and testing sets. Then you should choose an appropriate algorithm and, very importantly, preprocessing steps according to the types.
If you don't have any significant variations, then it is a good idea to consider for a small training set one of the Discrete Orthogonal Moments as a feature extraction method. They have a very strong ability to extract features without redundancy. Some of them (Hahn, Racah moments) can also work in two modes - local and global feature extraction. The topic is relatively new, and there are still few articles about it. Although, they are thought to become a very powerful tool in Image Recognition. They can be computed in near real-time by using recurrence relationships. For more information, have a look here and here.
If the pose of the individuals significantly varies, you may try to perform firstly pose correction by Active Appearance Model.
If there are lots of occlusions (glasses, hats) then using one of the local feature extractors may help.
If there is a significant time lapse between train and probe images, the local features of the faces could change over the age, then it's a good option to try one of the algorithms which use graphs for face representation so as to keep the face topology.
I believe that non of the above are implemented in OpenCV, but for some of them you can find MATLAB implementation.
I'm not native speaker as well, so sorry for the grammar
Coming to your problem , it is very unique in its way. As you said there are only few images per class , the model which we train should either have an awesome architecture which can create better features within an image itself , or there should be an different approach which can achieve this task .
I have four things which I can share as of now :
Do data pre-processing and then create a bigger dataset and train on a neural network ideally. Here, we can do pre-processing like:
- image rotation
- image shearing
- image scaling
- image blurring
- image stretching
- image translation
and create atleast 200 images per class. Please checkout opencv documentation which provides many more methods on how you can increase the size of your dataset. Once you do this, then we can apply transfer learning , which is a better approach than training a neural network from scratch.
Transfer learning is a method where we train a network on our own custom classes , and this network is already pre-trained on 1000's of classes. Since our data here is very less, I would prefer transfer learning only. I have written a blog on how you can approach this using tranfer learning after you have the required amount of data. It is linked here. Face recognition also is a classification task itself, where each human is a separate class. So, follow the instructions given in the blog , may be it would help you create your own powerful classifer.
Another suggestion would be , after creating a dataset , encode them properly. This encoding would help you preserve the features in an image and can help you train better networks. VLAD ,Fisher , Bag of Words are few encoding techniques. You can search few repositories online which have implemented these already on ORL database. Once you encode , train the network on the encodings , you will obviously see a better performance.
Even do check out , Siamese network here which is meant for this purpose I feel . Here they compare two images with similar characteristics on different networks and there by achieve better classification accuracies . Git repository is here.
Another standard approach would be using SVM , Random forests since the data is less. If you still prefer neural networks the above methods would serve you the purpose. If you intend to go with encodings , then I would suggest random forests , as it is highly preferrable in learning and flexible too.
Hopefully , this answer would help you proceed in the right direction of achieving things.
You might want to take a look at OpenFace, a Python and Torch implementantion of face recognition with deep neural networks:

Image classification/recognition open source library

I have a set of reference images (200) and a set of photos of those images (tens of thousands). I have to classify each photo in a semi-automated way. Which algorithm and open source library would you advise me to use for this task? The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
To give a little more context, the reference images are branded packages, and the photos are of the same packages, but with all kinds of noises: reflections from the flash, low light, imperfect perspective, etc. The photos are already (manually) segmented: only the package is visible.
Back in my days with image recognition (like 15 years ago) I would have probably tried to train a neural network with the reference images, but I wonder if now there are better ways to do this.
I recommend that you use Python, and use the NumPy/SciPy libraries for your numerical work. Some helpful libraries for handling images are the Mahotas library and the scikits.image library.
In addition, you will want to use scikits.learn, which is a Python wrapper for Libsvm, a very standard SVM implementation.
The hard part is choosing your descriptor. The descriptor will be the feature you compute from each image, intended to compute a similarity distance with the set of reference images. A good set of things to try would be Histogram of Oriented Gradients, SIFT features, and color histograms, and play around with various ways of binning the different parts of the image and concatenating such descriptors together.
Next, set aside some of your data for training. For these data, you have to manually label them according to the true reference image they belong to. You can feed these labels into built-in functions in scikits.learn and it can train a multiclass SVM to recognize your images.
After that, you may want to look at MPI4Py, an implementation of MPI in Python, to take advantage of multiprocessors when doing the large descriptor computation and classification of the tens of thousands of remaining images.
The task you describe is very difficult and solving it with high accuracy could easily lead to a research-level publication in the field of computer vision. I hope I've given you some starting points: searching any of the above concepts on Google will hit on useful research papers and more details about how to use the various libraries.
The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
One way people do this is with the so-called "Earth mover's distance". Briefly, one imagines each pixel in an image as a stack of rocks with height corresponding to the pixel value and defines the distance between two images as the minimal amount of work needed to transfer one arrangement of rocks into the other.
Algorithms for this are a current research topic. Here's some matlab for one: . Looks like they have a java version as well. Here's a link to the original paper and C code:
Try Radpiminer (one of the most widely used data-mining platform, with IMMI (Image Mining Extension,, AGPL licence.
It currently implements several similarity measurement methods (not only trivial pixel by pixel comparison). The similarity measures can be input for a learning algorithm (e.g. neural network, KNN, SVM, ...) and it can be trained in order to give better performance. Some information bout the methods is given in this paper:
Now-a-days Deep Learning based framworks like Torch , Tensorflow, Theano, Keras are the best open source tool/library for object classification/recognition tasks.

machine learning - svm feature fusion techique

for my final thesis i am trying to build up an 3d face recognition system by combining color and depth information. the first step i did, is to realign the data-head to an given model-head using the iterative closest point algorithm. for the detection step i was thinking about using the libsvm. but i dont understand how to combine the depth and the color information to one feature vector? they are dependent information (each point consist of color (RGB), depth information and also scan quality).. what do you suggest to do? something like weighting?
last night i read an article about SURF/SIFT features i would like to use them! could it work? the concept would be the following: extracting this features out of the color image and the depth image (range image), using each feature as a single feature vector for the svm?
Concatenation is indeed a possibility. However, as you are working on 3d face recognition you should have some strategy as to how you go about it. Rotation and translation of faces will be hard to recognize using a "straightforward" approach.
You should decide whether you attempt to perform a detection of the face as a whole, or of sub-features. You could attempt to detect rotation by finding some core features (eyes, nose, etc).
Also, remember that SVMs are inherently binary (i.e. they separate between two classes). Depending on your exact application you will very likely have to employ some multi-class strategy (One-against-all or One-against-many).
I would recommend doing some literature research to see how others have attacked the problem (a google search will be a good start).
It sounds simple, but you can simply concatenate the two vectors into one. Many researchers do this.
What you arrived at is an important open problem. Yes, there are some ways to handle it, as mentioned here by Eamorr. For example you can concatenate and do PCA (or some non linear dimensionality reduction method). But it is kind of hard to defend the practicality of doing so, considering that PCA takes O(n^3) time in the number of features. This alone might be unreasonable for data in vision that may have thousands of features.
As mentioned by others, the easiest approach is to simply combine the two sets of features into one.
SVM is characterized by the normal to the maximum-margin hyperplane, where its components specify the weights/importance of the features, such that higher absolute values have a larger impact on the decision function. Thus SVM assigns weights to each feature all on its own.
In order for this to work, obviously you would have to normalize all the attributes to have the same scale (say transform all features to be in the range [-1,1] or [0,1])
