Difference frameworks to do face matching - opencv

I try to make the correspondence between two faces and give as a result if two faces match or not.
To do this, I did some research and I found the face comparison package (https://pypi.org/project/face-compare/) that allows me to do this, and it works very well which is based on FaceNet. But here, I want to compare the accuracy of this solution with other solutions to choose the best one. Can anyone have an idea of other solutions (open source or commercial) that can help me for this benchmark

The FaceNet work should be a good start. The network does a good feature matching for the facial data. Even though the face-compare library uses the same model, it would be good if you can fine-tune the FaceNet model on another dataset and evaluate with respect to the output form face-compare.
Apart from that, different variants of siamese architecture can be tried for feature matching. If you want to compare the matching, try getting the triplet loss value for set of images.

Related

Are there any ways to build an ML model using CBIR and SIFT for image comparison in my case?

I have this project I'm working on. A part of the project involves multiple test runs during which screenshots of an application window are taken. Now, we have to ensure that screenshots taken between consecutive runs match (barring some allowable changes). These changes could be things like filenames, dates, different logos, etc. within the application window that we're taking a screenshot of.
I had the bright idea to automate the process of doing this checking. Essentially my idea was this. If I could somehow mathematically quantify the difference between a screenshot from the N-1th run and the Nth run, I could create a binary labelled dataset that mapped feature vectors of some sort to a label (0 for pass or 1 for fail if the images do not adequately match up). The reason for all of this was so that my labelled data would help make the model understand what scale of changes are acceptable, because there are so many kinds that are acceptable.
Now lets say I have access to lots of data that I have meticulously labelled, in the thousands. So far I have tried using SIFT in opencv using keypoint matching to determine a similarity score between images. But this isn't an intelligent, learning process. Is there some way I could take some information from SIFT and use it as my x-value in my dataset?
Here are my questions:
what would that be the information I need as my x-value? It needs to be something that represents the difference between two images. So maybe the difference between feature vectors from SIFT? What do I do when those vectors are of slightly different dimensions?
Am I on the right track with thinking about using SIFT? Should I look elsewhere and if so where?
Thanks for your time!
The approach that is being suggested in the question goes like this -
Find SIFT features of two consecutive images.
Use those to somehow quantify the similarity between two images (sounds reasonable)
Use this metric to first classify the images into similar and non-similar.
Use this dataset to train a NN do to the same job.
I am not completely convinced if this is a good approach. Let's say that you created the initial classifier with SIFT features. You are then using this data to train a NN. But this data will definitely have a lot of wrong labels. Because if it didn't have a lot of wrong labels, what's stopping you from using your original SIFT based classifier as your final solution?
So if your SIFT based classification is good, why even train a NN? On the other hand, if it's bad, you are giving a lot of wrong labeled data to the NN for training. I think the latter is a probably a bad idea. I say probably because there is a possibility that maybe the wrong labels just encourage the NN to generalize better, but that would require a lot of data, I imagine.
Another way to look at this is, let's say that your initial classifier is 90% accurate. That's probably the upper limit of the performance for the NN that you are looking at when talking about training it with this data.
You said that the issue that you have with your first approach is that 'it's not a an intelligent, learning process'. I think it's the wrong approach to think that the former approach is always inferior to the latter. SIFT is a powerful tool that can solve a lot of problems without all the 'black-boxness' of an NN. If this problem can be solved with sufficient accuracy using SIFT, I think going after a learning based approach is not the way to go, because again, a learning based approach isn't necessarily superior.
However, if the SIFT approach isn't giving you good enough results, definitely start thinking of NN stuff, but at that point, using the "bad" method to label the data is probably a bad idea.
Also in relation, I think you could potentially be underestimating the amount of data that is needed for this. You mentioned data in the thousands, but that's honestly, not a lot. You would need a lot more, I think.
One way I would think about instead doing this -
Do SIFT keyponits detection for a sample reference image.
Manually filter out keypoints that does not belong to the things in the image that are invariant. That is, just take keypoints at the locations in the image that is guaranteed (or very likely) to be always present.
When you get a new image, compute the keypoints and do matching with the reference image.
Set some threshold of the ratio of good matches to the total number of matches.
Depending on your application, this might give you good enough results.
If not, and if you really want your solution to be NN based, I would say you need to manually label the dataset as opposed to using SIFT.

How to give a logical reason for choosing a model

I used machine learning to train depression related sentences. And it was LinearSVC that performed best. In addition to LinearSVC, I experimented with MultinomialNB and LogisticRegression, and I chose the model with the highest accuracy among the three. By the way, what I want to do is to be able to think in advance which model will fit, like ml_map provided by Scikit-learn. Where can I get this information? I searched a few papers, but couldn't find anything that contained more detailed information other than that SVM was suitable for text classification. How do I study to get prior knowledge like this ml_map?
How do I study to get prior knowledge like this ml_map?
Try to work with different example datasets on different data types by using different algorithms. There are hundreds to be explored. Once you get the good grasp of how they work, it will become more clear. And do not forget to try googling something like advantages of algorithm X, it helps a lot.
And here are my thoughts, I think I used to ask such questions before and I hope it can help if you are struggling: The more you work on different Machine Learning models for a specific problem, you will soon realize that data and feature engineering play the more important parts than the algorithms themselves. The road map provided by scikit-learn gives you a good view of what group of algorithms to use to deal with certain types of data and that is a good start. The boundaries between them, however, are rather subtle. In other words, one problem can be solved by different approaches depending on how you organize and engineer your data.
To sum it up, in order to achieve a good out-of-sample (i.e., good generalization) performance while solving a problem, it is mandatory to look at the training/testing process with different setting combinations and be mindful with your data (for example, answer this question: does it cover most samples in terms of distribution in the wild or just a portion of it?)

Recognize "generic" objects

I'm working on a project for visually impaired people that converts the visual world to audio.
We prefer to create a prototype that doesn't need an internet connection. So we chose to work with OpenCV. After reading (a lot of) tutorials and documentation we were able to train OpenCV in recognizing specific objects.
For example: we trained OpenCV to recognize a certain chair and a door. That works fine.
But, we also tried to train OpenCV on a "generic" level. It should be possible to recognize (almost) all chairs. We did that by training OpenCV with a lot of positive and negative images as explained here: http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html
The actual result wasn't what we expected -he could not recognize any chair-. I know, there are a lot of different parameters to take into account (maybe we did something wrong with that) and we experimented a lot. But our time (and unfortunately our knowledge of opencv) is limited.
We are looking for some advice on how to train opencv to recognize generic objects.
Where do we start?
Is opencv even suited to do that?
Thank you for your time!
Open CV is the library to use. But object recognition is tricky. Often when people say they are doing "object recognition" they are not, they are processing one image, or at best a series of related images, to separate into object and background.
To recognise a "chair" - everything from an armchair to a dining chair to a throne - would be almost impossible. I'd want at least stereo images to give a chance to detect flat surfaces. I don't doubt that with a lot of work you can get quite a good result, maybe just recognising dining -style chairs, but it's skilled work, it's not just a case of feeding a few parameters to a hierarchical classifier.

feature matching/detection on brain images

This question is for those who have tried feature detection/matching methods on brain images - it is a broad one, and perhaps a bad one:
How could you tell if the method you used was "good enough?"
What does a successful matching/detection test look like for your data?
EDIT:
As of now, I am not trying to detect any distinct features in particular.
I'm using OpenCV's ORB, SIFT, SURF, etc detection methods, and seeing what they identify for features.
Sometimes, however, the orientation of the brain changes entirely from a
few set of images to the next set, so if I compare two images from these sets,the detection methods won't yield any effective
results (i.e. the matching will be distinctly, completely off). But if I compare images that look similar, but not identical,
the detection seems to work alright. Point is, it seems like detection works for frames that were taken around the same
time, but not over a long interval. I wonder if others have come across this and if they have found that detection methods
are still useful despite the fact.
First of all, you should specify what kind of features or for which purpose, the experiment is going to be performed.
Feature extraction is highly subjective in nature, it all depends on what type of problem you are trying to handle. There is no generic feature extraction scheme which works in all cases.
For example if the features are pointing out to some tumor classification or lesion, then of course there are different softwares you can use to extract and define your features.
There are different methods to detect the relevant features regarding to the application:
SURF algorithm (Speeded Up Robust Features)
PLOFS: It is a fast wrapper approach with a subset evaluation.
ICA or 'PCA
This paper is a very great review about brain MRI data feature extraction for tissue classification:
https://pdfs.semanticscholar.org/fabf/a96897dcb59ad9f04b5ff92bd15e1bd159ef.pdf
I found this paper very good o understand the difference between feature extraction techniques.
https://www.sciencedirect.com/science/article/pii/S1877050918301297

Face recognition with a small number of samples

Can anyone advise me way to build effective face classifier that may be able to classify many different faces (~1000)?
And i have only 1-5 examples of each face
I know about opencv face classifier, but it works bad for my task (many classes, a few samples).
It works alright for one face classification with small number of samples. But i think that 1k separate classifier is not good idea
I read a few articles about face recognition but methods from these articles reqiues a lot of samples of each class for work
PS Sorry for my writing mistakes. English in not my native language.
Actually, for giving you a proper answer, I'd be happy to know some details of your task and your data. Face Recognition is a non-trivial problem and there is no general solution for all sorts of image acquisition.
First of all, you should define how many sources of variation (posing, emotions, illumination, occlusions or time-lapse) you have in your sample and testing sets. Then you should choose an appropriate algorithm and, very importantly, preprocessing steps according to the types.
If you don't have any significant variations, then it is a good idea to consider for a small training set one of the Discrete Orthogonal Moments as a feature extraction method. They have a very strong ability to extract features without redundancy. Some of them (Hahn, Racah moments) can also work in two modes - local and global feature extraction. The topic is relatively new, and there are still few articles about it. Although, they are thought to become a very powerful tool in Image Recognition. They can be computed in near real-time by using recurrence relationships. For more information, have a look here and here.
If the pose of the individuals significantly varies, you may try to perform firstly pose correction by Active Appearance Model.
If there are lots of occlusions (glasses, hats) then using one of the local feature extractors may help.
If there is a significant time lapse between train and probe images, the local features of the faces could change over the age, then it's a good option to try one of the algorithms which use graphs for face representation so as to keep the face topology.
I believe that non of the above are implemented in OpenCV, but for some of them you can find MATLAB implementation.
I'm not native speaker as well, so sorry for the grammar
Coming to your problem , it is very unique in its way. As you said there are only few images per class , the model which we train should either have an awesome architecture which can create better features within an image itself , or there should be an different approach which can achieve this task .
I have four things which I can share as of now :
Do data pre-processing and then create a bigger dataset and train on a neural network ideally. Here, we can do pre-processing like:
- image rotation
- image shearing
- image scaling
- image blurring
- image stretching
- image translation
and create atleast 200 images per class. Please checkout opencv documentation which provides many more methods on how you can increase the size of your dataset. Once you do this, then we can apply transfer learning , which is a better approach than training a neural network from scratch.
Transfer learning is a method where we train a network on our own custom classes , and this network is already pre-trained on 1000's of classes. Since our data here is very less, I would prefer transfer learning only. I have written a blog on how you can approach this using tranfer learning after you have the required amount of data. It is linked here. Face recognition also is a classification task itself, where each human is a separate class. So, follow the instructions given in the blog , may be it would help you create your own powerful classifer.
Another suggestion would be , after creating a dataset , encode them properly. This encoding would help you preserve the features in an image and can help you train better networks. VLAD ,Fisher , Bag of Words are few encoding techniques. You can search few repositories online which have implemented these already on ORL database. Once you encode , train the network on the encodings , you will obviously see a better performance.
Even do check out , Siamese network here which is meant for this purpose I feel . Here they compare two images with similar characteristics on different networks and there by achieve better classification accuracies . Git repository is here.
Another standard approach would be using SVM , Random forests since the data is less. If you still prefer neural networks the above methods would serve you the purpose. If you intend to go with encodings , then I would suggest random forests , as it is highly preferrable in learning and flexible too.
Hopefully , this answer would help you proceed in the right direction of achieving things.
You might want to take a look at OpenFace, a Python and Torch implementantion of face recognition with deep neural networks: https://cmusatyalab.github.io/openface/

Resources