i want to know the best method of face detection because i'm working on predict face emotion application
so Before analyzing the facial expression of a face fixed or moving, it should detect or follow to extract relevant information. several
detection methods existe but what is the best in my case ?
A fast and easy way to get started with face detection is through using OpenCV's Haar detection methods (a slightly modified version of the viola-jones face detection algorithm IIRC). They have pre-trained haar cascade classifiers for entire faces and individual face components, e.g. eyes, nose, etc. You can also train your own if you feel so inclined. Haar features also have the advantage of being very fast, so it's quite usable with video (which it sounds like you'll be using). Also, by having the individual face-components classified, it may simplify your emotion detection/prediction algorithm.
You can find the OpenCV documentation detailing Haar feature-based object recognition at http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html#viola01
and an example of performing face detection at http://code.opencv.org/projects/opencv/repository/revisions/master/entry/samples/cpp/dbt_face_detection.cpp
As for the emotion detection, that's an open research question, so anything you try will likely be fairly involved. If you're into that sort of thing, some good papers to look over might be http://www.utdallas.edu/dept/eecs/research/researchlabs/msp-lab/publications/Busso_2004.pdf and http://humansensing.cs.cmu.edu/papers/Automated.pdf
Related
I aim to detect and track vehicles with opencv using recorded on-road scenarios in real-time. I realize most of the algorithms/papers use Haar-like-features or HoG as feature-descriptors, and in most literature about vehicle detection there is not much investigation in using point-of-interest based approaches.
I mean in OpenCV there are so many nice edge/corner -based detectors like FAST, ORB, BRISK, ....why don't use them in junction with a nice descriptor and do some matching/classification afterwards?
And when is it actually better to use such a detector/descriptor strategy for object detection compared with a traditional Haar-cascade or HoG/SVM approach? Is it for robustness or performance reasons?
Regards
Obi
I'm trying to implement a face recognition algorithm using Python. I want to be able to receive a directory of images, and compute pair-wise distances between them, when short distances should hopefully correspond to the images belonging to the same person. The ultimate goal is to cluster images and perform some basic face identification tasks (unsupervised learning).
Because of the unsupervised setting, my approach to the problem is to calculate a "face signature" (a vector in R^d for some int d) and then figure out a metric in which two faces belonging to the same person will indeed have a short distance between them.
I have a face detection algorithm which detects the face, crops the image and performs some basic pre-processing, so the images i'm feeding to the algorithm are gray and equalized (see below).
For the "face signature" part, I've tried two approaches which I read about in several publications:
Taking the histogram of the LBP (Local Binary Pattern) of the entire (processed) image
Calculating SIFT descriptors at 7 facial landmark points (right of mouth, left of mouth, etc.), which I identify per image using an external application. The signature is the concatenation of the square root of the descriptors (this results in a much higher dimension, but for now performance is not a problem).
For the comparison of two signatures, I'm using OpenCV's compareHist function (see here), trying out several different distance metrics (Chi Square, Euclidean, etc).
I know that face recognition is a hard task, let alone without any training, so I'm not expecting great results. But all I'm getting so far seems completely random. For example, when calculating distances from the image on the far right against the rest of the image, I'm getting she is most similar to 4 Bill Clintons (...!).
I have read in this great presentation that it's popular to carry out a "metric learning" procedure on a test set, which should significantly improve results. However it does say in the presentation and elsewhere that "regular" distance measures should also get OK results, so before I try this out I want to understand why what I'm doing gets me nothing.
In conclusion, my questions, which I'd love to get any sort of help on:
One improvement I though of would be to perform LBP only on the actual face, and not the corners and everything that might insert noise to the signature. How can I mask out the parts which are not the face before calculating LBP? I'm using OpenCV for this part too.
I'm fairly new to computer vision; How would I go about "debugging" my algorithm to figure out where things go wrong? Is this possible?
In the unsupervised setting, is there any other approach (which is not local descriptors + computing distances) that could work, for the task of clustering faces?
Is there anything else in the OpenCV module that maybe I haven't thought of that might be helpful? It seems like all the algorithms there require training and are not useful in my case - the algorithm needs to work on images which are completely new.
Thanks in advance.
What you are looking for is unsupervised feature extraction - take a bunch of unlabeled images and find the most important features describing these images.
The state-of-the-art methods for unsupervised feature extraction are all based on (convolutional) neural networks. Have look at autoencoders (http://ufldl.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity) or Restricted Bolzmann Machines (RBMs).
You could also take an existing face detector such as DeepFace (https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf), take only feature layers and use distance between these to group similar faces together.
I'm afraid that OpenCV is not well suited for this task, you might want to check Caffe, Theano, TensorFlow or Keras.
I'm doing hand motion recognition project for my final assigment, the core of my code is Hidden Markov Model some papers said that we first need to detect the object, perform feature extraction then use HMM to recognize the motion,
I'm using openCV, I've done the hand detection using haar clasifier, I've prepared the hmm code using c++, but I missed something:
I dont' know how to integrating Haar Clasifier with HMM
How to perform feature extraction from detected hand (haar clasifier)?
I know we should first train the HMM for motion recognition, but i don't how to train motion data, what kind of data that I should use? how to prepare the data? where can I find them or how can I collect them?
If I searching on google, some people said that HMM motion recognition has a similiarity with HMM speech recognition, but I confused which part is similiar?
someone please tell me if I do wrong, give me suggestion what should I do
please teach me, master
To my understanding:
1) haar is used to detect static objects, which means it works within a frame of image.
2) HMM is used to recognize temporal features, which means it works across frames.
So the things you wanna do is to first track the hand, get the feature of the hand and train the gesture movement in HMM.
As for the feature, the most naive one is the "pixel by pixel" feature. You just put all the pixels' intensities together. After this, a dimensionality reduction is needed, say, PCA.
After that, one way of using HMM is to discretize the features into clusters, and train the model with discretized states sequence, then predict the probability of a given sequence of features belonging to each of the groups.
Note
This is not a standard gesture recognition procedure. However it is quite naive for your "final project".
There doesn't seem to be any implementations of HOG training in openCV and little sources about how HOG training works. From what I gathered, HOG training can be done in real time. But what are the requirements of training? How does the training process actually work?
As with most computer vision algorithms, Google Scholar is your friend :) I would suggest reading a few papers on how it works. Here is one of the most referenced papers on HoG for you to start with.
Another tip when researching in computer vision is to note the authors of the papers you find interesting, and try to find their websites. They will tend to have an implementation of their algorithms as well as rules of thumb on how to use them. Also, look up the references that are sited in the paper about your algorithm. This can be very helpful in aquiring the background knowledge to truly understand how the algorithm works and why.
Your terminology is a bit mixed up. HOG is a feature descriptor. You can train a classifier using HOG, which can in turn be used for object detection. OpenCV includes a people detector that uses HOG features and an SVM classifier. It also includes CascadeClassifier, which can use HOG, and which is typically used for face detection.
There is a program in OpenCV called opencv_traincascade, which lets you train a cascade object detector, an which gives you the option to use HOG. There is a function in the Computer Vision System Toolbox for MATLAB called trainCascadeObjectDetector, which does the same thing.
I know that there are many detection techniques in OpenCV, such as SURF, STAR, ORB etc...but those techniques are for feature detection of new video feed, not for dealing with specific instances of objects that require prior learning. OpenCV's documentation isn't quite as easy to flip through and I've yet been able to find anything besides Haar, which I know deals best with face recognition.
So are there any other techniques besides Haar? The Haar technique dates back to research 10 years ago, so ideally I hope that there have been some more advances since then that have been implemented in OpenCV.
If you are looking for OpenCV machine learning type algorithms, check out this link.
For a state of the art on-the-fly object detection algorithm, have a look at OpenTLD. It uses bounding boxes and random forests to learn about an object over time. Check out the demo video here.
Also check out the matching_to_many_images.cpp sample from OpenCV. It uses feature descriptors to match objects much like Google Goggles works. A related example to this is the bagofwords_classification.cpp sample. It may be what you are looking for in this case. It uses feature detectors (SURF, SIFT, etc...) to detect objects and then classify them by comparing the relative positions of the features to a learned database of features. Have a look also at this tutorial from MIT.
The latentsvmdetect.cpp may also be a good starting point for you.
Hope that helps!