best algorithm for face detection and pose estimation - machine-learning

I am looking for algorithms/publications on face detection. There are plenty in the web. But my scenario is somewhat specialized. I want to detect faces accurately in images taken by wearable devices (e.g. narrative clips), so there will be motion blur, and image quality will not be that good. I want to detect faces that are within 15 feet of the camera accurately. Next goal is to estimate the pose, primarily to find out if the person is looking toward the camera ( or better looking at the camera owner).
Any suggestion?

My go to for this would either be a deep-learning framework using convolutional layers for pixel classification, or K-means/ K-Nearest Neighbour algorithm.
This does depend on your data, however. From your post I am assuming that your data isn't labelled? meaning you are unable to feed in the 'truth' to the algorithm for classification.
you could perhaps use a CNN (convolutional neural network) for pixel classification (image segmentation) which should identify the location of a person. given this, perhaps you could run a 'local' CNN i a region close to the face identified to classify the region the body is located in as a certain pose.
This would probably be my first take on the problem but would depend on the exact structure of your data, and the structure of your labels (if you have any).
I have to say it does sound like a fun project!

I found OpenCV's Haar Cascades for Face Detection pretty accurate and robust for motion blur and "live" face recognition.
I'm saying that because I used them for implementing an Eye-Tracker in C++ with a laptop webcam (whose resolution was not excellent and motion blur was naturally always present).
They work in multiresolution and are therefore able to detect faces of any size, but you can easily tune them for your distance of interest.
They might not be your final optimal solution, but since they are already implemented and come with the OpenCV package, they could constitute a good starting point.

Related

How to detect hand palm and its orientation (like facing outwards)?

I am working on a hand detection project. There are many good project on web to do this, but what I need is a specific hand pose detection. It needs a totally open palm and the whole palm face to outwards, like the image below:
The first hand faces to inwards, so it will not be detected, and the right one faces to outwards, it will be detected. Now I can detect hand with OpenCV. but how to tell the hand orientation?
Problem of matching with the forehand belongs to the texture classification, it's a classic pattern recognition problem. I suggest you to try one of the following methods:
Gabor filters: it is good to detect the orientation and pixel intensities (as forehand has different features), opencv has getGaborKernel function, the very important params of this function is theta (orientation) and lambd: (frequencies). To make it simple you can apply this process on a cropped zone of palm (as you have already detected it, it would be easy to crop for example the thumb, or a rectangular zone around the gravity center..etc). Then you can convolute it with a small database of images of the same zone to get the a rate of matching, or you can use the SVM classifier, where you have to train your SVM on a set of images by constructing the training matrix needed for SVM (check this question), this paper
Local Binary Patterns (LBP): it's an important feature descriptor used for texture matching, you can apply it on whole palm image or on a cropped zone or finger of image, it's easy to use in opencv, a lot of tutorials with codes are available for this method. I recommend you to read this paper talking about Invariant Texture Classification
with Local Binary Patterns. here is a good tutorial
Haralick Texture: I've read that it works perfectly when a set of features quantifies the entire image (Global Feature Descriptors). it's not implemented in opencv but easy to be implemented, check this useful tutorial
Training Models: I've already suggested a SVM classifier, to be coupled with some descriptor, that can works perfectly.
Opencv has an interesting FaceRecognizer class for face recognition, it could be an interesting idea to use it replacing the face images by the palm ones, (do resizing and rotation to get an unique pose of palm), this class has three methods can be used, one of them is Local Binary Patterns Histograms, which is recommended for texture recognition. and why not to try the other models (Eigenfaces and Fisherfaces ) , check this tutorial
well if you go for a MacGyver way you can notice that the left hand has bones sticking out in a certain direction, while the right hand has all finger lines and a few lines in the hand palms.
These lines are always sort of the same, so you could try to detect them with opencv edge detection or hough lines. Due to the dark color of the lines, you might even be able to threshold them out of it. Then gather the information from those lines, like angles, regressions, see which features you can collect and train a simple decision tree.
That was assuming you do not have enough data, if you have then you go into deeplearning, just take a basic inceptionV3 model and retrain the last dense layer to classify between two classes with a softmax, or to predict the probablity if the hand being up/down with sigmoid. Check this link, Tensorflow got your back on the training of this one, pure already ready code to execute.
Questions? Ask away
Take a look at what leap frog has done with the oculus rift. I'm not sure what they're using internally to segment hand poses, but there is another paper that produces hand poses effectively. If you have a stereo camera setup, you can use this paper's methods: https://arxiv.org/pdf/1610.07214.pdf.
The only promising solutions I've seen for mono camera train on large datasets.
use Haar-Cascade classifier,
you can get the classifier model file then use it here.
Just search for 'Haarcascade detection of Palm in Google' or use below code.
import cv2
cam=cv2.VideoCapture(0)
ccfr2=cv2.CascadeClassifier('haar-cascade-files-master/palm.xml')
while True:
retval,image=cam.read()
grey=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)
palm=ccfr2.detectMultiScale(grey,scaleFactor=1.05,minNeighbors=3)
for x,y,w,h in palm:
image=cv2.rectangle(image,(x,y),(x+w,y+h),(256,256,256),2)
cv2.imshow("Window",image)
if cv2.waitKey(1) & 0xFF==ord('q'):
cv2.destroyAllWindows()
break
del(cam)
Best of Luck for your experience using HaarCascade.

What is the best solution for rotation invariant detector?

I'd like to create an object detector based on cascade classifier, the only problem is that LBP and Haar features are not rotation invariant. The first thing that comes to my mind is to rotate the training sample at different angles, but I doubt that the resulting classifier would have good quality, moreover, the object could have stretched proportions. There are many rotation invariant detectors, for example, iPhone recognizes faces in real time in any orientation, so I wonder how do they achieve this? I would prefer to use OpenCV for this.
Check out the object detection framework available at https://github.com/nenadmarkus/pico.
The framework enables you to learn a custom object detector (for example, for finding frontal, upright faces) and then use it at runtime for rotation invariant detection.
This is achieved by scanning the image with a rotated version of the object detector at a number of different orientations. Note that this can be done without cascade retraining or image resampling, and it should work in real-time on modern machines (the provided face detection demo does).
The details are given in the paper available at http://arxiv.org/abs/1305.4537.
Fourier descriptors are rotation invariants (and translation as well as scaling invariants); the idea then would be to train whatever classifier your confortable with on the Fourier Descriptor result (PCA on Fourier descriptor, associated with a SVM seems to be a logical choice).
See Fourier Descriptors (Wolfram)
for matching logos I think this is what you need: http://www.ijera.com/papers/Vol2_issue5/JW2517421747.pdf
What about some simple solution....
Object Detection using SURF

Free Face Detection Algorithm for Video

I'm working on an application which needs to detect the location of a face in a video stream, using a web cam placed at desk height (and slightly off to the side of the user).
I've already implemented a version of OpenCV (using their Haar detection) and it works ok... the problem is that it tends to lose the position of the face if the user turns their head to the side (or looks up).
Since the webcam is sitting on the desk, it is tilted up at a 30 degree angle. The OpenCV detection algorithm is trained using fully frontal images, but not up-angle images like the ones I'm using. I know OpenCV also has a profile Haar file that can be used.. but from my research it seems that the results are quite mixed on profile detection. In addition, I don't really have control over the background or lighting of the image... so this sometimes also effects the efficacy of the OpenCV detection algorithm.
So, I guess what I'm asking is... are there other face detection algorithms (that are hopefully free, as this is part of my university research) that are better for detecting faces for this type of setup? It seems like some of the built-in webcams (for Macs and PCs) actually have fairly robust algorithms for detecting faces (and then overlaying cheesy cartoon images over the faces)... but they seem to work well regardless of background or lighting. Do you have any recommendations?
Thanks.
For research purposes, you can use the Haar cascades in OpenCV, things are different if you want to go commercial (in which case you need to consider LBP cascades instead). Just be sure to quote the Viola-Jones paper in your references.
To improve the results of face detection, you have several paths:
individual image detection: you can send rotated images to a frontal cascade to account for some variability without training your own cascade
individual image detection but more work) : train your own cascade in operating conditions closer to the ones of your app
stability in video streams (as in webcams & co.) : this is achieved by adding a layer of tracking around the face detection. Depending on your knowledge about this topic, you can use your own filter, have fun with OpenCV's particle or Kalman filter, implement a simple first or second order low pass filter on the face position or a PID tracker on the detected face...
Any of these tracking filters will enhance a lot your results when processing video streams.
Use CLM-framework for accurate realtime face detection and face landmark detection.
Example of the system in action: http://youtu.be/V7rV0uy7heQ
You may find it useful.

Finding path obstacles in a 2D image

what approach would you recommend for finding obstacles in a 2D image?
Here are some key points I came up with till now:
I doubt I can use object recognition based on "database of obstacles" search, since I don't know what might the obstruction look like.
I assume color recognition might be problematic if the path does not differ a lot from the object itself.
Possibly, adding one more camera and computing a 3D image (like a Kinect does) would work, but that would not run as smooth as I require.
To illustrate the problem; robot can ride either left or right side of the pavement. In the following picture, left side is the correct choice:
If you know what the path looks like, this is largely a classification problem. Acquire a bunch of images of path at different distances, illumination, etc. and manually label the ground in each image. Use this labeled data to train a classifier that classifies each pixel as either "road" or "not road." Depending upon the texture of the road, this could be as simple as classifying each pixels' RGB (or HSV) values or using OpenCv's built-in histogram back-projection (i.e. cv::CalcBackProjectPatch()).
I suggest beginning with manual thresholds, moving to histogram-based matching, and only using a full-fledged machine learning classifier (such as a Naive Bayes Classifier or a SVM) if the simpler techniques fail. Once the entire image is classified, all pixels that are identified as "not road" are obstacles. By classifying the road instead of the obstacles, we completely avoided building a "database of objects".
Somewhat out of the scope of the question, the easiest solution is to add additional sensors ("throw more hardware at the problem!") and directly measure the three-dimensional position of obstacles. In order of preference:
Microsoft Kinect: Cheap, easy, and effective. Due to ambient IR light, it only works indoors.
Scanning Laser Rangefinder: Extremely accurate, easy to setup, and works outside. Also very expensive (~$1200-10,000 depending upon maximum range and sample rate).
Stereo Camera: Not as good as a Kinect, but it works outside. If you cannot afford a pre-made stereo camera (~$1800), you can make a decent custom stereo camera using USB webcams.
Note that professional stereo vision cameras can be very fast by using custom hardware (Stereo On-Chip, STOC). Software-based stereo is also reasonably fast (10-20 Hz) on a modern computer.

OpenCV - Haar classifier for long objects with different angles

I have used Haar classifier with OpenCV before succesfully. Unfortunately it seems to work only on square objects and fixed angles (i.e. faces). However I need to find "long" (rectangular) objects which have different angles (see sample input image).
Is there a way to train Haar classifier to find such objects? All I can find are tutorials for face recognition. Any other alternative approches?
Haar classifiers are known to work with rigid object only. You need a classifier for each of the view. For example, the side-face classifier in OpenCV doesn't work as good as front-face classifer(due to the reason being, side face has more variation in yaw-pitch-roll than front face).
There is no perfect way of answering your question.
However, in your case whatever you are trying to classify (microbes I suppose) are overlapping on each other. Its a complex issue. But, you can isolate the region where microbes occur (not isolate each microbe like a face).
You can refer fingerprint segmentation techniques that are known to enhance the ridges on a fingerprint (here in your case its microbe edges) from the background and isolate the image.
Check "ridgesegmentation.m" in the following page:
http://www.csse.uwa.edu.au/~pk/Research/MatlabFns/index.html

Resources