Tracking Human Blob Associated with Face Detection - image-processing

I realize this is a non-trivial task, but what's the most practical method of detecting a face, and then tracking the body associated with that face in a video with moving background?
Detecting a face is fairly simple with OpenCV's trained Haar cascade for the face. Unfortunately, OpenCV's trained Haar cascades for the human body are so inaccurate as to be effectively unusuable, so my thought was to use the face detector to determine roughly where the "person" is, and then use something like OpenTLD to dynamically "learn" what the person's body looks like and track that over frames. This should have the benefit of being able to handle a moving background, which most motion-tracking code in OpenCV currently doesn't seem to handle. The main downside is that OpenTLD is still quite new, and all the public implementations I've tested are very buggy and difficult to use.
Does this seem like a practical approach? Are there better ways?

If you are interested in human body detection/tracking rather than face detection, you should check people detection sample in the OpenCV. OpenCV contains a HoG descriptors and SVM classifier based people detector, which is actually one of the most successfull people detection algorithms available.

Related

best algorithm for face detection and pose estimation

I am looking for algorithms/publications on face detection. There are plenty in the web. But my scenario is somewhat specialized. I want to detect faces accurately in images taken by wearable devices (e.g. narrative clips), so there will be motion blur, and image quality will not be that good. I want to detect faces that are within 15 feet of the camera accurately. Next goal is to estimate the pose, primarily to find out if the person is looking toward the camera ( or better looking at the camera owner).
Any suggestion?
My go to for this would either be a deep-learning framework using convolutional layers for pixel classification, or K-means/ K-Nearest Neighbour algorithm.
This does depend on your data, however. From your post I am assuming that your data isn't labelled? meaning you are unable to feed in the 'truth' to the algorithm for classification.
you could perhaps use a CNN (convolutional neural network) for pixel classification (image segmentation) which should identify the location of a person. given this, perhaps you could run a 'local' CNN i a region close to the face identified to classify the region the body is located in as a certain pose.
This would probably be my first take on the problem but would depend on the exact structure of your data, and the structure of your labels (if you have any).
I have to say it does sound like a fun project!
I found OpenCV's Haar Cascades for Face Detection pretty accurate and robust for motion blur and "live" face recognition.
I'm saying that because I used them for implementing an Eye-Tracker in C++ with a laptop webcam (whose resolution was not excellent and motion blur was naturally always present).
They work in multiresolution and are therefore able to detect faces of any size, but you can easily tune them for your distance of interest.
They might not be your final optimal solution, but since they are already implemented and come with the OpenCV package, they could constitute a good starting point.

OpenCV: Refine Cascade Face Detection

I'm making system for emotion detection on Android mobile phone. I'm using OpenCV's Cascades (LBP's or Haars) to find face, eyes, mouth areas etc. What I have observed till now that accuracy isn't stable. There are situations where I can't find eye or I have "additional faces" in the background due to very slight change of light. What I wanted to ask is:
1) Is Haar Cascade more Accurate than LBP?
2) Is there any good method for increasing accuracy of detection? Like find face/eyes etc on binarized image, or use some edge detection filter, saturation, anything else?
you can try Microsoft API for face emotion detection ..i am trying in my project so..result is best..try this link
https://www.microsoft.com/cognitive-services/en-us/emotion-api
sometimes HAAR or LBP will not get a good enough result for a face detection system. if you want to get more good acc. i think you can try to using STASM
it base on opencv and using Haar to detect face and landmark. something others you can also try YOLO Face detection
if you want to build your own face detection system just base on Haar or LBP and make them get a good result, maybe you will need to using the LBP to find out the face faster and train a CNN model to get the last good result, it can make your system to detect faces in realtime. as i know, the SEETAFACE is using this way to make a realtime faces detection.

Real time tracking of hand

I am trying to detect and track hand in real time using opencv. I thought haar cascade classifiers would yield a fair result. After training with 10k and 20k positive and negative images respectively, I obtained a classifier xml file. Unfortunately, it detects hand only in certain positions, proving that it works best only for rigid objects. So I am now thinking of adopting another algorithm that can track hand, once detected through haar classifier.
My question is,if I make sure that haar classifier detects hand in a certain frame, certain position, what method would yield robust tracking of hand further?
I searched web a bit, and have understood I can go for optical flow of the detected hand , or kalman filter or particle filter, but also have come across their own disadvantages.
also, If I incorporate stereo vision, would it help me, as I can possibly reconstruct hand in 3d.
You concluded rightly about Haar features - they aren't that useful when it comes to non-rigid objects.
Take a look at the following papers which use skin colour to detect hands.
Interaction between hands and wearable cameras
Markerless inspection of augmented reality objects
and this paper that uses KLT features to track the hand after the first detection:
Fast 2D hand tracking with flocks of features and multi-cue integration
I would say that a stereo camera will not help your cause much, as 3D reconstruction of non-rigid objects isn't straightforward and would require a whole lot of innovation and development. However, you can take a look at the papers in the hand pose estimation section of this page if you wish to pursue 3D tracking.
EDIT: Also take a look at this recent paper, which seems to get good results.
Zhang et al.'s Real-time Compressive Tracking does a reasonable job of tracking an object, once it has been detected by some other method, provided that the motion is not too fast. They have an OpenCV implementation (but it would need a bit of work to reuse).
This research paper describes a method to track hands, without using gloves by using a stereo camera setup.
there have been similar questions on stack overflow...
have a look at my answer and that of others: https://stackoverflow.com/a/17375647/1463143
you can for certain get better results by avoiding haar training and detection for deformable entities.
CamShift algorithm is generally fast and accurate, if you want to track the hand as a single entity. OpenCV documentation contains a good, easy-to-understand demo program that you can easily modify.
If you need to track fingers etc., however, further modeling will be needed.

Haar Cascade vs Hog Detection

I have been working around with OpenCV for few days now and I have a project where I should detect cars and humans from the sky.
So here are my inputs:
A moving camera in the sky (embedded on a quadcopter) which is gonna capture frames.
A set of objects I should detect (humans and cars)
And here are my output:
A detection of those objects outlined by a rectangle or some contours
Based on that, my question is as follows: Which one between Haar Cascade and Hog Detection would you recommend to do so and why? Or any else?
Many thanks for your answers
HOG is usually better for human detection, than Haar. I have only experience in this so I thought I'd give some input on that. However, the limitation of HOG is that the human must be within a "perfect" area on the screen. Too close, it won't detect the human. Too far, it won't detect the human.
I have had better luck with HOG than Haar. Haar gave me too many false positives.
I have been trying to use HAAR to detect human, and it turns out to give too many false positives. I think HAAR is only suitable for face or eye detection.
since your camera is in the sky, the human is pretty small in the image and got a whole body shape. HOG would be a better choice.
You need to change scale factor and minimum neighbours in HAAR cascade which is not same for all the image. So it's better to use HOG.

Free Face Detection Algorithm for Video

I'm working on an application which needs to detect the location of a face in a video stream, using a web cam placed at desk height (and slightly off to the side of the user).
I've already implemented a version of OpenCV (using their Haar detection) and it works ok... the problem is that it tends to lose the position of the face if the user turns their head to the side (or looks up).
Since the webcam is sitting on the desk, it is tilted up at a 30 degree angle. The OpenCV detection algorithm is trained using fully frontal images, but not up-angle images like the ones I'm using. I know OpenCV also has a profile Haar file that can be used.. but from my research it seems that the results are quite mixed on profile detection. In addition, I don't really have control over the background or lighting of the image... so this sometimes also effects the efficacy of the OpenCV detection algorithm.
So, I guess what I'm asking is... are there other face detection algorithms (that are hopefully free, as this is part of my university research) that are better for detecting faces for this type of setup? It seems like some of the built-in webcams (for Macs and PCs) actually have fairly robust algorithms for detecting faces (and then overlaying cheesy cartoon images over the faces)... but they seem to work well regardless of background or lighting. Do you have any recommendations?
Thanks.
For research purposes, you can use the Haar cascades in OpenCV, things are different if you want to go commercial (in which case you need to consider LBP cascades instead). Just be sure to quote the Viola-Jones paper in your references.
To improve the results of face detection, you have several paths:
individual image detection: you can send rotated images to a frontal cascade to account for some variability without training your own cascade
individual image detection but more work) : train your own cascade in operating conditions closer to the ones of your app
stability in video streams (as in webcams & co.) : this is achieved by adding a layer of tracking around the face detection. Depending on your knowledge about this topic, you can use your own filter, have fun with OpenCV's particle or Kalman filter, implement a simple first or second order low pass filter on the face position or a PID tracker on the detected face...
Any of these tracking filters will enhance a lot your results when processing video streams.
Use CLM-framework for accurate realtime face detection and face landmark detection.
Example of the system in action: http://youtu.be/V7rV0uy7heQ
You may find it useful.

Resources