I'm working on an application which needs to detect the location of a face in a video stream, using a web cam placed at desk height (and slightly off to the side of the user).
I've already implemented a version of OpenCV (using their Haar detection) and it works ok... the problem is that it tends to lose the position of the face if the user turns their head to the side (or looks up).
Since the webcam is sitting on the desk, it is tilted up at a 30 degree angle. The OpenCV detection algorithm is trained using fully frontal images, but not up-angle images like the ones I'm using. I know OpenCV also has a profile Haar file that can be used.. but from my research it seems that the results are quite mixed on profile detection. In addition, I don't really have control over the background or lighting of the image... so this sometimes also effects the efficacy of the OpenCV detection algorithm.
So, I guess what I'm asking is... are there other face detection algorithms (that are hopefully free, as this is part of my university research) that are better for detecting faces for this type of setup? It seems like some of the built-in webcams (for Macs and PCs) actually have fairly robust algorithms for detecting faces (and then overlaying cheesy cartoon images over the faces)... but they seem to work well regardless of background or lighting. Do you have any recommendations?
Thanks.
For research purposes, you can use the Haar cascades in OpenCV, things are different if you want to go commercial (in which case you need to consider LBP cascades instead). Just be sure to quote the Viola-Jones paper in your references.
To improve the results of face detection, you have several paths:
individual image detection: you can send rotated images to a frontal cascade to account for some variability without training your own cascade
individual image detection but more work) : train your own cascade in operating conditions closer to the ones of your app
stability in video streams (as in webcams & co.) : this is achieved by adding a layer of tracking around the face detection. Depending on your knowledge about this topic, you can use your own filter, have fun with OpenCV's particle or Kalman filter, implement a simple first or second order low pass filter on the face position or a PID tracker on the detected face...
Any of these tracking filters will enhance a lot your results when processing video streams.
Use CLM-framework for accurate realtime face detection and face landmark detection.
Example of the system in action: http://youtu.be/V7rV0uy7heQ
You may find it useful.
Related
I am looking for algorithms/publications on face detection. There are plenty in the web. But my scenario is somewhat specialized. I want to detect faces accurately in images taken by wearable devices (e.g. narrative clips), so there will be motion blur, and image quality will not be that good. I want to detect faces that are within 15 feet of the camera accurately. Next goal is to estimate the pose, primarily to find out if the person is looking toward the camera ( or better looking at the camera owner).
Any suggestion?
My go to for this would either be a deep-learning framework using convolutional layers for pixel classification, or K-means/ K-Nearest Neighbour algorithm.
This does depend on your data, however. From your post I am assuming that your data isn't labelled? meaning you are unable to feed in the 'truth' to the algorithm for classification.
you could perhaps use a CNN (convolutional neural network) for pixel classification (image segmentation) which should identify the location of a person. given this, perhaps you could run a 'local' CNN i a region close to the face identified to classify the region the body is located in as a certain pose.
This would probably be my first take on the problem but would depend on the exact structure of your data, and the structure of your labels (if you have any).
I have to say it does sound like a fun project!
I found OpenCV's Haar Cascades for Face Detection pretty accurate and robust for motion blur and "live" face recognition.
I'm saying that because I used them for implementing an Eye-Tracker in C++ with a laptop webcam (whose resolution was not excellent and motion blur was naturally always present).
They work in multiresolution and are therefore able to detect faces of any size, but you can easily tune them for your distance of interest.
They might not be your final optimal solution, but since they are already implemented and come with the OpenCV package, they could constitute a good starting point.
I've built an imaging system with a webcam and feature matching such that as I move the camera around; I can track the camera's motion. I am doing something similar to here, except with the webcam frames as the input.
It works really well for "good" images, but when taking images in really low light lots of noise appears (camera high gain), and that messes with the feature detection and matching. Basically, it doesn't detect any good features, and when it does, it cannot match them correctly between frames.
Does anyone know a good solution for this? What other methods are used for finding and matching features?
Here are two example images with very low features:
I think phase correlation is going to be your best bet here. It is designed to tell you the phase shift (i.e., translation) between two images. It is much more resilient (but not immune) to noise than feature detection because it operates in frequency space; whereas, feature detectors operate spatially. Another benefit is, it is very fast when compared with feature detection methods. I have an implementation available in the OpenCV trunk that is sub-pixel accurate located here.
However, your images are pretty much "featureless" with the exception of the crease in the middle, so even phase correlation may have some trouble with it. Think of it like trying to detect translation in a snow storm. If all you can see is white, you can't tell that you have translated at all, thus the term whiteout. In your case, the algorithm might suffer from "greenout" :)
Can you adjust the camera settings to work better in low-light conditions. Have you fully opened the iris? Can you live with lower framerates? Setting a longer exposure time will allow the camera to gather more light, thus giving you more features at the cost of adding motion blur. Or, if low-light is your default environment you probably want something designed for this like an IR camera, but those can be expensive. Other than that, a big lens and long exposures are your friend :)
Histogram equalization may be of interest in improving the image contrast. But, sometimes it can just enhance the noise. OpenCV has a global histogram equalization function called equalizeHist. For a more localized implementation, you'll want to look at Contrast Limited Adaptive Histogram Equalization or CLAHE for short. Here is a good article on it. This page has some nice examples, and some code.
i am working on a project detecting and tracking fingers. Though i find there is quiet a lot resource on this task, i haven't found a effective one yet :(.
So far i have thought of methods to detect hands as follow:
Haar training. But firstly we don't have a trained set(xml) as that in the face detection. Secondly, if we do the training ourselves, we don't have enough samples (i am still a college student)
skin color detection in HSV space. I have tried this one but the result has a lot of noises so cannot helps me continue the further detection on fingertip.
3.use Handvu. But i have heart that this lib is hard to set up and used in windows...
So in a word, can anyone give me any suggestions on how to detect hands effectively? (After that i may consider about detecting fingertips..)
Thanks!!
Here is a pretty in-depth paper on finger segmentation using Zernike moments. Here is a good paper on using Zernike moments for image recognition as a basis for the first paper.
Can you explain more about your experimental setup? Are you trying to track fingers against a cluttered background, or a plain cardboard sheet?
Haar like features perform very well for face recognition (the Viola Jones paper being a classic example) however I would not recommend them for your task. Although they can be computed fast using the integral image, they work well using a CASCADED Adaboost classification framework.
For skin colour detection, it depends on your setup. As a first step you could try doing background subtraction: simply learn the distribution (histogram) of pixels for foreground (ie. the hand) and the background and use these to do image segmentation.
I don't know what Handvu is
Zernike moments are also very good shape descriptors that are rotation invariant and can be made to be both scale and translation invariant.
I hope this helps!
what approach would you recommend for finding obstacles in a 2D image?
Here are some key points I came up with till now:
I doubt I can use object recognition based on "database of obstacles" search, since I don't know what might the obstruction look like.
I assume color recognition might be problematic if the path does not differ a lot from the object itself.
Possibly, adding one more camera and computing a 3D image (like a Kinect does) would work, but that would not run as smooth as I require.
To illustrate the problem; robot can ride either left or right side of the pavement. In the following picture, left side is the correct choice:
If you know what the path looks like, this is largely a classification problem. Acquire a bunch of images of path at different distances, illumination, etc. and manually label the ground in each image. Use this labeled data to train a classifier that classifies each pixel as either "road" or "not road." Depending upon the texture of the road, this could be as simple as classifying each pixels' RGB (or HSV) values or using OpenCv's built-in histogram back-projection (i.e. cv::CalcBackProjectPatch()).
I suggest beginning with manual thresholds, moving to histogram-based matching, and only using a full-fledged machine learning classifier (such as a Naive Bayes Classifier or a SVM) if the simpler techniques fail. Once the entire image is classified, all pixels that are identified as "not road" are obstacles. By classifying the road instead of the obstacles, we completely avoided building a "database of objects".
Somewhat out of the scope of the question, the easiest solution is to add additional sensors ("throw more hardware at the problem!") and directly measure the three-dimensional position of obstacles. In order of preference:
Microsoft Kinect: Cheap, easy, and effective. Due to ambient IR light, it only works indoors.
Scanning Laser Rangefinder: Extremely accurate, easy to setup, and works outside. Also very expensive (~$1200-10,000 depending upon maximum range and sample rate).
Stereo Camera: Not as good as a Kinect, but it works outside. If you cannot afford a pre-made stereo camera (~$1800), you can make a decent custom stereo camera using USB webcams.
Note that professional stereo vision cameras can be very fast by using custom hardware (Stereo On-Chip, STOC). Software-based stereo is also reasonably fast (10-20 Hz) on a modern computer.
How can we detect rapid motion and object simultaneously, let me give an example,....
suppose there is one soccer match video, and i want to detect position of each and every players with maximum accuracy.i was thinking about human detection but if we see soccer match video then there is nothing with human detection because we can consider human as objects.may be we can do this with blob detection but there are many problems with blobs like:-
1) I want to separate each and every player. so if players will collide then blob detection will not help. so there will problem to identify player separately
2) second will be problem of lights on stadium.
so is there any particular algorithm or method or library to do this..?
i've seen some research paper but not satisfied...so suggest anything related to this like any article,algorithm,library,any method, any research paper etc. and please all express your views in this.
For fast and reliable human detection, Dalal and Triggs' Histogram of Gradients is generally accepted as very good. Have you tried playing with that?
Since you mentioned rapid motion changes, are you worried about fast camera motion or fast player/ball motion?
You can do 2D or 3D video stabilization to fix camera motion (try the excellent Deshaker plugin for VirtualDub).
For fast player motion, background subtraction or other blob detection will definitely help. You can use that to get a rough kinematic estimate and use that as an estimate of your blur kernel. This can then be used to deblur the image chip containing the player.
You can do additional processing to establish identify based upon OCRing jersey numbers, etc.
You mentioned concern about lights on the stadium. Is the main issue that it will cast shadows? That can be dealt with by the HOG detector. Blob detection to get blur kernel should still work fine with the shadow.
If you have control over the camera, you may want to reduce exposure times to reduce blur. Denoising techniques can be used to reduce CCD noise that occurs with extreme low light and dense optical flow approaches align the frames and boost the signal back up to something reasonable via adding the denoised frames.