HOG Person Detector: Why background ground objects are detected as people? - opencv

I am working on a project which involves detection of people in various frames. The detector is able to detect most of the people in the frame sequence.
But it sometimes detects stationary background objects as people. I would really like to know why this is happening and how does the current working of the detector lead to these false positives.
And what can be done to remove these false positives?
A sample of false positive detection:

As the authors of the this paper imply in the title: "How Far are We from Solving Pedestrian Detection?", we haven't solved yet the problem of visual pedestrian detection in real scenarios, in fact, some think it will never be completely solved.
Detecting people in urban scenarios may rank among the most difficult tasks in computer vision. The scenes are cluttered with chaotic, random and unpredictable elements, pedestrians may be occluded, they may be hidden in shadow or in such dark environments that a camera can't see them. In fact, visual pedestrian detection remains one of the most important challenges to date.
And you aren't even using the best method in the state of the art, as you can see in the bellow graphic, its been a long time since HOG has been the best performing algorithm for this task.
(image taken from "Pedestrian Detection: An Evaluation of the State of the Art" by Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona)
That paper is already a bit outdated, but you see that even the best performing algorithms still do not perform brilliantly in image datasets, let alone real scenarios.
So, to answer your question, what can you do to improve its performance? It depends. If there are assumptions you can make in your specific scenario that allow this problem to become simpler, then you may be able to eliminate some false positives. Other way to improve results, and what every single Autonomous Driving Assistance System does, is fusing different sensor information to help the visual system. Most use LIDAR and RADAR to feed the camera with places to look at, and this helps the algorithm both in performance and speed.
So, as you can see it is very application dependent. If your application is supposed to work on a simple scenario, then a background subtraction algorithm will help removing false detections. You can also bootstrap your classifier with wrongly detected data to improve its performance.
But know one thing: there is no 100% in Computer-Vision, no matter how much you try. It is always a balance between accepting false positives and system robustness.
Cheers.
EDIT: To answer the question the the title, why background objects are detected as people? Because HOG is all about evaluating edges of the image, then you are probably sending HOG features to a SVM, right? The vertical pole detected in the image you provide shares some visual properties with humans, such as its vertical edges. That is why these algorithms fail a lot in traffic signs and other vertical elements, as you can see in my master thesis on this topic: Visual Pedestrian Detection using Integral Channels for ADAS

Related

Recognize objects while falling - viewpoint variation

I have a problem statement to recognize 10 classes of different variations(variations in color and size) of same object (bottle cap) while falling taking into account the camera sees different viewpoint of the object. I have split this into sub-tasks
1) Trained a deep learning model to classify only the flat surface of the object and successful in this attempt.
Flat Faces of sample 2 class
2) Instead of taking fall into account, trained a model for possible perspective changes - not successful.
Perception changes of sample 2 class
What are the approaches to recognize the object even for perspective changes. I am not constrained to arrive with a single camera solution. Open to ideas in approaching towards this problem of variable perceptions.
Any help could be really appreciated, Thanks in advance!
The answer I want to give you is: CapsNets
You should definately check out the paper, where you will be introduced to some short comings of CNNs and how they tried to fix them.
That said, I find it hard to believe that your architecture cannot solve the problem successfully when the perspective changes. Is your dataset extremely small? I'd expect the neural network to learn filters for the riffled edges, which can be seen from all perspectives.
If you're not limited to one camera you could try to train a "normal" classifier, which you feed multiple images in production and average the prediction. Or you could build an architecture that takes in multiple perspectives at once. You have to try for yourself, what works best.
Also, never underestimate the power of old school image preprocessing. If you have 3 different perspectives, you could take the one that comes closest to the "flat" perspective. This is probably as easy as using the image with the largest colored area, where img.sum() is the highest.
Another idea is to figure out the color through explicit programming, which should be fairly easy and then feed the network a grayscale image. Maybe your network is confused by the strong correlation of the color and ignores the shape altogether.

OpenCV Hog person detect is fooled by vertical lines?

I've been testing openCV on an RPi using Python. The video is coming from a USB grabber from a CCTV camera.
I tested it in a room with 'ideal' stick figures and it worked great, tracking and zoom automatically.
However when testing in the real world, the first test location has a corrugated roof in view and the vertical lines of the roof always get detected as a person.
I was very surprised by this as the HoG detection seemed quite robust against bushes, trees and other optically jumbled images. A series of vertical lines seems to catch it out every time.
Why might this be?
Do I need to look at trying to re-train it? I would imagine this would be quite a task!
Has anyone else found this issue?
Maybe I should try and pre-filter the vertical lines out of the image?
Having a person tracker that can't cope with fences or roofs is a bit of a limitation!
Having false positives after just a single training session is common and should be expected. You should now record all these false positives and use them for hard negative training. That is, you would add these false positives in the negative training set. Once you perform a hard negative training, your model should perform much better and the number of false positives will reduce.
Understanding why the fence and other edges shows up as a false positive is a bit complicated to explain and is better explained by the many articles and the original HOG paper by Dalal and Triggs, which I would highly recommend.

Real time tracking of hand

I am trying to detect and track hand in real time using opencv. I thought haar cascade classifiers would yield a fair result. After training with 10k and 20k positive and negative images respectively, I obtained a classifier xml file. Unfortunately, it detects hand only in certain positions, proving that it works best only for rigid objects. So I am now thinking of adopting another algorithm that can track hand, once detected through haar classifier.
My question is,if I make sure that haar classifier detects hand in a certain frame, certain position, what method would yield robust tracking of hand further?
I searched web a bit, and have understood I can go for optical flow of the detected hand , or kalman filter or particle filter, but also have come across their own disadvantages.
also, If I incorporate stereo vision, would it help me, as I can possibly reconstruct hand in 3d.
You concluded rightly about Haar features - they aren't that useful when it comes to non-rigid objects.
Take a look at the following papers which use skin colour to detect hands.
Interaction between hands and wearable cameras
Markerless inspection of augmented reality objects
and this paper that uses KLT features to track the hand after the first detection:
Fast 2D hand tracking with flocks of features and multi-cue integration
I would say that a stereo camera will not help your cause much, as 3D reconstruction of non-rigid objects isn't straightforward and would require a whole lot of innovation and development. However, you can take a look at the papers in the hand pose estimation section of this page if you wish to pursue 3D tracking.
EDIT: Also take a look at this recent paper, which seems to get good results.
Zhang et al.'s Real-time Compressive Tracking does a reasonable job of tracking an object, once it has been detected by some other method, provided that the motion is not too fast. They have an OpenCV implementation (but it would need a bit of work to reuse).
This research paper describes a method to track hands, without using gloves by using a stereo camera setup.
there have been similar questions on stack overflow...
have a look at my answer and that of others: https://stackoverflow.com/a/17375647/1463143
you can for certain get better results by avoiding haar training and detection for deformable entities.
CamShift algorithm is generally fast and accurate, if you want to track the hand as a single entity. OpenCV documentation contains a good, easy-to-understand demo program that you can easily modify.
If you need to track fingers etc., however, further modeling will be needed.

How can HOG be used to detect individual body parts

Information:
I would like to use OpenCV's HOG detection to identify objects that can be seen in a variety of orientations. The only problem is, I can't seem to find a reasonable feature detector or classifier to detect this in a rotation and scale invaraint way (as is needed by objects such as forearms).
Prior Work:
Lets focus on forearms for this discussion. A forearm can have multiple orientations, the primary distinct features probably being its contour edges. It is possible to have images of forearms that are pointing in any direction in an image, thus the complexity. So far I have done some in depth research on using HOG descriptors to solve this problem, but I am finding that the variety of poses produced by forearms in my positives training set is producing very low detection scores in actual images. I suspect the issue is that the gradients produced by each positive image do not produce very consistent results when saved into the Histogram. I have reviewed many research papers on the topic trying to resolve or improvie this, including the original from Dalal & Triggs [Link]: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf It also seems that the assumptions made for detecting whole humans do not necessary apply to detecting individual features (particularly the assumption that all humans are standing up seems to suggest HOG is not a good route for rotation invariant detection like that of forearms).
Note:
If possible, I would like to steer clear of any non-free solutions such as those pertaining to Sift, Surf, or Haar.
Question:
What is a good solution to detecting rotation and scale invariant objects in an image? Particularly for this example, what would be a good solution to detecting all orientations of forearms in an image?
I use hog to detect human heads and shoulders. To train particular part you have to give the location of it. If you use opencv, you can clip samples containing only the training part you want, and make sure all training samples share the same size. For example, I clip images to contain only head and shoulder and resize all them to 64x64. Other opensource codes may require you to pass the location as the input parameter, essentially the same.
Are you trying the Discriminatively trained deformable part model ?http://www.cs.berkeley.edu/~rbg/latent/
you may find answers there.

Feature Detection in Noisy Images

I've built an imaging system with a webcam and feature matching such that as I move the camera around; I can track the camera's motion. I am doing something similar to here, except with the webcam frames as the input.
It works really well for "good" images, but when taking images in really low light lots of noise appears (camera high gain), and that messes with the feature detection and matching. Basically, it doesn't detect any good features, and when it does, it cannot match them correctly between frames.
Does anyone know a good solution for this? What other methods are used for finding and matching features?
Here are two example images with very low features:
I think phase correlation is going to be your best bet here. It is designed to tell you the phase shift (i.e., translation) between two images. It is much more resilient (but not immune) to noise than feature detection because it operates in frequency space; whereas, feature detectors operate spatially. Another benefit is, it is very fast when compared with feature detection methods. I have an implementation available in the OpenCV trunk that is sub-pixel accurate located here.
However, your images are pretty much "featureless" with the exception of the crease in the middle, so even phase correlation may have some trouble with it. Think of it like trying to detect translation in a snow storm. If all you can see is white, you can't tell that you have translated at all, thus the term whiteout. In your case, the algorithm might suffer from "greenout" :)
Can you adjust the camera settings to work better in low-light conditions. Have you fully opened the iris? Can you live with lower framerates? Setting a longer exposure time will allow the camera to gather more light, thus giving you more features at the cost of adding motion blur. Or, if low-light is your default environment you probably want something designed for this like an IR camera, but those can be expensive. Other than that, a big lens and long exposures are your friend :)
Histogram equalization may be of interest in improving the image contrast. But, sometimes it can just enhance the noise. OpenCV has a global histogram equalization function called equalizeHist. For a more localized implementation, you'll want to look at Contrast Limited Adaptive Histogram Equalization or CLAHE for short. Here is a good article on it. This page has some nice examples, and some code.

Resources