HOG Person Detector: False Positive detections on background subtracted images - opencv

I am working on a project which requires detection of people in a scene.
Initially after running the HOG detector on the original frames a particular background object was being detected as a person on all the frames, giving me 3021 false positive detections.
So I took the logical step to remove the static background by applying a background subtracter (BackgroundSubtractorMOG2) to all the frames.
The resulting frames looked like this:
Then these mask images were added (using bitwise_and) to the original image so the white pixels are replaced the pixels constituting the person.
Sample:
Then I ran the HOG detector on these images which gave the results like this:
As you can see there are a lot of false positive detections for some reason. I thought doing background subtraction will give me better results than using HOG on the original images.
Can someone please tell me why there are so many false positives in this method? And what can be done to improve the detection on background subtracted images?

The problem is that you changed the nature of your image by removing the background. So, the HOG detector was trained with normal images, without artificial black pixels, and now you are feeding it artificially altered images, so it is normal that it will perform in an weird way (still don't understand that detection at the top of the image though..)
If you want to use HOG detector on top of the background subtraction, you should train the HOG classifier with features taken from the background subtracted images.
One thing you can try (if this doesn't kill the performance of you application), is to use HOG detector on both images, with and without background, and accept only detections that overlap significantly on both, this may remove some false positives from both images.
PS: HOG was specially designed to work on raw images by detecting strong edges and test them against an SVM model. By removing background, we are creating artificial edges that kinda defeat the purpose of using HOG. But I think you can use it to remove false detections by doing what I suggested in the previous paragraph.

Related

Background Subtraction in OpenCV

I am trying to subtract two images using absdiff function ,to extract moving object, it works good but sometimes background appears in front of foreground.
This actually happens when the background and foreground colors are similar,Is there any solution to overcome this problem?
It may be description of the problem above not enough; so I attach images in the following
link .
Thanks..
You can use some pre-processing techniques like edge detection and some contrast stretching algorithm, which will give you some extra information for subtracting the image. Since color is same but new object should have texture feature like edge; if the edge gets preserved properly then when performing image subtraction you will obtain the object.
Process flow:
Use edge detection algorithm.
Contrast stretching algorithm(like histogram stretching).
Use the detected edge top of the contrast stretched image.
Now use the image subtraction algorithm from OpenCV.
There isn't enough information to formulate a complete solution to your problem but there are some tips I can offer:
First, prefilter the input and background images using a strong
median (or gaussian) filter. This will make your results much more
robust to image noise and confusion from minor, non-essential detail
(like the horizontal lines of your background image). Unless you want
to detect a single moving strand of hair, you don't need to process
the raw pixels.
Next, take the advice offered in the comments to test all 3 color
channels as opposed to going straight to grayscale.
Then create a grayscale image from the the max of the 3 absdiffs done
on each channel.
Then perform your closing and opening procedure.
I don't know your requirements so I can't take them into account. If accuracy is of the utmost importance. I'd use the median filter on input image over gaussian. If speed is an issue I'd scale down the input images for processing by at least half, then scale the result up again. If the camera is in a fixed position and you have a pre-calibrated background, then the current naive difference method should work. If the system has to determine movement from a real world environment over an extended period of time (moving shadows, plants, vehicles, weather, etc) then a rolling average (or gaussian) background model will work better. If the camera is moving you will need to do a lot more processing, probably some optical flow and/or fourier transform tests. All of these things need to be considered to provide the best solution for the application.

Haar classifiers for object with varied poses

I'm trying to perform image recognition of large birds, and since the camera is moving, my usual tactic of background removal and shape recognition won't be effective. Since all adult birds of the species are remarkably similar in coloration, I was thinking that Haar classifiers may be effective, and found some resources to try and train my own classifier.
Negative images should be significantly larger than the positive images, and present similar environments but not contain the positive image. However, I haven't been able to find too many details on what makes for a good positive training image. I found some references to trying to keep a similar aspect ratio in all positive training images, but for something like a bird that can change dramatically in different poses (wings open, wings closed), is it crucial? Is it better to train multiple classifiers for each pose and orientation of the target to be recognized? Is it better to instead try to identify a subset of the object that is relatively consistent (like the bird's very distinctive black and white head)?
If I use a sub-feature like the head, does it matter if it is "mirrored" in some views? Should I artificially make my positive images face the same direction, and when running the classifier on an image, run it twice, once mirrored? What are the considerations made when designing the positive image set for a classifier?

A suitable workflow to detect and classify blurs in images? [duplicate]

I had asked this on photo stackexchange but thought it might be relevant here as well, since I want to implement this programatically in my implementation.
I am trying to implement a blur detection algorithm for my imaging pipeline. The blur that I want to detect is both -
1) Camera Shake: Pictures captured using hand which moves/shakes when shutter speed is less.
2) Lens focussing errors - (Depth of Field) issues, like focussing on a incorrect object causing some blur.
3) Motion blur: Fast moving objects in the scene, captured using a not high enough shutter speed. E.g. A moving car a night might show a trail of its headlight/tail light in the image as a blur.
How can one detect this blur and quantify it in some way to make some decision based on that computed 'blur metric'?
What is the theory behind blur detection?
I am looking of good reading material using which I can implement some algorithm for this in C/Matlab.
thank you.
-AD.
Motion blur and camera shake are kind of the same thing when you think about the cause: relative motion of the camera and the object. You mention slow shutter speed -- it is a culprit in both cases.
Focus misses are subjective as they depend on the intent on the photographer. Without knowing what the photographer wanted to focus on, it's impossible to achieve this. And even if you do know what you wanted to focus on, it still wouldn't be trivial.
With that dose of realism aside, let me reassure you that blur detection is actually a very active research field, and there are already a few metrics that you can try out on your images. Here are some that I've used recently:
Edge width. Basically, perform edge detection on your image (using Canny or otherwise) and then measure the width of the edges. Blurry images will have wider edges that are more spread out. Sharper images will have thinner edges. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous paper that describes this approach well enough for a full implementation. If you're dealing with motion blur, then the edges will be blurred (wide) in the direction of the motion.
Presence of fine detail. Have a look at my answer to this question (the edited part).
Frequency domain approaches. Taking the histogram of the DCT coefficients of the image (assuming you're working with JPEG) would give you an idea of how much fine detail the image has. This is how you grab the DCT coefficients from a JPEG file directly. If the count for the non-DC terms is low, it is likely that the image is blurry. This is the simplest way -- there are more sophisticated approaches in the frequency domain.
There are more, but I feel that that should be enough to get you started. If you require further info on either of those points, fire up Google Scholar and look around. In particular, check out the references of Marziliano's paper to get an idea about what has been tried in the past.
There is a great paper called : "analysis of focus measure operators for shape-from-focus" (https://www.researchgate.net/publication/234073157_Analysis_of_focus_measure_operators_in_shape-from-focus) , which does a comparison about 30 different techniques.
Out of all the different techniques, the "Laplacian" based methods seem to have the best performance. Most image processing programs like : MATLAB or OPENCV have already implemented this method . Below is an example using OpenCV : http://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/
One important point to note here is that an image can have some blurry areas and some sharp areas. For example, if an image contains portrait photography, the image in the foreground is sharp whereas the background is blurry. In sports photography, the object in focus is sharp and the background usually has motion blur. One way to detect such a spatially varying blur in an image is to run a frequency domain analysis at every location in the image. One of the papers which addresses this topic is "Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes" (cvpr2017).
the authors look at multi resolution DCT coefficients at every pixel. These DCT coefficients are divided into low, medium, and high frequency bands, out of which only the high frequency coefficients are selected.
The DCT coefficients are then fused together and sorted to form the multiscale-fused and sorted high-frequency transform coefficients
A subset of these coefficients are selected. the number of selected coefficients is a tunable parameter which is application specific.
The selected subset of coefficients are then sent through a max pooling block to retain the highest activation within all the scales. This gives the blur map as the output, which is then sent through a post processing step to refine the map.
This blur map can be used to quantify the sharpness in various regions of the image. In order to get a single global metric to quantify the bluriness of the entire image, the mean of this blur map or the histogram of this blur map can be used
Here are some examples results on how the algorithm performs:
The sharp regions in the image have a high intensity in the blur_map, whereas blurry regions have a low intensity.
The github link to the project is: https://github.com/Utkarsh-Deshmukh/Spatially-Varying-Blur-Detection-python
The python implementation of this algorithm can be found on pypi which can easily be installed as shown below:
pip install blur_detector
A sample code snippet to generate the blur map is as follows:
import blur_detector
import cv2
if __name__ == '__main__':
img = cv2.imread('image_name', 0)
blur_map = blur_detector.detectBlur(img, downsampling_factor=4, num_scales=4, scale_start=2, num_iterations_RF_filter=3)
cv2.imshow('ori_img', img)
cv2.imshow('blur_map', blur_map)
cv2.waitKey(0)
For detecting blurry images, you can tweak the approach and add "Region of Interest estimation".
In this github link: https://github.com/Utkarsh-Deshmukh/Blurry-Image-Detector , I have used local entropy filters to estimate a region of interest. In this ROI, I then use DCT coefficients as feature extractors and train a simple multi-layer perceptron. On testing this approach on 20000 images in the "BSD-B" dataset (http://cg.postech.ac.kr/research/realblur/) I got an average accuracy of 94%
Just to add on the focussing errors, these may be detected by comparing the psf of the captured blurry images (wider) with reference ones (sharper). Deconvolution techniques may help correcting them but leaving artificial errors (shadows, rippling, ...). A light field camera can help refocusing to any depth planes since it captures the angular information besides the traditional spatial ones of the scene.

Reduce false detection in Pedestrian Detection

I am using OpenCV sample code “peopledetect.cpp” to detect pedestrians.
The code uses HoG for feature extraction and SVM for classification. Please find the reference paper used here.
The camera is mounted on the wall at a height of 10 feet and 45o down. There is no restriction on the pedestrian movement within the frame.
I am satisfied with the true positive rate (correctly detecting pedestrians) but false positive rate is very high.
Some of the false detections I observed are moving car, tree, and wall among others.
Can anyone suggest me how to improve the existing code to reduce false detection rate.
Any reference to blogs/codes is very helpful.
You could apply a background subtraction algorithm on your video stream. I had some success on a similar project using BackgroundSubtractorMOG2.
Another trick I used is to eliminate all "moving pixels" that are too small or with a wrong aspect ratio. I did this by doing a blob/contour analysis of the background subtraction output image. You need to be careful with the aspect ratio to make sure you support overlapping pedestrians.
Note that the model you're using (not sure which) is probably trained on a front faced pedestrian and not with a 45 degrees angle down. This will obviously affect your accuracy.

Detection of Blur in Images/Video sequences

I had asked this on photo stackexchange but thought it might be relevant here as well, since I want to implement this programatically in my implementation.
I am trying to implement a blur detection algorithm for my imaging pipeline. The blur that I want to detect is both -
1) Camera Shake: Pictures captured using hand which moves/shakes when shutter speed is less.
2) Lens focussing errors - (Depth of Field) issues, like focussing on a incorrect object causing some blur.
3) Motion blur: Fast moving objects in the scene, captured using a not high enough shutter speed. E.g. A moving car a night might show a trail of its headlight/tail light in the image as a blur.
How can one detect this blur and quantify it in some way to make some decision based on that computed 'blur metric'?
What is the theory behind blur detection?
I am looking of good reading material using which I can implement some algorithm for this in C/Matlab.
thank you.
-AD.
Motion blur and camera shake are kind of the same thing when you think about the cause: relative motion of the camera and the object. You mention slow shutter speed -- it is a culprit in both cases.
Focus misses are subjective as they depend on the intent on the photographer. Without knowing what the photographer wanted to focus on, it's impossible to achieve this. And even if you do know what you wanted to focus on, it still wouldn't be trivial.
With that dose of realism aside, let me reassure you that blur detection is actually a very active research field, and there are already a few metrics that you can try out on your images. Here are some that I've used recently:
Edge width. Basically, perform edge detection on your image (using Canny or otherwise) and then measure the width of the edges. Blurry images will have wider edges that are more spread out. Sharper images will have thinner edges. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous paper that describes this approach well enough for a full implementation. If you're dealing with motion blur, then the edges will be blurred (wide) in the direction of the motion.
Presence of fine detail. Have a look at my answer to this question (the edited part).
Frequency domain approaches. Taking the histogram of the DCT coefficients of the image (assuming you're working with JPEG) would give you an idea of how much fine detail the image has. This is how you grab the DCT coefficients from a JPEG file directly. If the count for the non-DC terms is low, it is likely that the image is blurry. This is the simplest way -- there are more sophisticated approaches in the frequency domain.
There are more, but I feel that that should be enough to get you started. If you require further info on either of those points, fire up Google Scholar and look around. In particular, check out the references of Marziliano's paper to get an idea about what has been tried in the past.
There is a great paper called : "analysis of focus measure operators for shape-from-focus" (https://www.researchgate.net/publication/234073157_Analysis_of_focus_measure_operators_in_shape-from-focus) , which does a comparison about 30 different techniques.
Out of all the different techniques, the "Laplacian" based methods seem to have the best performance. Most image processing programs like : MATLAB or OPENCV have already implemented this method . Below is an example using OpenCV : http://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/
One important point to note here is that an image can have some blurry areas and some sharp areas. For example, if an image contains portrait photography, the image in the foreground is sharp whereas the background is blurry. In sports photography, the object in focus is sharp and the background usually has motion blur. One way to detect such a spatially varying blur in an image is to run a frequency domain analysis at every location in the image. One of the papers which addresses this topic is "Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes" (cvpr2017).
the authors look at multi resolution DCT coefficients at every pixel. These DCT coefficients are divided into low, medium, and high frequency bands, out of which only the high frequency coefficients are selected.
The DCT coefficients are then fused together and sorted to form the multiscale-fused and sorted high-frequency transform coefficients
A subset of these coefficients are selected. the number of selected coefficients is a tunable parameter which is application specific.
The selected subset of coefficients are then sent through a max pooling block to retain the highest activation within all the scales. This gives the blur map as the output, which is then sent through a post processing step to refine the map.
This blur map can be used to quantify the sharpness in various regions of the image. In order to get a single global metric to quantify the bluriness of the entire image, the mean of this blur map or the histogram of this blur map can be used
Here are some examples results on how the algorithm performs:
The sharp regions in the image have a high intensity in the blur_map, whereas blurry regions have a low intensity.
The github link to the project is: https://github.com/Utkarsh-Deshmukh/Spatially-Varying-Blur-Detection-python
The python implementation of this algorithm can be found on pypi which can easily be installed as shown below:
pip install blur_detector
A sample code snippet to generate the blur map is as follows:
import blur_detector
import cv2
if __name__ == '__main__':
img = cv2.imread('image_name', 0)
blur_map = blur_detector.detectBlur(img, downsampling_factor=4, num_scales=4, scale_start=2, num_iterations_RF_filter=3)
cv2.imshow('ori_img', img)
cv2.imshow('blur_map', blur_map)
cv2.waitKey(0)
For detecting blurry images, you can tweak the approach and add "Region of Interest estimation".
In this github link: https://github.com/Utkarsh-Deshmukh/Blurry-Image-Detector , I have used local entropy filters to estimate a region of interest. In this ROI, I then use DCT coefficients as feature extractors and train a simple multi-layer perceptron. On testing this approach on 20000 images in the "BSD-B" dataset (http://cg.postech.ac.kr/research/realblur/) I got an average accuracy of 94%
Just to add on the focussing errors, these may be detected by comparing the psf of the captured blurry images (wider) with reference ones (sharper). Deconvolution techniques may help correcting them but leaving artificial errors (shadows, rippling, ...). A light field camera can help refocusing to any depth planes since it captures the angular information besides the traditional spatial ones of the scene.

Resources