Histogram of Oriented Gradients in multi-scale (mean-shift?) - image-processing

I am working on HOG descriptors and I am pretty much done with most of the parts, except the fusion of the detection windows.
What I have done so far is; I build a scale space pyramid of the image and for each image on each scale I move the detection window(64x128) and detect humans. In each image a person is detected by more than one window.
So the question is how to fuse all these windows(assume for one person) into one window. Dalal suggests that one should use a robust mod detection algorithm, such as mean-shift. But, I have multiple scales... Should I first estimate the true location of the detection window found in lower levels of the scale space in order to do that?
Any help is appreciated.
Thanks in advance.

My interpretation is that mean shift would give you in effect what you are suggesting.
Essentially, you estimate the probability distribution of the location of the person at the coarsest scale first based upon the strengths of the detector outputs. This gives you a robust estimate of mode.
You can then iteratively refine using the finer scales around the maximum or the mode.
The idea is very similar that used in pyramidal LK tracking, for example. You can also do ensemble processing and/or particle filters.

Related

Tips on building a program detecting pupil in images

I am working on a project that aims to build a program which automatically gives a relatively accurate detection of pupil region in eye pictures. I am currently using simplecv in Python, given that Python is easier to experiment with. Since I just started, the eye pictures I am working with are fairly standardized. However, the size of iris and pupil as well as the color of iris can vary. And the position of the eye can shift a little among pictures. Here's a picture from wikipedia that is similar to the pictures I am using:
"MyStrangeIris.JPG" by Epicstessie is licensed under CC BY-SA 3.0
I have tried simple thresholding. Since different eyes have different iris colors, a fixed thresholding would not work on all pictures.
In addition, I tried simplecv's build-in sobel and canny edge detection, it's not working especially for eyes with darker iris. I also doubt that sobel or canny alone can solve the problem, given sometimes there are noises on the edge of the pupil (e.g., reflection of eyelash)
I have entry-level knowledge about image processing and machine learning. Right now, I am thinking about three possibilities:
Do a regression on the threshold value base on some variables
Make a specific mask only for edge detection for the pupil
classification on each pixel (this looks like lots of work to build the training set)
Am I on the right track? I would like to reach out to anyone with more experience on this type of problem. Any tips/suggestions are more than welcome. Thanks!
I think that for start you should put aside the machine learning. You have so much more to try in "regular" computer vision.
You need to try and describe a model for your problem. A good way to do this is to sit and think how you as a person detect iris. For example, i can think of:
It is near the center of image.
It is is Brown/green/blue circle, with distinct black center, surrounded by mostly white ellipse.
You have a skin color around the white ellipse.
It can't be too small or too large (depends on your images..)
After you build your model, try to find better ways to find these features. Hard to point on specific stuff, but you can start from: HSV color space, Correlation, Hough transform, Morphological operations..
Only after you feel you have exhausted all conventional tools, start thinking on features extraction and machine learning..
And BTW, because you are not the first person that try to detect iris, you can look at other projects for ideas.
I have written a small matlab code for image (link you have provided), function which i have used is hough transform for circle detection, which has also implemented in opencv, so porting will not create problem, i just want to know that i am on write way or not.
my result and code is as follows:
clc
clear all
close all
im = imresize(imread('irisdet.JPG'),0.5);
gray = rgb2gray(im);
Rmin = 50; Rmax = 100;
[centersDark, radiiDark] = imfindcircles(gray,[Rmin Rmax],'ObjectPolarity','dark');
figure,imshow(im,[])
viscircles(centersDark, radiiDark,'EdgeColor','b');
Input Image:
Result of Algorithm:
Thank You
Not sure about iris classification, but I've done written digit recognition from photos. I would recommend tuning up the contrast and saturation, then use a k-nearest neighbour algorithm to classify your images. Depending on your training set, you can get as high as 90% accuracy.
I think you are on the right track. Do image preprocessing to make classification easier, then train an algorithm of your choice. You would want to treat each image as one input vector though, instead of classifying each pixel!
I think you can try Active Shape Modelling or if you want a really feature rich modelling and do not care about the time it takes execute the algorithm you can try Active appearance modelling. You might want to look into these papers for better understanding:
Active Shape Models: Their Training and Application
Statistical Models of Appearance for Computer Vision - In Depth

Can Haar Cascade be too accurate to be useful in this situation?

I'm making a program to detect shapes from an r/c plane for a competition. I have no real images of the targets, but I do have computer generated examples of them on the rules.
My question is, can I train my program to detect real world objects based on computer generated shapes or should I find a different method to complete this task?
I would like to know before I foolishly generate 5k samples and find them useless in the end.
EDIT: I also don't know the exact color of the objects. If I feed the program samples of varying color, will it be a problem?
Thanks in advance!!
Edit2: Here's what groups from my school detected in previous years
As you can see, the detected images are not nearly as flawless as what would appear in real life. If you can suggest a better method, that would help.
If you think that the real images will have unique colors with simple geometric shapes then you could probably try to create a normalized Hue-histogram. Use it to train SVM classifier. The benefit of using Hue-histogram is that it will be rotational and scale invariant.
You can take the few precautions in mind:
Don't forget to remove the illumination affects.
Sometimes, White and black pixels create some problem in hue-histogram calculation so try to remove them from calculation by considering only those pixel which have S>0 and V>0 in S & V channels of HSV image.
I would rather suggest you to use the real world images because the performance is largely dependent upon training (my personal experience). And why don't you try to use SIFT/SURF descriptors for training to SVM (support vector machine) as SIFT/SURF are scale as well as rotational invariant.

A suitable workflow to detect and classify blurs in images? [duplicate]

I had asked this on photo stackexchange but thought it might be relevant here as well, since I want to implement this programatically in my implementation.
I am trying to implement a blur detection algorithm for my imaging pipeline. The blur that I want to detect is both -
1) Camera Shake: Pictures captured using hand which moves/shakes when shutter speed is less.
2) Lens focussing errors - (Depth of Field) issues, like focussing on a incorrect object causing some blur.
3) Motion blur: Fast moving objects in the scene, captured using a not high enough shutter speed. E.g. A moving car a night might show a trail of its headlight/tail light in the image as a blur.
How can one detect this blur and quantify it in some way to make some decision based on that computed 'blur metric'?
What is the theory behind blur detection?
I am looking of good reading material using which I can implement some algorithm for this in C/Matlab.
thank you.
-AD.
Motion blur and camera shake are kind of the same thing when you think about the cause: relative motion of the camera and the object. You mention slow shutter speed -- it is a culprit in both cases.
Focus misses are subjective as they depend on the intent on the photographer. Without knowing what the photographer wanted to focus on, it's impossible to achieve this. And even if you do know what you wanted to focus on, it still wouldn't be trivial.
With that dose of realism aside, let me reassure you that blur detection is actually a very active research field, and there are already a few metrics that you can try out on your images. Here are some that I've used recently:
Edge width. Basically, perform edge detection on your image (using Canny or otherwise) and then measure the width of the edges. Blurry images will have wider edges that are more spread out. Sharper images will have thinner edges. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous paper that describes this approach well enough for a full implementation. If you're dealing with motion blur, then the edges will be blurred (wide) in the direction of the motion.
Presence of fine detail. Have a look at my answer to this question (the edited part).
Frequency domain approaches. Taking the histogram of the DCT coefficients of the image (assuming you're working with JPEG) would give you an idea of how much fine detail the image has. This is how you grab the DCT coefficients from a JPEG file directly. If the count for the non-DC terms is low, it is likely that the image is blurry. This is the simplest way -- there are more sophisticated approaches in the frequency domain.
There are more, but I feel that that should be enough to get you started. If you require further info on either of those points, fire up Google Scholar and look around. In particular, check out the references of Marziliano's paper to get an idea about what has been tried in the past.
There is a great paper called : "analysis of focus measure operators for shape-from-focus" (https://www.researchgate.net/publication/234073157_Analysis_of_focus_measure_operators_in_shape-from-focus) , which does a comparison about 30 different techniques.
Out of all the different techniques, the "Laplacian" based methods seem to have the best performance. Most image processing programs like : MATLAB or OPENCV have already implemented this method . Below is an example using OpenCV : http://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/
One important point to note here is that an image can have some blurry areas and some sharp areas. For example, if an image contains portrait photography, the image in the foreground is sharp whereas the background is blurry. In sports photography, the object in focus is sharp and the background usually has motion blur. One way to detect such a spatially varying blur in an image is to run a frequency domain analysis at every location in the image. One of the papers which addresses this topic is "Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes" (cvpr2017).
the authors look at multi resolution DCT coefficients at every pixel. These DCT coefficients are divided into low, medium, and high frequency bands, out of which only the high frequency coefficients are selected.
The DCT coefficients are then fused together and sorted to form the multiscale-fused and sorted high-frequency transform coefficients
A subset of these coefficients are selected. the number of selected coefficients is a tunable parameter which is application specific.
The selected subset of coefficients are then sent through a max pooling block to retain the highest activation within all the scales. This gives the blur map as the output, which is then sent through a post processing step to refine the map.
This blur map can be used to quantify the sharpness in various regions of the image. In order to get a single global metric to quantify the bluriness of the entire image, the mean of this blur map or the histogram of this blur map can be used
Here are some examples results on how the algorithm performs:
The sharp regions in the image have a high intensity in the blur_map, whereas blurry regions have a low intensity.
The github link to the project is: https://github.com/Utkarsh-Deshmukh/Spatially-Varying-Blur-Detection-python
The python implementation of this algorithm can be found on pypi which can easily be installed as shown below:
pip install blur_detector
A sample code snippet to generate the blur map is as follows:
import blur_detector
import cv2
if __name__ == '__main__':
img = cv2.imread('image_name', 0)
blur_map = blur_detector.detectBlur(img, downsampling_factor=4, num_scales=4, scale_start=2, num_iterations_RF_filter=3)
cv2.imshow('ori_img', img)
cv2.imshow('blur_map', blur_map)
cv2.waitKey(0)
For detecting blurry images, you can tweak the approach and add "Region of Interest estimation".
In this github link: https://github.com/Utkarsh-Deshmukh/Blurry-Image-Detector , I have used local entropy filters to estimate a region of interest. In this ROI, I then use DCT coefficients as feature extractors and train a simple multi-layer perceptron. On testing this approach on 20000 images in the "BSD-B" dataset (http://cg.postech.ac.kr/research/realblur/) I got an average accuracy of 94%
Just to add on the focussing errors, these may be detected by comparing the psf of the captured blurry images (wider) with reference ones (sharper). Deconvolution techniques may help correcting them but leaving artificial errors (shadows, rippling, ...). A light field camera can help refocusing to any depth planes since it captures the angular information besides the traditional spatial ones of the scene.

How can HOG be used to detect individual body parts

Information:
I would like to use OpenCV's HOG detection to identify objects that can be seen in a variety of orientations. The only problem is, I can't seem to find a reasonable feature detector or classifier to detect this in a rotation and scale invaraint way (as is needed by objects such as forearms).
Prior Work:
Lets focus on forearms for this discussion. A forearm can have multiple orientations, the primary distinct features probably being its contour edges. It is possible to have images of forearms that are pointing in any direction in an image, thus the complexity. So far I have done some in depth research on using HOG descriptors to solve this problem, but I am finding that the variety of poses produced by forearms in my positives training set is producing very low detection scores in actual images. I suspect the issue is that the gradients produced by each positive image do not produce very consistent results when saved into the Histogram. I have reviewed many research papers on the topic trying to resolve or improvie this, including the original from Dalal & Triggs [Link]: http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf It also seems that the assumptions made for detecting whole humans do not necessary apply to detecting individual features (particularly the assumption that all humans are standing up seems to suggest HOG is not a good route for rotation invariant detection like that of forearms).
Note:
If possible, I would like to steer clear of any non-free solutions such as those pertaining to Sift, Surf, or Haar.
Question:
What is a good solution to detecting rotation and scale invariant objects in an image? Particularly for this example, what would be a good solution to detecting all orientations of forearms in an image?
I use hog to detect human heads and shoulders. To train particular part you have to give the location of it. If you use opencv, you can clip samples containing only the training part you want, and make sure all training samples share the same size. For example, I clip images to contain only head and shoulder and resize all them to 64x64. Other opensource codes may require you to pass the location as the input parameter, essentially the same.
Are you trying the Discriminatively trained deformable part model ?http://www.cs.berkeley.edu/~rbg/latent/
you may find answers there.

Detection of Blur in Images/Video sequences

I had asked this on photo stackexchange but thought it might be relevant here as well, since I want to implement this programatically in my implementation.
I am trying to implement a blur detection algorithm for my imaging pipeline. The blur that I want to detect is both -
1) Camera Shake: Pictures captured using hand which moves/shakes when shutter speed is less.
2) Lens focussing errors - (Depth of Field) issues, like focussing on a incorrect object causing some blur.
3) Motion blur: Fast moving objects in the scene, captured using a not high enough shutter speed. E.g. A moving car a night might show a trail of its headlight/tail light in the image as a blur.
How can one detect this blur and quantify it in some way to make some decision based on that computed 'blur metric'?
What is the theory behind blur detection?
I am looking of good reading material using which I can implement some algorithm for this in C/Matlab.
thank you.
-AD.
Motion blur and camera shake are kind of the same thing when you think about the cause: relative motion of the camera and the object. You mention slow shutter speed -- it is a culprit in both cases.
Focus misses are subjective as they depend on the intent on the photographer. Without knowing what the photographer wanted to focus on, it's impossible to achieve this. And even if you do know what you wanted to focus on, it still wouldn't be trivial.
With that dose of realism aside, let me reassure you that blur detection is actually a very active research field, and there are already a few metrics that you can try out on your images. Here are some that I've used recently:
Edge width. Basically, perform edge detection on your image (using Canny or otherwise) and then measure the width of the edges. Blurry images will have wider edges that are more spread out. Sharper images will have thinner edges. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous paper that describes this approach well enough for a full implementation. If you're dealing with motion blur, then the edges will be blurred (wide) in the direction of the motion.
Presence of fine detail. Have a look at my answer to this question (the edited part).
Frequency domain approaches. Taking the histogram of the DCT coefficients of the image (assuming you're working with JPEG) would give you an idea of how much fine detail the image has. This is how you grab the DCT coefficients from a JPEG file directly. If the count for the non-DC terms is low, it is likely that the image is blurry. This is the simplest way -- there are more sophisticated approaches in the frequency domain.
There are more, but I feel that that should be enough to get you started. If you require further info on either of those points, fire up Google Scholar and look around. In particular, check out the references of Marziliano's paper to get an idea about what has been tried in the past.
There is a great paper called : "analysis of focus measure operators for shape-from-focus" (https://www.researchgate.net/publication/234073157_Analysis_of_focus_measure_operators_in_shape-from-focus) , which does a comparison about 30 different techniques.
Out of all the different techniques, the "Laplacian" based methods seem to have the best performance. Most image processing programs like : MATLAB or OPENCV have already implemented this method . Below is an example using OpenCV : http://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/
One important point to note here is that an image can have some blurry areas and some sharp areas. For example, if an image contains portrait photography, the image in the foreground is sharp whereas the background is blurry. In sports photography, the object in focus is sharp and the background usually has motion blur. One way to detect such a spatially varying blur in an image is to run a frequency domain analysis at every location in the image. One of the papers which addresses this topic is "Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes" (cvpr2017).
the authors look at multi resolution DCT coefficients at every pixel. These DCT coefficients are divided into low, medium, and high frequency bands, out of which only the high frequency coefficients are selected.
The DCT coefficients are then fused together and sorted to form the multiscale-fused and sorted high-frequency transform coefficients
A subset of these coefficients are selected. the number of selected coefficients is a tunable parameter which is application specific.
The selected subset of coefficients are then sent through a max pooling block to retain the highest activation within all the scales. This gives the blur map as the output, which is then sent through a post processing step to refine the map.
This blur map can be used to quantify the sharpness in various regions of the image. In order to get a single global metric to quantify the bluriness of the entire image, the mean of this blur map or the histogram of this blur map can be used
Here are some examples results on how the algorithm performs:
The sharp regions in the image have a high intensity in the blur_map, whereas blurry regions have a low intensity.
The github link to the project is: https://github.com/Utkarsh-Deshmukh/Spatially-Varying-Blur-Detection-python
The python implementation of this algorithm can be found on pypi which can easily be installed as shown below:
pip install blur_detector
A sample code snippet to generate the blur map is as follows:
import blur_detector
import cv2
if __name__ == '__main__':
img = cv2.imread('image_name', 0)
blur_map = blur_detector.detectBlur(img, downsampling_factor=4, num_scales=4, scale_start=2, num_iterations_RF_filter=3)
cv2.imshow('ori_img', img)
cv2.imshow('blur_map', blur_map)
cv2.waitKey(0)
For detecting blurry images, you can tweak the approach and add "Region of Interest estimation".
In this github link: https://github.com/Utkarsh-Deshmukh/Blurry-Image-Detector , I have used local entropy filters to estimate a region of interest. In this ROI, I then use DCT coefficients as feature extractors and train a simple multi-layer perceptron. On testing this approach on 20000 images in the "BSD-B" dataset (http://cg.postech.ac.kr/research/realblur/) I got an average accuracy of 94%
Just to add on the focussing errors, these may be detected by comparing the psf of the captured blurry images (wider) with reference ones (sharper). Deconvolution techniques may help correcting them but leaving artificial errors (shadows, rippling, ...). A light field camera can help refocusing to any depth planes since it captures the angular information besides the traditional spatial ones of the scene.

Resources