I want to detect regions on image, kind of clusterize it. The main purpose is to say "this is sky", "this is forest", and a few other patterns (about 10-15).
Now I have prototype for nature recognition, which uses HSV color distance.
So, sky pattern has its approximate HSV range, and for this photo it's detected such way:
But except of sky, forest, water, snow I'd like to recognize clothes, furniture etc.
How to detect sweater on this photo (it has specific texture of tiny 3-5 pixel squares):
Except of HSV, I know clustering (e.g. k-means), also worked with FFT for image convolution (they say FFT can be used for feature detection: lines, patterns).
I've heard about image descriptors (SIFT ?), but didn't work with anything related to them, so don't have intuitive understanding.
Important: This should be fast and have low-memory usage, maybe with no-ideal quality. :) When detecting features in real-time, I don't have ability to store large database of tagged images.
Which methods would you suggest using?
Thanks in advance.
Related
Given a logo image as a reference image, how to detect/recognize it in a cluttered natural image?
The logo may be quite small in the image, it can appear in clothes, hats, shoes, background wall etc. I have tried SIFT feature for matching without any other preprocessing, and the result is good for cases in which the size of the logo in images is big and the logo is clear. However, it fails for some cases where the scene is quite cluttered and the proportion of the logo size is quite small compared with the whole image. It seems that SIFT feature is sensitive to perspective distortions.
Anyone know some better features or ideas for logo detection/recognition in natural images? For example, training a classifier to locate candidate regions first, and then apply directly SIFT matching for further recognition. However, training a model needs many data, especially it needs manually annotating logo regions in images, and it needs re-training (needs to collect and annotate new image) if I want to apply it for new logos.
So, any suggestions for this? Detailed workflow/code/reference will be highly appreciated, thanks!
There are many algorithms from shape matching to haar classifiers. The best algorithm very depend on kind of logo.
If you want to continue with feature registration, i recommend:
For detection of small logos, use tiles. Split whole image to smaller (overlapping) tiles and perform usual detection. It will use "locality" of searched features.
Try ASIFT for affine invariant detection.
Use many template images for reference feature extraction, with different lightning , different background images (black, white, gray)
Context:
I have the RGB-D video from a Kinect, which is aimed straight down at a table. There is a library of around 12 objects I need to identify, alone or several at a time. I have been working with SURF extraction and detection from the RGB image, preprocessing by downscaling to 320x240, grayscale, stretching the contrast and balancing the histogram before applying SURF. I built a lasso tool to choose among detected keypoints in a still of the video image. Then those keypoints are used to build object descriptors which are used to identify objects in the live video feed.
Problem:
SURF examples show successful identification of objects with a decent amount of text-like feature detail eg. logos and patterns. The objects I need to identify are relatively plain but have distinctive geometry. The SURF features found in my stills are sometimes consistent but mostly unimportant surface features. For instance, say I have a wooden cube. SURF detects a few bits of grain on one face, then fails on other faces. I need to detect (something like) that there are four corners at equal distances and right angles. None of my objects has much of a pattern but all have distinctive symmetric geometry and color. Think cellphone, lollipop, knife, bowling pin. My thought was that I could build object descriptors for each significantly different-looking orientation of the object, eg. two descriptors for a bowling pin: one standing up and one laying down. For a cellphone, one laying on the front and one on the back. My recognizer needs rotational invariance and some degree of scale invariance in case objects are stacked. Ability to deal with some occlusion is preferable (SURF behaves well enough) but not the most important characteristic. Skew invariance would be preferable and SURF does well with paper printouts of my objects held by hand at a skew.
Questions:
Am I using the wrong SURF parameters to find features at the wrong scale? Is there a better algorithm for this kind of object identification? Is there something as readily usable as SURF that uses the depth data from the Kinect along with or instead of the RGB data?
I was doing something similar for a project, and ended up using a super simple method for object recognition, which was using OpenCV blob detection, and recognizing objects based on their areas. Obviously, there needs to be enough variance for this method to work.
You can see my results here: http://portfolio.jackkalish.com/Secondhand-Stories
I know there are other methods out there, one possible solution for you could be approxPolyDP, which is described here:
How to detect simple geometric shapes using OpenCV
Would love to hear about your progress on this!
I had asked this on photo stackexchange but thought it might be relevant here as well, since I want to implement this programatically in my implementation.
I am trying to implement a blur detection algorithm for my imaging pipeline. The blur that I want to detect is both -
1) Camera Shake: Pictures captured using hand which moves/shakes when shutter speed is less.
2) Lens focussing errors - (Depth of Field) issues, like focussing on a incorrect object causing some blur.
3) Motion blur: Fast moving objects in the scene, captured using a not high enough shutter speed. E.g. A moving car a night might show a trail of its headlight/tail light in the image as a blur.
How can one detect this blur and quantify it in some way to make some decision based on that computed 'blur metric'?
What is the theory behind blur detection?
I am looking of good reading material using which I can implement some algorithm for this in C/Matlab.
thank you.
-AD.
Motion blur and camera shake are kind of the same thing when you think about the cause: relative motion of the camera and the object. You mention slow shutter speed -- it is a culprit in both cases.
Focus misses are subjective as they depend on the intent on the photographer. Without knowing what the photographer wanted to focus on, it's impossible to achieve this. And even if you do know what you wanted to focus on, it still wouldn't be trivial.
With that dose of realism aside, let me reassure you that blur detection is actually a very active research field, and there are already a few metrics that you can try out on your images. Here are some that I've used recently:
Edge width. Basically, perform edge detection on your image (using Canny or otherwise) and then measure the width of the edges. Blurry images will have wider edges that are more spread out. Sharper images will have thinner edges. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous paper that describes this approach well enough for a full implementation. If you're dealing with motion blur, then the edges will be blurred (wide) in the direction of the motion.
Presence of fine detail. Have a look at my answer to this question (the edited part).
Frequency domain approaches. Taking the histogram of the DCT coefficients of the image (assuming you're working with JPEG) would give you an idea of how much fine detail the image has. This is how you grab the DCT coefficients from a JPEG file directly. If the count for the non-DC terms is low, it is likely that the image is blurry. This is the simplest way -- there are more sophisticated approaches in the frequency domain.
There are more, but I feel that that should be enough to get you started. If you require further info on either of those points, fire up Google Scholar and look around. In particular, check out the references of Marziliano's paper to get an idea about what has been tried in the past.
There is a great paper called : "analysis of focus measure operators for shape-from-focus" (https://www.researchgate.net/publication/234073157_Analysis_of_focus_measure_operators_in_shape-from-focus) , which does a comparison about 30 different techniques.
Out of all the different techniques, the "Laplacian" based methods seem to have the best performance. Most image processing programs like : MATLAB or OPENCV have already implemented this method . Below is an example using OpenCV : http://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/
One important point to note here is that an image can have some blurry areas and some sharp areas. For example, if an image contains portrait photography, the image in the foreground is sharp whereas the background is blurry. In sports photography, the object in focus is sharp and the background usually has motion blur. One way to detect such a spatially varying blur in an image is to run a frequency domain analysis at every location in the image. One of the papers which addresses this topic is "Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes" (cvpr2017).
the authors look at multi resolution DCT coefficients at every pixel. These DCT coefficients are divided into low, medium, and high frequency bands, out of which only the high frequency coefficients are selected.
The DCT coefficients are then fused together and sorted to form the multiscale-fused and sorted high-frequency transform coefficients
A subset of these coefficients are selected. the number of selected coefficients is a tunable parameter which is application specific.
The selected subset of coefficients are then sent through a max pooling block to retain the highest activation within all the scales. This gives the blur map as the output, which is then sent through a post processing step to refine the map.
This blur map can be used to quantify the sharpness in various regions of the image. In order to get a single global metric to quantify the bluriness of the entire image, the mean of this blur map or the histogram of this blur map can be used
Here are some examples results on how the algorithm performs:
The sharp regions in the image have a high intensity in the blur_map, whereas blurry regions have a low intensity.
The github link to the project is: https://github.com/Utkarsh-Deshmukh/Spatially-Varying-Blur-Detection-python
The python implementation of this algorithm can be found on pypi which can easily be installed as shown below:
pip install blur_detector
A sample code snippet to generate the blur map is as follows:
import blur_detector
import cv2
if __name__ == '__main__':
img = cv2.imread('image_name', 0)
blur_map = blur_detector.detectBlur(img, downsampling_factor=4, num_scales=4, scale_start=2, num_iterations_RF_filter=3)
cv2.imshow('ori_img', img)
cv2.imshow('blur_map', blur_map)
cv2.waitKey(0)
For detecting blurry images, you can tweak the approach and add "Region of Interest estimation".
In this github link: https://github.com/Utkarsh-Deshmukh/Blurry-Image-Detector , I have used local entropy filters to estimate a region of interest. In this ROI, I then use DCT coefficients as feature extractors and train a simple multi-layer perceptron. On testing this approach on 20000 images in the "BSD-B" dataset (http://cg.postech.ac.kr/research/realblur/) I got an average accuracy of 94%
Just to add on the focussing errors, these may be detected by comparing the psf of the captured blurry images (wider) with reference ones (sharper). Deconvolution techniques may help correcting them but leaving artificial errors (shadows, rippling, ...). A light field camera can help refocusing to any depth planes since it captures the angular information besides the traditional spatial ones of the scene.
I had asked this on photo stackexchange but thought it might be relevant here as well, since I want to implement this programatically in my implementation.
I am trying to implement a blur detection algorithm for my imaging pipeline. The blur that I want to detect is both -
1) Camera Shake: Pictures captured using hand which moves/shakes when shutter speed is less.
2) Lens focussing errors - (Depth of Field) issues, like focussing on a incorrect object causing some blur.
3) Motion blur: Fast moving objects in the scene, captured using a not high enough shutter speed. E.g. A moving car a night might show a trail of its headlight/tail light in the image as a blur.
How can one detect this blur and quantify it in some way to make some decision based on that computed 'blur metric'?
What is the theory behind blur detection?
I am looking of good reading material using which I can implement some algorithm for this in C/Matlab.
thank you.
-AD.
Motion blur and camera shake are kind of the same thing when you think about the cause: relative motion of the camera and the object. You mention slow shutter speed -- it is a culprit in both cases.
Focus misses are subjective as they depend on the intent on the photographer. Without knowing what the photographer wanted to focus on, it's impossible to achieve this. And even if you do know what you wanted to focus on, it still wouldn't be trivial.
With that dose of realism aside, let me reassure you that blur detection is actually a very active research field, and there are already a few metrics that you can try out on your images. Here are some that I've used recently:
Edge width. Basically, perform edge detection on your image (using Canny or otherwise) and then measure the width of the edges. Blurry images will have wider edges that are more spread out. Sharper images will have thinner edges. Google for "A no-reference perceptual blur metric" by Marziliano -- it's a famous paper that describes this approach well enough for a full implementation. If you're dealing with motion blur, then the edges will be blurred (wide) in the direction of the motion.
Presence of fine detail. Have a look at my answer to this question (the edited part).
Frequency domain approaches. Taking the histogram of the DCT coefficients of the image (assuming you're working with JPEG) would give you an idea of how much fine detail the image has. This is how you grab the DCT coefficients from a JPEG file directly. If the count for the non-DC terms is low, it is likely that the image is blurry. This is the simplest way -- there are more sophisticated approaches in the frequency domain.
There are more, but I feel that that should be enough to get you started. If you require further info on either of those points, fire up Google Scholar and look around. In particular, check out the references of Marziliano's paper to get an idea about what has been tried in the past.
There is a great paper called : "analysis of focus measure operators for shape-from-focus" (https://www.researchgate.net/publication/234073157_Analysis_of_focus_measure_operators_in_shape-from-focus) , which does a comparison about 30 different techniques.
Out of all the different techniques, the "Laplacian" based methods seem to have the best performance. Most image processing programs like : MATLAB or OPENCV have already implemented this method . Below is an example using OpenCV : http://www.pyimagesearch.com/2015/09/07/blur-detection-with-opencv/
One important point to note here is that an image can have some blurry areas and some sharp areas. For example, if an image contains portrait photography, the image in the foreground is sharp whereas the background is blurry. In sports photography, the object in focus is sharp and the background usually has motion blur. One way to detect such a spatially varying blur in an image is to run a frequency domain analysis at every location in the image. One of the papers which addresses this topic is "Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes" (cvpr2017).
the authors look at multi resolution DCT coefficients at every pixel. These DCT coefficients are divided into low, medium, and high frequency bands, out of which only the high frequency coefficients are selected.
The DCT coefficients are then fused together and sorted to form the multiscale-fused and sorted high-frequency transform coefficients
A subset of these coefficients are selected. the number of selected coefficients is a tunable parameter which is application specific.
The selected subset of coefficients are then sent through a max pooling block to retain the highest activation within all the scales. This gives the blur map as the output, which is then sent through a post processing step to refine the map.
This blur map can be used to quantify the sharpness in various regions of the image. In order to get a single global metric to quantify the bluriness of the entire image, the mean of this blur map or the histogram of this blur map can be used
Here are some examples results on how the algorithm performs:
The sharp regions in the image have a high intensity in the blur_map, whereas blurry regions have a low intensity.
The github link to the project is: https://github.com/Utkarsh-Deshmukh/Spatially-Varying-Blur-Detection-python
The python implementation of this algorithm can be found on pypi which can easily be installed as shown below:
pip install blur_detector
A sample code snippet to generate the blur map is as follows:
import blur_detector
import cv2
if __name__ == '__main__':
img = cv2.imread('image_name', 0)
blur_map = blur_detector.detectBlur(img, downsampling_factor=4, num_scales=4, scale_start=2, num_iterations_RF_filter=3)
cv2.imshow('ori_img', img)
cv2.imshow('blur_map', blur_map)
cv2.waitKey(0)
For detecting blurry images, you can tweak the approach and add "Region of Interest estimation".
In this github link: https://github.com/Utkarsh-Deshmukh/Blurry-Image-Detector , I have used local entropy filters to estimate a region of interest. In this ROI, I then use DCT coefficients as feature extractors and train a simple multi-layer perceptron. On testing this approach on 20000 images in the "BSD-B" dataset (http://cg.postech.ac.kr/research/realblur/) I got an average accuracy of 94%
Just to add on the focussing errors, these may be detected by comparing the psf of the captured blurry images (wider) with reference ones (sharper). Deconvolution techniques may help correcting them but leaving artificial errors (shadows, rippling, ...). A light field camera can help refocusing to any depth planes since it captures the angular information besides the traditional spatial ones of the scene.
I am currently helping a friend working on a geo-physical project, I'm not by any means a image processing pro, but its fun to play
around with these kinds of problems. =)
The aim is to estimate the height of small rocks sticking out of water, from surface to top.
The experimental equipment will be a ~10MP camera mounted on a distance meter with a built in laser pointer.
The "operator" will point this at a rock, press a trigger which will register a distance along of a photo of the rock, which
will be in the center of the image.
The eqipment can be assumed to always be held at a fixed distance above the water.
As I see it there are a number of problems to overcome:
Lighting conditions
Depending on the time of day etc., the rock might be brighter then the water or opposite.
Sometimes the rock will have a color very close to the water.
The position of the shade will move throughout the day.
Depending on how rough the water is, there might sometimes be a reflection of the rock in the water.
Diversity
The rock is not evenly shaped.
Depending on the rock type, growth of lichen etc., changes the look of the rock.
Fortunateness, there is no shortage of test data. Pictures of rocks in water is easy to come by. Here are some sample images:
I've run a edge detector on the images, and esp. in the fourth picture the poor contrast makes it hard to see the edges:
Any ideas would be greatly appreciated!
I don't think that edge detection is best approach to detect the rocks. Other objects, like the mountains or even the reflections in the water will result in edges.
I suggest that you try a pixel classification approach to segment the rocks from the background of the image:
For each pixel in the image, extract a set of image descriptors from a NxN neighborhood centered at that pixel.
Select a set of images and manually label the pixels as rock or background.
Use the labeled pixels and the respective image descriptors to train a classifier (eg. a Naive Bayes classifier)
Since the rocks tends to have similar texture, I would use texture image descriptors to train the classifier. You could try, for example, to extract a few statistical measures from each color chanel (R,G,B) like the mean and standard deviation of the intensity values.
Pixel classification might work here, but will never yield a 100% accuracy. The variance in the data is really big, rocks have different colours (which are also "corrupted" with lighting) and different texture. So, one must account for global information as well.
The problem you deal with is foreground extraction. There are two approaches I am aware of.
Energy minimization via graph cuts, see e.g. http://en.wikipedia.org/wiki/GrabCut (there are links to the paper and OpenCV implementation). Some initialization ("seeds") should be done (either by a user or by some prior knowledge like the rock is in the center while water is on the periphery). Another variant of input is an approximate bounding rectangle. It is implemented in MS Office 2010 foreground extraction tool.
The energy function of possible foreground/background labellings enforces foreground to be similar to the foreground seeds, and a smooth boundary. So, the minimum of the energy corresponds to the good foreground mask. Note that with pixel classification approach one should pre-label a lot of images to learn from, then segmentation is done automatically, while with this approach one should select seeds on each query image (or they are chosen implicitly).
Active contours a.k.a. snakes also requre some user interaction. They are more like Photoshop Magic Wand tool. They also try to find a smooth boundary, but do not consider the inner area.
Both methods might have problems with the reflections (pixel classification will definitely have). If it is the case, you may try to find an approximate vertical symmetry, and delete the lower part, if any. You can also ask a user to mark the reflaction as a background while collecting stats for graph cuts.
Color segmentation to find the rock, together with edge detection to find the top.
To find the water level I would try and find all the water-rock boundaries, and the horizon (if possible) then fit a plane to the surface of the water.
That way you don't need to worry about reflections of the rock.
Easier if you know the pitch angle between the camera and the water and if the camera is is leveled horizontally (roll).
ps. This is a lot harder than I thought - you don't know the distance to all the rocks so fitting a plane is difficult.
It occurs that the reflection is actually the ideal way of finding the level, look for symetric path edges in the rock edge detection and pick the vertex?