According to the paper in the link, to reduce temporal camera noise reduction, three-frame temporal erosion is applied (Page 6, last paragraph). A brief description is replacing the image I(t) with the minimum of I(t-1), I(t) and I(t+1). I am not clear how it is implemented. Can somebody suggest me how to implement in a program?
Thanks
you should maintain some stack or list that contains the 3 latest frames I(t-1;:), I(t;:), I(t+1,:) for times (t-1,t,t+1). Then, compute the current image estimate Clean(t;x,y) for each pixel (x,y) as the minimum for the pixel along time:
Clean(t;x,y) = min{ I(t-1;x,y), I(t;x,y), I(t+1;x,y) }
This is an application of the definition of erosion (put the structuring element, take the min value of the pixels that belong to the structuring element), except that the structuring element is elongated in time.
Related
I have a basic question regarding pattern learning, or pattern representation. Assume I have a complex pattern of this form, could you please provide me with some research directions or concepts that I can follow to learn how to represent (mathematically describe) these forms of patterns? in general the pattern does not have a closed contour nor it can be represented with analytical objects like boxes, circles etc.
By mathematically describe I'm assuming you mean derive from the image a vector of values that represents the content of the image. In computer vision/image processing we call this an "image descriptor".
There are several image descriptors that could be applied to pixel based data of the form you showed, which appear to be 1 value per pixel i.e. greyscale images.
One approach is to perform "spatial gridding" where you divide the image up into a regular grid of a constant size e.g. a 4x4 grid. You then average the pixel values within each cell of the grid. Then concatenate these values to form a 16 element vector - this coarsely describes the pixel distribution of the image.
Another approach would be to use "image moments" which are 2D statistical moments. Use this equation:
where f(x,y) is they pixel value at coordinates (x,y). W and H are the image width and height. The mu_x and mu_y indicate the average x and y. The values i and j select the order of moment you want to compute. Various orders of moment can be combined in different ways for example in the "Hu moments" we can compute 7 numbers using combinations of image moments:
The cool thing about the Hu moments is you can scale, rotate, flip etc the image and you still get the same 7 values which makes this a robust ("affine invariant") image descriptor.
Hope this helps as a general direction to read more in.
The application of Konolige's block matching algorithm is not sufficiantly explained in the OpenCV documentation. The parameters of CvStereoBMState influence the accuracy of the disparities calculated by cv::StereoBM. However, those parameters are not documented. I will list those parameters below and describe, what I understand. Maybe someone can add a description of the parameters, which are unclear.
preFilterType: Determines, which filter is applied on the image before the disparities are calculated. Can be CV_STEREO_BM_XSOBEL (Sobel filter) or CV_STEREO_BM_NORMALIZED_RESPONSE (maybe differences to mean intensity???)
preFilterSize: Window size of the prefilter (width = height of the window, negative value)
preFilterCap: Clips the output to [-preFilterCap, preFilterCap]. What happens to the values outside the interval?
SADWindowSize: Size of the compared windows in the left and in the right image, where the sums of absolute differences are calculated to find corresponding pixels.
minDisparity: The smallest disparity, which is taken into account. Default is zero, should be set to a negative value, if negative disparities are possible (depends on the angle between the cameras views and the distance of the measured object to the cameras).
numberOfDisparities: The disparity search range [minDisparity, minDisparity+numberOfDisparities].
textureThreshold: Calculate the disparity only at locations, where the texture is larger than (or at least equal to?) this threshold. How is texture defined??? Variance in the surrounding window???
uniquenessRatio: Cited from calib3d.hpp: "accept the computed disparity d* only ifSAD(d) >= SAD(d*)(1 + uniquenessRatio/100.) for any d != d+/-1 within the search range."
speckleRange: Unsure.
trySmallerWindows: ???
roi1, roi2: Calculate the disparities only in these regions??? Unsure.
speckleWindowSize: Unsure.
disp12MaxDiff: Unsure, but a comment in calib3d.hpp says, that a left-right check is performed. Guess: Pixels are matched from the left image to the right image and from the right image back to the left image. The disparities are only valid, if the distance between the original left pixel and the back-matched pixel is smaller than disp12MaxDiff.
speckleWindowSize and speckleRange are parameters for the function cv::filterSpeckles. Take a look at OpenCV's documentation.
cv::filterSpeckles is used to post-process the disparity map. It replaces blobs of similar disparities (the difference of two adjacent values does not exceed speckleRange) whose size is less or equal speckleWindowSize (the number of pixels forming the blob) by the invalid disparity value (either short -16 or float -1.f).
The parameters are better described in the Python tutorial on depth map from stereo images. The parameters seem to be the same.
texture_threshold: filters out areas that don't have enough texture
for reliable matching
Speckle range and size: Block-based matchers
often produce "speckles" near the boundaries of objects, where the
matching window catches the foreground on one side and the background
on the other. In this scene it appears that the matcher is also
finding small spurious matches in the projected texture on the table.
To get rid of these artifacts we post-process the disparity image with
a speckle filter controlled by the speckle_size and speckle_range
parameters. speckle_size is the number of pixels below which a
disparity blob is dismissed as "speckle." speckle_range controls how
close in value disparities must be to be considered part of the same
blob.
Number of disparities: How many pixels to slide the window over.
The larger it is, the larger the range of visible depths, but more
computation is required.
min_disparity: the offset from the x-position
of the left pixel at which to begin searching.
uniqueness_ratio:
Another post-filtering step. If the best matching disparity is not
sufficiently better than every other disparity in the search range,
the pixel is filtered out. You can try tweaking this if
texture_threshold and the speckle filtering are still letting through
spurious matches.
prefilter_size and prefilter_cap: The pre-filtering
phase, which normalizes image brightness and enhances texture in
preparation for block matching. Normally you should not need to adjust
these.
Also check out this ROS tutorial on choosing stereo parameters.
I am looking for a "very" simple way to check if an image bitmap is blur. I do not need accurate and complicate algorithm which involves fft, wavelet, etc. Just a very simple idea even if it is not accurate.
I've thought to compute the average euclidian distance between pixel (x,y) and pixel (x+1,y) considering their RGB components and then using a threshold but it works very bad. Any other idea?
Don't calculate the average differences between adjacent pixels.
Even when a photograph is perfectly in focus, it can still contain large areas of uniform colour, like the sky for example. These will push down the average difference and mask the details you're interested in. What you really want to find is the maximum difference value.
Also, to speed things up, I wouldn't bother checking every pixel in the image. You should get reasonable results by checking along a grid of horizontal and vertical lines spaced, say, 10 pixels apart.
Here are the results of some tests with PHP's GD graphics functions using an image from Wikimedia Commons (Bokeh_Ipomea.jpg). The Sharpness values are simply the maximum pixel difference values as a percentage of 255 (I only looked in the green channel; you should probably convert to greyscale first). The numbers underneath show how long it took to process the image.
If you want them, here are the source images I used:
original
slightly blurred
blurred
Update:
There's a problem with this algorithm in that it relies on the image having a fairly high level of contrast as well as sharp focused edges. It can be improved by finding the maximum pixel difference (maxdiff), and finding the overall range of pixel values in a small area centred on this location (range). The sharpness is then calculated as follows:
sharpness = (maxdiff / (offset + range)) * (1.0 + offset / 255) * 100%
where offset is a parameter that reduces the effects of very small edges so that background noise does not affect the results significantly. (I used a value of 15.)
This produces fairly good results. Anything with a sharpness of less than 40% is probably out of focus. Here's are some examples (the locations of the maximum pixel difference and the 9×9 local search areas are also shown for reference):
(source)
(source)
(source)
(source)
The results still aren't perfect, though. Subjects that are inherently blurry will always result in a low sharpness value:
(source)
Bokeh effects can produce sharp edges from point sources of light, even when they are completely out of focus:
(source)
You commented that you want to be able to reject user-submitted photos that are out of focus. Since this technique isn't perfect, I would suggest that you instead notify the user if an image appears blurry instead of rejecting it altogether.
I suppose that, philosophically speaking, all natural images are blurry...How blurry and to which amount, is something that depends upon your application. Broadly speaking, the blurriness or sharpness of images can be measured in various ways. As a first easy attempt I would check for the energy of the image, defined as the normalised summation of the squared pixel values:
1 2
E = --- Σ I, where I the image and N the number of pixels (defined for grayscale)
N
First you may apply a Laplacian of Gaussian (LoG) filter to detect the "energetic" areas of the image and then check the energy. The blurry image should show considerably lower energy.
See an example in MATLAB using a typical grayscale lena image:
This is the original image
This is the blurry image, blurred with gaussian noise
This is the LoG image of the original
And this is the LoG image of the blurry one
If you just compute the energy of the two LoG images you get:
E = 1265 E = 88
or bl
which is a huge amount of difference...
Then you just have to select a threshold to judge which amount of energy is good for your application...
calculate the average L1-distance of adjacent pixels:
N1=1/(2*N_pixel) * sum( abs(p(x,y)-p(x-1,y)) + abs(p(x,y)-p(x,y-1)) )
then the average L2 distance:
N2= 1/(2*N_pixel) * sum( (p(x,y)-p(x-1,y))^2 + (p(x,y)-p(x,y-1))^2 )
then the ratio N2 / (N1*N1) is a measure of blurriness. This is for grayscale images, for color you do this for each channel separately.
Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.
I read on Wikipedia and see that if we need to perform spatial filtering on an image, we have to have a filter, for example 3x3, what I don't understand here is how can we choose the value for the filter? Let say that the original image is grey scale so its intensity goes from 0 to 255 (8 bits).
Another question is that if the image is 9x9, how can we apply the filter to boundary pixels of that image? If we choose to pad the image so the filter can work with all boundary pixels, what would be the value for new padded pixels?
Thank you very much
The value of the filter depends on what you want to achieve by filtering. There are a lot of filter design to perform a specific task. For example the simplest filter f=[-1 1 -1] kind of perform image derivation by performing first degree differencing on each pixel in horizontal direction (x-derivative) while f' perform the same thing in the vertical (y-derivative). The values -1,1,-1 are choose for such purpose. The same goes for 3*3 filters. In general the choose of the values come from a 2D(bi directional) designing of finite impulse response (FIR) and infinite impulse response (IIR) filters.
You should keep in mind that the value of filter operation on the boarders are not that much accurate. Filtering operation for boarder pixel are done interpolating out-of range pixel by a process called boarder interpolation.In OpenCV and similar image processing/computer vision libraries there are ways to do it. For example as the following in opencv
Various border types, image boundaries are denoted with '|'
BORDER_REPLICATE: aaaaaa|abcdefgh|hhhhhhh
BORDER_REFLECT: fedcba|abcdefgh|hgfedcb
BORDER_REFLECT_101: gfedcb|abcdefgh|gfedcba
BORDER_WRAP: cdefgh|abcdefgh|abcdefg
BORDER_CONSTANT: iiiiii|abcdefgh|iiiiiii with some specified 'i'
Thus according to you choose you pad the boarder pixels.