Different derivatives in image space and frequency space - image-processing

I'm trying to compare between an image derivative obtained in the image space and the one obtained in the frequency space.
What I'm actually comparing is the magnitude of derivatives.
This is the process for image space:
calculate the x-derivative in image space by convolving the image with [1 -1] and get convDerX.
calculate the y-derivative in image space by convolving the image with [1;-1] and get convDerY.
let convMagnitude = sqrt(convDerX.^2 + convDerY.^2)
This is the process for frequency space:
calculate the Fourier transform of the image and derive by x, then invert back to image space to get fourDerX.
calculate the Fourier transform of the image and derive by y, then invert back to image space to get fourDerY.
let fourMagnitude = sqrt(fourDerX.^2 + fourDerY.^2)
The two magnitudes are very similar, and show the edges, but they're not identical.
Why does this happen? Is it just due to discretization errors or is there something deeper here?

In frequency space point value may depend on very image-distant points. So when you do such kind of FT manipulations it is like using multi-points scheme in image space, and for image-space convolution derivatives it is uncommon due to numerical instability. It is a stability-precision trade-off, so for typical non-rocket-science-problems 1st order of precision [-1,1] scheme or 2nd order [-0.5, 0, 0.5] is enough. Sure result for different schemes will differ a bit, as soon as we are in discrete world.

Related

Workflow to clean badly scanned sheet music

I am looking for a workflow that would clean (and possibly straighten) old and badly scanned images of musical scores (like the one below).
I've tried to use denoise, hough filters, imagemagick geometry filters, and I am struggling to identify the series of filters that remove the scanner noise/bias.
Just some quick ideas:
Remove grayscale noise: Do a low pass filter (darks), since the music is darker than a lot of the noise. Remaining noise is mostly vertical lines.
Rotate image: Sum grayscale values for each column of the image. You'll get a vector with the total pixel lightness in that column. Use gradient descent or search on the rotation of the image (within some bounds like +/-15deg rotation) to maximize the variance of that vector. Idea here is that the vertical noise lines indicate vertical alignment, and so we want the columns of the image to align with these noise lines (= maximized variance).
Remove vertical line noise: After rotation, take median value of each column. The greater the distance (squared difference) a pixel is from that median darkness, the more confident we are it is its true color (e.g. a pure white or black pixel when vertical noise was gray). Since noise is non-white, you could try blending this distance by the whiteness of the median for an alternative confidence metric. Ideally, I think here you'd train some 7x7x2 convolution filter (2 channels being pixel value and distance from median) to estimate true value of the pixel. That would be the most minimal machine learning approach, not using some full-fledged NN. However, given your lack of training data, we'll have to come up with our own heuristic for what the true pixel value is. You likely will need to play around with it, but here's what I think might work:
Set some threshold of confidence; above that threshold we take the value as is. Below the threshold, set to white (the binary expected pixel value for the entire page).
For all values below threshold, take the max confidence value within a +/-2 pixels L1 distance (e.g. 5x5 convolution) as that pixel's value. Seems like features are separated by at least 2 pixels, but for lower resolutions that window size may need to be adjusted. Since white pixels may end up being more confident overall, you could experiment with prioritizing darker pixels (increase their confidence somehow).
Clamp the image contrast and maybe run another low pass filter.

How to transform filter when using FFT to do 2d convolution?

I want to use FFT to accelerate 2D convolution. The filter is 15 x 15 and the image is 300 x 300. The filter's size is different with image so I can not doing dot product after FFT. So how to transform the filter before doing FFT so that its size can be matched with image?
I use the convention that N is kernel size.
Knowing the convolution is not defined (mathematically) on the edges (N//2 at each end of each dimension), you would loose N pixels in totals on each axis.
You need to make room for convolution : pad the image with enough "neutral values" so that the edge cases (junk values inserted there) disappear.
This would involve making your image a 307x307px image (with suitable padding values, see next paragraph), which after convolution gives back a 300x300 image.
Popular image processing libraries have this already embedded : when you ask for a convolution, you have extra arguments specifying the "mode".
Which values can we pad with ?
Stolen with no shame from Numpy's pad documentation
'constant' : Pads with a constant value.
'edge' : Pads with the edge values of array.
'linear_ramp' : Pads with the linear ramp between end_value and the arraydge value.
'maximum' :
Pads with the maximum value of all or part of the
vector along each axis.
'mean'
Pads with the mean value of all or part of the
vector along each axis.
'median'
Pads with the median value of all or part of the
vector along each axis.
'minimum'
Pads with the minimum value of all or part of the
vector along each axis.
'reflect'
Pads with the reflection of the vector mirrored on
the first and last values of the vector along each
axis.
'symmetric'
Pads with the reflection of the vector mirrored
along the edge of the array.
'wrap'
Pads with the wrap of the vector along the axis.
The first values are used to pad the end and the
end values are used to pad the beginning.
It's up to you, really, but the rule of thumb is "choose neutral values for the task at hand".
(For instance, padding with 0 when doing averaging makes little sense, because 0 is not neutral in an average of positive values)
it depends on the algorithm you use for the FFT, because most of them need to work with images of dyadic dimensions (power of 2).
Here is what you have to do:
Padding image: center your image into a bigger one with dyadic dimensions
Padding kernel: center you convolution kernel into an image with same dimensions as step 1.
FFT on the image from step 1
FFT on the kernel from step 2
Complex multiplication (Fourier space) of results from steps 3 and 4.
Inverse FFT on the resulting image on step 5
Unpadding on the resulting image from step 6
Put all 4 blocs into the right order.
If the algorithm you use does not need dyadic dimensions, then steps 1 is useless and 2 has to be a simple padding with the image dimensions.

Simple way to check if an image bitmap is blur

I am looking for a "very" simple way to check if an image bitmap is blur. I do not need accurate and complicate algorithm which involves fft, wavelet, etc. Just a very simple idea even if it is not accurate.
I've thought to compute the average euclidian distance between pixel (x,y) and pixel (x+1,y) considering their RGB components and then using a threshold but it works very bad. Any other idea?
Don't calculate the average differences between adjacent pixels.
Even when a photograph is perfectly in focus, it can still contain large areas of uniform colour, like the sky for example. These will push down the average difference and mask the details you're interested in. What you really want to find is the maximum difference value.
Also, to speed things up, I wouldn't bother checking every pixel in the image. You should get reasonable results by checking along a grid of horizontal and vertical lines spaced, say, 10 pixels apart.
Here are the results of some tests with PHP's GD graphics functions using an image from Wikimedia Commons (Bokeh_Ipomea.jpg). The Sharpness values are simply the maximum pixel difference values as a percentage of 255 (I only looked in the green channel; you should probably convert to greyscale first). The numbers underneath show how long it took to process the image.
If you want them, here are the source images I used:
original
slightly blurred
blurred
Update:
There's a problem with this algorithm in that it relies on the image having a fairly high level of contrast as well as sharp focused edges. It can be improved by finding the maximum pixel difference (maxdiff), and finding the overall range of pixel values in a small area centred on this location (range). The sharpness is then calculated as follows:
sharpness = (maxdiff / (offset + range)) * (1.0 + offset / 255) * 100%
where offset is a parameter that reduces the effects of very small edges so that background noise does not affect the results significantly. (I used a value of 15.)
This produces fairly good results. Anything with a sharpness of less than 40% is probably out of focus. Here's are some examples (the locations of the maximum pixel difference and the 9×9 local search areas are also shown for reference):
(source)
(source)
(source)
(source)
The results still aren't perfect, though. Subjects that are inherently blurry will always result in a low sharpness value:
(source)
Bokeh effects can produce sharp edges from point sources of light, even when they are completely out of focus:
(source)
You commented that you want to be able to reject user-submitted photos that are out of focus. Since this technique isn't perfect, I would suggest that you instead notify the user if an image appears blurry instead of rejecting it altogether.
I suppose that, philosophically speaking, all natural images are blurry...How blurry and to which amount, is something that depends upon your application. Broadly speaking, the blurriness or sharpness of images can be measured in various ways. As a first easy attempt I would check for the energy of the image, defined as the normalised summation of the squared pixel values:
1 2
E = --- Σ I, where I the image and N the number of pixels (defined for grayscale)
N
First you may apply a Laplacian of Gaussian (LoG) filter to detect the "energetic" areas of the image and then check the energy. The blurry image should show considerably lower energy.
See an example in MATLAB using a typical grayscale lena image:
This is the original image
This is the blurry image, blurred with gaussian noise
This is the LoG image of the original
And this is the LoG image of the blurry one
If you just compute the energy of the two LoG images you get:
E = 1265 E = 88
or bl
which is a huge amount of difference...
Then you just have to select a threshold to judge which amount of energy is good for your application...
calculate the average L1-distance of adjacent pixels:
N1=1/(2*N_pixel) * sum( abs(p(x,y)-p(x-1,y)) + abs(p(x,y)-p(x,y-1)) )
then the average L2 distance:
N2= 1/(2*N_pixel) * sum( (p(x,y)-p(x-1,y))^2 + (p(x,y)-p(x,y-1))^2 )
then the ratio N2 / (N1*N1) is a measure of blurriness. This is for grayscale images, for color you do this for each channel separately.

How to apply box filter on integral image? (SURF)

Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.

Optimal sigma for Gaussian filtering of an image?

When applying a Gaussian blur to an image, typically the sigma is a parameter (examples include Matlab and ImageJ).
How does one know what sigma should be? Is there a mathematical way to figure out an optimal sigma? In my case, i have some objects in images that are bright compared to the background, and I need to find them computationally. I am going to apply a Gaussian filter to make the center of these objects even brighter, which hopefully facilitates finding them. How can I determine the optimal sigma for this?
There's no formula to determine it for you; the optimal sigma will depend on image factors - primarily the resolution of the image and the size of your objects in it (in pixels).
Also, note that Gaussian filters aren't actually meant to brighten anything; you might want to look into contrast maximization techniques - sounds like something as simple as histogram stretching could work well for you.
edit: More explanation - sigma basically controls how "fat" your kernel function is going to be; higher sigma values blur over a wider radius. Since you're working with images, bigger sigma also forces you to use a larger kernel matrix to capture enough of the function's energy. For your specific case, you want your kernel to be big enough to cover most of the object (so that it's blurred enough), but not so large that it starts overlapping multiple neighboring objects at a time - so actually, object separation is also a factor along with size.
Since you mentioned MATLAB - you can take a look at various gaussian kernels with different parameters using the fspecial('gaussian', hsize, sigma) function, where hsize is the size of the kernel and sigma is, well, sigma. Try varying the parameters to see how it changes.
I use this convention as a rule of thumb. If k is the size of kernel than sigma=(k-1)/6 . This is because the length for 99 percentile of gaussian pdf is 6sigma.
You have to find a min/max of a function G such that G(X,sigma) where X is a set of your observations (in your case, your image grayscale values) , This function can be anything that maintain the "order" of the intensities of the iamge, for example, this can be done with the 1st derivative of the image (as G),
fil = fspecial('sobel');
im = imfilter(I,fil);
imagesc(im);
colormap = gray;
this gives you the result of first derivative of an image, now you want to find max sigma by
maximzing G(X,sigma), that means that you are trying a few sigmas (let say, in increasing order) until you reach a sigma that makes G maximal. This can also be done with second derivative.
Given the central value of the kernel equals 1 the dimension that guarantees to have the outermost value less than a limit (e.g 1/100) is as follows:
double limit = 1.0 / 100.0;
size = static_cast<int>(2 * std::ceil(sqrt(-2.0 * sigma * sigma * log(limit))));
if (size % 2 == 0)
{
size++;
}

Resources