How to transform filter when using FFT to do 2d convolution? - image-processing

I want to use FFT to accelerate 2D convolution. The filter is 15 x 15 and the image is 300 x 300. The filter's size is different with image so I can not doing dot product after FFT. So how to transform the filter before doing FFT so that its size can be matched with image?

I use the convention that N is kernel size.
Knowing the convolution is not defined (mathematically) on the edges (N//2 at each end of each dimension), you would loose N pixels in totals on each axis.
You need to make room for convolution : pad the image with enough "neutral values" so that the edge cases (junk values inserted there) disappear.
This would involve making your image a 307x307px image (with suitable padding values, see next paragraph), which after convolution gives back a 300x300 image.
Popular image processing libraries have this already embedded : when you ask for a convolution, you have extra arguments specifying the "mode".
Which values can we pad with ?
Stolen with no shame from Numpy's pad documentation
'constant' : Pads with a constant value.
'edge' : Pads with the edge values of array.
'linear_ramp' : Pads with the linear ramp between end_value and the arraydge value.
'maximum' :
Pads with the maximum value of all or part of the
vector along each axis.
'mean'
Pads with the mean value of all or part of the
vector along each axis.
'median'
Pads with the median value of all or part of the
vector along each axis.
'minimum'
Pads with the minimum value of all or part of the
vector along each axis.
'reflect'
Pads with the reflection of the vector mirrored on
the first and last values of the vector along each
axis.
'symmetric'
Pads with the reflection of the vector mirrored
along the edge of the array.
'wrap'
Pads with the wrap of the vector along the axis.
The first values are used to pad the end and the
end values are used to pad the beginning.
It's up to you, really, but the rule of thumb is "choose neutral values for the task at hand".
(For instance, padding with 0 when doing averaging makes little sense, because 0 is not neutral in an average of positive values)

it depends on the algorithm you use for the FFT, because most of them need to work with images of dyadic dimensions (power of 2).
Here is what you have to do:
Padding image: center your image into a bigger one with dyadic dimensions
Padding kernel: center you convolution kernel into an image with same dimensions as step 1.
FFT on the image from step 1
FFT on the kernel from step 2
Complex multiplication (Fourier space) of results from steps 3 and 4.
Inverse FFT on the resulting image on step 5
Unpadding on the resulting image from step 6
Put all 4 blocs into the right order.
If the algorithm you use does not need dyadic dimensions, then steps 1 is useless and 2 has to be a simple padding with the image dimensions.

Related

Hessian matrix, how to combine Ixx & Iyy together?

"Before extracting the lines, you need to detect potential points on them. Apply a Gaussian filter first and use the Sobel filters as derivative operators. Threshold the determinant of the Hessian and then apply non-maximum suppression in 3 × 3 neighborhoods. Ignore pixels for which any of the filters falls even partially out of the image boundaries."
I understand to gaussian an image first to eliminate noise, then take twice with Sobel_x and Sobel_y, respectively, which became Ixx and Iyy in Hessian that would show horizontal line and vertical line in image.But how am I suppose to get Ixxyy? But how could I combine these two image together to make Ixxyy as the right bottom in Hessian matrix?
The two off-diagonal elements of the Hessian matrix are d^2/dxdy. That is, they are the first derivative along y applied to the first derivative along x.
If the top-left element is obtained by Sobel_x( Sobel_x( image )), and the bottom-right element is Sobel_y( Sobel_y( image )), then the two other elements are both Sobel_y( Sobel_x( image )) or, equivalently, Sobel_x( Sobel_y( image )) (note that these two should be identical).
Do take into account that negative values are important here, and you should thus be careful to compute the Sobel filter in a way that preserves those negative values—don't store them in an unsigned integer array!

How do you handle negative pixel values after filtering?

I have a 8-bit image and I want to filter it with a matrix for edge detection. My kernel matrix is
0 1 0
1 -4 1
0 1 0
For some indices it gives me a negative value. What am I supposed to with them?
Your kernel is a Laplace filter. Applying it to an image yields a finite difference approximation to the Laplacian operator. The Laplace operator is not an edge detector by itself.
But you can use it as a building block for an edge detector: you need to detect the zero crossings to find edges (this is the Marr-Hildreth edge detector). To find zero crossings, you need to have negative values.
You can also use the Laplace filtered image to sharpen your image. If you subtract it from the original image, the result will be an image with sharper edges and a much crisper feel. For this, negative values are important too.
For both these applications, clamping the result of the operation, as suggested in the other answer, is wrong. That clamping sets all negative values to 0. This means there are no more zero crossings to find, so you can't find edges, and for the sharpening it means that one side of each edge will not be sharpened.
So, the best thing to do with the result of the Laplace filter is preserve the values as they are. Use a signed 16-bit integer type to store your results (I actually prefer using floating-point types, it simplifies a lot of things).
On the other hand, if you want to display the result of the Laplace filter to a screen, you will have to do something sensical with the pixel values. Common in this case is to add 128 to each pixel. This shifts the zero to a mid-grey value, shows negative values as darker, and positive values as lighter. After adding 128, values above 255 and below 0 can be clipped. You can also further stretch the values if you want to avoid clipping, for example laplace / 2 + 128.
Out of range values are extremely common in JPEG. One handles them by clamping.
If X < 0 then X := 0 ;
If X > 255 then X := 255 ;

Reconstruct image from eigenvectors obtained from solving the eigenfunction of Hamiltonian operator in matrix form

I have an Image I
I am trying to do Automatic Object Extraction using Quantum Mechanics
Each pixel in an image is considered as a potential field, V(x,y) and hence each wave (eigen) function represents a meaningful region.
2D Time-independent Sschrodinger's equation
Multiplying both sides by
We get,
Rewriting the Laplacian using Finite Difference approach
where Ni is the set of neighbours with index i, and |Ni| is the cardinality of, i.e. the number of elements in Ni
Combining the above two equations, we get:
where M is the number of elements in
Now,the left hand side of the equation is a measure of how similar the labels in a neighbourhood are, i.e. a measure of spatial coherence.
Now, for applying this to images, the potential V is given as the pixel intensities.
Here, V is the pixel intensities
The right hand side is a measure of how close the pixel values in a segment are to a constant value E.
Now, the wave functions can be numerically calculated by solving the eigenvectors of Hamiltonian operator in matrix form which is
for i = j
for
and elsewhere 0
Now, in this paper it is said that first we have to find the maximum and minimum eigenvalues and then calculate the eigenvectors with eigenvalues closest to a number of values regularly selected between the minimum and maximum eigenvalues. the number is 300.
I have calculated the 300 eigenvectors.
And then the absolute square of the eigenvectors are thresholded to obtain the segments.
Fine upto this part.
Now, how do I reconstruct the eigenvectors into a 2D image so as to get the potential segments in the image?

Different derivatives in image space and frequency space

I'm trying to compare between an image derivative obtained in the image space and the one obtained in the frequency space.
What I'm actually comparing is the magnitude of derivatives.
This is the process for image space:
calculate the x-derivative in image space by convolving the image with [1 -1] and get convDerX.
calculate the y-derivative in image space by convolving the image with [1;-1] and get convDerY.
let convMagnitude = sqrt(convDerX.^2 + convDerY.^2)
This is the process for frequency space:
calculate the Fourier transform of the image and derive by x, then invert back to image space to get fourDerX.
calculate the Fourier transform of the image and derive by y, then invert back to image space to get fourDerY.
let fourMagnitude = sqrt(fourDerX.^2 + fourDerY.^2)
The two magnitudes are very similar, and show the edges, but they're not identical.
Why does this happen? Is it just due to discretization errors or is there something deeper here?
In frequency space point value may depend on very image-distant points. So when you do such kind of FT manipulations it is like using multi-points scheme in image space, and for image-space convolution derivatives it is uncommon due to numerical instability. It is a stability-precision trade-off, so for typical non-rocket-science-problems 1st order of precision [-1,1] scheme or 2nd order [-0.5, 0, 0.5] is enough. Sure result for different schemes will differ a bit, as soon as we are in discrete world.

How to apply box filter on integral image? (SURF)

Assuming that I have a grayscale (8-bit) image and assume that I have an integral image created from that same image.
Image resolution is 720x576. According to SURF algorithm, each octave is composed of 4 box filters, which are defined by the number of pixels on their side. The
first octave uses filters with 9x9, 15x15, 21x21 and 27x27 pixels. The
second octave uses filters with 15x15, 27x27, 39x39 and 51x51 pixels.The third octave uses filters with 27x27, 51x51, 75x75 and 99x99 pixels. If the image is sufficiently large and I guess 720x576 is big enough (right??!!), a fourth octave is added, 51x51, 99x99, 147x147 and 195x195. These
octaves partially overlap one another to improve the quality of the interpolated results.
// so, we have:
//
// 9x9 15x15 21x21 27x27
// 15x15 27x27 39x39 51x51
// 27x27 51x51 75x75 99x99
// 51x51 99x99 147x147 195x195
The questions are:What are the values in each of these filters? Should I hardcode these values, or should I calculate them? How exactly (numerically) to apply filters to the integral image?
Also, for calculating the Hessian determinant I found two approximations:
det(HessianApprox) = DxxDyy − (0.9Dxy)^2 anddet(HessianApprox) = DxxDyy − (0.81Dxy)^2Which one is correct?
(Dxx, Dyy, and Dxy are Gaussian second order derivatives).
I had to go back to the original paper to find the precise answers to your questions.
Some background first
SURF leverages a common Image Analysis approach for regions-of-interest detection that is called blob detection.
The typical approach for blob detection is a difference of Gaussians.
There are several reasons for this, the first one being to mimic what happens in the visual cortex of the human brains.
The drawback to difference of Gaussians (DoG) is the computation time that is too expensive to be applied to large image areas.
In order to bypass this issue, SURF takes a simple approach. A DoG is simply the computation of two Gaussian averages (or equivalently, apply a Gaussian blur) followed by taking their difference.
A quick-and-dirty approximation (not so dirty for small regions) is to approximate the Gaussian blur by a box blur.
A box blur is the average value of all the images values in a given rectangle. It can be computed efficiently via integral images.
Using integral images
Inside an integral image, each pixel value is the sum of all the pixels that were above it and on its left in the original image.
The top-left pixel value in the integral image is thus 0, and the bottom-rightmost pixel of the integral image has thus the sum of all the original pixels for value.
Then, you just need to remark that the box blur is equal to the sum of all the pixels inside a given rectangle (not originating in the top-lefmost pixel of the image) and apply the following simple geometric reasoning.
If you have a rectangle with corners ABCD (top left, top right, bottom left, bottom right), then the value of the box filter is given by:
boxFilter(ABCD) = A + D - B - C,
where A, B, C, D is a shortcut for IntegralImagePixelAt(A) (B, C, D respectively).
Integral images in SURF
SURF is not using box blurs of sizes 9x9, etc. directly.
What it uses instead is several orders of Gaussian derivatives, or Haar-like features.
Let's take an example. Suppose you are to compute the 9x9 filters output. This corresponds to a given sigma, hence a fixed scale/octave.
The sigma being fixed, you center your 9x9 window on the pixel of interest. Then, you compute the output of the 2nd order Gaussian derivative in each direction (horizontal, vertical, diagonal). The Fig. 1 in the paper gives you an illustration of the vertical and diagonal filters.
The Hessian determinant
There is a factor to take into account the scale differences. Let's believe the paper that the determinant is equal to:
Det = DxxDyy - (0.9 * Dxy)^2.
Finally, the determinant is given by: Det = DxxDyy - 0.81*Dxy^2.
Look at page 17 of this document
http://www.sci.utah.edu/~fletcher/CS7960/slides/Scott.pdf
If you made a code for normal Gaussian 2D convolution, just use the box filter as a Gaussian kernel and the input image will be the same original image not integral image. The results from this method will be same with the one you asked.

Resources