Where to center the kernel when using FFTW for image convolution?

Where to center the kernel when using FFTW for image convolution? - image-processing

I am trying to use FFTW for image convolution.
At first just to test if the system is working properly, I performed the fft, then the inverse fft, and could get the exact same image returned.
Then a small step forward, I used the identity kernel(i.e., kernel[0][0] = 1 whereas all the other components equal 0). I took the component-wise product between the image and kernel(both in the frequency domain), then did the inverse fft. Theoretically I should be able to get the identical image back. But the result I got is very not even close to the original image. I am suspecting this has something to do with where I center my kernel before I fft it into frequency domain(since I put the "1" at kernel[0][0], it basically means that I centered the positive part at the top left). Could anyone enlighten me about what goes wrong here?

For each dimension, the indexes of samples should be from -n/2 ... 0 ... n/2 -1, so if the dimension is odd, center around the middle. If the dimension is even, center so that before the new 0 you have one sample more than after the new 0.
E.g. -4, -3, -2, -1, 0, 1, 2, 3 for a width/height of 8 or -3, -2, -1, 0, 1, 2, 3 for a width/height of 7.
The FFT is relative to the middle, in its scale there are negative points.
In the memory the points are 0...n-1, but the FFT treats them as -ceil(n/2)...floor(n/2), where 0 is -ceil(n/2) and n-1 is floor(n/2)
The identity matrix is a matrix of zeros with 1 in the 0,0 location (the center - according to above numbering). (In the spatial domain.)
In the frequency domain the identity matrix should be a constant (all real values 1 or 1/(N*M) and all imaginary values 0).
If you do not receive this result, then the identify matrix might need padding differently (to the left and down instead of around all sides) - this may depend on the FFT implementation.
Center each dimension separately (this is an index centering, no change in actual memory).
You will probably need to pad the image (after centering) to a whole power of 2 in each dimension (2^n * 2^m where n doesn't have to equal m).
Pad relative to FFT's 0,0 location (to center, not corner) by copying existing pixels into a new larger image, using center-based-indexes in both source and destination images (e.g. (0,0) to (0,0), (0,1) to (0,1), (1,-2) to (1,-2))
Assuming your FFT uses regular floating point cells and not complex cells, the complex image has to be of size 2*ceil(2/n) * 2*ceil(2/m) even if you don't need a whole power of 2 (since it has half the samples, but the samples are complex).
If your image has more than one color channel, you will first have to reshape it, so that the channel are the most significant in the sub-pixel ordering, instead of the least significant. You can reshape and pad in one go to save time and space.
Don't forget the FFTSHIFT after the IFFT. (To swap the quadrants.)
The result of the IFFT is 0...n-1. You have to take pixels floor(n/2)+1..n-1 and move them before 0...floor(n/2).
This is done by copying pixels to a new image, copying floor(n/2)+1 to memory-location 0, floor(n/2)+2 to memory-location 1, ..., n-1 to memory-location floor(n/2), then 0 to memory-location ceil(n/2), 1 to memory-location ceil(n/2)+1, ..., floor(n/2) to memory-location n-1.
When you multiply in the frequency domain, remember that the samples are complex (one cell real then one cell imaginary) so you have to use a complex multiplication.
The result might need dividing by N^2*M^2 where N is the size of n after padding (and likewise for M and m). - You can tell this by (a. looking at the frequency domain's values of the identity matrix, b. comparing result to input.)

I think that your understanding of the Identity kernel may be off. An Identity kernel should have the 1 at the center of the 2D kernal not at the 0, 0 position.
example for a 3 x 3, you have yours setup as follows:
1, 0, 0
0, 0, 0
0, 0, 0
It should be
0, 0, 0
0, 1, 0
0, 0, 0
Check this out also
What is the "do-nothing" convolution kernel
also look here, at the bottom of page 3.
http://www.fmwconcepts.com/imagemagick/digital_image_filtering.pdf

I took the component-wise product between the image and kernel in frequency domain, then did the inverse fft. Theoretically I should be able to get the identical image back.
I don't think that doing a forward transform with a non-fft kernel, and then an inverse fft transform should lead to any expectation of getting the original image back, but perhaps I'm just misunderstanding what you were trying to say there...

Related

How to calculate partial derivatives of error function with respect to values in matrix

I am building a project that is a basic neural network that takes in a 2x2 image with the goal to classify the image as either a forward slash (1-class) or back slash (0-class) shape. The data for the input is a flat numpy array. 1 represents a black pixel 0 represents a white pixel.
0-class: [1, 0, 0, 1]
1-class: [0, 1, 1, 0]
If I start my filter as a random 4x1 matrix, how can I use gradient descent to come to either perfect matrix [1,-1,-1,1] or [-1,1,1,-1] to classify the datapoints.
Side note: Even when multiplied with the "perfect" answer matrix then summed, the label output would be -2 and 2. Would my data labels need to be -2 and 2? What if I want my classes labeled as 0 and 1?

Can I convert any matrix to an image?

I have a hypothetical question about image processing:
Supposing we have a grayscale image of size 2x2 which can be represented by an integer matrix (intensity values) with the same dimensions:
(050, 150)
(100, 250)
After applying some mathematical functions (it can be any mathematical function) the values were changed, for example:
(550, 825)
(990, 1120)
Is there any way that I can represent this matrix as an image again (considering that the pixels intensity range is 0-255)?
One option which I can think about is to 'normalize' these values by finding the lower value and decreasing it from each value:
(0, 275)
(440, 570)
Then, finding the higher value and consider it as the 255, for example:
(0, 48)
(77, 255)
I'm not sure if this approach makes sense (or is efficient to represent the original image).
Anyway, this question is just a conceptual doubt, I'm not trying to implement it, so I haven't any code to show.

Is there any way that I can represent this matrix as an image again ( considering that the pixels intensity range is 0-255 ) ?
Oh yes, we can.
The issue is with a colorspace-mapping.
Not just the translation from an unknown range of < A, B >, but also within a certain and reasonable context of the two different colorspace-ranges, the latter ( the target ) of which is the said (int) < 0, 255 > bound.
Given many 2x2 matrices get produced by some unknown process, their colorspace-transcoding ought keep some rationale, that if all were put side by side, the transcoding used should be "non-local" ( having some global anchor for globally equalised normalisation of individual colorspace-transcoding values ) so as not to "devastate" any phenomenon, that was observed in the original colorspace on 4096 x 4096 imagery source, but was "torn" appart, by just locally-normalised 2 x 2 transcoding ( this will lead to incoherrent target colorspaces and the globally observable visual phenomenon will not be visible in a set of target 2x2 sub-views right due to incompatible colorspaces transcoding -- a new kind o non-linear disorder will be introduced due to globally discoordinated colorspace-transcoding and the initial information value of the original will be lost )

composition of 3D rigid transformations

This question belongs to the topic - 'Structure from Motion'.
Suppose, there are 3 images. There are point correspondences between image 1-2 and image 2-3, but there's no common point between image 1 and 3. I got the RT (rotation and translation matrix) of image 2, RT12, with respect to image 1(considering image 1 RT as [I|0], that means, rotation is identity, translation is zero). Lets split RT12 into R12 and T12.
Similarly, I got RT23 considering image 2 RT as [I|0]. So, now I have R23 and T23, who are related to image 2, but not image 1. Now I want to find R13 and T13.
For a synthetic dataset, the equation R13=R23*R12 is giving correct R(verified, as I actually have the R13 precalculated). Similary T13 should be T2+T1. But the translation computed this way is bad. Since I have actual results with me, I could verify that Rotation was estimated nicely, but not translation. Any ideas?

This is a simple matrix block-multiplication problem, but you have to remember that you are actually considering 4x4 matrices (associated to rigid transformations in the 3D homogeneous world) and not 3x4 matrices.
Your 3x4 RT matrices actually correspond to the first three rows of 4x4 matrices A, whose last rows are [0, 0, 0, 1]:
RT23 -> A23=[R23, T23; 0, 0, 0, 1]
RT12 -> A12=[R12, T12; 0, 0, 0, 1]
Then, if you do the matrix block-multiplication on the 4x4 matrices (A13 = A23 * A12), you'll quickly find out that:
R13 = R23 * R12
T13 = R23 * T12 + T23

How to normalize Difference of Gaussian Image pixels with negative values?

In the context of image processing for edge detection or in my case a basic SIFT implementation:
When taking the 'difference' of 2 Gaussian blurred images, you are bound to get pixels whose difference is negative (they are originally between 0 - 255, when subtracting they are possibly between -255 - 255). What is the normal approach to 'fixing' this? I don't see taking the absolute value to be very correct in this situation.

There are two different approaches depending on what you want to do with the output.
The first is to offset the output by 128, so that your calculation range of -128 to 127 maps to 0 to 255.
The second is to clamp negative values so that they all equal zero.

Convolution theory vs implementation

I study convolution in image processing as it is a part of the curriculum, I understand the theory and the formula but I am confused about its implementation.
The formula is:
What I understand
The convolution kernel is flipped both horizontally and vertically then the values in the kernel are multiplied by the corresponding pixel values, the results are summed, divided by "row x column" to get the average, and then finally this result is the value of the pixel at the center of the kernel location.
Confusion in implementation
When I run the example convolution program from my course material and insert as input a 3x3 convolution kernel where:
1st row: (0, 1, 0)
2nd row: (0, 0, 0)
3rd row: (0, 0, 0)
The processed image is shifted down by one pixel, where I expected it to shift upwards by one pixel. This result indicates that no horizontal or vertical flipping is done before calculating (as if it is doing correlation).
I thought there might be a fault in the program so I looked around and found that Adobe Flex 3 and Gimp are doing this as well.
I don't understand, is there something that I missed to notice?
Appreciate any help or feedback.

I guess the programs you tried implement correlation instead of convolution.
I've tried your filter in Mathematica using the ImageFilter function, the result is shifted upwards as expected:
result:
I've also tried it in Octave (an open source Matlab clone):
imfilter([1,1,1,1,1;
2,2,2,2,2;
3,3,3,3,3;
4,4,4,4,4;
5,5,5,5,5],
[0,1,0;
0,0,0;
0,0,0],"conv")
("conv" means convolution - imfilter's default is correlation). Result:
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
0 0 0 0 0
Note that the last row is different. That's because different implementations use different padding (by default). Mathematica uses constant padding for ImageConvolve, no padding for ListConvolve. Octave's imfilter uses zero padding.
Also note that (as belisarius mentioned) the result of a convolution can be smaller, same size or larger than the source image. (I've read the terms "valid", "same size" and "full" convolution in the Matlab and IPPI documentation, but I'm not sure if that's standard terminology). The idea is that the summation can either be performed
only over the source image pixels where the kernel is completely inside the image. In that case, the result is smaller than the source image.
over every source pixel. In that case, the result has the same size as the source image. This requires padding at the borders
over every pixel where any part of the kernel is inside the source image. In that case, the result image is larger than the source image. This also requires padding at the borders.

Please note that:
Results in:
So, the "shifting" is not real, as the dimensions are affected.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Where to center the kernel when using FFTW for image convolution? - image-processing

Related

How to calculate partial derivatives of error function with respect to values in matrix

Can I convert any matrix to an image?

composition of 3D rigid transformations

How to normalize Difference of Gaussian Image pixels with negative values?

Convolution theory vs implementation

Categories

Resources