I am trying to implement a 2D FFT using 1D FFTs. I have a matrix of size 4x4 (row major)
My algorithm is:
FFT on all 16 points
bit reversal
transpose
FFT on 16 points
bit reversal
transpose
Is this correct?
No - the algorithm is:
do 1D FFT on each row (real to complex)
do 1D FFT on each column resulting from (1) (complex to complex)
So it's 4 x 1D (horizontal) FFTs followed by 4 x 1D (vertical) FFTs, for a total of 8 x 1D FFTs.
Related
I want to implement Ncuts algorithm for an image of size 1248 x 378 x 1 but the adjacency matrix will be (1248 x 378 ) x (1248 x 378 ) which needs about 800 gb of RAM. Even if i most of it is zero, still it needs too much memory. I do need this matrix though to compute the normalized cut. Is there any way that i can find the eigenvalues without actually calculate the whole matrix?
If most of the matrix is zero,, then don't use a dense format.
Instead use a sparse matrix.
The dimension of the image is 64 x 128. That is 8192 magnitude and gradient values. After the binning stage, we are left with 1152 values as we converted 64 pixels into 9 bins based on their orientation. Can you please explain to me how after L2 normalization we get 3780 vectors?
Assumption: You have the gradients of the 64 x 128 patch.
Calculate Histogram of Gradients in 8x8 cells
This is where it starts to get interesting. The image is divided into 8x8 cells and a HOG is calculated for each 8x8 cells. One of the reasons why we use 8x8 cells is that it provides a compact representation. An 8x8 image patch contains 8x8x3 = 192 pixel values (color image). The gradient of this patch contains 2 values (magnitude and direction) per pixel which adds up to 8x8x2 = 128 values. These 128 numbers are represented using a 9-bin histogram which can be stored as an array of 9 numbers. This makes it more compact and calculating histograms over a patch makes this representation more robust to noise.
The histogram is essentially a vector of 9 bins corresponding to angles 0, 20, 40, 60 ... 180 corresponding to unsigned gradients.
16 x 16 Block Normalization
After creating the histogram based on the gradient of the image, we want our descriptor to be independent of lighting variations. Hence, we normalize the histogram. The vector norm for a RGB color [128, 64, 32] is sqrt(128*128 + 64*64 + 32*32) = 146.64, which is the infamous L2-norm. Dividing each element of this vector by 146.64 gives us a normalized vector [0.87, 0.43, 0.22]. If we were to multiply each element of this vector by 2, the normalized vector will remain the same as before.
Although simply normalizing the 9x1 histogram is intriguing, normalizing a bigger sized block of 16 x 16 is better. A 16 x 16 block has 4 histograms, which can be concatenated to form a 36 x 1 element vector and it can be normalized the same way as the 3 x 1 vector in the example. The window is then moved by 8 pixels and a normalized 36 x 1 vector is calculated over this window and the process is repeated (see the animation: Courtesy)
Calculate the HOG feature vector
This is where your question comes in.
To calculate the final feature vector for the entire image patch, the 36 x 1 vectors are concatenated into on giant vector. Let us calculate the size:
How many positions of the 16 x 16 blocks do we have? There are 7 horizontal and 15 vertical positions, which gives - 105 positions.
Each 16 x 16 block is represented by a 36 x 1 vector. So when we concatenate them all into one giant vector we obtain a 36 x 105 = 3780 dimensional vector.
For more details, look at the tutorial where I learned.
Hope it helps!
Suppose that a given 3-bit image(L=8) of size 64*64 pixels (M*N=4096) has the intensity distribution shown as below. How to obtain histogram equalization transformation function
and then compute the equalized histogram of the image?
Rk nk
0 800
1 520
2 970
3 660
4 330
5 450
6 260
7 106
"Histogram Equalization is the process of obtaining transformation function automatically. So you need not have to worry about shape and nature of transformation function"
So in Histogram equalization, transformation function is calculated using cumulative frequency approach and this process is automatic. From the histogram of the image, we determine the cumulative histogram, c, rescaling the values as we go so that they occupy an 8-bit range. In this way, c becomes a look-up table that can be subsequently applied to the image in order to carry out equalization.
rk nk c sk = c/MN (L-1)sk rounded value
0 800 800 0.195 1.365 1
1 520 1320 0.322 2.254 2
2 970 2290 0.559 3.913 4
3 660 2950 0.720 5.04 5
4 330 3280 0.801 5.601 6
5 450 3730 0.911 6.377 6
6 260 3990 0.974 6.818 7
7 106 4096 1.000 7.0 7
Now the equalized histogram is therefore
rk nk
0 0
1 800
2 520
3 0
4 970
5 660
6 330 + 450 = 780
7 260 + 106 = 366
The algorithm for equalization can be given as
Compute a scaling factor, α= 255 / number of pixels
Calculate histogram of the image
Create a look up table c with
c[0] = α * histogram[0]
for all remaining grey levels, i, do
c[i] = c[i-1] + α * histogram[i]
end for
for all pixel coordinates, x and y, do
g(x, y) = c[f(x, y)]
end for
But there is a problem with histogram equalization and that is mainly because it is a completely automatic technique, with no parameters to set. At times, it can improve our ability to interpret an image dramatically. However, it is difficult to predict how beneficial equalization will be for any given image; in fact, it may not be of any use at all. This is because the improvement in contrast is optimal statistically, rather than perceptually. In images with narrow histograms and relatively few grey levels, a massive increase in contrast due to histogram equalisation can have the adverse effect of reducing perceived image quality. In particular, sampling or quantisation artefacts and image noise may become more prominent.
The alternative to obtaining the transformation (mapping) function automatically is Histogram Specification. In histogram specification instead of requiring a flat histogram, we specify a particular shape explicitly. We might wish to do this in cases where it is desirable for a set of related images to have the same histogram- in order, perhaps, that a particular operation produces the same results for all images.
Histogram specification can be visualised as a two-stage process. First, we transform the input image by equalisation into a temporary image with a flat histogram. Then we transform this equalised, temporary image into an output image possessing the desired histogram. The mapping function for the second stage is easily obtained. Since a rescaled version of the cumulative histogram can be used to transform a histogram with any shape into a flat histogram, it follows that the inverse of the cumulative histogram will perform the inverse transformation from a fiat histogram to one with a specified shape.
For more details about histogram equalization and mapping functions with C and C++ code
https://programming-technique.blogspot.com/2013/01/histogram-equalization-using-c-image.html
I'm currently trying to compute the fft of an image via fftw_plan_dft_2d.
To use this function, I'm linearizing the image data into an in array and calling the function mentioned above (and detailed below)
ftw_plan fftw_plan_dft_2d(int n0, int n1,
fftw_complex *in, fftw_complex *out,
int sign, unsigned flags);
The func modifies a complex array, out, with a size equal to the number of pixels in the original image.
Do you know if this is the proper way of computing the 2D FFT of an image? If so, what does the data within out represent? IE Where are the high and low frequency values in the array?
Thanks,
djs22
A 2D FFT is equivalent to applying a 1D FFT to each row of the image in one pass, followed by 1D FFTs on all the columns of the output from the first pass.
The output of a 2D FFT is just like the output of a 1D FFT, except that you have complex magnitudes in x, y dimensions rather just a single dimension. Spatial frequency increases with the x and y index as expected.
There's a section in the FFTW manual (here) which covers the organisation of the real-to-complex 2D FFT output data, assuming that's what you're using.
It is.
Try to compute 2 plans:
plan1 = fftw_plan_dft_2d(image->rows, image->cols, in, fft, FFTW_FORWARD, FFTW_ESTIMATE);
plan2 = fftw_plan_dft_2d(image->rows, image->cols, fft, ifft, FFTW_BACKWARD, FFTW_ESTIMATE);
You'll obtain the original data in ifft.
Hope it helps :)
im trying to implement a gaussian blur with the use of FFT and could find here the following recipe.
This means that you can take the
Fourier transform of the image and the
filter, multiply the (complex)
results, and then take the inverse
Fourier transform.
I've got a kernel K, a 7x7 Matrix
and a Image I, a 512x512 Matrix.
I do not understand how to multiply K by I.
Is the only way to do that by making K as big as I (512x512) ?
Yes, you do need to make K as big as I by padding it with zeros. Also, after padding, but before you take the FFT of the kernel, you need to translate it with wraparound, such that the center of the kernel (the peak of the Gaussian) is at (0,0). Otherwise, your filtered image will be translated. Alternatively, you can translate the resulting filtered image once you are done.
Another point: for small kernels not using the FFT may actually be faster. A 2D Gaussian kernel is separable, meaning that you can separate it into two 1D kernels for x and y. Then instead of a 2D convolution, you can do two 1D convolutions in x and y directions in the spatial domain. For smaller kernels that may end up being faster than doing the convolution in the frequency domain using the FFT.
If you are comfortable with pixel shader and if FFT is not your main goal here, but convolution with gaussian blur kernel IS,- then i can recommend my tutorial on what convolution is
regards.