how use Discrete cosine transform(DCT) in opencv - opencv

dct don't the conversion properly in opencv.
imf = np.float32(block)
dct = cv2.dct(imf)
[[154,123,123,123,123,123,123,136],
[192,180,136,154,154,154,136,110],
[254,198,154,154,180,154,123,123],
[239,180,136,180,180,166,123,123],
[180,154,136,167,166,149,136,136],
[128,136,123,136,154,180,198,154],
[123,105,110,149,136,136,180,166],
[110,136,123,123,123,136,154,136]]
this block of an image,when converting with code shown above
[162.3 ,40.6, 20.0...
[30.5 ,108.4...
this should be the result,
[1186.3 , 40.6, 20.0...
[30.5, 108.4 ....
but I found this Result. for sample block, https://www.math.cuhk.edu.hk/~lmlui/dct.pdf

The DCT is working fine. The difference between what you got and what you expect is because that particular example given actually does the DFT on M instead of on the original image, I. In this case, as the paper shows, M = I - 128. The only difference in your example is that you don't subtract off that piece, so the values are all larger. In a cosine or Fourier transform, the first coefficient (the "DC offset" as it is sometimes called) has a higher value because your image values are just greater. But that's why all the other coefficients are the same. If you take an image and you simply add some or subtract some from the entire image equally, the coefficients of the transform will be the same, except the very first one.
From the standard definition of the DCT:
You can see here that for the first coefficient with k = 0, that inside the cosine function, you just get 0, and cos(0) = 1. Thus, X_0 as it's shown in this picture is just the sum of all the x_n values. Generally this value may be scaled by something relating to N so that it's something like an average. When doing so, it relates back to the X_0 term being a "DC offset" which you'll see described as the "mean value of the signal," or in other words, how far the signal is from 0. This is super useful to have as one of the cosine/Fourier transform coefficients as it then can completely describe a signal; all the other coefficients describe the frequency content and so they say nothing about how far the values are from 0, but the first coefficient, the DC offset, does tell you the shift!

Related

What is a mathematical relation of diameter and sigma arguments in bilateral filter function?

While learning an image denoising technique based on bilateral filter, I encountered this tutorial which provides with full lists of arguments used to run OpenCV's bilateralFilter function. What I see, it's slightly confusing, because there is no explanation about a mathematical rule to alter the diameter value by manipulating both the sigma arguments. So, if picking some specific arguments to pass into that function, I realize hardly what diameter corresponds with a particular couple of sigma values.
Does there exist a dependency between both deviations and the diameter? If my inference is correct, what equation (may be, introduced in OpenCV documentation) is to be referred if applying bilateral filter in a program-based solution?
According to the documentation, the bilateralFilter function in OpenCV takes a parameter d, the neighborhood diameter, as well as a parameter sigmaSpace, the spatial sigma. They can be selected separately, but if d "is non-positive, it is computed from sigmaSpace." For more details we need to look at the source code:
if( d <= 0 )
radius = cvRound(sigma_space*1.5);
else
radius = d/2;
radius = MAX(radius, 1);
d = radius*2 + 1;
That is, if d is not positive, then it is taken as 3 times sigmaSpace. d is also always forced to be odd, so that there is a central pixel in the neighborhood.
Note that the other sigma, sigmaColor, is unrelated to the spatial size of the filter.
In general, if one chooses a sigmaSpace that is too large for the given d, then the Gaussian kernel will be cut off in a way that makes it not appear like a Gaussian, and loose its nice filtering properties (see for example here for an explanation). If it is taken too small for the given d, then many pixels in the neighborhood will always have a near-zero weight, meaning that computational work is wasted. The default value is rather small (one typically uses a radius of 3 times sigma for Gaussian filtering), but is still quite reasonable given the computational cost of the bilateral filter (a smaller neighborhood is cheaper).
These two value (d and sigma) are totally unrelated to each other. Sigma determines the values of the pixels of the kernel, but d determines the size of the kernel.
For example consider this Gaussian filter with sigma=1:
It's a filter kernel and and as you can see the pixel values of the kernel only depends on sigma (the 3*3 matrix in the middle is equal in both kernel), but reducing the size of the kernel (or reducing the diameter) will make the outer pixels ineffective without effecting the values of the middle pixels.
And now if you change the sigma, (with k=3) the kernel is still 3*3 but the pixels' values would be different.

Feature Scaling with Octave

I want to do feature scaling datasets by using means and standard deviations, and my code is below; but apparently it is not a univerisal code, since it seems only work with one dataset. Thus I am wondering what is wrong with my code, any help will be appreciated! Thanks!
X is the dataset I am currently using.
mu = mean(X);
sigma = std(X);
m = size(X, 1);
mu_matrix = ones(m, 1) * mu;
sigma_matrix = ones(m, 1) * sigma;
featureNormalize = (X-mu_matrix)/sigma;
Thank you for clarifying what you think the code should be doing in the comments.
My answer will effectively answer why what you think is happening is not what is happening.
First let's talk about the mean and std functions. When their input is a vector (whether this is vertically or horizontally aligned), then this will return a single number which is the mean or standard deviation of that vector respectively, as you might expect.
However, when the input is a matrix, then you need to know what it does differently. Unless you specify the direction (dimension) in which you should be calculating means / std, then it will calculate means along the rows, i.e. returning a single number for each column. Therefore, the end-result of this operation will be a horizontal vector.
Therefore, both mu and sigma will be horizontal vectors in your code.
Now let's move on to the 'matrix multiplication' operator (i.e. *).
When using the matrix multiplication operator, if you multiply a horizontal vector with a vertical vector (i.e. the usual matrix multiplication operation), your output is a single number (i.e. a scalar). However, if you reverse the orientations, as in, you multiply a vertical vector by a horizontal one, you will in fact be calculating a 'Kronecker product' instead. Since the output of the * operation is completely defined by the rows of the first input, and the columns of the second input, whether you're getting a matrix multiplication or a kronecker product is implicit and entirely dependent on the orientation of your inputs.
Therefore, in your case, the line mu_matrix = ones(m, 1) * mu; is not in fact appending a vector of ones, like you say. It is in fact performing the kronecker product between a vertical vector of ones, and the horizontal vector that is your mu, effectively creating an m-by-n matrix with mu repeated vertically for m rows.
Therefore, at the end of this operation, as the variable naming would suggest, mu_matrix is in fact a matrix (same with sigma_matrix), having the same size as X.
Your final step is X- mu_sigma, which gives you at each element, the difference between x and mu at that element. Then you "divide" with the sigma matrix.
Here is why I asked if you were sure you should be using ./ instead of /.
/ is the matrix division operator. With / You are effectively performing matrix multiplication by an inverse matrix, since D / S is mathematically equivalent to D * inv(S). It seems to me you should be using ./ instead, to simply divide each element by the standard deviation of that column (which is why you had to repeat the horizontal vector over m rows in sigma_matrix, so that you could use it for 'elementwise division'), since what you are trying to do is to normalise each row (i.e. observation) of a particular column, by the standard deviation that is specific to that column (i.e. feature).

Pass histogram equalized image to second pass hist eq?

Suppose that a digital image is subjected to histogram equalization. The problem is the following: Show that a second pass histogram equalization (on the histogram equalized image) will produce exactly the same result as the first pass?
Here is the solution :- problem 3.7
I am not able to understand the following part of the answer: because every pixel (and no others) with value r_k is mapped to s_k, n_{s_k}=n_{r_k}.
This fact stems from the assumptions made on the transformation T representing the histogram equalization operation:
a. T is single-valued and monotonically increasing;
b. T(r) is in [0, 1] for every r in [0,1].
See the corresponding chapter in "Digital Image Processing" by Gonzalez and Woods. In particular, it can be verified that these two assumptions hold for the discrete histogram normalization (equations see Problem 3.7) - at least if none of the intensity values is 0, because then T is strictly monotonic implying a one-to-one mapping; this in turn implies the fact that only every pixel with value r_k is mapped to s_k and no other intensity values. This is done in the solution of Problem 3.10 in the PDF you provided. If one (or more) intensity value(s) is (are) 0, this might not be the case anymore. However, in the continuous domain (in which histogram equalization is usually derived), both assumptions hold, as is shown in the corresponding Chapter in the book by Gonzalez and Woods.

Rotating 1d FFT to get 2D FFT?

I have a blurry image with a sharp edge and I want to use the profile of that sharp edge to estimate the point spread function (PSF) of the imaging system (assuming that it is symmetric). The profile of the edge gives me the "edge spread function" (ESF) and the derivative of that gives me the "line spread function" (LSF). I am trying to follow these directions that I found in an old paper on how to convert from the LSF to the PSF:
"If we form the one-dimensional Fourier transform of the LSF and rotate the resulting curve about its vertical axis, the surface thus generated proves to be the two-dimensional fourier transform of the PSF. Hence it is merely necessary to take a two-dimensional inverse Fourier transform to obtain the PSF"
I can't seem to get this to work. The 2D FFT of a PSF-like function (for example a 2d gaussian) has lots of alternative positive and negative values, but if I rotate a 1D FFT, I get concentric rings of positive or negative values and the inverse transform looks nothing like a point-spread function. Am I missing a step or misunderstanding something? Any help would be appreciated! Thanks!
Edit: Here is some code showing my attempt to follow the procedure described
;generate x array
x=findgen(1000)/999*50-25
;generate gaussian test function in 1D
;P[0] = peak value
;P[1] = centroid
;P[2] = sigma
;P[3] = base level
P=[1.0,0.0,4.0,0.0]
test1d=gaussian_1d(x,P)
;Take the FFT of the test function
fft1d=fft(test1d)
;create an array with the frequency values for the FFT array, following the conventions used by IDL
;This piece of code to find freq is straight from IDL documentation: http://www.exelisvis.com/docs/FFT.html
N=n_elements(fft1d)
T=x[1]-x[0] ;T = sampling interval
fftx=(findgen((N-1)/2)+1)
is_N_even=(N MOD 2) EQ 0
if (is_N_even) then $
freq=[0.0,fftx,N/2,-N/2+fftx]/(N*T) $
else $
freq=[0.0,fftx,-(N/2+1)+fftx]/(N*T)
;Create a 1000x1000 array where each element holds the distance from the center
dim=1000
center=[(dim-1)/2.0,(dim-1)/2.0]
xarray=cmreplicate(findgen(dim),dim)
yarray=transpose(cmreplicate(findgen(dim),dim))
rarray=sqrt((xarray-center[0])^2+(yarray-center[1])^2)
rarray=rarray/max(rarray)*max(freq) ;scale rarray so max value is equal to highest freq in 1D FFT
;rotate the 1d FFT about zero to get a 2d array by interpolating the 1D function to the frequency values in the 2d array
fft2d=rarray*0.0
fft2d(findgen(n_elements(rarray)))=interpol(fft1d,freq,rarray(findgen(n_elements(rarray))))
;Take the inverse fourier transform of the 2d array
psf=fft(fft2d,/inverse)
;shift the PSF to be centered in the image
psf=shift(psf,500,500)
window,0,xsize=1000,ysize=1000
tvscl,abs(psf) ;visualize the absolute value of the result from the inverse 2d FFT
I don't know IDL, but I think your problem here is that you're taking the FFT of signals that are centered, where by default the function expects 0-frequency components at the beginning of the array.
A quick search for the proper way to do this in IDL indicates the CENTER keyword is what you're looking for.
CENTER
Set this keyword to shift the zero-frequency component to the center of the spectrum. In the forward direction, the resulting Fourier transform has the zero-frequency component shifted to the center of the array. In the reverse direction, the input is assumed to be a centered Fourier transform, and the coefficients are shifted back before performing the inverse transform.
Without letting the FFT routine know where the center of your signal is, it will seem shifted by N/2. In the converse domain this is a strong phase shift that will appear as if values are alternating positive and negative.
Ok, looks like I have solved the problem. The main issue seems to be that I needed to use the absolute value of the FFT results, rather than the complex array that is returned by default. Using the /CENTER keyword also helped make the indexing of the FFT result much simpler than IDL's default. Here is the working version of the code:
;generate x array
x=findgen(1000)/999*50-25
;generate lorentzian test function in 1D
;P[0] = peak value
;P[1] = centroid
;P[2] = fwhm
;P[3] = base level
P=[1.0,0.0,2,0.0]
test1d=lorentzian_1d(x,P)
;Take the FFT of the test function
fft1d=abs(fft(test1d,/center))
;Create an array of frequencies corresponding to the FFT result
N=n_elements(fft1d)
T=x[1]-x[0] ;T = sampling interval
freq=findgen(N)/(N*T)-N/(2*N*T)
;Create an array where each element holds the distance from the center
dim=1000
center=[(dim-1)/2.0,(dim-1)/2.0]
xarray=cmreplicate(findgen(dim),dim)
yarray=transpose(cmreplicate(findgen(dim),dim))
rarray=sqrt((xarray-center[0])^2+(yarray-center[1])^2)
rarray=rarray/max(rarray)*max(freq) ;scale rarray so max value is equal to highest freq in 1D FFT
;rotate the 1d FFT about zero to get a 2d array by interpolating the 1D function to the frequency values in the 2d array
fft2d=rarray*0.0
fft2d(findgen(n_elements(rarray)))=interpol(fft1d,freq,rarray(findgen(n_elements(rarray))))
;Take the inverse fourier transform of the 2d array
psf=abs(fft(fft2d,/inverse,/center))
;shift the PSF to be centered in the image
psf=shift(psf,dim/2.0,dim/2.0)
psf=psf/max(psf)
window,0,xsize=1000,ysize=1000
tvscl,real_part(psf) ;visualize the resulting PSF
;Test the performance by integrating the PSF in one dimension to recover the LSF
psftotal=total(psf,1)
plot,x*sqrt(2),psftotal/max(psftotal),thick=2,linestyle=2
oplot,x,test1d

Optimal sigma for Gaussian filtering of an image?

When applying a Gaussian blur to an image, typically the sigma is a parameter (examples include Matlab and ImageJ).
How does one know what sigma should be? Is there a mathematical way to figure out an optimal sigma? In my case, i have some objects in images that are bright compared to the background, and I need to find them computationally. I am going to apply a Gaussian filter to make the center of these objects even brighter, which hopefully facilitates finding them. How can I determine the optimal sigma for this?
There's no formula to determine it for you; the optimal sigma will depend on image factors - primarily the resolution of the image and the size of your objects in it (in pixels).
Also, note that Gaussian filters aren't actually meant to brighten anything; you might want to look into contrast maximization techniques - sounds like something as simple as histogram stretching could work well for you.
edit: More explanation - sigma basically controls how "fat" your kernel function is going to be; higher sigma values blur over a wider radius. Since you're working with images, bigger sigma also forces you to use a larger kernel matrix to capture enough of the function's energy. For your specific case, you want your kernel to be big enough to cover most of the object (so that it's blurred enough), but not so large that it starts overlapping multiple neighboring objects at a time - so actually, object separation is also a factor along with size.
Since you mentioned MATLAB - you can take a look at various gaussian kernels with different parameters using the fspecial('gaussian', hsize, sigma) function, where hsize is the size of the kernel and sigma is, well, sigma. Try varying the parameters to see how it changes.
I use this convention as a rule of thumb. If k is the size of kernel than sigma=(k-1)/6 . This is because the length for 99 percentile of gaussian pdf is 6sigma.
You have to find a min/max of a function G such that G(X,sigma) where X is a set of your observations (in your case, your image grayscale values) , This function can be anything that maintain the "order" of the intensities of the iamge, for example, this can be done with the 1st derivative of the image (as G),
fil = fspecial('sobel');
im = imfilter(I,fil);
imagesc(im);
colormap = gray;
this gives you the result of first derivative of an image, now you want to find max sigma by
maximzing G(X,sigma), that means that you are trying a few sigmas (let say, in increasing order) until you reach a sigma that makes G maximal. This can also be done with second derivative.
Given the central value of the kernel equals 1 the dimension that guarantees to have the outermost value less than a limit (e.g 1/100) is as follows:
double limit = 1.0 / 100.0;
size = static_cast<int>(2 * std::ceil(sqrt(-2.0 * sigma * sigma * log(limit))));
if (size % 2 == 0)
{
size++;
}

Resources