What is the meaning of pixel 'value' when interpreting as inverse-fourier transform? - image-processing

I’m working on a image signal project using C++ with DFT, IDEF.
I major in physics and have lots of experience dealing with 1d fourier transform..
HOWEVER, 2d dft is really not intuitive.
I studied a lot and now have a little understanding of what is 2d dft.
THIS IS WHAT I REALLY WANT TO KNOW.
In 1d, assume you have 2 functions each having frequency 30, 60(ignore unit).
then I can have a sine function with frequency 30, 60.(spatial domain)
When I take DFT to each sine function, I got value of 30,60 in frequency domain.
*** If I reduced the value of frequency (f = 30), then I get low amplitude in spatial domain, which means Asin(2pi30x), coefficient A reduce.
alright then.
when I have a image of 100x100 pixels and take 2d dft.
Then I also have 2d frequency domain(only magnitude).
*** What happen to pixels in spatial domain when I reduce the value of specific frequency?
suppose we have two frequency (10,10), (20,20) in frequency domain(u,v)
this means image in spatial domain is composed of these two frequency sinusoidal functions.
Also same as 1D, when I reduced the value of the specific frequency, I thinks it should reduce the amplitude of 2d sinusoidal function,, right?
Then How can i interpret pixel?
*** what can I interpret pixel in regard to sinusoidal function.
This question arises because I and colleague are working on project,
we are inserting specific frequency like (30,30) in frequency domain to original 1 image.
then after, when I idft, I got image what i want.
But my colleague is trying to insert frequency not in frequency domain, but in spatial domain, dealing with pixel value, which I can’t understand…

Related

What is derivative of FFT outputs with respect to time?

I am quite new to Digital Signal Processing. I am trying to implement an anti-cogging algorithm in my PMSM control algorithm. I follow this [documentation].
I collected velocity data according to the angle. And I translated velocity data to the frequency domain with FFT. But last step, Acceleration Based Waveform Analysis, a calculated derivative of FFT outputs with respect to time. Outputs are frequency domain how could I calculate derivative of FFT outputs with respect to time, and why does it do this calculation?
"derivative of FFT outputs with respect to time" doesn't make any sense, so even though the notation used in the paper seems to say just that, I think we can understand it better by reading the text that says "the accelerations are
found by taking the time derivative of the FFT fitted speeds".
That makes perfect sense: The FFT of the velocity array is used to interpolate between samples, providing velocity as a continuous function of position. That continuous function is then differentiated at the appropriate position (j in the paper) to find the acceleration at every position i. I didn't read closely enough to find out how i and j are related.
In implementation, every FFT output for frequency f would be multiplied by fi (that is, the frequency times sqrt(-1), not i the position) to produce the FFT of the acceleration function, and then the FFT basis functions would be evaluated in their continuous form (using Math.sin and Math.cos) to produce an acceleration at any desired point.

Why using magnitude method to get processed image?

Hi guys I’ve thinking about this question:
I know that we use Fourier transform to get into frequency domain to process the image.
I read the text book, it said that when we are done with processing the image in the Fourier domain we have to invert it back to get processed image.
And the textbook taught to get the real part of the inverse.
However, when I go through the OpenCv tutorial, no matter if using OpenCV or NumPy version, eventually they use magnitude (for OpenCV) or np.abs (for NumPy).
For OpenCV, the inverse returns two channels which contain the real and imaginary components. When I took the real part of the inverse, I got a totally weird image.
May somebody who knows the meaning behind all of this:
Why using magnitude or abs to get processed image?
What’s wrong with textbook instruction (take the real part of inverse)?
The textbook is right, the tutorial is wrong.
A real-valued image has a complex conjugate symmetry in the Fourier domain. This means that the FFT of the image will have a specific symmetry. Any processing that you do must preserve this symmetry if you want the inverse transform to remain real-valued. If you do this processing wrong, then the inverse transform will be complex-valued, and probably non-sensical.
If you preserve the symmetry in the Fourier domain properly, then the imaginary component of the inverse transform will be nearly zero (likely different from zero because of numerical imprecision). Discarding this imaginary component is the correct thing to do. Computing the magnitude will yield the same result, except all negative values will become positive (note some filters are meant to produce negative values, such as derivative filters), and at an increased computational cost.
For example, a convolution is a multiplication in the Fourier domain. The filter in the Fourier domain must be real-valued and symmetric around the origin. Often people will confuse where the origin is in the Fourier domain, and multiply by a filter that is seems symmetric, but actually is shifted with respect to the origin making it not symmetric. This shift introduces a phase change of the inverse transform (see the shift property of the Fourier transform). The magnitude of the inverse transform is not affected by the phase change, so taking the magnitude of this inverse transform yields an output that sort of looks OK, except if one expects to see negative values in the filter result. It would have been better to correctly understand the FFT algorithm, create a properly symmetric filter in the Fourier domain, and simply keep the real part of the inverse transform.
Nonetheless, some filters are specifically designed to break the symmetry and yield a complex-valued filter output. For example the Gabor filter has an even (symmetric) component and an odd (anti-symmetric) component. The even component yields a real-valued output, the odd component yields an imaginary-valued output. In this case, it is the magnitude of the complex value that is of interest. Likewise, a quadrature filter is specifically meant to produce a complex-valued output. From this output, the analytic signal (or its multi-dimensional extension, the monogenic signal), both the magnitude and the phase are of interest, for example as used in the phase congruency method of edge detection.
Looking at the linked tutorial, it is the line
fshift[crow-30:crow+30, ccol-30:ccol+30] = 0
which generates the Fourier-domain filter and applies it to the image (it is equivalent to multiplying by a filter with 1s and 0s). This tutorial correctly computes the origin of the Fourier domain (though for Python 3 you would use crow,ccol = rows//2 , cols//2 to get the integer division). But the filter above is not symmetric around that origin. In Python, crow-30:crow+30 indicates 30 pixels to the left of the origin, and only 29 pixels to the right (the right bound is not included!). The correct filter would be:
fshift[crow-30:crow+30+1, ccol-30:ccol+30+1] = 0
With this filter, the inverse transform is purely real (imaginary component has values in the order of 1e-13, which is numerical errors). Thus, it is now possible (and correct) to replace img_back = np.abs(img_back) with img_back = np.real(img_back).

Sinusoids with frequencies that are random variales - What does the FFT impulse look like?

I'm currently working on a program in C++ in which I am computing the time varying FFT of a wav file. I have a question regarding plotting the results of an FFT.
Say for example I have a 70 Hz signal that is produced by some instrument with certain harmonics. Even though I say this signal is 70 Hz, it's a real signal and I assume will have some randomness in which that 70Hz signal varies. Say I sample it for 1 second at a sample rate of 20kHz. I realize the sample period probably doesn't need to be 1 second, but bear with me.
Because I now have 20000 samples, when I compute the FFT. I will have 20000 or (19999) frequency bins. Let's also assume that my sample rate in conjunction some windowing techniques minimize spectral leakage.
My question then: Will the FFT still produce a relatively ideal impulse at 70Hz? Or will there 'appear to be' spectral leakage which is caused by the randomness the original signal? In otherwords, what does the FFT look like of a sinusoid whose frequency is a random variable?
Some of the more common modulation schemes will add sidebands that carry the information in the modulation. Depending on the amount and type of modulation with respect to the length of the FFT, the sidebands can either appear separate from the FFT peak, or just "fatten" a single peak.
Your spectrum will appear broadened and this happens in the real world. Look e.g for the Voight profile, which is a Lorentizan (the result of an ideal exponential decay) convolved with a Gaussian of a certain width, the width being determined by stochastic fluctuations, e.g. Doppler effect on molecules in a gas that is being probed by a narrow-band laser.
You will not get an 'ideal' frequency peak either way. The limit for the resolution of the FFT is one frequency bin, (frequency resolution being given by the inverse of the time vector length), but even that (as #xvan pointed out) is in general broadened by the window function. If your window is nonexistent, i.e. it is in fact a square window of the length of the time vector, then you'll get spectral peaks that are convolved with a sinc function, and thus broadened.
The best way to visualize this is to make a long vector and plot a spectrogram (often shown for audio signals) with enough resolution so you can see the individual variation. The FFT of the overall signal is then the projection of the moving peaks onto the vertical axis of the spectrogram. The FFT of a given time vector does not have any time resolution, but sums up all frequencies that happen during the time you FFT. So the spectrogram (often people simply use the STFT, short time fourier transform) has at any given time the 'full' resolution, i.e. narrow lineshape that you expect. The FFT of the full time vector shows the algebraic sum of all your lineshapes and therefore appears broadened.
To sum it up there are two separate effects:
a) broadening from the window function (as the commenters 1 and 2 pointed out)
b) broadening from the effect of frequency fluctuation that you are trying to simulate and that happens in real life (e.g. you sitting on a swing while receiving a radio signal).
Finally, note the significance of #xvan's comment : phi= phi(t). If the phase angle is time dependent then it has a derivative that is not zero. dphi/dt is a frequency shift, so your instantaneous frequency becomes f0 + dphi/dt.

What FFT descriptors should be used as feature to implement classification or clustering algorithm?

I have some geographical trajectories sampled to analyze, and I calculated the histogram of data in spatial and temporal dimension, which yielded a time domain based feature for each spatial element. I want to perform a discrete FFT to transform the time domain based feature into frequency domain based feature (which I think maybe more robust), and then do some classification or clustering algorithms.
But I'm not sure using what descriptor as frequency domain based feature, since there are amplitude spectrum, power spectrum and phase spectrum of a signal and I've read some references but still got confused about the significance. And what distance (similarity) function should be used as measurement when performing learning algorithms on frequency domain based feature vector(Euclidean distance? Cosine distance? Gaussian function? Chi-kernel or something else?)
Hope someone give me a clue or some material that I can refer to, thanks~
Edit
Thanks to #DrKoch, I chose a spatial element with the largest L-1 norm and plotted its log power spectrum in python and it did show some prominent peaks, below is my code and the figure
import numpy as np
import matplotlib.pyplot as plt
sp = np.fft.fft(signal)
freq = np.fft.fftfreq(signal.shape[-1], d = 1.) # time sloth of histogram is 1 hour
plt.plot(freq, np.log10(np.abs(sp) ** 2))
plt.show()
And I have several trivial questions to ask to make sure I totally understand your suggestion:
In your second suggestion, you said "ignore all these values."
Do you mean the horizontal line represent the threshold and all values below it should be assigned to value zero?
"you may search for the two, three largest peaks and use their location and probably widths as 'Features' for further classification."
I'm a little bit confused about the meaning of "location" and "width", does "location" refer to the log value of power spectrum (y-axis) and "width" refer to the frequency (x-axis)? If so, how to combine them together as a feature vector and compare two feature vector of "a similar frequency and a similar widths" ?
Edit
I replaced np.fft.fft with np.fft.rfft to calculate the positive part and plot both power spectrum and log power spectrum.
code:
f, axarr = plt.subplot(2, sharex = True)
axarr[0].plot(freq, np.abs(sp) ** 2)
axarr[1].plot(freq, np.log10(np.abs(sp) ** 2))
plt.show()
figure:
Please correct me if I'm wrong:
I think I should keep the last four peaks in first figure with power = np.abs(sp) ** 2 and power[power < threshold] = 0 because the log power spectrum reduces the difference among each component. And then use the log spectrum of new power as feature vector to feed classifiers.
I also see some reference suggest applying a window function (e.g. Hamming window) before doing fft to avoid spectral leakage. My raw data is sampled every 5 ~ 15 seconds and I've applied a histogram on sampling time, is that method equivalent to apply a window function or I still need apply it on the histogram data?
Generally you should extract just a small number of "Features" out of the complete FFT spectrum.
First: Use the log power spec.
Complex numbers and Phase are useless in these circumstances, because they depend on where you start/stop your data acquisiton (among many other things)
Second: you will see a "Noise Level" e.g. most values are below a certain threshold, ignore all these values.
Third: If you are lucky, e.g. your data has some harmonic content (cycles, repetitions) you will see a few prominent Peaks.
If there are clear peaks, it is even easier to detect the noise: Everything between the peaks should be considered noise.
Now you may search for the two, three largest peaks and use their location and probably widths as "Features" for further classification.
Location is the x-value of the peak i.e. the 'frequency'. It says something how "fast" your cycles are in the input data.
If your cycles don't have constant frequency during the measuring intervall (or you use a window before caclculating the FFT), the peak will be broader than one bin. So this widths of the peak says something about the 'stability' of your cycles.
Based on this: Two patterns are similar if the biggest peaks of both hava a similar frequency and a similar widths, and so on.
EDIT
Very intersiting to see a logarithmic power spectrum of one of your examples.
Now its clear that your input contains a single harmonic (periodic, oscillating) component with a frequency (repetition rate, cycle-duration) of about f0=0.04.
(This is relative frquency, proprtional to the your sampling frequency, the inverse of the time beetween individual measurment points)
Its is not a pute sine-wave, but some "interesting" waveform. Such waveforms produce peaks at 1*f0, 2*f0, 3*f0 and so on.
(So using an FFT for further analysis turns out to be very good idea)
At this point you should produce spectra of several measurements and see what makes a similar measurement and how differ different measurements. What are the "important" features to distinguish your mesurements? Thinks to look out for:
Absolute amplitude: Height of the prominent (leftmost, highest) peaks.
Pitch (Main cycle rate, speed of changes): this is position of first peak, distance between consecutive peaks.
Exact Waveform: Relative amplitude of the first few peaks.
If your most important feature is absoulute amplitude, you're better off with calculating the RMS (root mean square) level of our input signal.
If pitch is important, you're better off with calculationg the ACF (auto-correlation function) of your input signal.
Don't focus on the leftmost peaks, these come from the high frequency components in your input and tend to vary as much as the noise floor.
Windows
For a high quality analyis it is importnat to apply a window to the input data before applying the FFT. This reduces the infulens of the "jump" between the end of your input vector ant the beginning of your input vector, because the FFT considers the input as a single cycle.
There are several popular windows which mark different choices of an unavoidable trade-off: Precision of a single peak vs. level of sidelobes:
You chose a "rectangular window" (equivalent to no window at all, just start/stop your measurement). This gives excellent precission of your peaks which now have a width of just one sample. Your sidelobes (the small peaks left and right of your main peaks) are at -21dB, very tolerable given your input data. In your case this is an excellent choice.
A Hanning window is a single cosine wave. It makes your peaks slightly broader but reduces side-lobe levels.
The Hammimg-Window (cosine-wave, slightly raised above 0.0) produces even broader peaks, but supresses side-lobes by -42 dB. This is a good choice if you expect further weak (but important) components between your main peaks or generally if you have complicated signals like speech, music and so on.
Edit: Scaling
Correct scaling of a spectrum is a complicated thing, because the values of the FFT lines depend on may things like sampling rate, lenght of FFT, window, and even implementation details of the FFT algorithm (there exist several different accepted conventions).
After all, the FFT should show the underlying conservation of energy. The RMS of the input signal should be the same as the RMS (Energy) of the spectrum.
On the other hand: if used for classification it is enough to maintain relative amplitudes. As long as the paramaters mentioned above do not change, the result can be used for classification without further scaling.

FFT for n Points (non power of 2 )

I need to know a way to make FFT (DFT) work with just n points, where n is not a power of 2.
I want to analyze an modify the sound spectrum, in particular of Wave-Files, which have in common 44100 sampling points. But my FFT does not work, it only works with points which are in shape like 2^n.
So what can I do? Beside fill up the vector with zeros to the next power of 2 ?!
Any way to modify the FFT algorithm?
Thanks!
You can use the FFTW library or the code generators of the Spiral project. They implement FFT for numbers with small prime factors, break down large prime factors p by reducing it to a FFT of size (p-1) which is even, etc.
However, just for signal analysis it is questionable why you want to analyze exactly one second of sound and not smaller units. Also, you may want to use a windowing procedure to avoid the jumps at the ends of the segment.
Aside from padding the array as you suggest, or using some other library function, you can construct a Fourier transform with arbitrary length and spacing in the frequency domain (also for non-integer sample spacings).
This is a well know result and is based on the Chirp-z transform (or Bluestein's FFT). Another good reference is given by Rabiner and can be found at the above link.
In summary, with this approach you don't have to write the FFT yourself, you can simply use an existing high-performance FFT and then apply the convolution theorem to a suitably scaled and conditioned version of your signal.
The performance will still be, O(n*log n), multiplied by some implementation-dependent scaling factor.
The FFT is just a faster method of computing the DFT for certain length vectors; and a DFT can be computed for any length of input vector. You can also zero-pad your input vector to a length supported by your FFT library, which may be faster.
If you want to modify your sound file, you may need to use the overlap-add or overlap-save fast convolution filtering after determining the length of the impulse response of your frequency domain modification.

Resources