Normalization Constant for Power Spectral Density - signal-processing

from my knowledge, Power Spectral Density (PSD) should stay relatively constant with the total time sampled (or aka. N-points sampled), however I have having trouble obtaining this result.
As I know from Discrete Fourier Transform (DFT), the amplitude normalization is 1/N. (e.g Amplitude Spectrum = DFT/N). However, from various sources, the PSD is defined as (DFT * DFT-conjugate / N).
How can this be possible? It is true that the Amplitude Spectrum has a 1/N normalization constant, then shouldn't the PSD have a 1/N^2 normalization constant (since DFT is proportional to N and so is its conjugate).
More specifically, I am trying to calcuated the PSD of a continuous electric field wave using the Eq. 9 of this paper. However I can't make sense of it's constants infront of the DFT since the factors of N's cancel out leaving behind only the summation of the window function squared. I tested this result and found that the PSD does not stay relatively constant with sampling size.
In summary, I have having troubles since my PSD varies with the amount of total time of the signal sampled. Any help would be great, thanks!

I've found the PSD of a time-series does increase linearly with the number of points sampled, N, however, an appropriately FITTED function (or some sort of averaging) allows the PSD to remain constant with N. One would then take the PSD at a point on this fitted function.
This is a direct result of conserving the area of a curve, AKA Plancherel's theorem.

Related

What is the meaning of pixel 'value' when interpreting as inverse-fourier transform?

I’m working on a image signal project using C++ with DFT, IDEF.
I major in physics and have lots of experience dealing with 1d fourier transform..
HOWEVER, 2d dft is really not intuitive.
I studied a lot and now have a little understanding of what is 2d dft.
THIS IS WHAT I REALLY WANT TO KNOW.
In 1d, assume you have 2 functions each having frequency 30, 60(ignore unit).
then I can have a sine function with frequency 30, 60.(spatial domain)
When I take DFT to each sine function, I got value of 30,60 in frequency domain.
*** If I reduced the value of frequency (f = 30), then I get low amplitude in spatial domain, which means Asin(2pi30x), coefficient A reduce.
alright then.
when I have a image of 100x100 pixels and take 2d dft.
Then I also have 2d frequency domain(only magnitude).
*** What happen to pixels in spatial domain when I reduce the value of specific frequency?
suppose we have two frequency (10,10), (20,20) in frequency domain(u,v)
this means image in spatial domain is composed of these two frequency sinusoidal functions.
Also same as 1D, when I reduced the value of the specific frequency, I thinks it should reduce the amplitude of 2d sinusoidal function,, right?
Then How can i interpret pixel?
*** what can I interpret pixel in regard to sinusoidal function.
This question arises because I and colleague are working on project,
we are inserting specific frequency like (30,30) in frequency domain to original 1 image.
then after, when I idft, I got image what i want.
But my colleague is trying to insert frequency not in frequency domain, but in spatial domain, dealing with pixel value, which I can’t understand…

Fast Fourier Transform with negative amplitude

Using wavesurfer to analyse bird songs data and getting negative amplitudes with the FFT analysis that I can not find the reason why.
I have been using wavesurfer for analysing my paper data on birds song. I open my data with the spectrogram and then in spectrum section is possible to find frequency and amplitude (using FFT analysis). My amplitude is negative but I can not find a way to justify that. I dont know why is that and I have significant results on my data, meaning that everything needs to be justified. The forum of the software does not work and there is literally no answer to my question on the internet. I have even emailed the creators asking for help. Find a screen attached to the windows.
dB are on a logarithmic scale. The log of a small enough positive FFT magnitude can be negative.
e.g. 20*log10(0.1) = -20
We express values in dB in order to say something like: it's n times as large. A dB quantity is the log10 of a ratio, it is used to make a comparison between two measurements, for example output vs. input. Sometimes there is a single measurement compared with a fixed value. In that case we do not use the dB unit but some variants, e.g. if the reference is 1mW, we use dBm. The logarithmic scale facilitates calculations of attenuation and gain along a processing chain. A negative value indicates the ratio is less than 1, and the signal is attenuated (or less than the other), conversely a positive value indicates the signal is amplified (or greater than the other).
A dB (deci+Bel) is 0.1 B, so -60.1dB is -6.01B.
Power ratio
Originally the dB scale is related to power. Assuming the value is a power, you have to find the number which log10 is -6.01, a matter of taking the exponential of both sides:
log(x) = -6.01 (after conversion to B)
10^(log(x)) = 10^-6.01
x = 9.7e-07 ≈ 1/1,000,00
So your measurement is a power 1e6 times smaller than another power. Which one? It's not told, but it's likely the spectral line with the maximum value.
Voltage ratio
When dB relate to a ratio of voltages, the number is doubled for the same ratio. The reason is:
It is assumed power is increased (or decreased) by managing to increase (decrease) some voltage. A change in the same proportion occurs for the current according to Ohm's law.
Power is voltage x current, therefore to change the power in the ratio k, it's sufficient to change the voltage in the ratio sqrt(k). Log (sqrt(k)) = 1/2*log(k).
E.g.
Power increase from 10W to 20W, a ratio of 2, is log(2) = 0.3B or 3dB.
Voltage increase from 5V to 10V, also a ratio of 2 but 0.3B x 2 or 6dB.
Root-power quantities vs. root-power quantities
The rule actually separates power quantities (like power, energy or acoustic intensity) and root-power quantity (like voltage, sound pressure and electric field).

Sinusoids with frequencies that are random variales - What does the FFT impulse look like?

I'm currently working on a program in C++ in which I am computing the time varying FFT of a wav file. I have a question regarding plotting the results of an FFT.
Say for example I have a 70 Hz signal that is produced by some instrument with certain harmonics. Even though I say this signal is 70 Hz, it's a real signal and I assume will have some randomness in which that 70Hz signal varies. Say I sample it for 1 second at a sample rate of 20kHz. I realize the sample period probably doesn't need to be 1 second, but bear with me.
Because I now have 20000 samples, when I compute the FFT. I will have 20000 or (19999) frequency bins. Let's also assume that my sample rate in conjunction some windowing techniques minimize spectral leakage.
My question then: Will the FFT still produce a relatively ideal impulse at 70Hz? Or will there 'appear to be' spectral leakage which is caused by the randomness the original signal? In otherwords, what does the FFT look like of a sinusoid whose frequency is a random variable?
Some of the more common modulation schemes will add sidebands that carry the information in the modulation. Depending on the amount and type of modulation with respect to the length of the FFT, the sidebands can either appear separate from the FFT peak, or just "fatten" a single peak.
Your spectrum will appear broadened and this happens in the real world. Look e.g for the Voight profile, which is a Lorentizan (the result of an ideal exponential decay) convolved with a Gaussian of a certain width, the width being determined by stochastic fluctuations, e.g. Doppler effect on molecules in a gas that is being probed by a narrow-band laser.
You will not get an 'ideal' frequency peak either way. The limit for the resolution of the FFT is one frequency bin, (frequency resolution being given by the inverse of the time vector length), but even that (as #xvan pointed out) is in general broadened by the window function. If your window is nonexistent, i.e. it is in fact a square window of the length of the time vector, then you'll get spectral peaks that are convolved with a sinc function, and thus broadened.
The best way to visualize this is to make a long vector and plot a spectrogram (often shown for audio signals) with enough resolution so you can see the individual variation. The FFT of the overall signal is then the projection of the moving peaks onto the vertical axis of the spectrogram. The FFT of a given time vector does not have any time resolution, but sums up all frequencies that happen during the time you FFT. So the spectrogram (often people simply use the STFT, short time fourier transform) has at any given time the 'full' resolution, i.e. narrow lineshape that you expect. The FFT of the full time vector shows the algebraic sum of all your lineshapes and therefore appears broadened.
To sum it up there are two separate effects:
a) broadening from the window function (as the commenters 1 and 2 pointed out)
b) broadening from the effect of frequency fluctuation that you are trying to simulate and that happens in real life (e.g. you sitting on a swing while receiving a radio signal).
Finally, note the significance of #xvan's comment : phi= phi(t). If the phase angle is time dependent then it has a derivative that is not zero. dphi/dt is a frequency shift, so your instantaneous frequency becomes f0 + dphi/dt.

What FFT descriptors should be used as feature to implement classification or clustering algorithm?

I have some geographical trajectories sampled to analyze, and I calculated the histogram of data in spatial and temporal dimension, which yielded a time domain based feature for each spatial element. I want to perform a discrete FFT to transform the time domain based feature into frequency domain based feature (which I think maybe more robust), and then do some classification or clustering algorithms.
But I'm not sure using what descriptor as frequency domain based feature, since there are amplitude spectrum, power spectrum and phase spectrum of a signal and I've read some references but still got confused about the significance. And what distance (similarity) function should be used as measurement when performing learning algorithms on frequency domain based feature vector(Euclidean distance? Cosine distance? Gaussian function? Chi-kernel or something else?)
Hope someone give me a clue or some material that I can refer to, thanks~
Edit
Thanks to #DrKoch, I chose a spatial element with the largest L-1 norm and plotted its log power spectrum in python and it did show some prominent peaks, below is my code and the figure
import numpy as np
import matplotlib.pyplot as plt
sp = np.fft.fft(signal)
freq = np.fft.fftfreq(signal.shape[-1], d = 1.) # time sloth of histogram is 1 hour
plt.plot(freq, np.log10(np.abs(sp) ** 2))
plt.show()
And I have several trivial questions to ask to make sure I totally understand your suggestion:
In your second suggestion, you said "ignore all these values."
Do you mean the horizontal line represent the threshold and all values below it should be assigned to value zero?
"you may search for the two, three largest peaks and use their location and probably widths as 'Features' for further classification."
I'm a little bit confused about the meaning of "location" and "width", does "location" refer to the log value of power spectrum (y-axis) and "width" refer to the frequency (x-axis)? If so, how to combine them together as a feature vector and compare two feature vector of "a similar frequency and a similar widths" ?
Edit
I replaced np.fft.fft with np.fft.rfft to calculate the positive part and plot both power spectrum and log power spectrum.
code:
f, axarr = plt.subplot(2, sharex = True)
axarr[0].plot(freq, np.abs(sp) ** 2)
axarr[1].plot(freq, np.log10(np.abs(sp) ** 2))
plt.show()
figure:
Please correct me if I'm wrong:
I think I should keep the last four peaks in first figure with power = np.abs(sp) ** 2 and power[power < threshold] = 0 because the log power spectrum reduces the difference among each component. And then use the log spectrum of new power as feature vector to feed classifiers.
I also see some reference suggest applying a window function (e.g. Hamming window) before doing fft to avoid spectral leakage. My raw data is sampled every 5 ~ 15 seconds and I've applied a histogram on sampling time, is that method equivalent to apply a window function or I still need apply it on the histogram data?
Generally you should extract just a small number of "Features" out of the complete FFT spectrum.
First: Use the log power spec.
Complex numbers and Phase are useless in these circumstances, because they depend on where you start/stop your data acquisiton (among many other things)
Second: you will see a "Noise Level" e.g. most values are below a certain threshold, ignore all these values.
Third: If you are lucky, e.g. your data has some harmonic content (cycles, repetitions) you will see a few prominent Peaks.
If there are clear peaks, it is even easier to detect the noise: Everything between the peaks should be considered noise.
Now you may search for the two, three largest peaks and use their location and probably widths as "Features" for further classification.
Location is the x-value of the peak i.e. the 'frequency'. It says something how "fast" your cycles are in the input data.
If your cycles don't have constant frequency during the measuring intervall (or you use a window before caclculating the FFT), the peak will be broader than one bin. So this widths of the peak says something about the 'stability' of your cycles.
Based on this: Two patterns are similar if the biggest peaks of both hava a similar frequency and a similar widths, and so on.
EDIT
Very intersiting to see a logarithmic power spectrum of one of your examples.
Now its clear that your input contains a single harmonic (periodic, oscillating) component with a frequency (repetition rate, cycle-duration) of about f0=0.04.
(This is relative frquency, proprtional to the your sampling frequency, the inverse of the time beetween individual measurment points)
Its is not a pute sine-wave, but some "interesting" waveform. Such waveforms produce peaks at 1*f0, 2*f0, 3*f0 and so on.
(So using an FFT for further analysis turns out to be very good idea)
At this point you should produce spectra of several measurements and see what makes a similar measurement and how differ different measurements. What are the "important" features to distinguish your mesurements? Thinks to look out for:
Absolute amplitude: Height of the prominent (leftmost, highest) peaks.
Pitch (Main cycle rate, speed of changes): this is position of first peak, distance between consecutive peaks.
Exact Waveform: Relative amplitude of the first few peaks.
If your most important feature is absoulute amplitude, you're better off with calculating the RMS (root mean square) level of our input signal.
If pitch is important, you're better off with calculationg the ACF (auto-correlation function) of your input signal.
Don't focus on the leftmost peaks, these come from the high frequency components in your input and tend to vary as much as the noise floor.
Windows
For a high quality analyis it is importnat to apply a window to the input data before applying the FFT. This reduces the infulens of the "jump" between the end of your input vector ant the beginning of your input vector, because the FFT considers the input as a single cycle.
There are several popular windows which mark different choices of an unavoidable trade-off: Precision of a single peak vs. level of sidelobes:
You chose a "rectangular window" (equivalent to no window at all, just start/stop your measurement). This gives excellent precission of your peaks which now have a width of just one sample. Your sidelobes (the small peaks left and right of your main peaks) are at -21dB, very tolerable given your input data. In your case this is an excellent choice.
A Hanning window is a single cosine wave. It makes your peaks slightly broader but reduces side-lobe levels.
The Hammimg-Window (cosine-wave, slightly raised above 0.0) produces even broader peaks, but supresses side-lobes by -42 dB. This is a good choice if you expect further weak (but important) components between your main peaks or generally if you have complicated signals like speech, music and so on.
Edit: Scaling
Correct scaling of a spectrum is a complicated thing, because the values of the FFT lines depend on may things like sampling rate, lenght of FFT, window, and even implementation details of the FFT algorithm (there exist several different accepted conventions).
After all, the FFT should show the underlying conservation of energy. The RMS of the input signal should be the same as the RMS (Energy) of the spectrum.
On the other hand: if used for classification it is enough to maintain relative amplitudes. As long as the paramaters mentioned above do not change, the result can be used for classification without further scaling.

What is "energy" in image processing?

I've read across several Image Processing books and websites, but I'm still not sure the true definition of the term "energy" in Image Processing. I've found several definition, but sometimes they just don't match.
When we say "energy" in Image processing, what are we implying?
The energy is a measure the localized change of the image.
The energy gets a bunch of different names and a lot of different contexts but tends to refer to the same thing. It's the rate of change in the color/brightness/magnitude of the pixels over local areas. This is especially true for edges of the things inside the image and because of the nature of compression, these areas are the hardest to compress and therefore it's a solid guess that these are more important, they are often edges or quick gradients. These are the different contexts but they refer to the same thing.
The seam carving algorithm uses determinations of energy (uses gradient magnitude) to find the least noticed if removed. JPEG represents the local cluster of pixels relative to the energy of the first one. The Snake algorithm uses it to find the local contoured edge of a thing in the image. So there's a lot of different definitions but they all refer to the sort of oomph of the image. Whether that's the sum of the local pixels in terms of the square of absolute brightness or the hard bits to compress in a jpeg, or the edges in Canny Edge detection or the gradient magnitude:
The important bit is that energy is where the stuff is.
The energy of an image more broadly is the distances of some quality between the pixels of some locality.
We can take the sum of the LABdE2000 color distances within a properly weighted 2d gaussian kernel. Here the distances are summed together, the locality is defined by a gaussian kernel and the quality is color and the distance is LAB Delta formula from the year 2000 (Errata: previously this claimed E stood for Euclidean but the distance for standard delta E is Euclidean but the 94 and 00 formulas are not strictly Euclidean and the 'E' stands for Empfindung; German for "sensation"). We could also add up the local 3x3 kernel of the local difference in brightness, or square of brightness etc. We need to measure the localized change of the image.
In this example, local is defined as a 2d gaussian kernel and the color distance as LabDE2000 algorithm.
If you took an image and moved all the pixels and sorted them by color for some reason. You would reduce the energy of the image. You could take a collection of 50% black pixels and 50% white pixels and arrange them as random noise for maximal energy or put them as two sides of the image for minimum energy. Likewise, if you had 100% white pixels the energy would be 0 no matter how you arranged them.
It depends on the context, but in general, in Signal Processing, "energy" corresponds to the mean squared value of the signal (typically measured with respect to the global mean value). This concept is usually associated with the Parseval theorem, which allows us to think of the total energy as distributed along "frequencies" (and so one can say, for example, that a image has most of its energy concentrated in low frequencies).
Another -related- use is in image transforms: for example, the DCT transform (basis of the JPEG compression method) transforms a blocks of pixels (8x8 image) into a matrix of transformed coefficients; for typical images, it results that, while the original 8x8 image has its energy evenly distributed among the 64 pixels, the transformed image has its energy concentrated in the left-upper "pixels" (which, again, correspond to "low frequencies", in some analagous sense).
Energy is a fairly loose term used to describe any user defined function (in the image domain).
The motivation for using the term 'Energy' is that typical object detection/segmentation tasks are posed as a Energy minimization problem. We define an energy that would capture the solution we desire and perform gradient-descent to compute its lowest value, resulting in a solution for the image segmentation.
There is more than one definition of "energy" in image processing, so it depends on the context of where it was used.
Energy is used to describe a measure of "information" when formulating an operation under a probability framework such as MAP (maximum a priori) estimation in conjunction with Markov Random Fields. Sometimes the energy can be a negative measure to be minimised and sometimes it is a positive measure to be maximized.
If you consider that (for natural images captured by cameras) the light is an energy, you may call energy the value of the pixel on some channel.
However, I think that by energy the books are referring to the spectral density. From wikipedia:
The energy spectral density describes how the energy (or variance) of a signal or a time series is distributed with frequency
http://en.wikipedia.org/wiki/Spectral_density
Going back to my chemistry - Energy and Entropy are closely related terms. And Entropy and Randomness are also closely related. So in Image Processing, Energy might be similar to Randomness. For example, a picture of a plain wall has low energy, while the picture of a city taken from a helicopter might have high energy.
Image "energy" should be inversely proportional to Shannon entropy of image. But as already said image energy is loosely coupled term, it is better use "compressibility" term instead. That is - high image "energy" should correspond to high image compressibility.
http://lcni.uoregon.edu/~mark/Stat_mech/thermodynamic_entropy_and_information.html
Energy is like the "information present on the image". Compression of images cause energy-loss. I guess its something like that.
Energy is defined based on a normalized histogram of the image. Energy shows how the gray levels are distributed. When the number of gray levels is low then energy is high.
The Snake algorithm an image processing technique used to determine the contour of an object, the snake is nothing but a vector of (X,Y) points with some constraints, its final goal is to surround the object and describe its shape (contour) and then to track or represent the object by its shape.
The algorithm has two kinds of energies, internal and external.
Internal energy (the snake energy) (IE) is a user defined energy which acts on the snake (internally) to impose constraints on smoothness of the snake, without such a force, the snake shape will end up with the exact shape of the object, this is not desirable, because the exact shape of an object is very difficult to be obtained, due to light conditions, quality of image, noise, etc.
The external energy (EE) arises from the data (the image intensities), and it is nothing but the absolute difference of the intensities in the x and y directions (the intensity gradient) multiplied by -1, to be summed with internal energy, because the total energy must be minimized. so the total energy for all of the snake point should be minimized, Ideally, this come true when there are edges, because the gradient on the edge or (EE) is maximized, and since it is multiplied by -1, the total energy of the snake around the nearest object is minimized, and thus the algorithm converges to a solution, which is hopefully the true contour of the studied object.
because this algorithm relies on EE which is not only high on edges but also high on noisy points, sometimes the snake algorithm does not converge to an optimal solution, that why it is an approximate greedy algorithm.
I found this at Image Processing book;
Energy: S_N = sum (from b=0 to b=L-1) of abs(P(b))^2
P(b) = N(b) / M
where M represents the total number of pixels in a neighborhood window centered
about (j,k), and N(b) is the number of pixels of amplitude in the same window.
It may give us a better understand if we see this equation with entropy;
Entropy: S_E = - sum (from b=0 to b=L-1) of P(b)log2{P(b)}
source: Pp. 538~539 Digital Image Processing written by William K. Pratt (4th edition)
For my current imaging project, which is rendering a diffuse light source, I'd like to consider energy as light energy, or radiation energy. Question I had initially: does an RGB "pixel value" represent light energy ? It could be asserted using a light intensity meter and generating subsequent screens with gray pixel values (n,n,n) for 0..255. According to matlabs forum, the radiated energy of 1 greyscale pixel is always proportional to its pixel value, but pixel to pixel it will vary slightly.
There is another assumption regarding energy: while performing the forward ray tracing, I yield a ray count on each sampled position hit. This ray count is, or preferably should be, proportional to radiation energy that would hit the target at this position. In order to be able to compare it to actual photographs taken, I'd have to normalize the ray count to some pixel value range..(?) I enclose an example below, the energy source is a diffuse light emitter inside a dark cylinder.
Energy in signal processing is the integral of the signal square within signal boundaries. An analogy could be made that involves two dimensional signals and you can square pixel values and sum for all the pixels.
Image Energy is calculated through MATLAB using:
image_energy = graycoprops(i1, {'energy'})

Resources