I have an accelerometer data i.e. AccX, AccY and AccZ.
I am looking for an algorithm to compute Power Spectral Density from this data. I know the following:
F = fft (s);
where "s" is the input signal and fft is fast fourier transform.
PSD = (1/length(s)) * F * conj(F);
I need to know that whether this s should be acceleration-time series or position-time series?

It depends on what you are interested in. If you want the power spectral density of the acceleration time-series, then s must be the acceleration time-series itself and not the position time-series.
Note however that the estimate of the PSD based on that simple algorithm you wrote (called "periodogram") may be insufficient in many cases, to get a realistic estimate of the real PSD.
The topic is huge, and literature extensive. You can start from Wikipedia or, if you want a suggestion on a good (but rather tough) book, Percival and Walden. To provide more detailed info, one would need to know much more in detail what you have to do from a physical point of view.


How to decorrelate accelerometer data

Is it possible to decorrelate accelerometer data in real-time? If so, how is it done?
My application is receiving (X,Y,Z) accelerometer data in real-time (sample rate is 6.75Hz). The sensor is moving in a periodic motion but the motion is not necessarily along only one axis. The 3 signals x(t), y(t) and z(t) are therefore slightly correlated and I would like to know if I can find a rotation matrix (in real time) which can be used to rotate the measured (x,y,z) into a new vector (x*,y*,z*) so that the entire motion is along the z-axis?
I would like to implement the algorithm in C.
What you're trying to do is generally called "principal component analysis". The Wikipedia article is pretty good:
For static data you generally use the eigenvectors of the covariance matrix as your new coordinate basis.
PCA in real time is doable, but not super easy. See, for example: http://www.bio-conferences.org/articles/bioconf/pdf/2011/01/bioconf_skills_00055.pdf
I'd like to first of all emphasize that Matt Timmermans' answer has done exactly what people are actually doing when classifying accelerometer data from clinical studies (a project I worked on).
Then: you're observing a sampled signal. In general, if you have a sensor that gives you samples at a rate of 6.75Hz, the highest frequency of a signal you can detect is 6.75Hz/2 = 3.375Hz. Everything that has a frequency higher than that will inherently be aliased back and look like it was something with a frequency f with 0<=f<3.375Hz. If you've not considered this, please go and read up on the Nyquist–Shannon sampling theorem. Especially: shield your sensors (however you do that, e.g. by employing dampeners) from all input above that limit, otherwise your measurements might be worth very little or even nothing. If your sensor does this internally (that's absolutely possible, there are enough accelerometers with analog low pass filters), this has been taken care of. However, document that characteristics of your sensor.
Now, your case is a little bit easier because you know pretty well that your whole observation is going to be periodic, and it's measured along three orthogonal axis.
In this case, just doing three discrete Fourier transforms at once, extracting the "strongest" spectral component over all three channels, and finding the phase of that spectral component (which is but the complex argument of that DFT bin) in the two others would give you something that you can map to a periodic movement around a specific axis in 3D space. If you want to, remove these value (set the bins to 0), and search for strongest component again etc.
Discrete cosine transforms can be done in staggering speed nowadays. with 6.75Hz, no PC in this world will ever get into trouble when you try this while you receive further samples. It's a hilariously low sampling rate.
Another, more elegant (read: you need less samples to compute this) would be using a parametric estimator; in your case, a direction-of-arrival sensor from the world of RF technology with multiple antennas would, as far as I can think, map directly to detection of rotational axis. The classical algorithms here are MUSIC and ESPRIT, and for your case (limited, known amount of oscillating parts), ESPRIT might be the better choice.

What FFT descriptors should be used as feature to implement classification or clustering algorithm?

I have some geographical trajectories sampled to analyze, and I calculated the histogram of data in spatial and temporal dimension, which yielded a time domain based feature for each spatial element. I want to perform a discrete FFT to transform the time domain based feature into frequency domain based feature (which I think maybe more robust), and then do some classification or clustering algorithms.
But I'm not sure using what descriptor as frequency domain based feature, since there are amplitude spectrum, power spectrum and phase spectrum of a signal and I've read some references but still got confused about the significance. And what distance (similarity) function should be used as measurement when performing learning algorithms on frequency domain based feature vector(Euclidean distance? Cosine distance? Gaussian function? Chi-kernel or something else?)
Hope someone give me a clue or some material that I can refer to, thanks~
Thanks to #DrKoch, I chose a spatial element with the largest L-1 norm and plotted its log power spectrum in python and it did show some prominent peaks, below is my code and the figure
import numpy as np
import matplotlib.pyplot as plt
sp = np.fft.fft(signal)
freq = np.fft.fftfreq(signal.shape[-1], d = 1.) # time sloth of histogram is 1 hour
plt.plot(freq, np.log10(np.abs(sp) ** 2))
And I have several trivial questions to ask to make sure I totally understand your suggestion:
In your second suggestion, you said "ignore all these values."
Do you mean the horizontal line represent the threshold and all values below it should be assigned to value zero?
"you may search for the two, three largest peaks and use their location and probably widths as 'Features' for further classification."
I'm a little bit confused about the meaning of "location" and "width", does "location" refer to the log value of power spectrum (y-axis) and "width" refer to the frequency (x-axis)? If so, how to combine them together as a feature vector and compare two feature vector of "a similar frequency and a similar widths" ?
I replaced np.fft.fft with np.fft.rfft to calculate the positive part and plot both power spectrum and log power spectrum.
f, axarr = plt.subplot(2, sharex = True)
axarr[0].plot(freq, np.abs(sp) ** 2)
axarr[1].plot(freq, np.log10(np.abs(sp) ** 2))
Please correct me if I'm wrong:
I think I should keep the last four peaks in first figure with power = np.abs(sp) ** 2 and power[power < threshold] = 0 because the log power spectrum reduces the difference among each component. And then use the log spectrum of new power as feature vector to feed classifiers.
I also see some reference suggest applying a window function (e.g. Hamming window) before doing fft to avoid spectral leakage. My raw data is sampled every 5 ~ 15 seconds and I've applied a histogram on sampling time, is that method equivalent to apply a window function or I still need apply it on the histogram data?
Generally you should extract just a small number of "Features" out of the complete FFT spectrum.
First: Use the log power spec.
Complex numbers and Phase are useless in these circumstances, because they depend on where you start/stop your data acquisiton (among many other things)
Second: you will see a "Noise Level" e.g. most values are below a certain threshold, ignore all these values.
Third: If you are lucky, e.g. your data has some harmonic content (cycles, repetitions) you will see a few prominent Peaks.
If there are clear peaks, it is even easier to detect the noise: Everything between the peaks should be considered noise.
Now you may search for the two, three largest peaks and use their location and probably widths as "Features" for further classification.
Location is the x-value of the peak i.e. the 'frequency'. It says something how "fast" your cycles are in the input data.
If your cycles don't have constant frequency during the measuring intervall (or you use a window before caclculating the FFT), the peak will be broader than one bin. So this widths of the peak says something about the 'stability' of your cycles.
Based on this: Two patterns are similar if the biggest peaks of both hava a similar frequency and a similar widths, and so on.
Very intersiting to see a logarithmic power spectrum of one of your examples.
Now its clear that your input contains a single harmonic (periodic, oscillating) component with a frequency (repetition rate, cycle-duration) of about f0=0.04.
(This is relative frquency, proprtional to the your sampling frequency, the inverse of the time beetween individual measurment points)
Its is not a pute sine-wave, but some "interesting" waveform. Such waveforms produce peaks at 1*f0, 2*f0, 3*f0 and so on.
(So using an FFT for further analysis turns out to be very good idea)
At this point you should produce spectra of several measurements and see what makes a similar measurement and how differ different measurements. What are the "important" features to distinguish your mesurements? Thinks to look out for:
Absolute amplitude: Height of the prominent (leftmost, highest) peaks.
Pitch (Main cycle rate, speed of changes): this is position of first peak, distance between consecutive peaks.
Exact Waveform: Relative amplitude of the first few peaks.
If your most important feature is absoulute amplitude, you're better off with calculating the RMS (root mean square) level of our input signal.
If pitch is important, you're better off with calculationg the ACF (auto-correlation function) of your input signal.
Don't focus on the leftmost peaks, these come from the high frequency components in your input and tend to vary as much as the noise floor.
For a high quality analyis it is importnat to apply a window to the input data before applying the FFT. This reduces the infulens of the "jump" between the end of your input vector ant the beginning of your input vector, because the FFT considers the input as a single cycle.
There are several popular windows which mark different choices of an unavoidable trade-off: Precision of a single peak vs. level of sidelobes:
You chose a "rectangular window" (equivalent to no window at all, just start/stop your measurement). This gives excellent precission of your peaks which now have a width of just one sample. Your sidelobes (the small peaks left and right of your main peaks) are at -21dB, very tolerable given your input data. In your case this is an excellent choice.
A Hanning window is a single cosine wave. It makes your peaks slightly broader but reduces side-lobe levels.
The Hammimg-Window (cosine-wave, slightly raised above 0.0) produces even broader peaks, but supresses side-lobes by -42 dB. This is a good choice if you expect further weak (but important) components between your main peaks or generally if you have complicated signals like speech, music and so on.
Edit: Scaling
Correct scaling of a spectrum is a complicated thing, because the values of the FFT lines depend on may things like sampling rate, lenght of FFT, window, and even implementation details of the FFT algorithm (there exist several different accepted conventions).
After all, the FFT should show the underlying conservation of energy. The RMS of the input signal should be the same as the RMS (Energy) of the spectrum.
On the other hand: if used for classification it is enough to maintain relative amplitudes. As long as the paramaters mentioned above do not change, the result can be used for classification without further scaling.

STFT/FFT work flow order

I am trying to implement FFT, and I am OK with the code etc, but the general order of things is confusing me.
Am I right in thinking that this is the correct order of things to do?
Input -> Overlap input -> Windowing -> FFT -> Phase calculations/Overlap compensation -> Output
I'm getting results close to my input frequency, but they are consistently off by some factor that I can't work out, i.e. 440Hz is always 407Hz, 430Hz is always 420Hz.
The main bit that is confusing me is the initial overlap, as I have been looking at some open source FFT code and that is the part that I can never quite work out whats going on. I seem to be getting the idea from looking at those that overlapping is supposed to happen before windowing, but to me logically, wouldn't that mess with the windowing?
Any advice would be great
The FFT is a discrete version of the continuous Fourier Transform.
The FFT produces a 1D vector of complex numbers. This complex vector is often used to calculate a 2D matrix of Frequency Magnitude versus Frequency, and represented as a 2D graph, like this one:
A single FFT is used when you want to understand the frequency spectrum of a signal. For example, from the above FFT graph we can say that most of the energy in this female soprano's G5 note is concentrated in the 784 Hz and 1572 Hz frequencies.
STFT or "Short-Time Fourier Transform" uses a sliding-frame FFT to produce a 2D matrix of Frequency versus Time, often represented as a graph called a Spectrogram, like this one:
The STFT is used when you want to know at what time a particular frequency event occurs in the signal. For example, from the above graph we can say that a large portion of the energy in this vocal phrase occurred between 0.05 and 0.15 seconds, in the frequency range of 100 Hz to 1500 Hz.
The workflow for the FFT is:
Sample the signal -> Window the entire sample frame -> FFT -> Calculate magnitude and phase -> Output something, usually a 2D graph
If your time-domain data is available in text form and if you can post it here, we can try to help you analyze it, or you can analyze it yourself with this online FFT: Sooeet FFT calculator
If you use window for FFT, your computation will be a kind of STFT.
There are some prepared codes of STFT like 'Spectrogram' etc.
To write the code by FFT, the overlapping is inevitable,but you can use some optimization methods to minimize ghost effects.Also, the practical way for windowing may be choosing the window's bandwidth according to frequency extension. It is clear that in high frequency data's you need to select small windows which is so time consuming.
I am not good enough in Matlab to write this code adhesively:)
Good Luck

How to select frequencies from DFT

Assume a sequence of numbers (wave-like data). I perform then the DFT (or FFT) transform. Next step I want to achieve is to find the frequencies, that correspond to the real frequencies that are included in data. As we know, DFT output has real and imaginary part a[i] and b[i]. If we look at spectrum (sqrt(a[i]^2+b[i]^2) then the maximum in it corresponds to the frequency that is included to the data. The question is how to find all frequencies from DFT? The problem arises when there are many other peaks that can be falsely selected.
I had a similar problem when doing spectral analysis processing of data when I was writing my honours thesis.
You are right: To find dominant frequencies you generally only need to look at the magnitude of the complex value in the DFT.
Unfortunately, you pretty much have to write some sort of intelligent algorithm which will identify the peaks (frequencies). The way the algorithm works is highly dependent on what the DFT looks like for your application. My DFTs all had similar characteristics, so it wasn't too difficult to put together a heuristic algorithm. If your DFT can take on any form, then you will probably get a lot of false positives and/or false negatives.
The way I did it was to identify regions in the DFT with high magnitude (peaks) which were surrounded by low magnitude (troughs). You can define the minimum difference between peaks and troughs (the sensitivity) as a constant times the standard deviation of the data. Additionally, you can say that any peaks that fall below a certain magnitude (threshold) are ignored altogether, as they are just noise.
Of course, the above technique will only really work if you have relatively well defined frequencies in your data. If your DFT is highly random, then you will need to take extra care to set the sensitivity and threshold carefully.
Don't forget that the magnitude of your data is symmetric, so you only need to look at half of it.
Once you have identified the frequencies in your DFT, don't forget to convert it into the units you want. From memory, if you have n samples taken with time discretisation dt, then if you have a peak at data point 5 (for example), where the first data point is 1, then the frequency is 1/(n*dt) radians per time unit. (I haven't done this in a while, so that formula might be off by a factor of Pi or something)

What is the Hamming window for?

I'm working with some code that does a Fourier transform (to calculate the cepstrum of an audio sample). Before it computes the Fourier transform, it applies a Hamming window to the sample:
for(int i = 0; i < SEGMENTATION_LENGTH;i++){
timeDomain[i] = (float) (( 0.53836 - ( 0.46164 * Math.cos( TWOPI * (double)i / (double)( SEGMENTATION_LENGTH - 1 ) ) ) ) * frameBuffer[i]);
Why is it doing this? I can't find any reason for it to do this in the code, or online.
This is an old question, but I thought the answer could be improved.
Imagine the signal you want to fourier transform is a pure sine wave. In the frequency domain, you would expect it to have a sharp spike only at the frequency of the sine. However if you took the fourier transform, your nice sharp spike would be replaced by something like this:
Why is that? Real sine waves extend to infinity in both directions. Computers can't do computations with an infinite number of data points, so all signals are "cut off" at either end. This causes the ripple on either side of the peak that you see. The hamming window reduces this ripple, giving you a more accurate idea of the original signal's frequency spectrum.
More theory, for the interested: when you cut your signal off at either end, you are implicitly multiplying your signal by a square window. The fourier transform of a square window is the image above, known as a sinc function. Whenever you do a fourier transform on a computer, like it or not, you're always choosing some window. The square window is the implicit default, but not a very good choice. There are a variety of windows that people have come up with, depending on certain characteristics you want to optimize. The hamming window is one of the standard ones.
Whenever you do a finite Fourier transform, you're implicitly applying it to an infinitely repeating signal. So, for instance, if the start and end of your finite sample don't match then that will look just like a discontinuity in the signal, and show up as lots of high-frequency nonsense in the Fourier transform, which you don't really want. And if your sample happens to be a beautiful sinusoid but an integer number of periods don't happen to fit exactly into the finite sample, your FT will show appreciable energy in all sorts of places nowhere near the real frequency. You don't want any of that.
Windowing the data makes sure that the ends match up while keeping everything reasonably smooth; this greatly reduces the sort of "spectral leakage" described in the previous paragraph.
With what I know about sound and quick research, it appears that Hamming Window is here to minimize the signal side lobe (unwanted radiation). Thus improving the quality or harmonics of the sound.
I also understand this type of window function fits good with DTFT.
You will find some good technical explanation on a stanford researcher page or wikipedia and also in a paper of Harris if you are ready for maths :D.
The FT of a finite length segment of sinusoid convolves the Fourier transform of the window against the sinusoid's frequency peak, since a property of the FFT is that vector multiplication in one domain is convolution in the other. The FT of a rectangular window (which is what any unmodified finite length of samples in an FFT implies) is the messy looking Sinc function which splatters any signal that is not exactly periodic in the window over the entire frequency spectrum.
The FT of a Hamming shaped window concentrates this "splatter" much nearer to the frequency peak after the convolution (than a Sinc function), resulting in a fatter but smoother frequency peak, but a lot less splatter across frequencies far from the frequency peak. This results in not only a cleaner looking spectrum, but also less interference from far away frequencies on any signal of interest.
This interpretation (as opposed to the "infinitely repeating" interpretation) makes it more clear why differently shaped windows than Hamming may give you better results with even less "leakage". In particular, a Hamming window will reduce the size of the first Sinc side lobe of "leakage" right next to the frequency peak in exchange for actually more "leakage" (or convolution splatter) far from the frequency of interest. Other windows may be more appropriate if you wish a different trade-off. The Harris paper (pdf here) linked in another answer above gives several examples of these different windows.
