Is a "rolling" FFT possible and could it be of use? - signal-processing

Lately I have been experimenting with audio and FFTs, specifically the Minim library in Processing (basically Java, not that its particularly important for this question). What I have come to understand is that with a buffer/sample size N and sample rate K, after performing a forward FFT, I will get N frequency bins (only N/2 usable data and in fact Minim only returns N/2 bins) linearly spaced representing the spectrum from 0 to K/2 HZ.
With Minim (as well as other typical FFT implementations) you wait to gather N samples, and then perform the forward transformation, then wait for N more samples, and so on. In order to get a reasonable frame-rate (for audio visualizations, beat detection, etc.), I must use a small sample size relative to the sampling frequency.
The problem with this, though, is that a small sample size results in a very low resolution for the low end of the spectrum when I compute logarithmically spaced averages (Since a bass octave is much narrower than a high pitched octave).
I was wondering if a possible way to squeeze more apparent resolution would be to perform FFTs more often than every N samples on a slightly larger sample size than I am currently using. (I.E. with input buffer of size 2048, every 100 samples, add those samples to the input buffer and remove the oldest 100 samples, and perform a FFT). It seems like this would possibly create a rolling-average type of affect (which I can live with) but I'm not too sure.
What would be the pros and cons of this approach? Are there any other ways I could increase my apparent resolution while still being able to do real-time visualization and analysis?

That approach goes by the name Short-time Fourier transform. You get all the answers to your question on wikipedia: https://en.wikipedia.org/wiki/Short-time_Fourier_transform
It works great in practice and you can even get better resolution out of it compared to what you would expect from a rolling window by using the phase difference between the fft's.
Here is one article that does pitch shifting of audio signals. The way how to get higher frequency resolution is well explained: http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/

We use the approach you describe, which we call overlapping, to make sure all the rows of a spectral waterfall are filled in. Overlap can be used to provide spectra that are spaced as closely as a single sample interval.
The primary disadvantage is the extra processing to produce all those spectra.
On the positive side, while the time resolution of each spectra is still constrained by FFT size, looking at closely spaced adjacent spectra seems to provide a kind of a visual interpolation that, I think, lets you see the data with higher precision.

One common way this is done is to use multiple lengths of windowed FFTs on the same data, short FFTs for good time resolution, much longer FFTs for better frequency resolution of lower frequencies. Then the problem for visualization becomes picking the best FFT result out of several possible at each plot point (such as the highest contrast sub-block, etc.) and blending them attractively.
Most modern processors (in PCs and mobile phones, etc.) can easily do multiple lengths (dozens) of FFTs still in real-time for audio.

Related

Optimal value of sampling frequency for guitar notes detection

I am running FFT algorithm to detect the music note played on a guitar.
The frequencies that I am interested are in the range 65.41Hz (C2) to 1864.7Hz (A#6).
If I set the sampling frequency of the input to 16KHz, the output of FFT would yield N points from 0Hz to 16KHz linearly. All the input I am interested would be in the first N/8 points approximately. The other N*7/8 points are of no use to me. They actually are decreasing my resolution.
From Nyquist's theory (https://en.wikipedia.org/wiki/Nyquist_frequency), the sampling frequency that is needed is just twice the maximum frequency one desires. In my case, this would be about 4KHz.
Is 4KHz really the ideal sampling frequency for a guitar tuning app?
Intuitively, one would feel a better sampling frequency would give you more accurate results. However, in this case, it seems having a lesser sampling frequency is better for improving the resolution. Regards.
You are confusing the pitch of a guitar note with spectral frequency. A guitar generates lots of overtones and harmonics at a much higher frequency than the pitch of a played note. Those higher harmonics and overtones, more than the possibly weak fundamental frequency in some cases, is what the human ear hears and interprets as the lower perceived pitch.
Any of the overtones and harmonics around or above 2 kHz that are not completely low pass filtered out before sampling at 4 kHz will cause aliasing and thus corruption of your sampled data and its spectrum.
If you want to create an accurate tuner, use a pitch estimation algorithm, not an FFT peak frequency bin estimator. And depending on which pitch estimation method you choose, a higher density of samples per unit time might allow finer accuracy or greater reliability under background noise or more prompt responsiveness.
Is 4KHz really the ideal sampling frequency for a guitar tuning app?
You've been mis-reading Nyquist's theorem if you ask it like that.
States that every sampling frequency above twice your maximum signal frequency will allow you to perfectly reconstruct your original signal. So there's no "ideal" frequency. Just a set of frequencies that are sufficient. What is ideal hence depends on a lot of other things: mainly, what your digitizer really supports (hint: most sound cards can do 44.1kHz, but not 4kHz), what kind of margin you want to have for filters etc to work on, and what kind of processing power you can spend (hint: modern smart phones, PCs and even pocket calculators don't really have a hard time processing a couple hundred kHz in real time).
Also note that #hotpaw2 is right, the harmonics are important, and are multiples of the base tone frequency.
However, in this case, it seems having a lesser sampling frequency is better for improving the resolution.
no. No matter where that comes from, it's wrong. Information theory's first and foremost result is that based upon more information, you can't make worse estimates. An oversampled signal is simply more information on the same signal.
Yes, if all you are interested in is frequencies up to 2 kHz then you only need a sampling frequency of 4 kHz. This should include an anti-aliasing filter in front of the ADC or any downconverter to prevent any higher frequency components from aliasing into a lower frequency.
If all you are interested in is specific frequencies (one or two) then you may want to look at the Goertzel algorithm which is more efficient than an FFT for a single frequency. Also, the chirp-Z transform can be used to effectively get a zoomed FFT (resulting in a higher resolution over a smaller bandwidth without the computational complexity of an FFT with the same resolution). You may want to check out this CZT tutorial

How to decorrelate accelerometer data

Is it possible to decorrelate accelerometer data in real-time? If so, how is it done?
Background:
My application is receiving (X,Y,Z) accelerometer data in real-time (sample rate is 6.75Hz). The sensor is moving in a periodic motion but the motion is not necessarily along only one axis. The 3 signals x(t), y(t) and z(t) are therefore slightly correlated and I would like to know if I can find a rotation matrix (in real time) which can be used to rotate the measured (x,y,z) into a new vector (x*,y*,z*) so that the entire motion is along the z-axis?
I would like to implement the algorithm in C.
Thanks.
What you're trying to do is generally called "principal component analysis". The Wikipedia article is pretty good:
https://en.wikipedia.org/wiki/Principal_component_analysis
For static data you generally use the eigenvectors of the covariance matrix as your new coordinate basis.
PCA in real time is doable, but not super easy. See, for example: http://www.bio-conferences.org/articles/bioconf/pdf/2011/01/bioconf_skills_00055.pdf
I'd like to first of all emphasize that Matt Timmermans' answer has done exactly what people are actually doing when classifying accelerometer data from clinical studies (a project I worked on).
Then: you're observing a sampled signal. In general, if you have a sensor that gives you samples at a rate of 6.75Hz, the highest frequency of a signal you can detect is 6.75Hz/2 = 3.375Hz. Everything that has a frequency higher than that will inherently be aliased back and look like it was something with a frequency f with 0<=f<3.375Hz. If you've not considered this, please go and read up on the Nyquist–Shannon sampling theorem. Especially: shield your sensors (however you do that, e.g. by employing dampeners) from all input above that limit, otherwise your measurements might be worth very little or even nothing. If your sensor does this internally (that's absolutely possible, there are enough accelerometers with analog low pass filters), this has been taken care of. However, document that characteristics of your sensor.
Now, your case is a little bit easier because you know pretty well that your whole observation is going to be periodic, and it's measured along three orthogonal axis.
In this case, just doing three discrete Fourier transforms at once, extracting the "strongest" spectral component over all three channels, and finding the phase of that spectral component (which is but the complex argument of that DFT bin) in the two others would give you something that you can map to a periodic movement around a specific axis in 3D space. If you want to, remove these value (set the bins to 0), and search for strongest component again etc.
Discrete cosine transforms can be done in staggering speed nowadays. with 6.75Hz, no PC in this world will ever get into trouble when you try this while you receive further samples. It's a hilariously low sampling rate.
Another, more elegant (read: you need less samples to compute this) would be using a parametric estimator; in your case, a direction-of-arrival sensor from the world of RF technology with multiple antennas would, as far as I can think, map directly to detection of rotational axis. The classical algorithms here are MUSIC and ESPRIT, and for your case (limited, known amount of oscillating parts), ESPRIT might be the better choice.

STFT/FFT work flow order

I am trying to implement FFT, and I am OK with the code etc, but the general order of things is confusing me.
Am I right in thinking that this is the correct order of things to do?
Input -> Overlap input -> Windowing -> FFT -> Phase calculations/Overlap compensation -> Output
I'm getting results close to my input frequency, but they are consistently off by some factor that I can't work out, i.e. 440Hz is always 407Hz, 430Hz is always 420Hz.
The main bit that is confusing me is the initial overlap, as I have been looking at some open source FFT code and that is the part that I can never quite work out whats going on. I seem to be getting the idea from looking at those that overlapping is supposed to happen before windowing, but to me logically, wouldn't that mess with the windowing?
Any advice would be great
Thanks
The FFT is a discrete version of the continuous Fourier Transform.
The FFT produces a 1D vector of complex numbers. This complex vector is often used to calculate a 2D matrix of Frequency Magnitude versus Frequency, and represented as a 2D graph, like this one:
A single FFT is used when you want to understand the frequency spectrum of a signal. For example, from the above FFT graph we can say that most of the energy in this female soprano's G5 note is concentrated in the 784 Hz and 1572 Hz frequencies.
STFT or "Short-Time Fourier Transform" uses a sliding-frame FFT to produce a 2D matrix of Frequency versus Time, often represented as a graph called a Spectrogram, like this one:
The STFT is used when you want to know at what time a particular frequency event occurs in the signal. For example, from the above graph we can say that a large portion of the energy in this vocal phrase occurred between 0.05 and 0.15 seconds, in the frequency range of 100 Hz to 1500 Hz.
The workflow for the FFT is:
Sample the signal -> Window the entire sample frame -> FFT -> Calculate magnitude and phase -> Output something, usually a 2D graph
If your time-domain data is available in text form and if you can post it here, we can try to help you analyze it, or you can analyze it yourself with this online FFT: Sooeet FFT calculator
If you use window for FFT, your computation will be a kind of STFT.
There are some prepared codes of STFT like 'Spectrogram' etc.
To write the code by FFT, the overlapping is inevitable,but you can use some optimization methods to minimize ghost effects.Also, the practical way for windowing may be choosing the window's bandwidth according to frequency extension. It is clear that in high frequency data's you need to select small windows which is so time consuming.
I am not good enough in Matlab to write this code adhesively:)
Good Luck

FFT for n Points (non power of 2 )

I need to know a way to make FFT (DFT) work with just n points, where n is not a power of 2.
I want to analyze an modify the sound spectrum, in particular of Wave-Files, which have in common 44100 sampling points. But my FFT does not work, it only works with points which are in shape like 2^n.
So what can I do? Beside fill up the vector with zeros to the next power of 2 ?!
Any way to modify the FFT algorithm?
Thanks!
You can use the FFTW library or the code generators of the Spiral project. They implement FFT for numbers with small prime factors, break down large prime factors p by reducing it to a FFT of size (p-1) which is even, etc.
However, just for signal analysis it is questionable why you want to analyze exactly one second of sound and not smaller units. Also, you may want to use a windowing procedure to avoid the jumps at the ends of the segment.
Aside from padding the array as you suggest, or using some other library function, you can construct a Fourier transform with arbitrary length and spacing in the frequency domain (also for non-integer sample spacings).
This is a well know result and is based on the Chirp-z transform (or Bluestein's FFT). Another good reference is given by Rabiner and can be found at the above link.
In summary, with this approach you don't have to write the FFT yourself, you can simply use an existing high-performance FFT and then apply the convolution theorem to a suitably scaled and conditioned version of your signal.
The performance will still be, O(n*log n), multiplied by some implementation-dependent scaling factor.
The FFT is just a faster method of computing the DFT for certain length vectors; and a DFT can be computed for any length of input vector. You can also zero-pad your input vector to a length supported by your FFT library, which may be faster.
If you want to modify your sound file, you may need to use the overlap-add or overlap-save fast convolution filtering after determining the length of the impulse response of your frequency domain modification.

How to select frequencies from DFT

Assume a sequence of numbers (wave-like data). I perform then the DFT (or FFT) transform. Next step I want to achieve is to find the frequencies, that correspond to the real frequencies that are included in data. As we know, DFT output has real and imaginary part a[i] and b[i]. If we look at spectrum (sqrt(a[i]^2+b[i]^2) then the maximum in it corresponds to the frequency that is included to the data. The question is how to find all frequencies from DFT? The problem arises when there are many other peaks that can be falsely selected.
I had a similar problem when doing spectral analysis processing of data when I was writing my honours thesis.
You are right: To find dominant frequencies you generally only need to look at the magnitude of the complex value in the DFT.
Unfortunately, you pretty much have to write some sort of intelligent algorithm which will identify the peaks (frequencies). The way the algorithm works is highly dependent on what the DFT looks like for your application. My DFTs all had similar characteristics, so it wasn't too difficult to put together a heuristic algorithm. If your DFT can take on any form, then you will probably get a lot of false positives and/or false negatives.
The way I did it was to identify regions in the DFT with high magnitude (peaks) which were surrounded by low magnitude (troughs). You can define the minimum difference between peaks and troughs (the sensitivity) as a constant times the standard deviation of the data. Additionally, you can say that any peaks that fall below a certain magnitude (threshold) are ignored altogether, as they are just noise.
Of course, the above technique will only really work if you have relatively well defined frequencies in your data. If your DFT is highly random, then you will need to take extra care to set the sensitivity and threshold carefully.
Don't forget that the magnitude of your data is symmetric, so you only need to look at half of it.
Once you have identified the frequencies in your DFT, don't forget to convert it into the units you want. From memory, if you have n samples taken with time discretisation dt, then if you have a peak at data point 5 (for example), where the first data point is 1, then the frequency is 1/(n*dt) radians per time unit. (I haven't done this in a while, so that formula might be off by a factor of Pi or something)

Resources