I'm doing a pitch detection using a combination of an ACF and AMDF.
First I was using ACF in the time domain like this:
Get a buffer of 2048 samples
Window it (Hamming window)
sum=Sum(Buffer[i]*Buffer[i+lag]) for all i < 2048 - lag
acf = sum / 2048
And repeat the last 2 steps for all lags to be considered. (actually doing interpolation for non-integer lags)
Now I found that you can use FFT to calculate the ACF:
Get a buffer of 2048 samples
Window it (Hamming window)
fftBuf=fft(buffer)
buffer[i]=real(fftBuf[i])^2+imag(fftBuf[i])^2
fftBuf=fft(buffer) //ifft=fft for real signals
acfBuf = real(fftBuf) / 2048
Then actBuf[lag] is the ACF value at that lag.
I expected that the results will be the same or at least similar. But they are not.
E.g for a 65.4Hz Sine wave (note C2) I get ~0.2 for a the corresponding lag of 674.25 using the time-domain approach and ~536.795 using the fft.
What did I miss? Or isn't both the same?
Related
I have retrieved some signal in my Abaqus simulation for verification purpose. The true signal shall be a perfect sinusoid at 300kHz and I performed fft on the sampled signal using scipy.fftpack.fft.
But I got a strange spectrum as shown below (sorry that I am too lazy to scale the x-axis of the spectrum to the correct frequency). In the same figure, I sliced the signal into pieces and plotted in the time domain. I also repeated the same process for a pure sine wave.
This totally surprises me. As indicated below in the code, sampling frequency is 16.66x of the frequency of the signal. At the moment, I think it is due to the very little error in the sampling period. In theory, Abaqus shall sample it in a regular time interval. As you can see, there is some little error so that the dots in my signal appear to be thicker than the perfect signal. But does such a small error give a striking difference in the frequency spectrum? Otherwise, why is the frequency spectrum like that?
FYI1: This is the magnified fft spectrum of my signal:
FYI2: This is the python code that was used to produce the above figures
def myfft(x, k, label):
plt.plot(np.abs(fft(x))[0:k], label = label)
plt.legend()
plt.subplot(4,1,1)
for i in range(149800//200):
plt.plot(mysignal[200*i:200*(i+1)], 'bo')
plt.subplot(4,1,2)
myfft(mysignal,150000//2, 'fft of my signal')
plt.subplot(4,1,3)
[Fs,f, sample] = [5e6,300000, 150000]
x = np.arange(sample)
y = np.sin(2 * np.pi * f * x / Fs)
for i in range(149800//200):
plt.plot(y[200*i:200*(i+1)], 'bo')
plt.subplot(4,1,4)
myfft(y,150000//2, 'fft of a perfect signal')
plt.subplots_adjust(top = 2, right = 2)
FYI3: Here is my signal in .npy and .txt format. The signal is pretty long. It has 150001 points. The .txt one is the raw file from Abaqus. The .npy format is what I used to produce the above plot - (1) the time vector is removed and (2) the data is in half precision and normalized.
Any standard FFT algorithm you use operates on the assumption that the signal you provide is uniformly sampled. Uniform in this context means equally spaced in time. Your signal is clearly not uniformly sampled, therefore the FFT does not "see" a perfect sine but a distorted version. As a consequence you see all these additional spectral components the FFT computes to map your distorted signal to the frequency domain. You have two options now. Resample your signal i.e. it is uniformly sampled and use your off the shelf FFT or take a non-uniform FFT to get your spectrum. Here is one library you could use to calculate your non-uniform FFT.
What do the values of a FFT output means?
I'm using AudioKit's FFT algorithm (framework written for Swift) and when I fft the AudioNode (the microphone sound), it gives me a variable containing the fft data. It's a variable of 512 positions (0 to 511).
When I do it, it gives me veeeery small results, like 0.00004231 or even 2.41233e-7, sometimes 2.41233e-12. What do these values means?
What I think:
index 0: 0 - x Hz
1: x - 2x Hz
2: 2x- 3x Hz
::
::
and so on...
Each array has the Amplitude value of that position.
Am I right? If no, please explain me. It will help me a lot.
The Fourier Transform maps a signal from the time domain to the frequency domain. As such, each FFT sample measures that given frequency intensity in the original signal.
For instance, fft[2] indicates how strong frequency 2 hz is in the original signal. Keep in mind there might be some scaling in the fft array returned by AudioKit, so please check the actual frequency range covered by those 512 samples.
I've been experimenting with a few different techniques that I can find for a freq shifting (specifically I want to shift high freq signals to a lower freq). At the moment I'm trying to use this technique -
take the original signal, x(t), multiply it by: cos(2 PI dF t), sin(2
PI dF t)
R(t) = x(t) cos(2 PI dF t)
I(t) = x(t) sin(2 PI dF t)
where dF is the delta frequency to be shifted.
Now you have two time series signals: R(t) and I(t).
Conduct complex Fourier transform using R(t) as real and I(t) as
imaginary parts. The results will be frequency shifted spectrum.
I have interpreted this into the following code -
for(j=0;j<(BUFFERSIZE/2);j++)
{
Partfunc = (((double)j)/2048);
PreFFTShift[j+x] = PingData[j]*(cos(2*M_PI*Shift*(Partfunc)));
PreFFTShift[j+1+x] = PingData[j]*(sin(2*M_PI*Shift*(Partfunc)));
x++;
}
//INITIALIZE FFT
status = arm_cfft_radix4_init_f32(&S, fftSize, ifftFlag, doBitReverse);
//FFT on FFTData
arm_cfft_radix4_f32(&S, PreFFTShift);
This builds me an array with interleaved real and imag data and then FFT. I then inverse the FFT, but the output im getting is pretty garbled. Results seem huge in comparison to what I think they should be, and although there are a few traces of a freq shifted signal, its hard to tell as the result seems mostly pretty noisy.
I've also attempted simply revolving the array values of a standard FFT of my original signal to get a freq shift, but to no avail. Is there a better method for doing this?
have you tried something like:
Use a Hanning window for each framed data
Once you have your windowed frame of audio data, you do an FFT on it
Do some kind of transformation in the frequency domain (you can use
Flanagan - phase vocoder)
Now you need to go back to the time domain with an IFFT
Apply Hanning window in the IFFT data
Use overlap-add at each new frame of time-domain data into the output
stream
My results:
I created two concatenated sinusoids (250Hz and 400Hz) and move one octave UP!
Blue waveform is the original and red was changed, you can see one fadeIN-fadeOut caused by overlap add and hann window !
If you want the frequency shift to sound more "natural", you will have to maintain the ratios between all the initial frequency bins, where the amount of shift will depend on the FFT bin, thus requiring lots of interpolation. The Phase Vocoder algorithm will use multiple FFTs to reduce phase distortion in the result.
I've got a 4096 samples long 44.1 kHz audio-clip. After applying the FFT to it I get 4096 frequency bands.
Each band would then span 10.77 Hz (44100 / 4096).
I've been told the 2nd half of the frequencies is conjugate symmetric to the first half.
Considering this is my calculation above correct or did I miss something important?
That's pretty much correct - for most common complex-to-complex FFTs with purely real inputs (i.e. all imaginary parts zero) the first N/2 output bins (0..2047 in your case) are typically the only bins that you will be interested in. The first bin is DC (0 Hz), and bin N/2 corresponds to Nyquist (Fs/2 = 22.05 kHz), which is not normally of interest. Bins above N/2 are just complex conjugate "mirror images" of the bottom N/2-1 bins.
See this answer for more details.
I'm just doing a power spectral density analysis of a signal in time domain. I'm following the fft method described in :
http://www.mathworks.com/support/tech-notes/1700/1702.html
It gives the real physical unit for the PSD. However, the unit is "power", is that mean "V^2/Hz"?
If I take 10*log10(power) or 10*log10(V^2/Hz), do I get the unit of "dB/Hz"?
Then how can I convert it to dBm/MHz?
It depends on the unit of your timeseries. Often we think of this as just "amplitude", but if your timeseries is a series of voltage amplitude vs. time, then your PSD estimate will be Volts^2/Hz. This is because the PSD is the Fourier Transform of the autocorrelation of your original signal: The autocorrelation has units of Volts^2, and running it through the Fourier Transform decomposes these units over frequency, instead of time, resulting in units of Volts^2/Hz. This is commonly referred to as Watts/Hz, but the conversion from Volts^2 to Watts is not very physically meaningful, as W = V^2/R.
10*log10(power) will result in a unit of dB/Hz, but remember that decibels are always a comparison between two power levels; you are quantifying a ratio of powers. A better definition of decibels is 10*log10(P1/P0), as explained here. If you simply plug a PSD bin estimate into this equation, you are setting your PSD bin to P1 and implicitly comparing it to a P0 value of 1. This may be what you want, and it may not be. For visualization purposes, this is fairly typical, but if you have a standard reference power you should be comparing to, you should use that for P0 instead.
Assuming that you are attempting to plot a dB Power Spectral Density estimate, to convert from Hz to MHz, you simple rescale the x-axis of your frequency graph. Remember that a MHz is just 1 million Hz, so the only difference is that 240000Hz = 0.24MHz
EDIT
The point brought up by mtrw is a very valid one; if you are dealing with large amounts of data and are averaging FFT vectors, I highly suggest the Multitaper method; it's a much more statistically sound method of sacrificing frequency resolution for greater confidence on your PSD estimate.
If you have a PSD in W/Hz i.e. 100 W/Hz then you have 50 dBm/Hz. dB/Hz or is often vaguely and generically used instead of dBm/Hz. Audacity uses dB as shorthand for dBFS (not dBFS/Hz, because it is computing a DFT, and discrete frequencies use a power spectrum and not a density) . A digital signal that reaches 50% of the maximum level has an amplitude of −6 dBFS, which is 6 dB below full scale – the removal of the MSB, hence the 6dB/bit figure (because 50% of maximum level is 25% of maximum power; 1/4 = - 6dB)
dBm is the logarithmic ratio of the power with respect to 1mW, you divide the power by 1mW to get a unitless ratio, and then take the logarithm to get dB units, which in this case makes more sense to be clarified as dBm.
dBc/Hz is the ratio with respect to the carrier power, which is a ratio of two dBm/Hz values, meaning you subtract them and you get dBc/Hz; you get the same result if you divide the two linear power levels in W and then convert the ratio to dB (or more appropriately dBc).
dB-Hz is a logarithmic measure of bandwidth with respect to 1Hz and
dBJ is a measure of spectral density as a logarithmic ratio to 1 joule, seeing as W/Hz is indeed J.
Power spectral density is a density function, so you need to integrate it to get the actual quantity, like a line Integral of a V/m electric field, or a probability density of probability per x. This does not make sense for discrete quantities and instead the power spectrum is used akin to a probability mass function. If you see dB (which should be used for the discrete frequency domain) instead of dBm/Hz then it's wrong, but if you see it instead of dBm then it's right, as long as it's made clear what the reference is.