I'm using Apples vDSP APIs to calculate the FFT of audio. However, my results (in amp[]) aren't symmetrical around N/2, which they should be, from my understanding of FFTs on real inputs?
In the below frame is an array[128] of floats containing the audio samples.
int numSamples = 128;
vDSP_Length log2n = log2f(numSamples);
FFTSetup fftSetup = vDSP_create_fftsetup(log2n, FFT_RADIX2);
int nOver2 = numSamples/2;
COMPLEX_SPLIT A;
A.realp = (float *) malloc(nOver2*sizeof(float));
A.imagp = (float *) malloc(nOver2*sizeof(float));
vDSP_ctoz((COMPLEX*)frame, 2, &A, 1, nOver2);
//Perform FFT using fftSetup and A
//Results are returned in A
vDSP_fft_zrip(fftSetup, &A, 1, log2n, FFT_FORWARD);
//Convert COMPLEX_SPLIT A result to float array to be returned
float amp[numSamples];
amp[0] = A.realp[0]/(numSamples*2);
for(int i=1;i<numSamples;i++) {
amp[i]=A.realp[i]*A.realp[i]+A.imagp[i]*A.imagp[i];
printf("%f ",amp[i]);
}
If I put the same float array into an online FFT calculator I do get a symmetrical output. Am I doing something wrong above?
For some reason, most values in amp[] are around 0 to 1e-5, but I also get one huge value of about 1e23. I'm not doing any windowing here, just trying to get a basic FFT working initially.
I've attached a picture of the two FFT outputs, using the same data. You can see they are similar upto 64, although not by a constant scaling factor, so I'm not sure what they are different by. Then over 64 they are completely different.
Because the mathematical output of a real-to-complex FFT is symmetrical, there is no value in returning the second half. There is also no space for it in the array that is passed to vDSP_fft_zrip. So vDSP_fft_zrip returns only the first half (except for the special N/2 point, discussed below). The second half is usually not needed explicitly and, if it is, you can compute it easily from the first half.
The output of vDSP_fft_zrip when used for a forward (real to complex) transformation has the H0 output (which is purely real; its imaginary part is zero) in A.realp[0]. The HN/2 output (which is also purely real) is stored in A.imagp[0]. The remaining values Hi, for 0 < i < N/2, are stored normally in A.realp[i] and A.imagp[i].
Documentation explaining this is here, in the section “Data Packing for Real FFTs”.
To get symmetric results from strictly real inout to a basic FFT, your complex data input and output arrays have to be the same length as your FFT. You seem to be allocating and copying only half your data into the FFT input, which could be feeding non-real memory garbage to the FFT.
Related
I have a 2048 point FFT IP. How may I use it to calculate 512 point FFT ?
There are different ways to accomplish this, but the simplest is to replicate the input data 4 times, to obtain a signal of 2048 samples. Note that the DFT (which is what the FFT computes) can be seen as assuming the input signal being replicated infinitely. Thus, we are just providing a larger "view" of this infinitely long periodic signal.
The resulting FFT will have 512 non-zero values, with zeros in between. Each of the non-zero values will also be four times as large as the 512-point FFT would have produced, because there are four times as many input samples (that is, if the normalization is as commonly applied, with no normalization in the forward transform and 1/N normalization in the inverse transform).
Here is a proof of principle in MATLAB:
data = randn(1,512);
ft = fft(data); % 512-point FFT
data = repmat(data,1,4);
ft2 = fft(data); % 2048-point FFT
ft2 = ft2(1:4:end) / 4; % 512-point FFT
assert(all(ft2==ft))
(Very surprising that the values were exactly equal, no differences due to numerical precision appeared in this case!)
An alternate solution from the correct solution provided by Cris Luengo which does not require any rescaling is to pad the data with zeros to the required length of 2048 samples. You then get your result by reading every 2048/512 = 4 outputs (i.e. output[0], output[3], ... in a 0-based indexing system).
Since you mention making use of a hardware module, this could be implemented in hardware by connecting the first 512 input pins and grounding all other inputs, and reading every 4th output pin (ignoring all other output pins).
Note that this works because the FFT of the zero-padded signal is an interpolation in the frequency-domain of the original signal's FFT. In this case you do not need the interpolated values, so you can just ignore them. Here's an example computing a 4-point FFT using a 16-point module (I've reduced the size of the FFT for brievety, but kept the same ratio of 4 between the two):
x = [1,2,3,4]
fft(x)
ans> 10.+0.j,
-2.+2.j,
-2.+0.j,
-2.-2.j
x = [1,2,3,4,0,0,0,0,0,0,0,0,0,0,0,0]
fft(x)
ans> 10.+0.j, 6.499-6.582j, -0.414-7.242j, -4.051-2.438j,
-2.+2.j, 1.808+1.804j, 2.414-1.242j, -0.257-2.3395j,
-2.+0.j, -0.257+2.339j, 2.414+1.2426j, 1.808-1.8042j,
-2.-2.j, -4.051+2.438j, -0.414+7.2426j, 6.499+6.5822j
As you can see in the second output, the first column (which correspond to output 0, 3, 7 and 11) is identical to the desired output from the first, smaller-sized FFT.
I'm extracting SURF descriptors from an image using the following simple lines:
Ptr<DescriptorExtractor> descriptor = DescriptorExtractor::create("SURF");
descriptor->compute(im1, kp, desc1);
Now, when I "watch" the variable desc1.data, it contains integer values in the range 0 to 255.
However, when I investigate the values using the code:
for (int j=0;j<desc1.cols; j++){
float a=desc1.at<float>(0,j);
it contains values between -1 and 1. How is that possible? SURF shouldn't return integer values like SIFT?
I am not sure what happens in OpenCV, but as far the paper goes this is what SURF does. The SURF descriptor divides a small image patch into 4x4 sub regions and computes wavelet responses over each sub region in a clever fashion. Basically it is a 4 tuple descripor < sum(dx), sum(dy), sum(|dx|), sum(|dy|)>, where dx, dy are wavelet responses in each sub-region. The descriptor is constructed by concatenating all the responses and normalizing the magnitude, which results in a 64 dimensional descriptor. It is clear from the description that normalized sum(dx) and sum(dy) values would lie between -1 and 1, while sum(|dx|) and sum(|dy|) would lie between 0 to 1. If you see the 128 dimensional descriptor, the summation for dx and |dx| is computed separately for dx >= 0 and dy < 0, which doubles the size of the 64 dimensional descriptor.
I am a complete signal processing newbie, and I apologize in advance for asking a clueless question.
Is it possible to reuse an existing 1D FFT algorithm to compute an the 2D inverse FFT algorithm ?
Yes. In practical terms, a 2D FFT is 1-D FFTs columnwise then rowwise (or vice versa). This is exactly what I've done in the past
Linear Algebra
From a linear algebra sense; consider the 1D DFT as a unitary linear transform F.
The 2D FFT of a square matrix X is simply
F*X*F'
making an IFFT from an FFT
If you have no 1D IFFT then make one from an FFT: IFFT(x) == conj( FFT( conj( x ) ). This follows from its unitarity:
Note: for the composition of 2D IFFT from 1D FFTs, there are 4 levels of conjugation. The middle two undo each other and can be skipped.
Scaling Factors
For the fft to be unitary, it should preserve norms. Many libraries and tools neglect this and incur a sqrt(N) scale factor on the forward transform which they undo on the inverse.
Have a look at this solution in java that I wrote.. FFT is tricky, but I wrote this as simply as I could for understanding purposes. I won't post the complex class, but it's pretty standard - just holds a double for the real component and a double for the imaginary - and has a host of mathematical operations on complex numbers.
For the direction parameter pass in -1 for forward, and 1 for reverse. That's all the inverse relationship comes down to in this implementation. And of course, you know to inverse a 2D - you simply apply the 1D inverse against the rows, and then against the columns. (or vice versa).
//performs the FFT on a single dimension in the desired direction through recursion
private static Complex[] RecursiveFFT(Complex[] input, double direction)
{
int length = input.length;
int half_length = input.length / 2;
Complex[] result = new Complex[length];
if(length==1)
{
result[0] = input[0];
}
else
{
Complex[] sum = new Complex[half_length];
Complex[] diff = new Complex[half_length];
Complex temp = new Complex(0.0, direction*(2*Math.PI)/length).GetExponential();
Complex c1 = new Complex(1,0);
Complex c2 = new Complex(2,0);
for(int index=0;index<half_length;index++)
{
sum[index] = input[index].Add(input[index+half_length]).Divide(c2);
diff[index] = input[index].Subtract(input[index+half_length]).Multiply(c1).Divide(c2);
c1 = c1.Multiply(temp);
}
Complex[] even = RecursiveFFT(sum,direction);
Complex[] odd = RecursiveFFT(diff,direction);
for(int index=0;index<half_length;index++)
{
result[index*2] = even[index];
result[index*2 + 1] = odd[index];
}
}
return result;
}
Yes. The transform of a two-dimensional matrix is simply the composition of the individual transforms of all of the rows and, after all the rows are transformed, the individual transforms of all the columns.
However, there are a number of performance issues in an FFT. In particular, transforming the columns of an array is likely to encounter cache thrashing problems. And performing individual transforms is less efficient than using SIMD parallelism on machines that support it. So it is usually better to write a two-dimensional implementation with performance details in mind than it is to compose a two-dimensional FFT out of one-dimensional FFTs.
While using FFT sample code from Apple documentation, what actually does the N, log2n, n and nOver2 mean?
Does N refer to the window size of the fft or the whole number of samples in a given audio, and
how do I calculate N from an audio file?
how are they related to the audio sampling rate i.e. 44.1kHz?
What would be the FFT frame size in this code?
Code:
/* Set the size of FFT. */
log2n = N;
n = 1 << log2n;
stride = 1;
nOver2 = n / 2;
printf("1D real FFT of length log2 ( %d ) = %d\n\n", n, log2n);
/* Allocate memory for the input operands and check its availability,
* use the vector version to get 16-byte alignment. */
A.realp = (float *) malloc(nOver2 * sizeof(float));
A.imagp = (float *) malloc(nOver2 * sizeof(float));
originalReal = (float *) malloc(n * sizeof(float));
obtainedReal = (float *) malloc(n * sizeof(float));
N or n typically refers to the number of elements. log2n is the base-two logarithm of n. (The base-two logarithm of 32 is 5.) nOver2 is n/2, n divided by two.
In the context of an FFT, n is the number of samples being fed into the FFT.
n is usually determined by a variety of constraints. You want more samples to provide a better quality result, but you do not want so many samples that processing takes up a lot of computer time or that the result is not available until so late that the user notices a lag. Usually, it is not the length of an audio file that determines the size. Rather, you design a “window” that you will use for processing, then you read samples from the audio file into a buffer big enough to hold your window, then you process the buffer, then you repeat with more samples from the file. Repetitions continue until the entire file is processed.
A higher audio sampling rate means there will be more samples in a given period of time. E.g., if you want to keep your window under 1/30th of a second, then a 44.1 kHz sampling rate will have less than 44.1•1000/30 = 1470 samples. A higher sampling rate means you have more work to do, so you may need to adjust your window size to keep the processing within limits.
That code uses N for log2n, which is unfortunate, since it may confuse people. Otherwise, the code is as I described above, and the FFT frame size is n.
There can be some confusion about FFT size or length when a mix of real data and complex data is involved. Typically, for a real-to-complex FFT, the number of real elements is said to be the length. When doing a complex-to-complex FFT, the number of complex elements is the length.
'N' is the number of samples, i.e., your vector size. Corresponding, 'log2N' is the logarithm of 'N' with the base 2, and 'nOver2' is the half of 'N'.
To answer the other questions, one must know, what do you want to do with FFT. This document, even it is written with a specific system in mind, can serve as an survey about the relation and the meaning of the parameters in (D)FFT.
I have an array of 240 data points sampled at 600hz, representing 400ms. I need to resample this data to 512 data points sampled at 1024hz, representing 500ms. I assume since I'm starting with 400ms of data, the last 100ms will just need to be padded with 0s.
Is there a best approach to take to accomplish this?
If you want to avoid interpolation then you need to upsample to a 76.8 kHz sample rate (i.e. insert 127 0s after every input sample), low pass filter, then decimate (drop 74 out of every 75 samples).
You can use windowed Sinc interpolation, which will give you the same result as upsampling and downsampling using a linear phase FIR low-pass filter with a windowed Sinc impulse response. When using a FIR filter, one normally has to pad a signal with zeros the length of the FIR filter kernel on both sides.
Added:
Another possibility is to zero pad 240 samples with 60 zeros, apply a non-power-of-2 FFT of length 300, "center" zero pad the FFT result with 212 complex zeros to make it 512 long, but with the identical spectrum, and do an IFFT of length 512 to get the resampled result.
Yes to endolith's response, if you want to interpolate x[n] by simply computing the FFT, zero-stuff, and then IFFT, you'll get errors if x[n] is not periodic. See this reference: http://www.embedded.com/design/other/4212939/Time-domain-interpolation-using-the-Fast-Fourier-Transform-
FFT based resampling/upsampling is pretty easy...
If you can use python, scipy.signal.resample should work.
For C/C++, there is a simple fftw trick to upsample if you have real (as opposed to complex) data.
nfft = the original data length
upnfft = the new data length
double * data = the original data
// allocate
fftw_complex * tmp_fd = (fftw_complex*)fftw_malloc((upnfft/2+1)*sizeof(fftw_complex));
double * result = (double*)fftw_malloc(upnfft*sizeof(double));
// create fftw plans
fftw_plan fft_plan = fftw_plan_dft_r2c_1d(nfft, data, tmp_fd, FFTW_ESTIMATE);
fftw_plan ifft_plan = fftw_plan_dft_c2r_1d(upnfft, tmp_fd, result, FFTW_ESTIMATE);
// zero out tmp_fd
memset(tmp_fd, 0, (upnfft/2+1)*sizeof(fftw_complex));
// execute the plans (forward then reverse)
fftw_execute_dft_r2c(fft_plan, data, tmp_fd);
fftw_execute_dft_c2r(ifft_plan, tmp_fd, result);
// cleanup
fftw_free(tmp_fd);