I am designing a fractional delay filter, and my lagrange coefficient of order 5 h(n) have 6 taps in time domain. I have tested to convolute the h(n) with x(n) which is 5000 sampled signal using matlab, and the result seems ok. When I tried to use FFT and IFFT method, the output is totally wrong. Actually my FFT is computed with 8192 data in frequency domain, which is the nearest power of 2 for 5000 signal sample. For the IFFT portion, I convert back the 8192 frequency domain data back to 5000 length data in time domain. So, the problem is, why this thing works in convolution, but not in FFT multiplication. Does converting my 6 taps h(n) to 8192 taps in frequency domain causes this problem?
Actually I have tried using overlap-save method, which perform the FFT and multiplication with smaller chunks of x(n) and doing it 5 times separately. The result seems slight better than the previous, and at least I can see the waveform pattern, but still slightly distorted. So, any idea where goes wrong, and what is the solution. Thank you.
The reason I am implementing the circular convolution in frequency domain instead of time domain is, I am try to merge the Lagrange filter with other low pass filter in frequency domain, so that the implementation can be more efficient. Of course I do believe implement filtering in frequency domain will be much faster than convolution in time domain. The LP filter has 120 taps in time domain. Due to the memory constraints, the raw data including the padding will be limited to 1024 in length, and so with the fft bins.
Because my Lagrange coefficient has only 6 taps, which is huge different with 1024 taps. I doubt that the fft of the 6 taps to 1024 bins in frequency domain will cause error. Here is my matlab code on Lagrange filter only. This is just a test code only, not implementation code. It's a bit messy, sorry about that. Really appreciate if you can give me more advice on this problem. Thank you.
t=1:5000;
fs=2.5*(10^12);
A=70000;
x=A*sin(2*pi*10.*t.*(10^6).*t./fs);
delay=0.4;
N=5;
n = 0:N;
h = ones(1,N+1);
for k = 0:N
index = find(n ~= k);
h(index) = h(index) * (delay-k)./ (n(index)-k);
end
pad=zeros(1,length(h)-1);
out=[];
H=fft(hh,1024);
H=fft([h zeros(1,1024-length(h))]);
for i=0:1:ceil(length(x)/(1024-length(h)+1))-1
if (i ~= ceil(length(x)/(1024-length(h)+1))-1)
a=x(1,i*(1024-length(h)+1)+1:(i+1)*(1024-length(h)+1));
else
temp=x(1,i*(1024-length(h)+1)+1:length(x));
a=[temp zeros(1,1024-length(h)+1-length(temp))];
end
xx=[pad a];
X=fft(xx,1024);
Y=H.*X;
y=abs(ifft(Y,1024));
out=[out y(1,length(h):length(y))];
pad=y(1,length(a)+1:length(y));
end
Some comments:
The nearest power of two is actually 4096. Do you expect the remaining 904 samples to contribute much? I would guess that they are significant only if you are looking for relatively low-frequency features.
How did you pad your signal out to 8192 samples? Padding your sample out to 8192 implies that approximately 40% of your data is "fictional". If you used zeros to lengthen your dataset, you likely injected a step change at the pad point - which implies a lot of high-frequency content.
A short code snippet demonstrating your methods couldn't hurt.
Related
I have a 2048 point FFT IP. How may I use it to calculate 512 point FFT ?
There are different ways to accomplish this, but the simplest is to replicate the input data 4 times, to obtain a signal of 2048 samples. Note that the DFT (which is what the FFT computes) can be seen as assuming the input signal being replicated infinitely. Thus, we are just providing a larger "view" of this infinitely long periodic signal.
The resulting FFT will have 512 non-zero values, with zeros in between. Each of the non-zero values will also be four times as large as the 512-point FFT would have produced, because there are four times as many input samples (that is, if the normalization is as commonly applied, with no normalization in the forward transform and 1/N normalization in the inverse transform).
Here is a proof of principle in MATLAB:
data = randn(1,512);
ft = fft(data); % 512-point FFT
data = repmat(data,1,4);
ft2 = fft(data); % 2048-point FFT
ft2 = ft2(1:4:end) / 4; % 512-point FFT
assert(all(ft2==ft))
(Very surprising that the values were exactly equal, no differences due to numerical precision appeared in this case!)
An alternate solution from the correct solution provided by Cris Luengo which does not require any rescaling is to pad the data with zeros to the required length of 2048 samples. You then get your result by reading every 2048/512 = 4 outputs (i.e. output[0], output[3], ... in a 0-based indexing system).
Since you mention making use of a hardware module, this could be implemented in hardware by connecting the first 512 input pins and grounding all other inputs, and reading every 4th output pin (ignoring all other output pins).
Note that this works because the FFT of the zero-padded signal is an interpolation in the frequency-domain of the original signal's FFT. In this case you do not need the interpolated values, so you can just ignore them. Here's an example computing a 4-point FFT using a 16-point module (I've reduced the size of the FFT for brievety, but kept the same ratio of 4 between the two):
x = [1,2,3,4]
fft(x)
ans> 10.+0.j,
-2.+2.j,
-2.+0.j,
-2.-2.j
x = [1,2,3,4,0,0,0,0,0,0,0,0,0,0,0,0]
fft(x)
ans> 10.+0.j, 6.499-6.582j, -0.414-7.242j, -4.051-2.438j,
-2.+2.j, 1.808+1.804j, 2.414-1.242j, -0.257-2.3395j,
-2.+0.j, -0.257+2.339j, 2.414+1.2426j, 1.808-1.8042j,
-2.-2.j, -4.051+2.438j, -0.414+7.2426j, 6.499+6.5822j
As you can see in the second output, the first column (which correspond to output 0, 3, 7 and 11) is identical to the desired output from the first, smaller-sized FFT.
According to what I have read on the internet, the normal range of fundamental frequency of female voice is 165 to 255 Hz .
I am using Praat and also python library called Parselmouth to get the fundamental frequency values of female voice in an audio file(.wav). however, I got some values that are over 255Hz(eg: 400+Hz, 500Hz).
Is it normal to get big values like this?
It is possible, but unlikely, if you are trying to capture the fundamental frequency (F0) of a speaking voice. It sounds likely that you are capturing a more easily resonating overtone (e.g. F1 or F2) instead.
My experiments with Praat give me the impression that the with good parameters it will reliably extract F0.
What you'll want to do is to verify that by comparing the pitch curve with a spectrogram. Here's an example of a fitting made by Praat (female speaker):
You can see from the image that
Most prominent frequency seems to be F2
Around 200 Hz seems likely to be F0, since there's only noise below that (compared to before/after the segment)
Praat has calculated a good estimate of F0 for the voiced speech segments
If, after a visual inspection, it seems that you are getting wrong results, you can try to tweak the parameters. Window length greatly affects the frequency resolution.
If you can't capture frequencies this low, you should try increasing the window length - the intuition is that it gives the algorithm a better chance at finding slowly changing periodic features in the data.
I'd be grateful if people could help me find an efficient way (probably low memory algorithm) to tackle the following problem.
I need to find the stationary distribution x of a transition matrix P. The transition matrix is an extremely large, extremely sparse matrix, constructed such that all the columns sum to 1. Since the stationary distribution is given by the equation Px = x, then x is simply the eigenvector of P associated with eigenvalue 1.
I'm currently using GNU Octave to both generate the transition matrix, find the stationary distribution, and plot the results. I'm using the function eigs(), which calculates both eigenvalues and eigenvectors, and it is possible to return just one eigenvector, where the eigenvalue is 1 (I actually had to specify 1.1, to prevent an error). Construction of the transition matrix (using a sparse matrix) is fairly quick, but finding the eigenvector gets increasingly slow as I increase the size, and I'm running out of memory before I can examine even moderately sized problems.
My current code is
[v l] = eigs(P, 1, 1.01);
x = v / sum(v);
Given that I know that 1 is the eigenvalue, I'm wondering if there is either a better method to calculate the eigenvector, or a way that makes more efficient use of memory, given that I don't really need an intermediate large dense matrix. I naively tried
n = size(P,1); % number of states
Q = P - speye(n,n);
x = Q\zeros(n,1); % solve (P-I)x = 0
which fails, since Q is singular (by definition).
I would be very grateful if anyone has any ideas on how I should approach this, as it's a calculation I have to perform a great number of times, and I'd like to try it on larger and more complex models if possible.
As background to this problem, I'm solving for the equilibrium distribution of the number of infectives in a cattle herd in a stochastic SIR model. Unfortunately the transition matrix is very large for even moderately sized herds. For example: in an SIR model with an average of 20 individuals (95% of the time the population is between 12 and 28 individuals), P is 21169 by 21169 with 20340 non-zero values (i.e. 0.0005% dense), and uses up 321 Kb (a full matrix of that size would be 3.3 Gb), while for around 50 individuals P uses 3 Mb. x itself should be pretty small. I suspect that eigs() has a dense matrix somewhere, which is causing me to run out of memory, so I should be okay if I can avoid using full matrices.
Power iteration is a standard way to find the dominant eigenvalue of a matrix. You pick a random vector v, then hit it with P repeatedly until you stop seeing it change very much. You want to periodically divide v by sqrt(v^T v) to normalise it.
The rate of convergence here is proportional to the separation between the largest eigenvalue and the second largest eigenvalue. Each iteration takes just a couple of matrix multiplies.
There are fancier-pants ways to do this ("PageRank" is one good thing to search for here) that improve speed for really huge sparse matrices, but I don't know that they're necessary or useful here.
Your approach seems like a good one. However, what you're calling x, is the null space of Q. null(Q) would work if it supported sparse matrices, but it doesn't. There's a bunch of stuff on the web for finding the null space of a sparse matrix. For example:
http://www.mathworks.co.uk/matlabcentral/newsreader/view_thread/249467
http://www.mathworks.com/matlabcentral/fileexchange/42922-null-space-for-sparse-matrix/content/nulls.m
http://www.mathworks.com/matlabcentral/fileexchange/11120-null-space-of-a-sparse-matrix
It seems the best solution is to use the Power Iteration method, as suggested by tmyklebu.
The method is to iterate x = Px; x /= sum(x), until x converges. I'm assuming convergence if the d1 norm between successive iterations is less than 1e-5, as that seems to give good results.
Convergence can take a while, since the largest two eigenvalues are fairly close (the number of iterations needed to converge can vary considerably, from around 200 to 2000 depending on the model used and population sizes, but it gets there in the end). However, the memory requirements are low, and it's very easy to implement.
I'm able to read a wav files and its values. I need to find peaks and pits positions and their values. First time, i tried to smooth it by (i-1 + i + i +1) / 3 formula then searching on array as array[i-1] > array[i] & direction == 'up' --> pits style solution but because of noise and other reasons of future calculations of project, I'm tring to find better working area. Since couple days, I'm researching FFT. As my understanding, fft translates the audio files to series of sines and cosines. After fft operation the given values is a0's and a1's for a0 + ak * cos(k*x) + bk * sin(k*x) which k++ and x++ as this picture
http://zone.ni.com/images/reference/en-XX/help/371361E-01/loc_eps_sigadd3freqcomp.gif
My question is, does fft helps to me find peaks and pits on audio? Does anybody has a experience for this kind of problems?
It depends on exactly what you are trying to do, which you haven't really made clear. "finding the peaks and pits" is one thing, but since there might be various reasons for doing this there might be various methods. You already tried the straightforward thing of actually looking for the local maximum and minima, it sounds like. Here are some tips:
you do not need the FFT.
audio data usually swings above and below zero (there are exceptions, including 8-bit wavs, which are unsigned, but these are exceptions), so you must be aware of positive and negative values. Generally, large positive and large negative values carry large amounts of energy, though, so you want to count those as the same.
due to #2, if you want to average, you might want to take the average of the absolute value, or more commonly, the average of the square. Once you find the average of the squares, take the square root of that value and this gives the RMS, which is related to the power of the signal, so you might do something like this is you are trying to indicate signal loudness, intensity or approximate an analog meter. The average of absolutes may be more robust against extreme values, but is less commonly used.
another approach is to simply look for the peak of the absolute value over some number of samples, this is commonly done when drawing waveforms, and for digital "peak" meters. It makes less sense to look at the minimum absolute.
Once you've done something like the above, yes you may want to compute the log of the value you've found in order to display the signal in dB, but make sure you use the right formula. 10 * log_10( amplitude ) is not it. Rule of thumb: usually when computing logs from amplitude you will see a 20, not a 10. If you want to compute dBFS (the amount of "headroom" before clipping, which is the standard measurement for digital meters), the formula is -20 * log_10( |amplitude| ), where amplitude is normalize to +/- 1. Watch out for amplitude = 0, which gives an infinite headroom in dB.
If I understand you correctly, you just want to estimate the relative loudness/quietness of an audio digital sample at a given point.
For this estimation, you don't need to use FFT. However your method of averaging the signal does not produce the appropiate picture neither.
The digital signal is the value of the audio wave at a given moment. You need to find the overall amplitude of the signal at that given moment. You can somewhat see it as the local maximum value for a given interval around the moment you want to calculate. You may have a moving max for the signal and get your amplitude estimation.
At a 16 bit sound sample, the sound signal value can go from 0 up to 32767. At a 44.1 kHz sample rate, you can find peaks and pits of around 0.01 secs by finding the max value of 441 samples around a given t moment.
max=1;
for (i=0; i<441; i++) if (array[t*44100+i]>max) max=array[t*44100+i];
then for representing it on a 0 to 1 scale you (not really 0, because we used a minimum of 1)
amplitude = max / 32767;
or you might represent it in relative dB logarithmic scale (here you see why we used 1 for the minimum value)
dB = 20 * log10(amplitude);
all you need to do is take dy/dx, which can getapproximately by just scanning through the wave and and subtracting the previous value from the current one and look at where it goes to zero or changes from positive to negative
in this code I made it really brief and unintelligent for sake of brevity, of course you could handle cases of dy being zero better, find the 'centre' of a long section of a flat peak, that kind of thing. But if all you need is basic peaks and troughs, this will find them.
lastY=0;
bool goingup=true;
for( i=0; i < wave.length; i++ ) {
y = wave[i];
dy = y - lastY;
bool stillgoingup = (dy>0);
if( goingup != direction ) {
// changed direction - note value of i(place) and 'y'(height)
stillgoingup = goingup;
}
}
I'm not a DSP expert, but I understand that there are two ways that I can apply a discrete time-domain filter to a discrete time-domain waveform. The first is to convolve them in the time domain, and the second is to take the FFT of both, multiply both complex spectrums, and take IFFT of the result. One key difference in these methods is the second approach is subject to circular convolution.
As an example, if the filter and waveforms are both N points long, the first approach (i.e. convolution) produces a result that is N+N-1 points long, where the first half of this response is the filter filling up and the 2nd half is the filter emptying. To get a steady-state response, the filter needs to have fewer points than the waveform to be filtered.
Continuing this example with the second approach, and assuming the discrete time-domain waveform data is all real (not complex), the FFT of the filter and the waveform both produce FFTs of N points long. Multiplying both spectrums IFFT'ing the result produces a time-domain result also N points long. Here the response where the filter fills up and empties overlap each other in the time domain, and there's no steady state response. This is the effect of circular convolution. To avoid this, typically the filter size would be smaller than the waveform size and both would be zero-padded to allow space for the frequency convolution to expand in time after IFFT of the product of the two spectrums.
My question is, I often see work in the literature from well-established experts/companies where they have a discrete (real) time-domain waveform (N points), they FFT it, multiply it by some filter (also N points), and IFFT the result for subsequent processing. My naive thinking is this result should contain no steady-state response and thus should contain artifacts from the filter filling/emptying that would lead to errors in interpreting the resulting data, but I must be missing something. Under what circumstances can this be a valid approach?
Any insight would be greatly appreciated
The basic problem is not about zero padding vs the assumed periodicity, but that Fourier analysis decomposes the signal into sine waves which, at the most basic level, are assumed to be infinite in extent. Both approaches are correct in that the IFFT using the full FFT will return the exact input waveform, and both approaches are incorrect in that using less than the full spectrum can lead to effects at the edges (that usually extend a few wavelengths). The only difference is in the details of what you assume fills in the rest of infinity, not in whether you are making an assumption.
Back to your first paragraph: Usually, in DSP, the biggest problem I run into with FFTs is that they are non-causal, and for this reason I often prefer to stay in the time domain, using, for example, FIR and IIR filters.
Update:
In the question statement, the OP correctly points out some of the problems that can arise when using FFTs to filter signals, for example, edge effects, that can be particularly problematic when doing a convolution that is comparable in the length (in the time domain) to the sampled waveform. It's important to note though that not all filtering is done using FFTs, and in the paper cited by the OP, they are not using FFT filters, and the problems that would arise with an FFT filter implementation do not arise using their approach.
Consider, for example, a filter that implements a simple average over 128 sample points, using two different implementations.
FFT: In the FFT/convolution approach one would have a sample of, say, 256, points and convolve this with a wfm that is constant for the first half and goes to zero in the second half. The question here is (even after this system has run a few cycles), what determines the value of the first point of the result? The FFT assumes that the wfm is circular (i.e. infinitely periodic) so either: the first point of the result is determined by the last 127 (i.e. future) samples of the wfm (skipping over the middle of the wfm), or by 127 zeros if you zero-pad. Neither is correct.
FIR: Another approach is to implement the average with an FIR filter. For example, here one could use the average of the values in a 128 register FIFO queue. That is, as each sample point comes in, 1) put it in the queue, 2) dequeue the oldest item, 3) average all of the 128 items remaining in the queue; and this is your result for this sample point. This approach runs continuously, handling one point at a time, and returning the filtered result after each sample, and has none of the problems that occur from the FFT as it's applied to finite sample chunks. Each result is just the average of the current sample and the 127 samples that came before it.
The paper that OP cites takes an approach much more similar to the FIR filter than to the FFT filter (note though that the filter in the paper is more complicated, and the whole paper is basically an analysis of this filter.) See, for example, this free book which describes how to analyze and apply different filters, and note also that the Laplace approach to analysis of the FIR and IIR filters is quite similar what what's found in the cited paper.
Here's an example of convolution without zero padding for the DFT (circular convolution) vs linear convolution. This is the convolution of a length M=32 sequence with a length L=128 sequence (using Numpy/Matplotlib):
f = rand(32); g = rand(128)
h1 = convolve(f, g)
h2 = real(ifft(fft(f, 128)*fft(g)))
plot(h1); plot(h2,'r')
grid()
The first M-1 points are different, and it's short by M-1 points since it wasn't zero padded. These differences are a problem if you're doing block convolution, but techniques such as overlap and save or overlap and add are used to overcome this problem. Otherwise if you're just computing a one-off filtering operation, the valid result will start at index M-1 and end at index L-1, with a length of L-M+1.
As to the paper cited, I looked at their MATLAB code in appendix A. I think they made a mistake in applying the Hfinal transfer function to the negative frequencies without first conjugating it. Otherwise, you can see in their graphs that the clock jitter is a periodic signal, so using circular convolution is fine for a steady-state analysis.
Edit: Regarding conjugating the transfer function, the PLLs have a real-valued impulse response, and every real-valued signal has a conjugate symmetric spectrum. In the code you can see that they're just using Hfinal[N-i] to get the negative frequencies without taking the conjugate. I've plotted their transfer function from -50 MHz to 50 MHz:
N = 150000 # number of samples. Need >50k to get a good spectrum.
res = 100e6/N # resolution of single freq point
f = res * arange(-N/2, N/2) # set the frequency sweep [-50MHz,50MHz), N points
s = 2j*pi*f # set the xfer function to complex radians
f1 = 22e6 # define 3dB corner frequency for H1
zeta1 = 0.54 # define peaking for H1
f2 = 7e6 # define 3dB corner frequency for H2
zeta2 = 0.54 # define peaking for H2
f3 = 1.0e6 # define 3dB corner frequency for H3
# w1 = natural frequency
w1 = 2*pi*f1/((1 + 2*zeta1**2 + ((1 + 2*zeta1**2)**2 + 1)**0.5)**0.5)
# H1 transfer function
H1 = ((2*zeta1*w1*s + w1**2)/(s**2 + 2*zeta1*w1*s + w1**2))
# w2 = natural frequency
w2 = 2*pi*f2/((1 + 2*zeta2**2 + ((1 + 2*zeta2**2)**2 + 1)**0.5)**0.5)
# H2 transfer function
H2 = ((2*zeta2*w2*s + w2**2)/(s**2 + 2*zeta2*w2*s + w2**2))
w3 = 2*pi*f3 # w3 = 3dB point for a single pole high pass function.
H3 = s/(s+w3) # the H3 xfer function is a high pass
Ht = 2*(H1-H2)*H3 # Final transfer based on the difference functions
subplot(311); plot(f, abs(Ht)); ylabel("abs")
subplot(312); plot(f, real(Ht)); ylabel("real")
subplot(313); plot(f, imag(Ht)); ylabel("imag")
As you can see, the real component has even symmetry and the imaginary component has odd symmetry. In their code they only calculated the positive frequencies for a loglog plot (reasonable enough). However, for calculating the inverse transform they used the values for the positive frequencies for the negative frequencies by indexing Hfinal[N-i] but forgot to conjugate it.
I can shed some light to the reason why "windowing" is applied before FFT is applied.
As already pointed out the FFT assumes that we have a infinite signal. When we take a sample over a finite time T this is mathematically the equivalent of multiplying the signal with a rectangular function.
Multiplying in the time domain becomes convolution in the frequency domain. The frequency response of a rectangle is the sync function i.e. sin(x)/x. The x in the numerator is the kicker, because it dies down O(1/N).
If you have frequency components which are exactly multiples of 1/T this does not matter as the sync function is zero in all points except that frequency where it is 1.
However if you have a sine which fall between 2 points you will see the sync function sampled on the frequency point. It lloks like a magnified version of the sync function and the 'ghost' signals caused by the convolution die down with 1/N or 6dB/octave. If you have a signal 60db above the noise floor, you will not see the noise for 1000 frequencies left and right from your main signal, it will be swamped by the "skirts" of the sync function.
If you use a different time window you get a different frequency response, a cosine for example dies down with 1/x^2, there are specialized windows for different measurements. The Hanning window is often used as a general purpose window.
The point is that the rectangular window used when not applying any "windowing function" creates far worse artefacts than a well chosen window. i.e by "distorting" the time samples we get a much better picture in the frequency domain which closer resembles "reality", or rather the "reality" we expect and want to see.
Although there will be artifacts from assuming that a rectangular window of data is periodic at the FFT aperture width, which is one interpretation of what circular convolution does without sufficient zero padding, the differences may or may not be large enough to swamp the data analysis in question.