Effect of LP order and sampling frequency on formants of speech - signal-processing

I have been trying to understand linear prediction of speech. I came to know that the order p of the LP predictor should be (Fs/1000)+2. In case of a 8KHz speech signal, the LP spectrum will range from 0 to 4KHz, and there will be 4 to 5 peaks corresponding to each formant. Does that mean, in case of a 16KHz speech signal, the LP spectrum will range from 0 to 8KHz and there will be 8 formants, because of 8 peaks?
The second doubt I have is that the value of formant frequencies in case of a 8KHz speech signal will be limited to 4KHz only, while for 16KHz, we will get higher values of formant frequencies? What am I missing in my understanding?

The number of poles will be equal to the order of the linear predictor system. However some of these poles will be on the real axis (frequency zero) and are not considered as formant candidates (these model the spectral slope of the voice and why the order is often higher than the number of desired formants). Poles with an imaginary part will always have a complex conjugate partner. One formant will correspond to each of these pairs.
The maximum formant will be F_s/2
You should have a play with Praat and their formant estimation changing the maximum formant frequency and the number of formants. You really have to chose the maximum formant (sampling rate) and number of formants (order) correctly, or else praat will estimate a only a single formant in between the first and second, or second and third formant. Formant estimation by LPC is not very rubust to bad choices of these parameters, in my expirence.

Related

Calculating SNR using PSD of captured signal and noise

I have captured both a transmitted signal and when there is no transmission (i.e. noise only). I would like to calculate the SNR of the signal. I would like to make sure the following GNURadio flowgraph is not wrong:
In summary, after the PSD of each is calculated, the "Integrate with Decimate over 2048" block sums up the power over the 2048 FFT bins. Then, the noise FFT sum is subtracted from the signal FFT sum. This is divided by the noise FFT sum and converted to dB.
This is the output of my flowgraph:
As calculated by my flowgraph, the power values are:
signal only, raw power: ~0.329
noise only, raw power: 0.000007
SNR in dB: ~46.6dB
I'm using a LoRa node to transmit the signal of interest; the modulation details are here: https://www.thethingsnetwork.org/docs/lorawan/#modulation-and-data-rate
The signal occupies the captured bandwidth (125k) and is sampled at 1 million samples per second.
Your flowgraph should give you the correct SNR value under the following conditions:
the signal and noise sources are uncorrelated
the "noise only" captured by the lower branch has the same characteristics (especially the same average power) as the noise included in the "signal + noise" captured by the upper branch
As an aside, unless you are also using intermediate signals for other purposes, there are a few simplifications that can be made to your flowgraph:
The multiplications up the upper and lower branches by the same constant factor will eventually cancel out in the divide block. You could save yourself the trouble of the scaling altogether.
From Parseval's theorem, the summation of the squared magnitudes in the frequency-domain is proportional to the summation of the squared samples in the time-domain. The FFT blocks would thus not be necessary.
That said, in your flowgraph you are using some intermediate signals for GUI output purposes. In this case, you could simply put the required constant scaling just before the Number Sink.

Sinusoids with frequencies that are random variales - What does the FFT impulse look like?

I'm currently working on a program in C++ in which I am computing the time varying FFT of a wav file. I have a question regarding plotting the results of an FFT.
Say for example I have a 70 Hz signal that is produced by some instrument with certain harmonics. Even though I say this signal is 70 Hz, it's a real signal and I assume will have some randomness in which that 70Hz signal varies. Say I sample it for 1 second at a sample rate of 20kHz. I realize the sample period probably doesn't need to be 1 second, but bear with me.
Because I now have 20000 samples, when I compute the FFT. I will have 20000 or (19999) frequency bins. Let's also assume that my sample rate in conjunction some windowing techniques minimize spectral leakage.
My question then: Will the FFT still produce a relatively ideal impulse at 70Hz? Or will there 'appear to be' spectral leakage which is caused by the randomness the original signal? In otherwords, what does the FFT look like of a sinusoid whose frequency is a random variable?
Some of the more common modulation schemes will add sidebands that carry the information in the modulation. Depending on the amount and type of modulation with respect to the length of the FFT, the sidebands can either appear separate from the FFT peak, or just "fatten" a single peak.
Your spectrum will appear broadened and this happens in the real world. Look e.g for the Voight profile, which is a Lorentizan (the result of an ideal exponential decay) convolved with a Gaussian of a certain width, the width being determined by stochastic fluctuations, e.g. Doppler effect on molecules in a gas that is being probed by a narrow-band laser.
You will not get an 'ideal' frequency peak either way. The limit for the resolution of the FFT is one frequency bin, (frequency resolution being given by the inverse of the time vector length), but even that (as #xvan pointed out) is in general broadened by the window function. If your window is nonexistent, i.e. it is in fact a square window of the length of the time vector, then you'll get spectral peaks that are convolved with a sinc function, and thus broadened.
The best way to visualize this is to make a long vector and plot a spectrogram (often shown for audio signals) with enough resolution so you can see the individual variation. The FFT of the overall signal is then the projection of the moving peaks onto the vertical axis of the spectrogram. The FFT of a given time vector does not have any time resolution, but sums up all frequencies that happen during the time you FFT. So the spectrogram (often people simply use the STFT, short time fourier transform) has at any given time the 'full' resolution, i.e. narrow lineshape that you expect. The FFT of the full time vector shows the algebraic sum of all your lineshapes and therefore appears broadened.
To sum it up there are two separate effects:
a) broadening from the window function (as the commenters 1 and 2 pointed out)
b) broadening from the effect of frequency fluctuation that you are trying to simulate and that happens in real life (e.g. you sitting on a swing while receiving a radio signal).
Finally, note the significance of #xvan's comment : phi= phi(t). If the phase angle is time dependent then it has a derivative that is not zero. dphi/dt is a frequency shift, so your instantaneous frequency becomes f0 + dphi/dt.

Optimal value of sampling frequency for guitar notes detection

I am running FFT algorithm to detect the music note played on a guitar.
The frequencies that I am interested are in the range 65.41Hz (C2) to 1864.7Hz (A#6).
If I set the sampling frequency of the input to 16KHz, the output of FFT would yield N points from 0Hz to 16KHz linearly. All the input I am interested would be in the first N/8 points approximately. The other N*7/8 points are of no use to me. They actually are decreasing my resolution.
From Nyquist's theory (https://en.wikipedia.org/wiki/Nyquist_frequency), the sampling frequency that is needed is just twice the maximum frequency one desires. In my case, this would be about 4KHz.
Is 4KHz really the ideal sampling frequency for a guitar tuning app?
Intuitively, one would feel a better sampling frequency would give you more accurate results. However, in this case, it seems having a lesser sampling frequency is better for improving the resolution. Regards.
You are confusing the pitch of a guitar note with spectral frequency. A guitar generates lots of overtones and harmonics at a much higher frequency than the pitch of a played note. Those higher harmonics and overtones, more than the possibly weak fundamental frequency in some cases, is what the human ear hears and interprets as the lower perceived pitch.
Any of the overtones and harmonics around or above 2 kHz that are not completely low pass filtered out before sampling at 4 kHz will cause aliasing and thus corruption of your sampled data and its spectrum.
If you want to create an accurate tuner, use a pitch estimation algorithm, not an FFT peak frequency bin estimator. And depending on which pitch estimation method you choose, a higher density of samples per unit time might allow finer accuracy or greater reliability under background noise or more prompt responsiveness.
Is 4KHz really the ideal sampling frequency for a guitar tuning app?
You've been mis-reading Nyquist's theorem if you ask it like that.
States that every sampling frequency above twice your maximum signal frequency will allow you to perfectly reconstruct your original signal. So there's no "ideal" frequency. Just a set of frequencies that are sufficient. What is ideal hence depends on a lot of other things: mainly, what your digitizer really supports (hint: most sound cards can do 44.1kHz, but not 4kHz), what kind of margin you want to have for filters etc to work on, and what kind of processing power you can spend (hint: modern smart phones, PCs and even pocket calculators don't really have a hard time processing a couple hundred kHz in real time).
Also note that #hotpaw2 is right, the harmonics are important, and are multiples of the base tone frequency.
However, in this case, it seems having a lesser sampling frequency is better for improving the resolution.
no. No matter where that comes from, it's wrong. Information theory's first and foremost result is that based upon more information, you can't make worse estimates. An oversampled signal is simply more information on the same signal.
Yes, if all you are interested in is frequencies up to 2 kHz then you only need a sampling frequency of 4 kHz. This should include an anti-aliasing filter in front of the ADC or any downconverter to prevent any higher frequency components from aliasing into a lower frequency.
If all you are interested in is specific frequencies (one or two) then you may want to look at the Goertzel algorithm which is more efficient than an FFT for a single frequency. Also, the chirp-Z transform can be used to effectively get a zoomed FFT (resulting in a higher resolution over a smaller bandwidth without the computational complexity of an FFT with the same resolution). You may want to check out this CZT tutorial

Voice Spectrogram

I am working on a spectrogram project and trying to plot the frequencies with the highest magnitude at each section. We have tested and recorded the do-re-mi-fa-so-la-ti-do sang by a human. After plotting the spectrogram, we have seen multiple sets of increase in magnitudes. In this image , we have encircled our ideal frequencies to be plotted.
However, there were some sections that had the frequencies with the highest magnitude located outside our ideal set of frequencies. For example, in time 6-7, the frequency plotted was around 200 instead of 400.
Do anybody have an idea why this happens?
This is normal and expected. The overtone or harmonic with the highest magnitude in speech or singing can vary with the pitch and voicing (the particular vowel sound, etc.) Change the speaker, pitch or vowel and the overtone or harmonic frequency multiplier for the highest energy peak can change. It can even change over time for a constant vowel and pitch.

Effects for bad sampling in frequency formula

Is there any formula to calculate the frequency (or frequencys) of a signal that is bad sampled?
For example, what's the output of an analog signal with F=22Khz when it's sampled at 25Khz, or 10Khz?
EDIT:
In this example, the sampled signal (on the right) have a different frequency than the original one, because it was bad sampled (Fs is minor than 2*F)
My question is: is there any formula to know what's the frequency of this 20kHz signal, sampled at 30kHz?
No any formula to know what's the frequency of 20kHz signal, sampled at 30kHz. But it is a fact that the frequency of undersampled signal will be reflected about Nyquist frequency. In your example 30 kHz means that Nyquist frequency is about 15 KHz, that is not enough to record original signal (20KHz) correctly, only 15 kHz of it distributed, another 5 KHz (reminder after distribution of 15 KHZ) during reflection about Nyquist frequency appear in position 15-5=10 KHz. This is final ansver. The frequency of sampled signal will be equal 10 kHz in your case
Unless the bandwidth of the signal is less than half the sampling rate, you lose information during sampling and generally can't distinguish frequencies after that due to aliasing.
See Undersampling for more details about sampling at rates lower than twice the maximum signal frequency.
There's no simple formula that can give you the spectral content of a signal or the main frequency. In general you need to calculate a Discrete Fourier Transform of the sampled signal to find that out. If you're interested in whether or not there's a specific frequency, or how strong it is, you can calculate DFT at that frequency. The Goertzel algorithm can be an option.
EDIT: a signal at frequency f such that fsample/2 <= f < fsample will alias to f* = fsample - f, hence a 20KHz sine wave sampled at 30KHz will appear as a 10KHz sine wave.
In general frequencies above the fsample/2 can be observed in the sampled signal, but their frequency is ambiguous. That is, a frequency component with frequency f cannot be distinguished from other components with frequencies N*fsample/2 + f and N*fsample/2 – f for nonzero integers N. This ambiguity is called aliasing*.
Assuming a constant sampling rate, any sampling will alias together spectral content from below and above the sampling rate. If you have frequency content on both sides of the sampling rate that you don't want combined, you will have to filter one or the other frequency band Out before the sampling, or you will have a problem. For instance a low-pass filter which only passes signals below Fs/2, or a bandpass filter that only passes signals strictly between n*Fs/2 and (n+1)*Fs/2 for some integer n, might be appropriate.
Note that the accuracy of the sampling rate must be higher (lower jitter) for n > 0. Lack of this lower jitter would be an example of bad sampling that would add random phase noise.

Resources