Frequency Modulation - signal-processing

im trying to make frequency modulation. But could anyone explain, what about non-sinusoidal (and maybe non-periodic) carrier? Could we assume some FM( A(t), B(t) ) function, which modulates carrier, given by ABSTRACT(non-sinusoidal) function A(t) with signal, given by abstract function B(t)? could anyone write/explain something about that? what will formula look like in that most common case? I want some kind of recursive formula in terms, like "A(t-1)". Or either some explanation, if that is not possible and why.
Frequency modulation (FM)
proposes some kind of "varying playback speed" - but seems it does something wrong.
so i am repeating asking "how?"

Well, for a non-sinusoidal but periodic carrier you could easer use a look-up table as suggested by the answer of Paul R, or you could break up the periodic carrier into its Fourier modes, create an individual oscillator for each mode, modulate each one and sum then up.
In the case of a non-periodic signal the phase or frequency is not defined in general. Just think of noise, how should that be modulated? You would need to define what frequency modulation means for arbitrary signals.

If you are using a look-up table for your waveform generation then it's pretty easy to modify the standard phase accumulator synthesis method to add an FM input. See e.g. this answer.

Related

Python: time stretch wave files - comparison between three methods

I'm doing some data augmentation on a speech dataset, and I want to stretch/squeeze each audio file in the time domain.
I found the following three ways to do that, but I'm not sure which one is the best or more optimized way:
dimension = int(len(signal) * speed)
res = librosa.effects.time_stretch(signal, speed)
res = cv2.resize(signal, (1, dimension)).squeeze()
res = skimage.transform.resize(signal, (dimension, 1)).squeeze()
However, I found that librosa.effects.time_stretch adds unwanted echo (or something like that) to the signal.
So, my question is: What are the main differences between these three ways? And is there any better way to do that?
librosa.effects.time_stretch(signal, speed) (docs)
In essence, this approach transforms the signal using stft (short time Fourier transform), stretches it using a phase vocoder and uses the inverse stft to reconstruct the time domain signal. Typically, when doing it this way, one introduces a little bit of "phasiness", i.e. a metallic clang, because the phase cannot be reconstructed 100%. That's probably what you've identified as "echo."
Note that while this approach effectively stretches audio in the time domain (i.e., the input is in the time domain as well as the output), the work is actually being done in the frequency domain.
cv2.resize(signal, (1, dimension)).squeeze() (docs)
All this approach does is interpolating the given signal using bilinear interpolation. This approach is suitable for images, but strikes me as unsuitable for audio signals. Have you listened to the result? Does it sound at all like the original signal only faster/slower? I would assume not only the tempo changes, but also the frequency and perhaps other effects.
skimage.transform.resize(signal, (dimension, 1)).squeeze() (docs)
Again, this is meant for images, not sound. Additionally to the interpolation (spline interpolation with the order 1 by default), this function also does anti-aliasing for images. Note that this has nothing to do with avoiding audio aliasing effects (Nyqist/Aliasing), therefore you should probably turn that off by passing anti_aliasing=False. Again, I would assume that the results may not be exactly what you want (changing frequencies, other artifacts).
What to do?
IMO, you have several options.
If what you feed into your ML algorithms ends up being something like a Mel spectrogram, you could simply treat it as image and stretch it using the skimage or opencv approach. Frequency ranges would be preserved. I have successfully used this kind of approach in this music tempo estimation paper.
Use a better time_stretch library, e.g. rubberband. librosa is great, but its current time scale modification (TSM) algorithm is not state of the art. For a review of TSM algorithms, see for example this article.
Ignore the fact that the frequency changes and simply add 0 samples on a regular basis to the signal or drop samples on a regular basis from the signal (much like your image interpolation does). If you don't stretch too far it may still work for data augmentation purposes. After all the word content is not changed, if the audio content has higher or lower frequencies.
Resample the signal to another sampling frequency, e.g. 44100 Hz -> 43000 Hz or 44100 Hz -> 46000 Hz using a library like resampy and then pretend that it's still 44100 Hz. This still change the frequencies, but at least you get the benefit that resampy does proper filtering of the result so that you avoid the aforementioned aliasing, which otherwise occurs.

fundamental frequency of female voice

According to what I have read on the internet, the normal range of fundamental frequency of female voice is 165 to 255 Hz .
I am using Praat and also python library called Parselmouth to get the fundamental frequency values of female voice in an audio file(.wav). however, I got some values that are over 255Hz(eg: 400+Hz, 500Hz).
Is it normal to get big values like this?
It is possible, but unlikely, if you are trying to capture the fundamental frequency (F0) of a speaking voice. It sounds likely that you are capturing a more easily resonating overtone (e.g. F1 or F2) instead.
My experiments with Praat give me the impression that the with good parameters it will reliably extract F0.
What you'll want to do is to verify that by comparing the pitch curve with a spectrogram. Here's an example of a fitting made by Praat (female speaker):
You can see from the image that
Most prominent frequency seems to be F2
Around 200 Hz seems likely to be F0, since there's only noise below that (compared to before/after the segment)
Praat has calculated a good estimate of F0 for the voiced speech segments
If, after a visual inspection, it seems that you are getting wrong results, you can try to tweak the parameters. Window length greatly affects the frequency resolution.
If you can't capture frequencies this low, you should try increasing the window length - the intuition is that it gives the algorithm a better chance at finding slowly changing periodic features in the data.

8dpsk. How to "improve" constellation diagram?

I try to demodulate 8dpsk in software. Carrier frequency=1800 Hz, modulation rate=1600 bauds, i.e. itu-t v.27. Demodulator has the following properties:
Input passband hardware filter 50...3600 Hz;
sampling frequency - 9600 Hz;
Matched filter - RRC, Beta=1;
Timing recovery- simply Gardner,first order loop, 2
points per symbol.
Also, demodulator has interpolator to interpolate between matched filter outputs.
Physical line is short and I believe AWGN amount must be relative small. Demodulator works without errors, but constellation diagram looks ugly (see picture). Can anyone tell me how to "improve" constellation diagram?
I have placed interpolator before matched filter to minimize ISI and apply equalizer. Now it looks much better.
Your question has not general answer, but I want to suggest some solution.
At first, it is important what filter do you use in your Gardner algorithm to calculate jitter and mu and how you set the loop filter coefficients. It seems that it will improve your results.
Second, the RRC filter roll of factor is very important to get better results.
Third, if you revise the frequency offset before timing recovery, you get better results in time to lock and the constellation density. For doing this solution, you should feedback the estimated phase in PLL to the Timing recovery input and compensate phase offset for symbol synchronization.
Fourth, the interpolation that you use for symbol extraction from samples is important.
I write these suggestions in order that seems is useful for your constellation.

bayesianoptimization in machine learning

Thanks for reading this. I am currently studying bayesoptimization problem and follow the tutorial. Please see the attachment.bayesian optimization tutorial
In page 11, about the acquisition function. Before I raise my question I need state my understanding about bayesian optimization to see if there is anything wrong.
First we need take some training points and assume them as multivariable gaussian ditribution. Then we need use acquisiont function to find the next point we want to sample. So for example we use x1....x(t) as training point then we need use acquisition function to find x(t+1) and sample it. Then we'll assume x1....x(t),x(t+1) as multivariable gaussian ditribution and then use acquisition function to find x(t+2) to sample so on and so forth.
In page 11, seems we need find the x that max the probability of improvement. f(x+) is from the sample training point(x1...xt) and easy to get. But how to get u(x) and that variance here? I don't know what is the x in the eqaution. It should be x(t+1) but the paper doesn't say that. And if it is indeed x(t+1), then how could I get its u(x(t+1))? You may say use equation at the bottom page 8, but we can use that equation on condition that we have found the the x(t+1) and put it into multivariable gaussian distribution. Now we don't know what is the next point x(t+1) so I have no way to calculate, in my opinion.
I know this is a tough question. Thanks for answering!!
In fact I have got the answer.
Indeed it is x(t+1). The direct way is we compute every u and varaince of the rest x outside of the training data and put it into acquisition function to find which one is the maximum.
This is time consuming. So we use nonlinear optimization like DIRECT to get the x that max the acquisition function instead of trying one by one

vDSP: Do the FFT functions include windowing?

I am working on implementing an algorithm using vDSP.
1) take FFT
2) take log of square of absolute value (can be done with lookup table)
3) take another FFT
4) take absolute value
I'm not sure if it is up to me to throw the incoming data through a windowing function before I run the FFT on it.
vDSP_fft_zrip(setupReal, &A, stride, log2n, direction);
that is my FFT function
Do I need to throw the data through vDSP_hamm_window(...) first?
The iOS Accelerate library function vDSP_fft_zrip() does not include applying a window function (unless you count the implied rectangular window due to the finite length parameter).
So you need to apply your chosen window function (there are many different ones) first.
It sounds like you're doing cepstral analysis and yes, you do need a window function prior to the first FFT. I would suggest a simple Hann or Hamming window.
I don't have any experience with your particular library, but in every other FFT library I know of it's up to you to window the data first. If nothing else, the library can't know what window you wish to use, and sometimes you don't want to use a window (if you're using the FFT for overlap-add filtering, or if you know the signal is exactly periodic in the transform block).
Also, just offhand, it seems like if you're doing 2 FFTs, the overhead of calling a logarithm function is relatively minor.

Resources