I try to demodulate 8dpsk in software. Carrier frequency=1800 Hz, modulation rate=1600 bauds, i.e. itu-t v.27. Demodulator has the following properties:
Input passband hardware filter 50...3600 Hz;
sampling frequency - 9600 Hz;
Matched filter - RRC, Beta=1;
Timing recovery- simply Gardner,first order loop, 2
points per symbol.
Also, demodulator has interpolator to interpolate between matched filter outputs.
Physical line is short and I believe AWGN amount must be relative small. Demodulator works without errors, but constellation diagram looks ugly (see picture). Can anyone tell me how to "improve" constellation diagram?
I have placed interpolator before matched filter to minimize ISI and apply equalizer. Now it looks much better.
Your question has not general answer, but I want to suggest some solution.
At first, it is important what filter do you use in your Gardner algorithm to calculate jitter and mu and how you set the loop filter coefficients. It seems that it will improve your results.
Second, the RRC filter roll of factor is very important to get better results.
Third, if you revise the frequency offset before timing recovery, you get better results in time to lock and the constellation density. For doing this solution, you should feedback the estimated phase in PLL to the Timing recovery input and compensate phase offset for symbol synchronization.
Fourth, the interpolation that you use for symbol extraction from samples is important.
I write these suggestions in order that seems is useful for your constellation.
Related
I've been given some digitized sound recordings and asked to plot the sound pressure level per Hz.
The signal is sampled at 40KHz and the units for the y axis are simply volts.
I've been asked to produce a graph of the SPL as dB/Hz vs Hz.
EDIT: The input units are voltage vs time.
Does this make sense? I though SPL was a time domain measure?
If it does make sense how would I go about producing this graph? Apply the dB formula (20 * log10(x) IIRC) and do an FFT on that or...?
What you're describing is a Power Spectral Density. Matlab, for example, has a pwelch function that does literally what you're asking for. To scale to dBSPL/Hz, simply apply 10*log10([psd]) where psd is the output of pwelch. Let me know if you need help with the function inputs.
If you're working with a different framework, let me know which, 100% sure they'll have a version of this function, possibly with a different output format in which case the scaling might be different.
I'm doing some data augmentation on a speech dataset, and I want to stretch/squeeze each audio file in the time domain.
I found the following three ways to do that, but I'm not sure which one is the best or more optimized way:
dimension = int(len(signal) * speed)
res = librosa.effects.time_stretch(signal, speed)
res = cv2.resize(signal, (1, dimension)).squeeze()
res = skimage.transform.resize(signal, (dimension, 1)).squeeze()
However, I found that librosa.effects.time_stretch adds unwanted echo (or something like that) to the signal.
So, my question is: What are the main differences between these three ways? And is there any better way to do that?
librosa.effects.time_stretch(signal, speed) (docs)
In essence, this approach transforms the signal using stft (short time Fourier transform), stretches it using a phase vocoder and uses the inverse stft to reconstruct the time domain signal. Typically, when doing it this way, one introduces a little bit of "phasiness", i.e. a metallic clang, because the phase cannot be reconstructed 100%. That's probably what you've identified as "echo."
Note that while this approach effectively stretches audio in the time domain (i.e., the input is in the time domain as well as the output), the work is actually being done in the frequency domain.
cv2.resize(signal, (1, dimension)).squeeze() (docs)
All this approach does is interpolating the given signal using bilinear interpolation. This approach is suitable for images, but strikes me as unsuitable for audio signals. Have you listened to the result? Does it sound at all like the original signal only faster/slower? I would assume not only the tempo changes, but also the frequency and perhaps other effects.
skimage.transform.resize(signal, (dimension, 1)).squeeze() (docs)
Again, this is meant for images, not sound. Additionally to the interpolation (spline interpolation with the order 1 by default), this function also does anti-aliasing for images. Note that this has nothing to do with avoiding audio aliasing effects (Nyqist/Aliasing), therefore you should probably turn that off by passing anti_aliasing=False. Again, I would assume that the results may not be exactly what you want (changing frequencies, other artifacts).
What to do?
IMO, you have several options.
If what you feed into your ML algorithms ends up being something like a Mel spectrogram, you could simply treat it as image and stretch it using the skimage or opencv approach. Frequency ranges would be preserved. I have successfully used this kind of approach in this music tempo estimation paper.
Use a better time_stretch library, e.g. rubberband. librosa is great, but its current time scale modification (TSM) algorithm is not state of the art. For a review of TSM algorithms, see for example this article.
Ignore the fact that the frequency changes and simply add 0 samples on a regular basis to the signal or drop samples on a regular basis from the signal (much like your image interpolation does). If you don't stretch too far it may still work for data augmentation purposes. After all the word content is not changed, if the audio content has higher or lower frequencies.
Resample the signal to another sampling frequency, e.g. 44100 Hz -> 43000 Hz or 44100 Hz -> 46000 Hz using a library like resampy and then pretend that it's still 44100 Hz. This still change the frequencies, but at least you get the benefit that resampy does proper filtering of the result so that you avoid the aforementioned aliasing, which otherwise occurs.
According to what I have read on the internet, the normal range of fundamental frequency of female voice is 165 to 255 Hz .
I am using Praat and also python library called Parselmouth to get the fundamental frequency values of female voice in an audio file(.wav). however, I got some values that are over 255Hz(eg: 400+Hz, 500Hz).
Is it normal to get big values like this?
It is possible, but unlikely, if you are trying to capture the fundamental frequency (F0) of a speaking voice. It sounds likely that you are capturing a more easily resonating overtone (e.g. F1 or F2) instead.
My experiments with Praat give me the impression that the with good parameters it will reliably extract F0.
What you'll want to do is to verify that by comparing the pitch curve with a spectrogram. Here's an example of a fitting made by Praat (female speaker):
You can see from the image that
Most prominent frequency seems to be F2
Around 200 Hz seems likely to be F0, since there's only noise below that (compared to before/after the segment)
Praat has calculated a good estimate of F0 for the voiced speech segments
If, after a visual inspection, it seems that you are getting wrong results, you can try to tweak the parameters. Window length greatly affects the frequency resolution.
If you can't capture frequencies this low, you should try increasing the window length - the intuition is that it gives the algorithm a better chance at finding slowly changing periodic features in the data.
In Sec. 4.7 of the classical textbook "Discrete-Time Signal Processing (3rd)", the efficient implementation of multi-rate processing is well discussed. The first method deal with the "interchange of filtering with compressor/expander", and the following figure shows the interchange in downsampling.
Since downsampling can cause aliasing, the pre-filtering is necessary. In the figure, we can notice that H(z) in (a) and H(z^M) in (b); however, if aliasing has occurred after downsampling in (a), can H(z) eliminates the aliasing? Thank you!
Yes, as long as the original filter was of the form H(z^M), meaning that only every Mth coefficient of the filter is non-zero.
The reason this is possible comes down to the fact that only each Mth input sample actually factors into the output sequence in this configuration. It is a special case since input samples at non multiples of M are always cancelled out by either the filter zero coefficients or the decimator. It is unnecessary to even consider input samples at indexes other than multiples of M.
This means you can decimate the input first and then apply the filter with its zero coefficients removed.
im trying to make frequency modulation. But could anyone explain, what about non-sinusoidal (and maybe non-periodic) carrier? Could we assume some FM( A(t), B(t) ) function, which modulates carrier, given by ABSTRACT(non-sinusoidal) function A(t) with signal, given by abstract function B(t)? could anyone write/explain something about that? what will formula look like in that most common case? I want some kind of recursive formula in terms, like "A(t-1)". Or either some explanation, if that is not possible and why.
Frequency modulation (FM)
proposes some kind of "varying playback speed" - but seems it does something wrong.
so i am repeating asking "how?"
Well, for a non-sinusoidal but periodic carrier you could easer use a look-up table as suggested by the answer of Paul R, or you could break up the periodic carrier into its Fourier modes, create an individual oscillator for each mode, modulate each one and sum then up.
In the case of a non-periodic signal the phase or frequency is not defined in general. Just think of noise, how should that be modulated? You would need to define what frequency modulation means for arbitrary signals.
If you are using a look-up table for your waveform generation then it's pretty easy to modify the standard phase accumulator synthesis method to add an FM input. See e.g. this answer.