Gnuradio Buffer and sampling rate - buffer

I am kind of confused on how the ring buffers in gnuradio block works.
I am using a UHD:USRP source block with a sampling rate of 20Msps.
I believe each block has an input buffer and an output buffer that is fed into the next block.
In my flowgraph, does it mean USRP source block is pulling data from the hardware at a rate of 20Msps and the buffer will contain 20Msamples?
The output and input buffer numbers in all the blocks are made default 0.
So how does the buffer for USRP Source and the subsequent blocks look?
And in the Stream to Vector onwards does that become only 64 samples?

This is not the case. The source will sample at 20MSPS and put samples into the output buffer of the source block. Each buffer has a maximum size before it overflows.
In the code of each block there is a function called forecast() which tells the GNU Radio scheduler how many input samples it needs to produce a certain number of output samples. The scheduler then determines when there are enough samples for a block to produce an output. If there is enough space on the input and output buffers the scheduler calls that blocks work() function which then does the necessary signal processing at the input and produces an output. The number of samples at the input and space at the output buffers is never constant.
So in the example of the stream to vector block there may be 4096 samples at the input. It is the blocks job to check this and produce 64 vectors at the output.
GNU radio prefers that a block processes as much data as possible per call to the work function since moving samples between blocks is a costly operation.
There are some exceptions such as tagged stream blocks but for most blocks this is the basic idea.

Related

How to normalize amplitude differeces within the 433Mhz signal burst in GNU Radio Companion?

I'm learning SDR by trying to decode different 433Mhz keyfob signals.
My initial flow for capture and pre-processing looks like this:
What I get on the sink is:
I guess the bits are visible fine and I could proceed to decode it somehow. But I am worried about the big difference in the amplitude: the very beginning of the burst has much higher amplitude in comparison to the rest of the packet. This picture is very consistent (I was not able to get bursts with more balanced amplitudes). If I was speaking about music recording I would look for a compression method. But I don't know what the equivalent in the SDR world is.
I'm not sure if this will be a problem when I'll try to quadrature demod, binary slice and/or clock recover.
Is this a known problem and what is the approach to eliminate it within GNU Radio Companion?

Python: time stretch wave files - comparison between three methods

I'm doing some data augmentation on a speech dataset, and I want to stretch/squeeze each audio file in the time domain.
I found the following three ways to do that, but I'm not sure which one is the best or more optimized way:
dimension = int(len(signal) * speed)
res = librosa.effects.time_stretch(signal, speed)
res = cv2.resize(signal, (1, dimension)).squeeze()
res = skimage.transform.resize(signal, (dimension, 1)).squeeze()
However, I found that librosa.effects.time_stretch adds unwanted echo (or something like that) to the signal.
So, my question is: What are the main differences between these three ways? And is there any better way to do that?
librosa.effects.time_stretch(signal, speed) (docs)
In essence, this approach transforms the signal using stft (short time Fourier transform), stretches it using a phase vocoder and uses the inverse stft to reconstruct the time domain signal. Typically, when doing it this way, one introduces a little bit of "phasiness", i.e. a metallic clang, because the phase cannot be reconstructed 100%. That's probably what you've identified as "echo."
Note that while this approach effectively stretches audio in the time domain (i.e., the input is in the time domain as well as the output), the work is actually being done in the frequency domain.
cv2.resize(signal, (1, dimension)).squeeze() (docs)
All this approach does is interpolating the given signal using bilinear interpolation. This approach is suitable for images, but strikes me as unsuitable for audio signals. Have you listened to the result? Does it sound at all like the original signal only faster/slower? I would assume not only the tempo changes, but also the frequency and perhaps other effects.
skimage.transform.resize(signal, (dimension, 1)).squeeze() (docs)
Again, this is meant for images, not sound. Additionally to the interpolation (spline interpolation with the order 1 by default), this function also does anti-aliasing for images. Note that this has nothing to do with avoiding audio aliasing effects (Nyqist/Aliasing), therefore you should probably turn that off by passing anti_aliasing=False. Again, I would assume that the results may not be exactly what you want (changing frequencies, other artifacts).
What to do?
IMO, you have several options.
If what you feed into your ML algorithms ends up being something like a Mel spectrogram, you could simply treat it as image and stretch it using the skimage or opencv approach. Frequency ranges would be preserved. I have successfully used this kind of approach in this music tempo estimation paper.
Use a better time_stretch library, e.g. rubberband. librosa is great, but its current time scale modification (TSM) algorithm is not state of the art. For a review of TSM algorithms, see for example this article.
Ignore the fact that the frequency changes and simply add 0 samples on a regular basis to the signal or drop samples on a regular basis from the signal (much like your image interpolation does). If you don't stretch too far it may still work for data augmentation purposes. After all the word content is not changed, if the audio content has higher or lower frequencies.
Resample the signal to another sampling frequency, e.g. 44100 Hz -> 43000 Hz or 44100 Hz -> 46000 Hz using a library like resampy and then pretend that it's still 44100 Hz. This still change the frequencies, but at least you get the benefit that resampy does proper filtering of the result so that you avoid the aforementioned aliasing, which otherwise occurs.

Input representation in FFT for a given list of amplitudes and sampling rate

How to represent a use a sound wave (Sine wave, 1000Hz, 3sec, -3dBFS, 44.1kHz) in FFT program? The input for the program is list of amplitues and sampling rate.
I mean how to transform a sound file(Eg: XYZ.wav file) as input to FFT where one of the input argument needs to take a .dat file consisting of amplitudes and other input arguments needs to take sampling rate and if any necessary.
Typically when you execute a fft call you supply a one dimensional array which represents a curve in the time domain, often this is your audio curve however fft will transform any time series curve ... when you start from an audio file, say a wav file, you must transform the binary data into this floating point 1D array ... if its wav then the file will begin with a 44 byte header which details essential attributes like sample rate, bit depth and endianness ... the rest of the wav file is the payload ... depending on bit depth you will then need to parse a set of bytes then transform the data from typically 16 bit which will consume two bytes into an integer by doing some bit shifting ... to do that you need to be aware of notion of endianness (big or little endian) as well as handling interleaving of a multi-channel signal like stereo ... once you have the generated the floating point array representation just feed it into your fft call ... for starters ignore using a wav file and simply synthesize your own sin curve and feed this into a fft call just to confirm the known frequency in will result in that frequency represented in its frequency domain coming out of the fft call
The response back from a fft call (or DFT) will be a 1D array of complex numbers ... there is a simple formula to calculate the magnitude and phase of each frequency in this fft result set ... be aware of what a Nyquist Limit is and how to fold the freq domain array on top of itself to double the magnitude while only using half of the elements of this freq domain array ... element 0 of this freq domain array is your DC offset and each subsequent element is called a frequency bin which is separated from each other by a constant frequency increment also calculated by a simple formula ... pipe back if interested in what these formulas are
Now you can appreciate people who spend their entire careers pushing the frontiers of working the algos behind the curtain on these api calls ... slop chop slamming together 30 lines of api calls to perform all of above is probably available however its far more noble to write the code to perform all of above yourself by hand as I know it will open up new horizons to enable you to conquer more subtle questions
A super interesting aspect of transforming a curve in time domain into its frequency domain counterpart by making a fft call is that you have retained all of the information of your source signal ... to prove this point I highly suggest you take the next step and perform the symmetrical operation by transforming the output of your fft call back into the time domain
audio curve in time domain --> fft --> freq domain representation --> inverse fft --> back to original audio curve in time domain
this cycle of transformation is powerful as its a nice way to allow you to confirm your audio curve to freq domain is working

Frequency analysis of very short signal in GNU Octave

I have some very short signals from oscilloscope (50k-200k samples) registered over about 2ms time length. Those are acoustic signals with registered signal of a spark of ESD (electrostatic discharge).
I'd like to get some frequency data of that signal, in near-acoustic frequency range (up to about 30kHz) with as high time resolution as possible.
I have tried ploting a spectrogram (specgram in Octave) to view the signal, but the output is not really usefull. Using specgram( x, N, fs );, where x is my signal of fs sampling rate, I receive plot starting at very high frequencies of about 500MHz for low values of N and I get better frequency resolution for big N values (like 2^12-13) but the window is too wide and I receive only 2 spectrum values over whole signal length.
I understand that it may be the limitation of Fourier transform which is probably used by the specgram function (actually, I don't know much about signal analysis).
Is there any other way to get some frequency (as a function of time) information of that kind of signal? I've read something about wavelets, but when I tried using dwt function of signal package, I received this error:
error: 'wfilters' undefined near line 51 column 14
error: called from
dwt at line 51 column 12
Even if this would work, I am not so sure if I'd know how to actually use the output of those wavelet functions ...
To get audio frequency information from such a high sample rate, you will need obtain a sample vector long enough to contain at least a few whole cycles at audio frequencies, e.g. many 10's of milliseconds of contiguous samples, which may or may not be more than your scope can gather. To reasonably process this amount of data, you might low pass filter the sample data to just contain audio frequencies, and then resample it to a lower sample rate, but above twice that filter cut-off frequency. Then you will end up with a much shorter sample vector to feed an FFT for your audio spectrum analysis.

What is an ideal domain for FFT?

I am using the Jtransforms library which seems to be wicked fast for my purpose.
At this point I think I have a pretty good handle on how FFT works so now I am wondering if there is any form of a standard domain which is used for audio visualizations like spectograms?
Thanks to android's native FFT in 2.3 I had been using bytes as the range although I am still unclear as to whether it is signed or not. (I know java doesn't have unsigned bytes, but Google implemented these functions natively and the waveform is PCM 8bit unsigned)
However I am adapting my app to work with mic audio and 2.1 phones. At this point having the input domain being in the range of bytes whether it is [-128, 127] or [0, 255] no longer seems quite optimal.
I would like the range of my FFT function to be [0,1] so that I can scale it easily.
So should I use a domain of [-1, 1] or [0, 1]?
Essentially, the input domain does not matter. At most, it causes an offset and a change in scaling on your original data, which will be turned into an offset on bin #0 and an overall change in scaling on your frequency-domain results, respectively.
As to limiting your FFT output to [0,1]; that's essentially impossible. In general, the FFT output will be complex, there's no way to manipulate your input data so that the output is restricted to positive real numbers.
If you use DCT instead of FFT your output range will be real. (Read about the difference and decide if DCT is suitable for your application.)
FFT implementations for real numbers (as input domain) use half the samples for the output range (since there are only even results when the input is real), therefore the fact you have both real and imaginary parts for each sample doesn't effect the size of the result (vs the size of the source) much (output size is ceil(n/2)*2).

Resources