How to "MUX" audio tracks in Audacity? - signal-processing

I have two mono tracks (A & B) of 48khz sampling frequency. Is there a way to generate a new mono track of 96khz sampling rate in which the samples are alternating from the original mono tracks? i.e. a1,b1,a2,b2,a3,b3....
Is there a better tool for doing such a process ?

Related

How to fix the parameters for MFCC feature extraction?

I want to extract MFCC features for a 10 second speech audio which was recorded at 44.1 kHz using librosa.
Should I fix my sr at 8k or 44.1k (given that speech is mostly in the lower band)?
Also, how do I choose the values for hop_length, window_length, n_mels, n_mfcc, and n_fft? How does the calculation work?
I would like to use this audio for an ASR task. Thanks in advance for your expertise!

Peak-to-Peak amplitude of sinus wave signal as function of time in LabVIEW

A current source is exciting a Loud by an AC current of ±5mA. The voltage through the loud is measured using the NI data acquisition. The resistance of the loud changes with time, the peak-to-peak amplitude of the voltage signal changes accordingly. How to define the relationship between the loud's resistance and the voltage peak-to-peak amplitude?! in other words, how can I plot the graph of signal's peak-to-peak amplitude as a function of time in LabView?
measure with appropriate Sample-Rate, at least 10 times higher than the max. frequency of your signal.
use a DAQ, that has "synchronous sampling"
measure current and voltage (synchronously with high sample rate). You can use either a shunt or a current transducer for current measurement
Sample in "Blocks". This means: let the DAQ device store e.g. 10k Values (at a Sample rate of 100kHz) in it's internal memory, and read that buffer every 100 ms. Go to the Example finder (Help -> Find exampes) and look for "continous analog measurement" examples.
calculate the RMS-Value of both signals of each block and plot that in a graph. If you want it simple, feed both signals into a "Chart".
if the current is constant (which should be a strait line in the graph!), the voltage should rise over time, when the inner resistance of the loudspeaker rises ...
Note: be aware, that with the example numbers above (100kHz SR, Blocksizte 10k) calculating the RMS value will produce wrong results, when your signal main frequency is below 10 Hz!

How can I resample an audio file programatically in swift?

I would like to know if it's possible to resample an already written AVAudioFile.
All the references that I found don't work on this particular problem, since:
They propose the resampling while the user is recording an AVAudioFile, while installTap is running. In this approach, the AVAudioConverter works in each buffer chunk given by the inputNode and appends it in the AVAudioFile. [1] [2]
The point is that I would like to resample my audio file regardless of the recording process.
The harder approach would be to upsample the signal by a L factor and applying decimation by a factor of M, using vDSP:
Audio on Compact Disc has a sampling rate of 44.1 kHz; to transfer it to a digital medium that uses 48 kHz, method 1 above can be used with L = 160, M = 147 (since 48000/44100 = 160/147). For the reverse conversion, the values of L and M are swapped. Per above, in both cases, the low-pass filter should be set to 22.05 kHz. [3]
The last one obviously seems like a too hard coded way to solve it. I hope there's a way to resample it with AVAudioConverter, but it lacks documentation :(

Python: time stretch wave files - comparison between three methods

I'm doing some data augmentation on a speech dataset, and I want to stretch/squeeze each audio file in the time domain.
I found the following three ways to do that, but I'm not sure which one is the best or more optimized way:
dimension = int(len(signal) * speed)
res = librosa.effects.time_stretch(signal, speed)
res = cv2.resize(signal, (1, dimension)).squeeze()
res = skimage.transform.resize(signal, (dimension, 1)).squeeze()
However, I found that librosa.effects.time_stretch adds unwanted echo (or something like that) to the signal.
So, my question is: What are the main differences between these three ways? And is there any better way to do that?
librosa.effects.time_stretch(signal, speed) (docs)
In essence, this approach transforms the signal using stft (short time Fourier transform), stretches it using a phase vocoder and uses the inverse stft to reconstruct the time domain signal. Typically, when doing it this way, one introduces a little bit of "phasiness", i.e. a metallic clang, because the phase cannot be reconstructed 100%. That's probably what you've identified as "echo."
Note that while this approach effectively stretches audio in the time domain (i.e., the input is in the time domain as well as the output), the work is actually being done in the frequency domain.
cv2.resize(signal, (1, dimension)).squeeze() (docs)
All this approach does is interpolating the given signal using bilinear interpolation. This approach is suitable for images, but strikes me as unsuitable for audio signals. Have you listened to the result? Does it sound at all like the original signal only faster/slower? I would assume not only the tempo changes, but also the frequency and perhaps other effects.
skimage.transform.resize(signal, (dimension, 1)).squeeze() (docs)
Again, this is meant for images, not sound. Additionally to the interpolation (spline interpolation with the order 1 by default), this function also does anti-aliasing for images. Note that this has nothing to do with avoiding audio aliasing effects (Nyqist/Aliasing), therefore you should probably turn that off by passing anti_aliasing=False. Again, I would assume that the results may not be exactly what you want (changing frequencies, other artifacts).
What to do?
IMO, you have several options.
If what you feed into your ML algorithms ends up being something like a Mel spectrogram, you could simply treat it as image and stretch it using the skimage or opencv approach. Frequency ranges would be preserved. I have successfully used this kind of approach in this music tempo estimation paper.
Use a better time_stretch library, e.g. rubberband. librosa is great, but its current time scale modification (TSM) algorithm is not state of the art. For a review of TSM algorithms, see for example this article.
Ignore the fact that the frequency changes and simply add 0 samples on a regular basis to the signal or drop samples on a regular basis from the signal (much like your image interpolation does). If you don't stretch too far it may still work for data augmentation purposes. After all the word content is not changed, if the audio content has higher or lower frequencies.
Resample the signal to another sampling frequency, e.g. 44100 Hz -> 43000 Hz or 44100 Hz -> 46000 Hz using a library like resampy and then pretend that it's still 44100 Hz. This still change the frequencies, but at least you get the benefit that resampy does proper filtering of the result so that you avoid the aforementioned aliasing, which otherwise occurs.

Keyword spotter doesn't work well with narrowband speech signal. How to solve it?

Here's what I have:
Acoustic model (CMU Sphinx) to be used in a keyword spotter. Trained for speech sampled at 16kHz and performs well. Doesn't perform well when presented with a speech signal sampled at 8kHz or a speech signal with max bandwidth of 4kHz and sample rate = 16kHz.
A microphone which only delivers a narrow-band signal. The bandwidth of the signal is max 4kKz. I can set the sample rate (audio driver API) to 16kHz, but the bandwidth remains the same since the underlying
HW samples at 8kHz. Can't change that!
Here's the result:
The keyword spotter fails when it's presented with a speech signal (sample rate 16kHz) which only has
a bandwidth of 4kHz.
Here's my question:
Would it be reasonable to expect that the keyword spotter will work if I "fake it" by bandwidth
extending the narrowband signal prior to sending it to the keyword spotter?
What is the simplest BW-extender ? (I'm looking for something which can be implemented fast).
Thanks
There are 8khz models, you should use them instead.
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-ptm-8khz-5.2.tar.gz

Resources