AudioKit: Noise gate - ios

I'm trying to implement a simple noise gate if the amplitude is beyond a certain threshold using AudioKit.
I believe this should be simple and I just need to use the AKAmplitudeTracker and set the output to zero, but I can't work out how to do the latter part.
Source for AKAmplitudeTracker

If I understand your question, you don't know how to set the output to zero. I'll go ahead and write the most obvious answer first, send the output through a booster,
...tracker stuff...
let booster = AKBooster(tracker, gain: 0)
AudioKit.output = booster
and then wherever you poll the tracker, set
if tracker.amplitude > threshold {
booster.gain = 1
}
Mind you, this will be very primitive and you'll have a better noise gate doing things at the DSP level, but this may be good enough for a proof of concept or test.

Related

How to use apple's Accelerate framework in swift in order to compute the FFT of a real signal?

How am I supposed to use the Accelerate framework to compute the FFT of a real signal in Swift on iOS?
Available example on the web
Apple’s Accelerate framework seems to provide functions to compute the FFT of a signal efficiently.
Unfortunately, most of the examples available on the Internet, like Swift-FFT-Example and TempiFFT, crash if tested extensively and call the Objective C API.
The Apple documentation answers many questions, but also leads to some others (Is this piece mandatory? Why do I need this call to convert?).
Threads on Stack Overflow
There are few threads addressing various aspects of the FFT with concrete examples. Notably FFT Using Accelerate In Swift, DFT result in Swift is different than that of MATLAB and FFT Calculating incorrectly - Swift.
None of them address directly the question “What is the proper way to do it, starting from 0”?
It took me one day to figure out how to properly do it, so I hope this thread can give a clear explanation of how you are supposed to use Apple's FFT, show what are the pitfalls to avoid, and help developers save precious hours of their time.
TL ; DR : If you need a working implementation to copy past here is a gist.
What is FFT?
The Fast Fourier transform is an algorithm that take a signal in the time domain -- a collection of measurements took at a regular, usual small, interval of time -- and turn it into a signal expressed into the phase domain (a collection of frequency).
The ability to express the signal along time lost by the transformation (the transformation is invertible, which means no information is lost by computing the FFT and you can apply a IFFT to get the original signal back), but we get the ability to distinguish between frequencies that the signal contained. This is typically used to display the spectrograms of the music you are listening to on various hardware and youtube videos.
The FFT works with complexe numbers. If you don't know what they are, lets just pretend it is a combination of a radius and an angle. There is one complex number per point on a 2D plane. Real numbers (your usual floats) can be saw as a position on a line (negative on the left, positive on the right).
Nb: FFT(FFT(FFT(FFT(X))) = X (up to a constant depending on your FFT implementation).
How to compute the FFT of a real signal.
Usual you want to compute the FFT of a small window of an audio signal. For sake of the example, we will take a small 1024 samples window. You would also prefer to use a power of two, otherwise things gets a little bit more difficult.
var signal: [Float] // Array of length 1024
First, you need to initialize some constants for the computation.
// The length of the input
length = vDSP_Length(signal.count)
// The power of two of two times the length of the input.
// Do not forget this factor 2.
log2n = vDSP_Length(ceil(log2(Float(length * 2))))
// Create the instance of the FFT class which allow computing FFT of complex vector with length
// up to `length`.
fftSetup = vDSP.FFT(log2n: log2n, radix: .radix2, ofType: DSPSplitComplex.self)!
Following apple's documentation, we first need to create a complex array that will be our input.
Dont get mislead by the tutorial. What you usual want is to copy your signal as the real part of the input, and keep the complex part null.
// Input / Output arrays
var forwardInputReal = [Float](signal) // Copy the signal here
var forwardInputImag = [Float](repeating: 0, count: Int(length))
var forwardOutputReal = [Float](repeating: 0, count: Int(length))
var forwardOutputImag = [Float](repeating: 0, count: Int(length))
Be careful, the FFT function do not allow to use the same splitComplex as input and output at the same time. If you experience crashs, this may be the cause. This is why we define both an input and an output.
Now, we have to be careful and "lock" the pointer to this four arrays, as showed in the documentation example. If you simply use &forwardInputReal as argument of your DSPSplitComplex, the pointer may become invalidated at the following line and you will likely experience sporadic crash of your app.
forwardInputReal.withUnsafeMutableBufferPointer { forwardInputRealPtr in
forwardInputImag.withUnsafeMutableBufferPointer { forwardInputImagPtr in
forwardOutputReal.withUnsafeMutableBufferPointer { forwardOutputRealPtr in
fforwardOutputImag.withUnsafeMutableBufferPointer { forwardOutputImagPtr in
// Input
let forwardInput = DSPSplitComplex(realp: forwardInputRealPtr.baseAddress!, imagp: forwardInputImagPtr.baseAddress!)
// Output
var forwardOutput = DSPSplitComplex(realp: forwardOutputRealPtr.baseAddress!, imagp: forwardOutputImagPtr.baseAddress!)
// FFT call goes here
}
}
}
}
Now, the finale line: the call to your fft:
fftSetup.forward(input: forwardInput, output: &forwardOutput)
The result of your FFT is now available in forwardOutputReal and forwardOutputImag.
If you want only the amplitude of each frequency, and you don't care about the real and imaginary part, you can declare alongside the input and output an additional array:
var magnitudes = [Float](repeating: 0, count: Int(length))
add right after your fft compute the amplitude of each "bin" with:
vDSP.absolute(forwardOutput, result: &magnitudes)

Python: time stretch wave files - comparison between three methods

I'm doing some data augmentation on a speech dataset, and I want to stretch/squeeze each audio file in the time domain.
I found the following three ways to do that, but I'm not sure which one is the best or more optimized way:
dimension = int(len(signal) * speed)
res = librosa.effects.time_stretch(signal, speed)
res = cv2.resize(signal, (1, dimension)).squeeze()
res = skimage.transform.resize(signal, (dimension, 1)).squeeze()
However, I found that librosa.effects.time_stretch adds unwanted echo (or something like that) to the signal.
So, my question is: What are the main differences between these three ways? And is there any better way to do that?
librosa.effects.time_stretch(signal, speed) (docs)
In essence, this approach transforms the signal using stft (short time Fourier transform), stretches it using a phase vocoder and uses the inverse stft to reconstruct the time domain signal. Typically, when doing it this way, one introduces a little bit of "phasiness", i.e. a metallic clang, because the phase cannot be reconstructed 100%. That's probably what you've identified as "echo."
Note that while this approach effectively stretches audio in the time domain (i.e., the input is in the time domain as well as the output), the work is actually being done in the frequency domain.
cv2.resize(signal, (1, dimension)).squeeze() (docs)
All this approach does is interpolating the given signal using bilinear interpolation. This approach is suitable for images, but strikes me as unsuitable for audio signals. Have you listened to the result? Does it sound at all like the original signal only faster/slower? I would assume not only the tempo changes, but also the frequency and perhaps other effects.
skimage.transform.resize(signal, (dimension, 1)).squeeze() (docs)
Again, this is meant for images, not sound. Additionally to the interpolation (spline interpolation with the order 1 by default), this function also does anti-aliasing for images. Note that this has nothing to do with avoiding audio aliasing effects (Nyqist/Aliasing), therefore you should probably turn that off by passing anti_aliasing=False. Again, I would assume that the results may not be exactly what you want (changing frequencies, other artifacts).
What to do?
IMO, you have several options.
If what you feed into your ML algorithms ends up being something like a Mel spectrogram, you could simply treat it as image and stretch it using the skimage or opencv approach. Frequency ranges would be preserved. I have successfully used this kind of approach in this music tempo estimation paper.
Use a better time_stretch library, e.g. rubberband. librosa is great, but its current time scale modification (TSM) algorithm is not state of the art. For a review of TSM algorithms, see for example this article.
Ignore the fact that the frequency changes and simply add 0 samples on a regular basis to the signal or drop samples on a regular basis from the signal (much like your image interpolation does). If you don't stretch too far it may still work for data augmentation purposes. After all the word content is not changed, if the audio content has higher or lower frequencies.
Resample the signal to another sampling frequency, e.g. 44100 Hz -> 43000 Hz or 44100 Hz -> 46000 Hz using a library like resampy and then pretend that it's still 44100 Hz. This still change the frequencies, but at least you get the benefit that resampy does proper filtering of the result so that you avoid the aforementioned aliasing, which otherwise occurs.

AudioKit - Issues getting specific frequencies (above 20kHz)

I got a small problem using the AudioKit framework:
----> I can't get the AudioKit framework to pick up special frequencies above and below a specific amount. (frequency below 100Hz & above 20kHz)
Edit: I've tested some frequency-tracker apps on my iOS device combined with some online-tonegenerator tools to check if my iPhone microphone is able to pick up those frequency above 20000Hz... And it is.
But using the default frequency tracker code snippets: Frequency above 20000Hz can not be picked up by AudioKit.
mic = AKMicrophone()
tracker = AKFrequencyTracker(mic)
silence = AKBooster(tracker, gain: 0)
AKSettings.audioInputEnabled = true
AudioKit.output = silence
AudioKit.start()
print(tracker.frequency)
---> Is this is a limitation of the AudioKit framework, a limitation of the default iOS-device settings- or is there maybe another way to achieve getting those frequencies?
This is a limitation in AudioKit. The Frequency Tracker is based on a soundpipe opcode called ptrack which is in turn based off Csound's ptrack. You could modify the parameters to try to givie you rmore precision in the area of your concern, but if you are concerned with better results both low and high, that will be very processor intensive. Perhaps using two "banded" trackers for low and high would be better. There are always choices to be made and AudioKit's frequency tracker is geared towards typical audible ranges.

My Double DQN algorithm for 2048 game never learns

I am trying to make Double-DQN algorithm to learn play 2048 game. My implementation is available in GitHub if you want to check the code. (https://github.com/codetiger/MachineLearning-2048)
My code is not learning after a basic level. Its not able to achieve more than 256 tile. Some of my predictions are below.
I am using a random player to train the code. I guess RL algorithms learn this way. They try all possible moves and learn from failures. I wild guess is, since I am training it using random moves the code is learns very limited.
The maximum episodes I tried is 4000. How do I calculate the optimal number of episodes.
There is a problem with my code.
I am not able to identify the issue with my approach. Would like to get some view on this.
My Pseudocode is here.
for e in range(EPISODES):
gameEnv.Reset()
state = gameEnv.GetFlatGrid()
state = np.reshape(state, [1, state_size])
reward = 0.0
prevMaxNumber = 0
while True:
action = agent.get_action(state)
(moveScore, isValid) = gameEnv.Move(action + 1)
next_state = gameEnv.GetFlatGrid()
next_state = np.reshape(next_state, [1, state_size])
if isValid:
# Reward for step score
reward += moveScore
# Reward for New Max Number
if gameEnv.GetMaxNumber() > prevMaxNumber:
reward += 10.0
prevMaxNumber = gameEnv.GetMaxNumber()
gameEnv.AddNewNumber()
else:
reward = -50.0
done = gameEnv.CheckGameOver()
if done:
reward = -100.0
agent.append_sample(state, action, reward, next_state, done)
agent.train_model()
state = next_state
if done:
agent.update_target_model()
My two cents,
RL algorithms don't learn randomly. I suggest you take a look at 'Sutton and Barto (Second Edition)' for a detailed description of the wide variety of algorithms. Having said that I don't think the git code that you linked does what you expect (Why do you have an ES module? Are you training the network using evolutionary algorithms?). You might want to start off with simpler and stable implementations like this https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html.
2048 is probably a difficult game for a simple Q-network to learn because it requires long-term planning. It's much easier for DQN to learn to play control/instant action games like Pong or Breakout but doesn't do well on games that need some amount of planning (Pacman, for example).

Frequency Modulation

im trying to make frequency modulation. But could anyone explain, what about non-sinusoidal (and maybe non-periodic) carrier? Could we assume some FM( A(t), B(t) ) function, which modulates carrier, given by ABSTRACT(non-sinusoidal) function A(t) with signal, given by abstract function B(t)? could anyone write/explain something about that? what will formula look like in that most common case? I want some kind of recursive formula in terms, like "A(t-1)". Or either some explanation, if that is not possible and why.
Frequency modulation (FM)
proposes some kind of "varying playback speed" - but seems it does something wrong.
so i am repeating asking "how?"
Well, for a non-sinusoidal but periodic carrier you could easer use a look-up table as suggested by the answer of Paul R, or you could break up the periodic carrier into its Fourier modes, create an individual oscillator for each mode, modulate each one and sum then up.
In the case of a non-periodic signal the phase or frequency is not defined in general. Just think of noise, how should that be modulated? You would need to define what frequency modulation means for arbitrary signals.
If you are using a look-up table for your waveform generation then it's pretty easy to modify the standard phase accumulator synthesis method to add an FM input. See e.g. this answer.

Resources