Recording, modifying and playing audio on iOS - ios

EDIT: In the end I used exactly as I explained below, AVRecorder for recording the speech and openAL for the pitch shift and playback. It worked out quite well.
I got a question regarding recording, modifying and playing back audio. I asked a similar question before ( Record, modify pitch and play back audio in real time on iOS ) but I now have more information and could do with some further advice please.
So firstly this is what I am trying to do (on a separate thread to the main thread):
monitor the iphone mic
check for sound greater than a certain volume
if above threshold start recording e.g. person starts talking
continue to record until volume drops below threshold e.g. person stops talking
modify pitch of recorded sound.
playback sound
I was thinking of using the AVRecorder to monitor and record the sound, good tutorial here: http://mobileorchard.com/tutorial-detecting-when-a-user-blows-into-the-mic/
and I was thinking of using openAL to modify the pitch of the recorded audio.
So my question is, is my thinking correct in the list of points above, am I missing something or is there a better/easier way to do it. Can I avoid mixing audio libraries and just use AVFoundation to change the pitch too?

You can either use AVRecorder or something lower like the realtime IO audio unit.
The concept of 'volume' is pretty vague. You might want to look at the difference between calculating peak and RMS values, and understanding how to integrate an RMS value over a given time (say 300ms which is what a VU meter uses).
Basically you sum all the squares of the values. You would take the square root and convert to dBFS with 10 * log10f(sqrt(sum/num_samples)), but you can do that without the sqrt in one step with 20 * log10f(sum/num_samples).
You'll need to do a lot of adjusting of integration times and thresholds to get it to behave the way you want.
For pitch shifting, I think OpenAL with do the trick, the technique behind it is called band limited interpolation - https://ccrma.stanford.edu/~jos/resample/Theory_Ideal_Bandlimited_Interpolation.html
This example shows a rms calculation as a running average. The circular buffer maintains a history of squares, and eliminates the need to sum the squares every operation. I haven't run it so treat it as pseudo code ;)
Example:
class VUMeter
{
protected:
// samples per second
float _sampleRate;
// the integration time in seconds (vu meter is 300ms)
float _integrationTime;
// these maintain a circular buffer which contains
// the 'squares' of the audio samples
int _integrationBufferLength;
float *_integrationBuffer;
float *_integrationBufferEnd;
float *_cursor;
// this is a sort of accumulator to make a running
// average more efficient
float _sum;
public:
VUMeter()
: _sampleRate(48000.0f)
, _integrationTime(0.3f)
, _sum(0.)
{
// create a buffer of values to be integrated
// e.g 300ms # 48khz is 14400 samples
_integrationBufferLength = (int) (_integrationTime * _sampleRate);
_integrationBuffer = new float[_integrationBufferLength + 1];
bzero(_integrationBuffer, _integrationBufferLength);
// set the pointers for our ciruclar buffer
_integrationBufferEnd = _integrationBuffer + _integrationBufferLength;
_cursor = _integrationBuffer;
}
~VUMeter()
{
delete _integrationBuffer;
}
float getRms(float *audio, int samples)
{
// process the samples
// this part accumulates the 'squares'
for (int i = 0; i < samples; ++i)
{
// get the input sample
float s = audio[i];
// remove the oldest value from the sum
_sum -= *_cursor;
// calculate the square and write it into the buffer
double square = s * s;
*_cursor = square;
// add it to the sum
_sum += square;
// increment the buffer cursor and wrap
++_cursor;
if (_cursor == _integrationBufferEnd)
_cursor = _integrationBuffer;
}
// now calculate the 'root mean' value in db
return 20 * log10f(_sum / _integrationBufferLength);
}
};

OpenAL resampling will change the pitch and the duration inversely. e.g. a sound resampled to a higher pitch will play for a shorter amount of time and thus faster.

Related

record pcmaudiodata per 10 milisecond without playback

İ need to record pcmaudio per 10 milisecond without playback in swift.
I have tried this code but i can't find how can i stop playback while recording.
RecordAudio Github Repo
and second question: How can i get PCM data from circular buffer for encode-decode process properly. When I convert recorded audio data to signed byte or unsigned byte or anything else the converted data sometimes will corrupt. What is the best practice for this kind of process
In the RecordAudio sample code, the audio format is specified as Float (32-bit floats). When doing a float to integer conversion, you have to make sure your scale and offset results in a value in legal range for the destination type. e.g. check that -1.0 to 1.0 results in 0 to 256 (unsigned byte), and out-of-range values are clipped to legal values. Also pay attention to the number of samples you convert, as an Audio Unit callback can vary the frameCount sent (number of samples returned). You most likely won't get exactly 10 mS in any single RemoteIO callback, but may have to observe a circular buffer filled by multiple callbacks, or a larger buffer that you will have to split.
When RemoteIO is running in play-and-record mode, you can usually silence playback by zeroing the bufferList buffers (after copying, analyzing, or otherwise using the data in the buffers) before returning from the Audio Unit callback.

Measure (frequency-weighted) sound levels with AudioKit

I am trying to implement an SLM app for iOS using AudioKit. Therefore I need to determine different loudness values to a) display the current loudness (averaged over a second) and b) do further calculations (e.g. to calculate the "Equivalent Continuous Sound Level" over a longer time span). The app should be able to track frequency-weighted decibel values like dB(A) and dB(C).
I do understand that some of the issues im facing are related to my general lack of understanding in the field of signal and audio processing. My question is how one would approach this task with AudioKit. I will describe my current process and would like to get some input:
Create an instance of AKMicrophone and a AKFrequencyTracker on this microphone
Create a Timer instance with some interval (currently 1/48_000.0)
Inside the timer: retrieve the amplitude and frequency. Calculate a decibel value from the amplitude with 20 * log10(amplitude) + calibrationOffset (calibration offset will be determined per device model with the help of a professional SLM). Calculate offsets for the retrieved frequency according to frequency-weighting (A and C) and apply these to the initial dB value. Store dB, dB(A) and dB(C) values in an array.
Calculate the average for arrays over the give timeframe (1 second).
I read somewhere else that using a Timer this is not the best approach. What else is there that I could use for the "sampling"? What exactly is the frequency of AKFrequencyTracker? Will this frequency be sufficient to determine dB(A) and dB(C) values or will I need an AKFFTTap for this? How are values retrieved from the AKFrequencyTracker averaged, i.e. what time frame is used for the RMS?
Possibly related questions: Get dB(a) level from AudioKit in swift, AudioKit FFT conversion to dB?

No sound issue kills audio for all apps on the device

We are losing sound (no sound) in our app, but this is somehow causing all other apps to also lose sound. I don't know how it would even be possible for us to block the sound from an external app like the Apple Music app.
We are dumping the contents of our AVAudioSession session and there are no differences that we can see between when sound is working and not. We have verified that the route output is still the iPhone's speaker, even when we lost sound.
This is happening on the iPhone 6s & 6s Plus with the speaker. We can "fix" the audio by changing the output route, such as plug in and unplug headphones.
How is it possible to impact the ability to play sound of other apps, which may help troubleshoot what is happening?
We tracked down the source of the problem to having bad data in the audio buffer that was sent to Core Audio. Specifically, one of the audio processing steps output data that was NaN (Not a Number), instead of a float in the valid range of +/- 1.0.
It appears that on some devices, if the data contains NaN, it kills the audio of the whole device.
We worked around it by looping through the audio data checking for NaN values, and converting them to 0.0 instead. Note that checking if a float is NaN is a strange check (or it seems strange to me). NaN is not equal to anything, including itself.
Some pseudocode to work around the problem until we get new libraries that have a proper fix:
float *interleavedAudio; // pointer to a buffer of the audio data
unsigned int numberOfSamples; // number of left/right samples in the audio buffer
unsigned int numberOfLeftRightSamples = numberOfSamples * 2; // number of float values in the audio buffer
// loop through each float in the audio data
for (unsigned int i = 0; i < numberOfLeftRightSamples; i++)
{
float *sample = interleavedAudio + i;
// NaN is never equal to anything, including itself
if( *sample != *sample )
{
// This sample is NaN - force it to 0.0 so it doesn't corrupt the audio
*sample = 0.0;
}
}

Accurate timer using AudioUnit

I'm trying to make an accurate timer to analyze an input. I'd like to be able to measure 1% deviation in signals of ~200ms.
My understanding is that using an AudioUnit will be able to get <1ms.
I tried implementing the code from Stefan Popp's example
After updating a few things to get it to work on xcode 6.3, I have the example working, however:
While I do eventually want to capture audio, I thought there should be some way to get a notification, like NSTimer, so I tried an AudioUnitAddRenderNotify, but it does exactly what it says it should - i.e it's tied to the render, not just an arbitrary timer. Is there some way to get a callback triggered without having to record or play?
When I examine mSampleTime, I find that the interval between slices does match the inNumberFrames - 512 - which works out to 11.6ms. I see the same interval for both record and play. I need more resolution than that.
I tried playing with kAudioSessionProperty_PreferredHardwareIOBufferDuration but all the examples I could find use the deprecated AudioSessions, so I tried to convert to AudioUnits:
Float32 preferredBufferSize = .001; // in seconds
status = AudioUnitSetProperty(audioUnit, kAudioSessionProperty_PreferredHardwareIOBufferDuration, kAudioUnitScope_Output, kOutputBus, &preferredBufferSize, sizeof(preferredBufferSize));
But I get OSStatus -10879, kAudioUnitErr_InvalidProperty.
Then I tried kAudioUnitProperty_MaximumFramesPerSlice with values of 128 and 256, but inNumberFrames is always 512.
UInt32 maxFrames = 128;
status = AudioUnitSetProperty(audioUnit, kAudioUnitProperty_MaximumFramesPerSlice, kAudioUnitScope_Global, 0, &maxFrames, sizeof(maxFrames));
[EDIT]
I am trying to compare the timing of an input (user's choice of MIDI or microphone) to when it should be. Specifically, is the instrument being played before or after the beat/metronome and by how much? This is for musicians, not a game, so precision is expected.
[EDIT]
The answers seem re-active to events. i.e. They let me precisely see when something happened, however I don't see how I do something accurately. My fault for not being clear. My app needs to be the metronome as well - synchronize playing a click on the beat and flash a dot on the beat - then I can analyze the user's action to compare timing. But if I can't play the beat accurately, the rest falls apart. Maybe I'm supposed to record audio - even if I don't want it - just to get inTimeStamp from the callback?
[EDIT]
Currently my metronome is:
- (void) setupAudio
{
AVAudioPlayer *audioPlayer;
NSString *path = [NSString stringWithFormat:#"%#/click.mp3", [[NSBundle mainBundle] resourcePath]];
NSURL *soundUrl = [NSURL fileURLWithPath:path];
audioPlayer = [[AVAudioPlayer alloc] initWithContentsOfURL:soundUrl error:nil];
[audioPlayer prepareToPlay];
CADisplayLink *syncTimer;
syncTimer = [CADisplayLink displayLinkWithTarget:self selector:#selector(syncFired:)];
syncTimer.frameInterval = 30;
[syncTimer addToRunLoop:[NSRunLoop mainRunLoop] forMode:NSDefaultRunLoopMode];
}
-(void)syncFired:(CADisplayLink *)displayLink
{
[audioPlayer play];
}
You should be using a circular buffer, and performing your analysis on the signal in chunks that match your desired frame count on your own timer. To do this you set up a render callback, then feed your circular buffer the input audio in the callback. Then you set up your own timer which will pull from the tail of the buffer and do your analysis. This way you could be feeding the buffer 1024 frames every 0.23 seconds, and your analysis timer could fire maybe every 0.000725 seconds and analyze 32 samples. Here is a related question about circular buffers.
EDIT
To get precision timing using a ring buffer, you could also store the timestamp corresponding to the audio buffer. I use TPCircularBuffer for doing just that. TPCircularBufferPrepareEmptyAudioBufferList, TPCircularBufferProduceAudioBufferList, and TPCircularBufferNextBufferList will copy and retrieve the audio buffer and timestamp to and from a ring buffer. Then when you are doing your analysis, there will be a timestamp corresponding to each buffer, eliminating the need to do all of your work in the render thread, and allowing you to pick and choose your analysis window.
If you are using something like cross-correlation and/or a peak detector to find a matched sample vector within an audio sample buffer (or a ring buffer containing samples), then you should be able to count samples between sharp events to within one sample (1/44100 or 0.0226757 milliseconds at a 44.1k Hz sample rate), plus or minus some time estimation error. For events more than one Audio Unit buffer apart, you can sum and add the number of samples within the intervening buffers to get a more precise time interval than just using (much coarser) buffer timing.
However, note that there is a latency or delay between every sample buffer and speaker audio going out, as well as between microphone sound reception and buffer callbacks. That has to be measured, as in you can measure the round trip time between sending a sample buffer out, and when the input buffer autocorrelation estimation function gets it back. This is how long it takes the hardware to buffer, convert (analog to digital and vice versa) and pass the data. That latency might be around the area of 2 to 6 times 5.8 milliseconds, using appropriate Audio Session settings, but might be different for different iOS devices.
Yes, the most accurate way to measure audio is to capture the audio and look at the data in the actual sampled audio stream.

Performing FFT on PCM file to generate a spectrogram

Im writing an app that visualizes music. So far I have an audio file from the ipod library converted to PCM and placed inside the APPs directory. Now I am trying to perform a FFT on that PCM file to give me frequency and db over time. Here is code I found which uses Apple Accelerate framework to perform the FFT:
int bufferFrames = 1024;
int bufferlog2 = round(log2(bufferFrames));
FFTSetup fftSetup = vDSP_create_fftsetup(bufferlog2, kFFTRadix2);
float outReal[bufferFrames / 2];
float outImaginary[bufferFrames / 2];
COMPLEX_SPLIT out = { .realp = outReal, .imagp = outImaginary };
vDSP_ctoz((COMPLEX *)data, 2, &out, 1, bufferFrames / 2);
vDSP_fft_zrip(fftSetup, &out, 1, bufferlog2, FFT_FORWARD);
Now I dont understand how to feed this the PCM file. 'data' I believe is an array of COMPLEX objects which hold the portion of the audio that the FFT will be applied to. How do I build such a data structure from the PCM file?
I found some Java code that might be useful but not sure how to convert this to C. Also was is audioData and how do I fill that from the PCM file:
Complex[] complexData = new Complex[audioData.length];
for (int i = 0; i < complexData.length; i++) {
complextData[i] = new Complex(audioData[i], 0);
}
The "Apple Accelerate framework" is fine, I have used it on a project. Just read the docs carefully. Basically you will need to read the PCM data into memory and perhaps massage the format, then call the FFT on it.
To massage the data loop through your PCM data, sample by sample and copy as appropriate to another array.
The docs are here, you want the vDFT_fft* functions.
Mike Ash has a good writeup Friday Q&A site.
Audio data is the result of voltage samples taken periodically of the audio. The forma has many different forms. If you are going to work with this you are going to have to spend some time learning the basics. Also if you are going to run a FFT on audio data you need to learn about sampling frequency and "aliasing" sometimes called "fold back".

Resources