I'm doing an audio app where I need to manipulate audio buffers in real time for different sounds. I need a asynchronous clock running in the background that i can pull the time in various callbacks good to the millisecond for various audio manipulation purposes like the stop and start times of my audio playback, and when it records and stops recording.
I need it to be so such that the precision allows me to determine latency caused by the various processes and compensate for it in my code.
How would one implement such a clock in iOS?
If you are using the RemoteIO Audio Unit and requesting very short audio buffers from the Audio Session, then counting Remote callbacks and PCM samples within the callback buffer appears to be the most precise way to align audio with sub-millisecond resolution. Anything else will be off by the amount of latency between callbacks.
You can also use the timer in the mach_time.h header for fairly precise monotonic time measurement, but this won't be sync'd to audio playback rate, nor account for various latencies between subsystems.
If you have not already done so, I would definately advise you to look into the AVFoundation Framework, and take use of Core Audio. You can forget about the native NSTimer and NSDate options, considering your need of accuracy.
Related
I'm trying to identify how much latency is being experienced when using AirPods, compared to using the device mic & speaker, for the purposes of recording user video & audio that must be synchronised to a backing track.
Here's how my system currently works:
I have a recording pipeline that uses AVCaptureSession to record video, and AVAudioEngine to record audio.
During the recording process, I play audio via AVAudioEngine, which the user will 'perform to'. I create a movie file using AVAssetWriter where the user's captured audio (utilising noise cancellation) is added to the file, and the backing audio file is written into a separate track.
The audio file's presentation timestamps are modified slightly to account for the initial playback delay experienced in AVAudioEngine., and this works well (I previously used AVPlayer for audio playback and the start delay was more significant, and that's what led to making use of this technique).
I know about AVAudioSession's inputLatency, outputLatency and bufferDuration properties, and I've read that these can be used to identify latency, at least in one sense. I notice that this calculation yields a total round-trip latency of around 0.01s when using the device on its own, and 0.05 seconds when using AirPods' inputs and outputs.
This is useful, and I can apply that extra time difference in my own logic to improve synchronisation, but there is definitely additional latency in the output, and I can't identify its source.
Strangely, it looks as though the recorded audio and video are in sync, but not in sync with the backing track. This makes me think that the system is still adding compensation to one of those two forms of captured media, but it doesn't relate the active played-back audio, and so the user is potentially listening to delayed-playback audio and I'm not accounting for that extra delay.
Does anyone have any thoughts on what other considerations may be required? I feel as though most use cases for bluetooth synchronisation will be to either synchronise audio and visual output, or to synchronise only the audio and visual input when recording, not a third factor whereby the user is performing alongside an audio or video source on device that is later added to a resultant asset writing session/media file.
This is a pretty "minutia" question regarding timing...
I'm using iOS's RemoteIO audio unit to do things. Just wonder how exactly system handles the timing: after calling AudioOutputUnitStart(), the unit should on "on", then render callbacks will be pulled by downstream units. Allow me to guess:
Possibility 1: the next render callback happens right after the execution of AudioOutputUnitStart(), then it goes on
Possibility 2: the system has its own render callback rhythm. After calling AudioOutputUnitStart(), the next render callback catches on one of the system's "next" tick, then start from there
1 or 2? or there's 3? Thanks in advance!
The audio latency seems to depend on the specific device model, audio session and options, requested sample rate and buffer size, and whether any other audio (background or recently closed app) is or has recently been playing or recording on the system. Whether or not the internal audio amplifier circuits (etc.) need to be powered up or are already turned on may make the biggest difference. Requesting certain sample rates seems to also cause extra time due to the buffering potentially needed by the OS resampling and mixer code.
So likely (2) or (3).
The best way to minimize latency when using RemoteIO is to request very short buffers (1 to 6 mS) in the audio session setup, start the audio session and Audio Unit way ahead of time (at app startup, view load, etc.), then fill the callback buffers with zeros (or discard recorded callback data) until you need sound.
I need to analyse chunks of audio data of (approximately) 1 second with a sample rate of 8kHz. Although the audio will be recorded in real time, it will only be used for detecting specific events. Hence, there are no strict latency requirements. What would be the best framework to use in this case?
I already started learning Core Audio and I worked through the book Learning Core Audio. With the minimal amount of Swift documentation available on the internet I was able to set up an AUGraph for iOS to record audio with the remote I/O audio unit and to get acces to the raw samples with the output render callback. Unfortunately, I got stuck to create chunks of 1 seconds of audio samples to perform the audio analysis. Could a custom AudioBufferList be used for this? Or could a large ringbuffer be implemented on the remote I/O audio unit (like it is required in case of a HAL audio unit)?
I also tried to adopt AVFoundation with AVAssetReader to obtain the audio chunks. Although I was able to obtain samples of a recorded audio signal, I did not succes in creating a buffer of 1 second (and I even don’t know whether it would be possible to do this in realtime). Would AVFoundation be a good choice in this situation anyhow?
I would appreciate any advice on this.
A main problem for me is the fact that I try to use Swift but that there is not much example code available and that there is even less documentation. I feel that it would be better to switch to Objective-C for audio programming, and to stop trying to get everything in Swift. I am curious whether this would be a better time investment?
For analyzing 1 second windows of audio samples, the simplest solution would be to use the Audio Queue API with a lock-free ring buffer (say around 2 seconds long) to record samples. You can use a repeating nstimer task to poll how full the buffer is, and emit 1 second chunks to a processing task when they become available.
Core Audio and the RemoteIO Audio Unit is for if you need much shorter data windows with latency requirements on the order a few milliseconds.
Core Audio is a C API.
Objective-C is an extension of C. I find that Objective-C is much nicer for working with core audio than swift.
I created a cross platform c lockless ring buffer. There is sample code that demonstrates setting up the ring, setting up the mic, playing audio, and reading and writing from the ring.
The ring records that last N number of seconds that you specify. Old data is overwritten by new data. So you specify that you want the latest 3 seconds recorded. The sample I show plays a sine wave while recording through the microphone. Every 7 seconds, it grabs the last 2 seconds of recorded audio.
Here is the complete sample code on github.
Inside my iOS 8.0. App I need to apply some custom audio processing on (non-realtime) audio playback. Typically, the audio comes from a device-local audio file.
Currently, I use MTAudioProcessingTap on a AVMutableAudioMix. Inside the process callback I then call my processing code. In certain cases this processing code may produce more samples than the amount of samples being passed in and I wonder what's the best way to handle this (think time stretching effect for example)
The process callback takes an incoming CMItemCount *numberFramesOut argument that signals the amount of outgoing frames. For in-place processing where the amount of incoming frames and outgoing frames is identical this is no problem. In the case where my processing generates more samples I need a way to get the playback going until my output buffers are emptied.
Is MTAudioProcessingTap the right choice here anyway?
MTAudioProcessingTap does not support changing the number of samples between the input and the output (to skip silences for instance).
You will need a custom audio unit graph for this.
A circular buffer/fifo is one of the most common methods to intermediate between different producer and consumer rates, as long as the long term rate is the same. If long term, you plan on producing more samples than are played, you may need to occasionally temporarily stop producing samples, while still playing, in order not to fill up all of the buffer or the systems memory.
I'm trying to realize an app which plays a sequence of tones in a loop.
Actually, I use OpenAL and my experiences with such framework are positive, as I can perform a sound pitch also.
Here's the scenario:
load a short sound (3 seconds) from a CAF file
play that sound in a loop and perform a sound shift also.
This works well, provided that the tact rate isn't too high - I mean a time of more than 10 milliseconds per tone.
Anyhow, my NSTimer (which embeds my sound sequence to play) should be configurable - and as soon as my tact rate increases (I mean less than 10 ms per tone), the sound is no more echoed correctly - even some tones are dropped in an obvious random way.
It seems that real time sound processing becomes an issue.
I'm still a novice in IOS programming, but I believe that Apple sets a limit concerning time consumption and/or semaphore.
Now my questions:
OpenAL is written in C - until now, I didn't understand the whole code and philosophy behind that framework. Is there a possibility to resolve my above mentioned problem making some modifications - I mean setting flags/values or overwriting certain methods?
If not, do you know another IOS sound framework more appropriate for such kind of real time sound processing?
Many thanks in advance!
I know that it deals with a quite extraordinary and difficult problem - maybe s.o. of you has resolved a similar one? Just to emphasize: sound pitch must be guaranteed!
It is not immediately clear from the explanation precisely what you're trying to achieve. Some code is expected.
However, your use of NSTimer to sequence audio playback is clearly problematic. It is neither intended as a reliable nor a high resolution timer.
NSTimer delivers events through a run-loop queue - probably your application's main queue - where they content with user interface events.
As the main thread is not a real-time thread, it may not even be scheduled to run for some time.
There may be quantisation effects on with the delay you requested, meaning your events effectively round to zero clock ticks and get scheduled immediately.
Perioidic timers have deleterious effects on battery life. iOS and MacOSX both take steps to reduce their impact by timer coalescing
The clock you should be using for sequencing events is the playback sample clock - which is available in the render handler of whatever framework you use. As well as being reliable this is efficient as well, as the render handler will be running periodically anyway, and in a real-time thread.