I'm trying to identify how much latency is being experienced when using AirPods, compared to using the device mic & speaker, for the purposes of recording user video & audio that must be synchronised to a backing track.
Here's how my system currently works:
I have a recording pipeline that uses AVCaptureSession to record video, and AVAudioEngine to record audio.
During the recording process, I play audio via AVAudioEngine, which the user will 'perform to'. I create a movie file using AVAssetWriter where the user's captured audio (utilising noise cancellation) is added to the file, and the backing audio file is written into a separate track.
The audio file's presentation timestamps are modified slightly to account for the initial playback delay experienced in AVAudioEngine., and this works well (I previously used AVPlayer for audio playback and the start delay was more significant, and that's what led to making use of this technique).
I know about AVAudioSession's inputLatency, outputLatency and bufferDuration properties, and I've read that these can be used to identify latency, at least in one sense. I notice that this calculation yields a total round-trip latency of around 0.01s when using the device on its own, and 0.05 seconds when using AirPods' inputs and outputs.
This is useful, and I can apply that extra time difference in my own logic to improve synchronisation, but there is definitely additional latency in the output, and I can't identify its source.
Strangely, it looks as though the recorded audio and video are in sync, but not in sync with the backing track. This makes me think that the system is still adding compensation to one of those two forms of captured media, but it doesn't relate the active played-back audio, and so the user is potentially listening to delayed-playback audio and I'm not accounting for that extra delay.
Does anyone have any thoughts on what other considerations may be required? I feel as though most use cases for bluetooth synchronisation will be to either synchronise audio and visual output, or to synchronise only the audio and visual input when recording, not a third factor whereby the user is performing alongside an audio or video source on device that is later added to a resultant asset writing session/media file.
Related
I need to analyse chunks of audio data of (approximately) 1 second with a sample rate of 8kHz. Although the audio will be recorded in real time, it will only be used for detecting specific events. Hence, there are no strict latency requirements. What would be the best framework to use in this case?
I already started learning Core Audio and I worked through the book Learning Core Audio. With the minimal amount of Swift documentation available on the internet I was able to set up an AUGraph for iOS to record audio with the remote I/O audio unit and to get acces to the raw samples with the output render callback. Unfortunately, I got stuck to create chunks of 1 seconds of audio samples to perform the audio analysis. Could a custom AudioBufferList be used for this? Or could a large ringbuffer be implemented on the remote I/O audio unit (like it is required in case of a HAL audio unit)?
I also tried to adopt AVFoundation with AVAssetReader to obtain the audio chunks. Although I was able to obtain samples of a recorded audio signal, I did not succes in creating a buffer of 1 second (and I even don’t know whether it would be possible to do this in realtime). Would AVFoundation be a good choice in this situation anyhow?
I would appreciate any advice on this.
A main problem for me is the fact that I try to use Swift but that there is not much example code available and that there is even less documentation. I feel that it would be better to switch to Objective-C for audio programming, and to stop trying to get everything in Swift. I am curious whether this would be a better time investment?
For analyzing 1 second windows of audio samples, the simplest solution would be to use the Audio Queue API with a lock-free ring buffer (say around 2 seconds long) to record samples. You can use a repeating nstimer task to poll how full the buffer is, and emit 1 second chunks to a processing task when they become available.
Core Audio and the RemoteIO Audio Unit is for if you need much shorter data windows with latency requirements on the order a few milliseconds.
Core Audio is a C API.
Objective-C is an extension of C. I find that Objective-C is much nicer for working with core audio than swift.
I created a cross platform c lockless ring buffer. There is sample code that demonstrates setting up the ring, setting up the mic, playing audio, and reading and writing from the ring.
The ring records that last N number of seconds that you specify. Old data is overwritten by new data. So you specify that you want the latest 3 seconds recorded. The sample I show plays a sine wave while recording through the microphone. Every 7 seconds, it grabs the last 2 seconds of recorded audio.
Here is the complete sample code on github.
Is it possible to record output audio in an app using Swift? So, for example, say I'm listening to a podcast, and I want to, within a separate app, record a small segment of the podcast's audio. Is there any way to do that?
I've looked around but have only been able to find information on recording microphone recording and such.
It depends on how you are producing the audio. If the production of the audio is within your control, you can put a tap on the output and record to a file as it plays. The easiest way is with the new AVAudioEngine feature (there are other ways, but AVAudioEngine is basically an easy front end for them).
Of course, if the real problem is to take a copy of a podcast, then obviously all you have to do is download the podcast as opposed to listening to it. Similarly, you could buffer and save streaming audio to a file. There are many apps that do this. But this is not because the device's output is being hijacked; it is, again, because we have control of the sound data itself.
I believe you'll have to write a kernel extension to do that
https://developer.apple.com/library/mac/documentation/Darwin/Conceptual/KEXTConcept/KEXTConceptIOKit/iokit_tutorial.html
You'd have to make your own audio driver to record it
It appears as though
That is how softonic made soundflowerbed.
http://features.en.softonic.com/how-to-record-internal-sound-on-a-mac
I'm doing an audio app where I need to manipulate audio buffers in real time for different sounds. I need a asynchronous clock running in the background that i can pull the time in various callbacks good to the millisecond for various audio manipulation purposes like the stop and start times of my audio playback, and when it records and stops recording.
I need it to be so such that the precision allows me to determine latency caused by the various processes and compensate for it in my code.
How would one implement such a clock in iOS?
If you are using the RemoteIO Audio Unit and requesting very short audio buffers from the Audio Session, then counting Remote callbacks and PCM samples within the callback buffer appears to be the most precise way to align audio with sub-millisecond resolution. Anything else will be off by the amount of latency between callbacks.
You can also use the timer in the mach_time.h header for fairly precise monotonic time measurement, but this won't be sync'd to audio playback rate, nor account for various latencies between subsystems.
If you have not already done so, I would definately advise you to look into the AVFoundation Framework, and take use of Core Audio. You can forget about the native NSTimer and NSDate options, considering your need of accuracy.
I'm been happily synthesizing audio (at 44.1khz) and sending it out through the RemoteIO audio unit. It's come to my attention that my app's audio is "garbled" when going out via HDMI to a certain model of TV. It looks to me like the problem is related to the fact that this TV is looking for audio data at 48khz.
Here are some questions:
Does RemoteIO adopt the sample rate of whichever device it's outputting to? If I'm sending audio via HDMI to a device that asks for 48kz, do my RemoteIO callback buffers become 48khz?
Is there some tidy way to just force RemoteIO to still think in terms of 44.1khz, and be smart enough to perform any necessary sample rate conversions on its own, before it hands data off to the device?
If RemoteIO does indeed just defer to the device it's connected to, then presumably I need to do some sample rate conversion between my synthesis engine and remote IO. Is AudioConverterConvertComplexBuffer the best way to do this?
Fixed my problem. I was incorrectly assuming that the number of frames requested by the render callback would be a power of two. Changed my code to accommodate any arbitrary number of frames and all seems to work fine now.
If you want sample rate conversion, try using the Audio Queue API, or do the conversion within your own app using some DSP code.
Whether the RemoteIO buffer size or sample rate can be configured or not might depend on iOS device model, OS version, audio routes, background modes, etc., so an app must accomodate different buffer sizes and sample rates when using RemoteIO.
In Xcode 3.2.5 I would like to play multiple audio files in sequence (50+) from a single UIButton. I've tried several codes but they leak memory. Any suggestions? I'm still learning so please include header and implimentation file codes. My thanks in advance.
Use the interfaces in Audio Queue Services (AudioToolbox/AudioQueue.h). Create one audio queue object for each sound that you want to play. Then specify simultaneous start times for the first audio buffer in each audio queue, using the AudioQueueEnqueueBufferWithParameters function.
The following limitations pertain for simultaneous sounds in iPhone OS, depending on the audio data format:
AAC, MP3, and ALAC (Apple Lossless) audio: You may play multiple AAC, MP3, and ALAC format sounds simultaneously; playback of multiple sounds of these formats will require CPU resources for decoding.
Linear PCM and IMA/ADPCM (IMA4 audio): You can play multiple linear PCM or IMA4 format sounds simultaneously without CPU resource concerns.
Taken from play multiple sounds simultaneously
This is just conceptual, but what about (a) creating an array of sound names you want to play (this can be during runtime), in the proper order, then (b) writing a function where each soundHandler-type object checks to see where it is in the array; if it's not last it constructs a soundPlayer, loads the sound, plays and then calls the next soundHandler in the array. (If it's last it just constructs/loads/plays, and maybe notifies the parent that it's done.) Each soundHandler (I'm just making that up, you'll have to write it) then can dealloc itself when complete.
If you run into latency/loading issues, you could always have each soundHandler call n+2 in the array, and of course then check to see if it's penultimate instead of the end.
Is that more what you had in mind?