iOS audio over HDMI -- how to deal with 48khz sample rate? - ios

I'm been happily synthesizing audio (at 44.1khz) and sending it out through the RemoteIO audio unit. It's come to my attention that my app's audio is "garbled" when going out via HDMI to a certain model of TV. It looks to me like the problem is related to the fact that this TV is looking for audio data at 48khz.
Here are some questions:
Does RemoteIO adopt the sample rate of whichever device it's outputting to? If I'm sending audio via HDMI to a device that asks for 48kz, do my RemoteIO callback buffers become 48khz?
Is there some tidy way to just force RemoteIO to still think in terms of 44.1khz, and be smart enough to perform any necessary sample rate conversions on its own, before it hands data off to the device?
If RemoteIO does indeed just defer to the device it's connected to, then presumably I need to do some sample rate conversion between my synthesis engine and remote IO. Is AudioConverterConvertComplexBuffer the best way to do this?

Fixed my problem. I was incorrectly assuming that the number of frames requested by the render callback would be a power of two. Changed my code to accommodate any arbitrary number of frames and all seems to work fine now.

If you want sample rate conversion, try using the Audio Queue API, or do the conversion within your own app using some DSP code.
Whether the RemoteIO buffer size or sample rate can be configured or not might depend on iOS device model, OS version, audio routes, background modes, etc., so an app must accomodate different buffer sizes and sample rates when using RemoteIO.


Using VoiceProcessingIO for Voip and getting raw mic input as well

I am using a VoiceProcessingIO audio unit for voip calls. However, when I set the loud speaker (setting the kAudioSessionOverrideAudioRoute_Speaker audio session property), the PCM data received in the input callback by calling AudioUnitRender has a very low volume.
For a voip call, it is actually fine. The interlocutor hears it fainter, but he hears it. However, I would like to save to disk a good quality version of the input audio, possibly a raw audio from the mic.
Is it actually possible? In my tests I have not be able to do it. When VoiceProcessingIO is in use, the audio from the input-callback is just very low. Perhaps, I can get the unprocessed audio from some other source? Note, VoiceProcessingIO must still be used during the voip call.
The same question on Apple's forum is thread-655091, it has been asked 1 year ago and it has no answers.
Closest questions on SO I found are Two audio units? and Effect before render callback?, but they are more concerned about the output of VoiceProcessingIO rather than the input.
An idea would be to add a parallel "raw" RemoteIO unit to get the audio from the mic, but both in Two audio units? and in apple-forum-110816, developers say it will not be possible to add another RemoteIO in parallel to the VoiceProcessingIO, because having set
the audio session category as PlayAndRecord and the audio mode as VoiceChat, RemoteIO will not function as usual. I have not had a chance to try it, but it seems possible.
Are there other strategies? Are there some "pre-render input callbacks" called before VoiceProcessingIO unit kicks in and processes the raw data from the mic?
Is it possible to install some TAP between the mic and the VoiceProcessingIO unit?
AFAIK, there is no public API that allows getting both processed and unprocessed input from the microphone on an iOS device.
If you need processed input (voice processing for echo cancellation, etc.), then your best bet is to just add gain to the audio data for your other needs (via some DSP library, etc.), since it is float data.

Synchronising AVAudioEngine audio recording with backing track, using AirPods

I'm trying to identify how much latency is being experienced when using AirPods, compared to using the device mic & speaker, for the purposes of recording user video & audio that must be synchronised to a backing track.
Here's how my system currently works:
I have a recording pipeline that uses AVCaptureSession to record video, and AVAudioEngine to record audio.
During the recording process, I play audio via AVAudioEngine, which the user will 'perform to'. I create a movie file using AVAssetWriter where the user's captured audio (utilising noise cancellation) is added to the file, and the backing audio file is written into a separate track.
The audio file's presentation timestamps are modified slightly to account for the initial playback delay experienced in AVAudioEngine., and this works well (I previously used AVPlayer for audio playback and the start delay was more significant, and that's what led to making use of this technique).
I know about AVAudioSession's inputLatency, outputLatency and bufferDuration properties, and I've read that these can be used to identify latency, at least in one sense. I notice that this calculation yields a total round-trip latency of around 0.01s when using the device on its own, and 0.05 seconds when using AirPods' inputs and outputs.
This is useful, and I can apply that extra time difference in my own logic to improve synchronisation, but there is definitely additional latency in the output, and I can't identify its source.
Strangely, it looks as though the recorded audio and video are in sync, but not in sync with the backing track. This makes me think that the system is still adding compensation to one of those two forms of captured media, but it doesn't relate the active played-back audio, and so the user is potentially listening to delayed-playback audio and I'm not accounting for that extra delay.
Does anyone have any thoughts on what other considerations may be required? I feel as though most use cases for bluetooth synchronisation will be to either synchronise audio and visual output, or to synchronise only the audio and visual input when recording, not a third factor whereby the user is performing alongside an audio or video source on device that is later added to a resultant asset writing session/media file.

AudioUnit recording glitches every 30 seconds

I've used this sample code to create an audio recorder.
I'm finding I get glitches about every 30 seconds. They sound a bit like buffer glitches to me, although I might be wrong. I've tried contacting the author of the article but not having much success. I'm really struggling to follow some of this code. I think it's missing a circular buffer but I'm not sure how important that is here. I'm hoping someone can point me in the right direction to either:
Point me to some different example code or suggest what I need to add to this (high level suggestion is fine - I'm happy to research and do the work, I'm just not confident what the work is)
Suggest some better values to use for things like the buffer data size.
Tell me that there's nothing wrong with this code and my bug is almost certainly elsewhere.
Suggest a library I can use that should take care of it (Amazing Audio Engine 2 looks good for me but I'm a bit worried about the note saying it's retired. AudioKit looks great too but it's missing a peak power reading, which would be a shame to have to implement myself after having imported such a complex library)
Why aren't I using AVAudioSession? I need the user to be able to set mic level while recording and to be able to listen back at the same time. Previously I did this with AVAudioSession but on more recent devices isInputGainSettable returns NO. It also returns NO for many hardware mics plugged in via lightning cable, which we're seeing more and more now the headphone jack is gone.
Several problems.
Apple recommends that object methods not be called in the audio context (the callbacks). Your code has several. Use C functions instead.
Newer iOS devices likely use a hardware sample rate of 48000, not 44100. Resampling potentially causes buffers to change sizes.
The code seems to assume that the play callback buffer was the same size as the input callback buffer. This is not guaranteed. Thus the playback might end up with too few samples, causing periodic glitches.
In my experience (iPhone 6) sample rate from microphone can be 48000 when a headset is not plugged in, and change to 44100 when a headset is plugged in.
If your audiounit is expecting a samplerate of 44100 then glitches like these are to be expected. To verify, you could try if your problem remains when you plug in a headset.
A workaround for the glitch problem seems to be to use an AVAudioEngine. Connect its inputNode to its mainMixerNode using the inputFormat of the inputNode. Connect the mainMixerNode to your AudioUnit in your desired format. Connect your AudioUnit to outputNode of the AVAudioEngine.
Using this mixerNode between inputNode and audioUnit is essential in this workaround.

iPhone music streaming

I'm trying to send music over bluetooth from one iOS device to another. I've been using this to build packets like in Ray Wenderlich's SNAP tutorial, but I've been having trouble reconstructing the packet information on the receiving phone. I have tried using but I think it is too complicated for my needs (since I do not need synced playing). What is the simplest buffer approach that accounts for things like lost/out of order packets? I have read through a lot of CoreAudio stuff but it is very dense, so I would appreciate help from someone who has tackled this type of problem.
when you talk about los/out of order packets.. you're talking about the topic of Packet Loss Concealment.. which is a very dense topic (I mean if you think core audio is dense.. wait till you dive into PLC).
In a nutshell, there are many ways to deal with packet loss.. but the simplest way (which I advise you to do) is to replace the lost packets with silence (same goes with out of order packets.. if a packet is out of order.. just discard it).
that being said.. you are dealing with audio that is streamed to you (ie sent via the bluetooth/wifi network).. which means in almost 100% of the time it's compressed audio you're getting (ie Variable Bit Rate audio VBR).. if you simply try to substitute lost VBR packets with silence.. you'll run into this problem. You'll either have to insert silence packets in the same compression format as the VBR audio you're dealing with, or you will have to convert your VBR compressed audio into non-compressed audio (Lossless PCM), then insert zeros in place of the missing packets.

What exactly is an audio queue processing tap?

These have been around in OS X for a little while now and just recently became available in ios with ios 6. I am trying to figure what they let you do exactly. So the idea is you can tap into an audio queue and process the data before sending it on. Does this mean you can now intercept raw audio coming from different applications and process that (such as the iOS music player) before it plays? In other words is inter-app audio possible? I have read over the audioQueue.h file and can't quite figure out what to make of it.
Consider it a mid-level entry for your audio custom processing (e.g. insert effect) or reading (e.g. for analysis or display purposes) of the queue's sample data. A basic interface for reading or processing an AQ's data.
Does this mean you can now intercept raw audio coming from different applications and process that (such as the iOS music player) before it plays? In other words is inter-app audio possible?
Nope - it's not inter-process; you have no access to other processes' audio queues. These are for your queues' sample data. They can be used to simplify general audio render or analysis chains (the common case, by app count). My guess is that it was provided because a lot of people wanted an easier entry to access this sample data for processing or analysis. Custom processing entries on iOS can also be more complicated to implement (i.e. AudioUnit availability is restricted).
