Low jitter audio on iOS - ios

I'd like to load a small audio clip like a beep into memory, and schedule playback after x seconds with very low jitter. My application ideally gets less than +-1ms, but +-5ms could still be useful. The time is synchronized to a remote application without a microphone. My question is what kind of jitter can I expect from the audio APIs, and are they all equal in this regard?
I'm not familiar with the audio APIs, but from the latency discussions I've seen the number 5.8ms using remoteIO audio units. Does this mean +-3ms would be the best precision possible?

You would need to set this process as Real-Time to have a guarantee of low delay, otherwise you can get jitter in seconds because operating system can decide to make some background job.
Once you got it as real-time, you might archive lower delay.
Please check with Apple if you can make process real-time (with scheduling options). You might want to have extra permissions and kernel level support in your app to do it properly, that you can have guaranteed 1ms delay for audio app.

Related

Battery impact of keeping microphone on while in-app

I've been looking around for a while but couldn't find much about this. I have an AudioComponentInstance that I use to continuously record the user while in-app. This doesn't get written to a file, but I do some light processing in a recording callback. This light processing is basically an offline, lightweight voice activity detection system on every 100ms of audio data.
So essentially what I have is like the Hey Siri feature. While in-app, the microphone is always on. It waits for the user to start talking, and once the lightweight recognizer detects speech, other stuff happens.
I know this can be very battery efficient because Hey Siri is a system-wide feature. But at the same time, I have no clear idea of the impact on battery life. I only have anecdotal data – for example, the Sleep Cycle app uses 30% of battery if your phone is not charging while you sleep. So in that case, 30% battery for 8 hours of mic use. But that might be high because they're constantly doing some sort of sleep processing?
Is there a way to use Instruments or something to do an isolated battery test, or someone who has a better understanding of the microphone's impact on battery life? Thanks!
In your case, using "Hey Siri" as a comparison is not accurate because this feature relies on a dedicated SoC, specifically to optimize power usage. In your scenario, you have no choice but to consume CPU resources which will result in a higher power draw.
While further testing would be required, my assumption is that your power usage will be no better than an app in idle state at best (YMMV based on what else your app is doing).
https://machinelearning.apple.com/2017/10/01/hey-siri.html
To avoid running the main processor all day just to listen for the
trigger phrase, the iPhone’s Always On Processor (AOP) (a small,
low-power auxiliary processor, that is, the embedded Motion
Coprocessor) has access to the microphone signal (on 6S and later). We
use a small proportion of the AOP’s limited processing power to run a
detector with a small version of the acoustic model (DNN).
The acoustic model it is referring to is the trigger phrase "Hey Siri", which it has been highly optimized to detect, again circling back to power and performance considerations.

Alsa: snd_pci_readi() and real-time threads

I've got a dedicated thread that caputures audio from Alsa through snd_pcm_readi(). Periodically I get a short read, meaning snd_pcm_readi() returns a positive integer lower than my buffer size, and there's obviously a 'pop' sound in my audio stream. Then I set the thread priority to real-time and this gives a tangible benefit, far less short reads, but this doesn't solve.
Now the question: before going down the bumpy road of a real-time patched Linux kernel, there's something else I can do to squeeze out some more performance? Is calling snd_pcm_readi() in a dedicated thread the best way to pull audio out of Alsa?
For playback, the buffer size determines the latency.
For capture, it does not; only the period size determines how long you must wait until recorded samples are reported to be available.
So to prevent overruns, make the buffer as large as possible (e.g., by calling snd_pcm_hw_params_set_buffer_size_max() after setting the other parameters).

How to ensure audio rendered within time limit on iOS?

I am rendering low-latency audio from my custom synth code via the iOS Audio Unit render callback. Obviously if my rendering code is too slow then it will return from the callback too late and there will be a buffer underrun. I know from experience this results in silence being output.
I want to know what time limit I have so that I can manage the level of processing to match the device limitations etc..
Obviously the length of the buffer (in samples) determines the duration of audio being rendered and this sets an overall limit. However I suspect that the Apple audio engine will have a smaller time limit between issuing the render callback and requiring the response.
How can I find out this time limit and is that something I can do within the callback function itself?
If I happen to exceed the time limit and cause a buffer underrun, is there a notification I can receive or a status object I can interrogate?
NB: In my app I am creating a single 'output' audio unit, so I don't need to worry about chaining audio units together.
The amount of audio rendering that can be done in Audio Unit callbacks depends on the iOS device model and OS version, and well as potential CPU clock speed throttling due to temperature or background modes. Thus, it needs to be profiled on the oldest, slowest iOS device you plan on your app supporting, with some margin.
To support iOS 9, I very conservatively profile my apps on an iPhone 4S test device (ARM Cortex A9 CPU at 800 MHz), or an even older slower device by using an earlier iOS version. When doing this profiling, one can add some percentage of "make work" to test an audio callback and see if there is any margin (For a 50% margin, generate the sample buffer twice, etc.) Other developers appear to be less conservative.
This is why it is important for an mobile audio developer to have (or have access to) to several iOS devices (the older the better). If the callback meets the time limit on an old slow text device, it will very likely be more than fast enough on any newer iOS device.
Depending on the OS version, an underrun can either result in silence, or the Audio Unit stopping or crashing (which can be detected by no more or not enough callbacks within some predictable amount of time).
But the best way to avoid underrun is to do most of the heavy audio work in another thread outside the audio unit thread, and pass samples to/from the audio unit callback using a lock-free circular fifo/queue.
Adding to what hotpaw2 said, the worst performing iOS device I have encountered is the iPhone touch 16G without the rear facing camera. I have done projects where every device except the ipod touch 16G plays audio smoothly. I had to bump up the buffer duration to the next size to accommodate.
I typically have done all audio prepping prior before the render callback in a separate lockless ring buffer and keep the render callback limited to copying data. I let the application "deal" with a buffer underruns.
I personally never measured the render callback variance but I would guess that it would be consistently equal to the buffer duration time and would extremely minimal jitter (eg 5ms). I doubt it would be 4.9 ms one time then 5.1 ms the next time.
To get some timing info, in mach_time.hyou can use mach_absolute_time() to get some timing.
You didn't really say what your timing requirements are. I assume you need low latency audio. Otherwise, you can just set the buffer duration to be pretty big. I assume that you want to increase latency for slow devices using this code. I usually find what works on an iPod 16G and use that as a worst case.
NSTimeInterval _preferredDuration = ...
NSError* err;
[[AVAudioSession sharedInstance]setPreferredIOBufferDuration:_preferredDuration error:&err];
And of course, you should get the actual duration used. The OS will pick some power of two based on the sample rate:
NSTimeInterval _actualBufferDuration;
_actualBufferDuration = [[AVAudioSession sharedInstance] IOBufferDuration];
As far as adjusting for device performance. You can set the buffer duration

Sound Synchronization Issues

We are going to develop a Project on Sound Source Localization using Labview. Still We are on intial stage and going to perform all task on Software base with four mic connected with PC (For initial stage, later on going to develop using NI hardware if possible).
Initially we acquireing sound from 4 Different Microphones connected with computer through USB. Here all microhpones acquiring sound from single sound source with some delay(mili seconds) beacuse of their different position. But this Sound data acquired by USB are not able to write to sound card simulteneously. This data of sound acquire some hold time while writing to the sound card and we are getting some delay samples while synchronizaing these all sounds. Is there any idea to reduce this hold time of sounds that writes the data to the sound card?
Suppose hold time 10ms, want to reduce this to the micro seconds of nano seconds.
Reducing of the hold time as well as precise inter-channel synchronisation are not possible with LabVIEW running under Windows, and regular sound acquisition hardware. Internal software delays comparable with the time slice are to be expected (~10ms).
You need at least dedicated acquisition hardware (not a number of USB sound cards), and, if you would like to have precise synchronisation of your output with the input with minimum jitter, you need NI-FPGA. Giving these requirements, I would look at the R-series

Time between callback calls?

I have a lab project that uses mainly PyAudio and to further understand its way of working I made some measurements, in this case time between callbacks (using callback mode).
I timed it, and got an interesting result
(#256 chunk size, 44.1k fs): 0.0099701;0.0000365;0.0000201;0.0201579
This pattern goes on and on.
Between two longer calls, we have two shorter calls and sometimes the longer call is shorter (mind you I don't do anything else in the program than time the callbacks).
If we average this out we get our desired callback time:
1/44100 * 256 (roughly 5.8ms)
Here is my measurement visualized:
So can someone explain what exactly happens here under the hood?
What happens under the hood in PortAudio is dependent on a number of factors, including:
Which native audio API PortAudio is talking to
What buffer size and latency parameters you passed to Pa_OpenStream()
The capabilities of the audio hardware and its drivers, including its supported buffer sizes, buffering model and timing characteristics.
Under some circumstances PortAudio will request larger buffers from the native audio API and then invoke the PortAudio user callback multiple times in quick succession. This can happen if you have selected a small callback buffer size and a long latency.
Another scenario is that the native audio API doesn't support the buffer size that you requested for your callback size (framesPerBuffer parameter to Pa_OpenStream()). In this case PortAudio will be forced to use a driver-supported buffer size and then "adapt" between that buffer size and your callback buffer size. This adaption process can cause irregular timing.
Yet another possibility is that the native audio API uses a large ring buffer. Each time PortAudio polls the native host API, it will work to fill the native ring buffer by calling your callback as many times as needed. In this case irregular timing is related to the polling rate.
The above are not the only possibilities.
One likely explanation of what is happening in your case is that PortAudio is calling your callback 3 times in fast succession (a guess would be that the native buffer size is 3x your callback buffer size), for one of the reasons above.
Another possibility is that the native audio subsystem is signalling PortAudio irregularly. This can happen if a system layer below PortAudio is doing similar kinds of buffering to what I described above. I have seen this happen with DirectSound on Windows 7 for example. ASIO4ALL drivers will exhibit +/- 1ms jitter (which is not what you're seeing).
You can try reducing the requested stream latency to 0 and see if that changes the result. This will force double-buffering, which may or may not produce stable output. Another thing to try is to use the paFramesPerBufferUnspecified parameter, which will cause the callback to be called with the native buffer size -- then you can observe whether there is greater periodicity, what that buffer size is, and also whether the buffer size varies from callback to callback.
You didn't say which operating system and host API you're targetting, so it's hard to give more specific details than the above.
The internal buffering models used by the various PortAudio host API backends are described in some detail on the PortAudio wiki.
To answer a related question: why is it like this? Aside from the cases where it is a function of the lower layers of the native audio subsystem, or the buffer adaption process, it is often a result of specifying a large suggested latency to Pa_OpenStream(). Some PortAudio host APIs will relax the buffer periodicity if the specified latency is very high, in order to reduce system load that would be caused by high-frequency timer callbacks.

Resources