It came into my mind that Core Audio callbacks require very low latency. In my case I'm getting requests for 512 samples at a time, which at 44100Hz means that the callback can at a very maximum, take 11.6 milliseconds to run.
Now, as I understand garbage collection, each collection cycle requires the VM to stop all threads. It is then possible for a garbage collection cycle to interrupt a Core Audio callback, and get glitches.
If so, then it is not really safe to use Core Audio from MonoTouch.
Am I correct in my assumptions? or is this all incorrect?
The Core Audio render callback is going to be called on a realtime thread which is very strict about its deadlines. From the sounds of it, you're occasionally exceeding the render callback's time allowance, and being cut off (which == glitches). While I don't know much about MonoTouch, your guess about GC delays being the culprit does sound like a very likely conclusion.
To give you a sense of just how strict Core Audio render callbacks are, here's some things that are unacceptable in that context:
Allocating memory
Waiting on a mutex
Reading data from disk
Objective-C messaging
Due to the architecture of Core Audio, render callbacks are going to be triggered very shortly before the audio you produce will be heard. Therefore, even a brief GC hangup could trigger audible glitches.
No. The MonoTouch VM does not appear to be guaranteed to execute code in deterministic time. Real-time audio callbacks require code (usually compiled native C) whose performance can be strictly bounded in time, including all OS calls and any interpreter overhead.
Related
I am trying to understand under what circumstances and how iOS may throttle my application threads due to excessive CPU consumption. The results I'm getting are kind of strange.
I have an application with OpenGL / GLKViewController rendering a view and a separate logic thread, started in the background using NSThread.detachNewThreadSelector, performing calculations. I find that if I (for purposes of discussion) let my computation thread run flat out as fast as it can, iOS quickly throttles it down. e.g. I monitor the FPS of both the view and my thread and I see that the view maintains e.g. 60fps and my logic thread is humming along but then suddenly drops after a few seconds.
So that makes sense to me that perhaps iOS tries to limit thread consumption. What is weird is that it doesn't just slow down gradually but it seems to "quantize" my logic thread's FPS at approximately some multiple of the GPU frame rate (i.e. 30 or 60fps)!
Now, keep in mind that there is no synchronization between these threads and the logic loop is self contained hard loop equivalent to while(true) so I have no idea how it's even possible for iOS to accomplish this magic unless it is somehow aware of my top level loop and interjecting itself into it.
In case you don't believe me that there is no synchronization point I will tell you that I have created a test case that literally just has an empty GLKViewController loop and an dumb logic thread that churns some numbers and it exhibits the behavior. Screenshots are below and I can post the code if anyone is interested.
The screenshots below are for two different "loads" of the logic thread, printed at intervals of a second, running on an iPad Air with iOS 8.
What's even stranger is - sometimes setting a lower preferred GLK frame rate (e.g. 30fps) can actually make my logic thread run slower. I'd have expected that reducing the work done by the GPU would free up (resources / heat dissipation) and reduce the need for throttling, but it doesn't always seem to be the case.
Does anyone have an explanation for this behavior? Is it documented? Thanks.
EDIT: My only guess at this point is that if the GPU runs to hot they shut down the second core and migrate threads back to the first... and then somehow thread prioritization accounts for the implicit synchronization, although I still can't envision exactly how this happens.
I have a lab project that uses mainly PyAudio and to further understand its way of working I made some measurements, in this case time between callbacks (using callback mode).
I timed it, and got an interesting result
(#256 chunk size, 44.1k fs): 0.0099701;0.0000365;0.0000201;0.0201579
This pattern goes on and on.
Between two longer calls, we have two shorter calls and sometimes the longer call is shorter (mind you I don't do anything else in the program than time the callbacks).
If we average this out we get our desired callback time:
1/44100 * 256 (roughly 5.8ms)
Here is my measurement visualized:
So can someone explain what exactly happens here under the hood?
What happens under the hood in PortAudio is dependent on a number of factors, including:
Which native audio API PortAudio is talking to
What buffer size and latency parameters you passed to Pa_OpenStream()
The capabilities of the audio hardware and its drivers, including its supported buffer sizes, buffering model and timing characteristics.
Under some circumstances PortAudio will request larger buffers from the native audio API and then invoke the PortAudio user callback multiple times in quick succession. This can happen if you have selected a small callback buffer size and a long latency.
Another scenario is that the native audio API doesn't support the buffer size that you requested for your callback size (framesPerBuffer parameter to Pa_OpenStream()). In this case PortAudio will be forced to use a driver-supported buffer size and then "adapt" between that buffer size and your callback buffer size. This adaption process can cause irregular timing.
Yet another possibility is that the native audio API uses a large ring buffer. Each time PortAudio polls the native host API, it will work to fill the native ring buffer by calling your callback as many times as needed. In this case irregular timing is related to the polling rate.
The above are not the only possibilities.
One likely explanation of what is happening in your case is that PortAudio is calling your callback 3 times in fast succession (a guess would be that the native buffer size is 3x your callback buffer size), for one of the reasons above.
Another possibility is that the native audio subsystem is signalling PortAudio irregularly. This can happen if a system layer below PortAudio is doing similar kinds of buffering to what I described above. I have seen this happen with DirectSound on Windows 7 for example. ASIO4ALL drivers will exhibit +/- 1ms jitter (which is not what you're seeing).
You can try reducing the requested stream latency to 0 and see if that changes the result. This will force double-buffering, which may or may not produce stable output. Another thing to try is to use the paFramesPerBufferUnspecified parameter, which will cause the callback to be called with the native buffer size -- then you can observe whether there is greater periodicity, what that buffer size is, and also whether the buffer size varies from callback to callback.
You didn't say which operating system and host API you're targetting, so it's hard to give more specific details than the above.
The internal buffering models used by the various PortAudio host API backends are described in some detail on the PortAudio wiki.
To answer a related question: why is it like this? Aside from the cases where it is a function of the lower layers of the native audio subsystem, or the buffer adaption process, it is often a result of specifying a large suggested latency to Pa_OpenStream(). Some PortAudio host APIs will relax the buffer periodicity if the specified latency is very high, in order to reduce system load that would be caused by high-frequency timer callbacks.
I am using the mmap function on iOS in a drawing app. I have a file that uses mmap and then I create a CGBitmapContext from that memory. The user may perform many core graphics operations on this CGBitmapContext with their finger, which will cause the memory to be updated constantly.
How often will this flush to the flash storage and is this a concern for wearing out the flash storage or for performance? I haven't noticed anything bad in my tests, but I am not familiar enough with mmap to know for sure.
iOS is pretty smart about how often it flushes to disk. I have noticed that when I background the app, as long as I call msync, i.e. msync(self.memoryMap, self.memoryMapLength, MS_SYNC); it flushes properly.
While I use the app, even if there is a crash or sudden termination, usually all data is saved. If I kill my app while debugging, sometimes the last few changes are not saved, but usually everything is saved.
So my conclusion is that this is not a concern. iOS is not constantly writing to disk, it is writing to disk at smart intervals.
I have an iOS app that processes video frames from captureOutput straight from the camera. As part of the processing I'm calling several C functions in another source file. I convert UIImages into raw data and pass these rapidly - all of the processing is done on a queue tied to the video output.
This seems to work fine, up to a point. It seems that I'm hitting a limit when the data I'm passing becomes too large and I get seemingly random EXC_BAD_ACCESS errors popping up during the initialisation phase of the C function.
By initialisation I mean, declaring small static arrays and setting them to zero and suchlike.
I was wondering if I was hitting some kind of stack limit with passing large amounts of data so tried upping the stack size using Other Linker Flags and the -Wl,-stack_size, but this didn't seem to make a difference.
Is there anything else I should be aware of calling C functions from a non-UI thread in this way?
Sorry to be a little general, but I'm unable to post specifics of the code and am looking for general advice and tips for this kind of situation.
Some further information - we had issues with releasing memory and used autorelease pools in the video processing side in Objective-C (as recommended as we're on a different thread) - perhaps we're hitting the same difficulty with the C code. Is there a way to increase the frequency that releases/frees are executed in C or am I just chasing my tail?
So, the root of your problem is memory usage. Even if you don't leak any memory and are very careful, writing an video processing app on iOS is very tricky because there is only so much memory you can actually allocate in the app before the OS will terminate your app due to memory use. If you would like to read my blog post about the this subject, you can find it at video_and_memory_usage_on_ios. Some easy rules to remember are that you basically can allocate and use something like 10 megs of memory for a short time, but anything more than that and you risk upsetting the os and your app can be terminated. With virtual mapped memory in a file, the upper limit is a total of about 700 megs for all mapped memory at any one time. This is not a stack issue, I am talking about heap memory. You should not be putting video memory on the stack, that is just crazy. Be careful to only pass around pointers to memory and NEVER copy the memory from one buffer into another, just pass around refs to the memory in the buffer. The iOS APIs in CoreGraphics and CoreVideo support this type of "allocate a buffer and pass around the pointer" type of approach. The other rule of thumb to remember is to only process one frame at a time and then reuse the same buffer to process the next frame after the data has been written to a file or into a h.264 video via the AVAssets APIs.
After including several #autoreleasepool blocks around some key areas of code, we identified the major cause of the memory problems we were having.
It seemed that including the following block just inside the captureOutput callback function did the trick.
#autoreleaspool
{
imageColour = [self imageFromSampleBuffer:sampleBuffer];
}
Note: imageFromSampleBuffer was taken from this question
ios capturing image using AVFramework
I'm looking to do some high precision core motion reading (>=100Hz if possible) and motion analysis on the iPhone 4+ which will run continuously for the duration of the main part of the app. It's imperative that the motion response and the signals that the analysis code sends out are as free from lag as possible.
My original plan was to launch a dedicated NSThread based on the code in the metronome project as referenced here: Accurate timing in iOS, along with a protocol for motion analysers to link in and use the thread. I'm wondering whether GCD or NSOperation queues might be better?
My impression after copious reading is that they are designed to handle a quantity of discrete, one-off operations rather than a small number of operations performed over and over again on a regular interval and that using them every millisecond or so might inadvertently create a lot of thread creation/destruction overhead. Does anyone have any experience here?
I'm also wondering about the performance implications of an endless while loop in a thread (such as in the code in the above link). Does anyone know more about how things work under the hood with threads? I know that iPhone4 (and under) are single core processors and use some sort of intelligent multitasking (pre-emptive?) which switches threads based on various timing and I/O demands to create the effect of parallelism...
If you have a thread that has a simple "while" loop running endlessly but only doing any additional work every millisecond or so, does the processor's switching algorithm consider the endless loop a "high demand" on resources thus hogging them from other threads or will it be smart enough to allocate resources more heavily towards other threads in the "downtime" between additional code execution?
Thanks in advance for the help and expertise...
IMO the bottleneck are rather the sensors. The actual update frequency is most often not equal to what you have specified. See update frequency set for deviceMotionUpdateInterval it's the actual frequency? and Actual frequency of device motion updates lower than expected, but scales up with setting
Some time ago I made a couple of measurements using Core Motion and the raw sensor data as well. I needed a high update rate too because I was doing a Simpson integration and thus wnated to minimise errors. It turned out that the real frequency is always lower and that there is limit at about 80 Hz. It was an iPhone 4 running iOS 4. But as long as you don't need this for scientific purposes in most cases 60-70 Hz should fit your needs anyway.