Understanding hardware / OS level thread CPU throttling in iOS - ios

I am trying to understand under what circumstances and how iOS may throttle my application threads due to excessive CPU consumption. The results I'm getting are kind of strange.
I have an application with OpenGL / GLKViewController rendering a view and a separate logic thread, started in the background using NSThread.detachNewThreadSelector, performing calculations. I find that if I (for purposes of discussion) let my computation thread run flat out as fast as it can, iOS quickly throttles it down. e.g. I monitor the FPS of both the view and my thread and I see that the view maintains e.g. 60fps and my logic thread is humming along but then suddenly drops after a few seconds.
So that makes sense to me that perhaps iOS tries to limit thread consumption. What is weird is that it doesn't just slow down gradually but it seems to "quantize" my logic thread's FPS at approximately some multiple of the GPU frame rate (i.e. 30 or 60fps)!
Now, keep in mind that there is no synchronization between these threads and the logic loop is self contained hard loop equivalent to while(true) so I have no idea how it's even possible for iOS to accomplish this magic unless it is somehow aware of my top level loop and interjecting itself into it.
In case you don't believe me that there is no synchronization point I will tell you that I have created a test case that literally just has an empty GLKViewController loop and an dumb logic thread that churns some numbers and it exhibits the behavior. Screenshots are below and I can post the code if anyone is interested.
The screenshots below are for two different "loads" of the logic thread, printed at intervals of a second, running on an iPad Air with iOS 8.
What's even stranger is - sometimes setting a lower preferred GLK frame rate (e.g. 30fps) can actually make my logic thread run slower. I'd have expected that reducing the work done by the GPU would free up (resources / heat dissipation) and reduce the need for throttling, but it doesn't always seem to be the case.
Does anyone have an explanation for this behavior? Is it documented? Thanks.
EDIT: My only guess at this point is that if the GPU runs to hot they shut down the second core and migrate threads back to the first... and then somehow thread prioritization accounts for the implicit synchronization, although I still can't envision exactly how this happens.

Related

Alternatives to D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS?

This is a followon to this question about using the DX11VideoRenderer sample (a replacement for EVR that uses DirectX11 instead of EVR's DirectX9).
I've been trying to track down why it uses so much more CPU than the EVR. Task Manager shows me that most of that time is kernel mode.
Using profiling tools, I see that a LOT of time is being spent in numerous calls to NtDelayExecution (aka Sleep). How many calls? ~100,000 over the course of ~12 seconds. Ok, yeah, I'm sending a lot of frames in those 12 seconds, but that's still a lot of calls, every one of which requires a kernel mode transition.
The callstack shows the last call in "my" code is to IDXGISwapChain1::Present(0, 0). The actual call seems be Sleep(0) and comes from nvwgf2umx.dll (which is why this question is tagged NVidia: hopefully someone there can call up the code and see what the logic is behind such frequent calls).
I couldn't quite figure out why it would need to do /any/ Sleeping during Present. It's not like we wait for vertical retrace anymore, is it? But the other reason to use Sleep has to do with yielding to other threads. Which led me to a serious clue:
If I use D3D11_CREATE_DEVICE_PREVENT_INTERNAL_THREADING_OPTIMIZATIONS, the CPU utilization drops. Along with some other fixes, the DX11 version is now faster and uses less CPU time than the DX9 version (which is what I would hope/expect). Profiling shows that Sleep has dropped from >30% to <1%.
Unfortunately, this page tells me:
This flag is not recommended for general use.
Oh.
So, any ideas on how to get decent performance without using debug flags?

The same operation takes different execution times in two identical projects

I have an app which I am trying to migrate to a new project. There is a heavy operation which I am handling in main thread asynchronously. In my older project, it takes just a second to complete this task but in my new project, it takes 6-7 seconds for the same task.
I observed the CPU usage and it looks like the new app is using less CPU and getting very few threads while the old one gets lots of threads for the same task. PS: I am using the same device.
What could cause this? Any ideas or suggestions to find out?
Thanks.
Finally, I found the problem. It was caused by Optimization Level setting in Xcode Build Settings. When a new project created, default Debug optimization level is none, and Release optimization level is Fastest, Smallest [-Os] So when I changed Debug to Fastest, Smallest [-Os] my task completion time dropped to 1 sec.
From Apple:
The Xcode compiler supports optimization options that let you choose
whether you prefer a smaller binary size, faster code, or faster build
times. For new projects, Xcode automatically disables optimizations
for the debug build configuration and selects the Fastest, Smallest
option for the release build configuration. Code optimizations of any
kind result in slower build times because of the extra work involved
in the optimization process. If your code is changing, as it does
during the development cycle, you do not want optimizations enabled.
As you near the end of your development cycle, though, the release
build configuration can give you an indication of the size of your
finished product, so the Fastest, Smallest option is appropriate.
If you want to read more about optimization levels and performance: Tuning for Performance and Responsiveness
Side note: Changing the optimization level to fastest, Smallest [-0s] in debug mode might affect the debugger breakpoints and it will behave abruptly.
Cheers.
This probably isn't really a response to your question, you answered it yourself quite nicely, but nonetheless I feel like it's needed.
I'd like to stress out how you should NOT make long running operation on the main thread. For no reason. Actually, if you want the screen to refresh 60 times per second (which should always be your goal) it means that every block of code you submit to the main thread must last less than 0.016 seconds (1/60) to avoid losing some frames. If, in the meantime, you also need to make the main thread do some complex animation and other stuff, well probably you need to go far behind the 0.016 seconds point.
If you block the main thread for much more than that (like 1 second in this case) than the users will experience a stuck interface, they can't scroll a scrollView or navigate the app. They may as well close your app entirely since they may feel like it's stuck.
In your case, for example, you may want to add some nice loading animation, like the ActivityIndicator or some nicer animation, to express you're actually working at that moment and you didn't freeze. That's really expected by users nowadays.
You may (or may not, it's up to you) also wanna add a cancel button, if the user wants to cancel the long running operation and do something else with your app.
To avoid what you say that causes the loss of performances (the task is slowed up to 7-8 seconds) you may want to use a serialQueue with a high quality of service.
Probably userInitiated is what you want.
This way you still have those task be prioritized by the OS, but you won't block the main thread in meantime, allowing you to add that loading animation for example.
If that's still too low of a performance, you could think of splitting the task in subtasks and having them performed in parallel by using DispatchQueue.concurrentPerform(iterations:execute:) (but I don't know if that's doable in your case).
I hope this helps you.
Cheers

Should I programly put computation-heavy tasks on a separate thread on IOS to utilize multi-core?

I am making a real-time image processing app on IOS with my team. I am handling the custom computation kernel (mostly on CPU rather than GPU) and my teammates deals with the GUI. When I tested my kernel on a toy app, the core (ignoring any IO overhead ) runs steadily at 100ms per image. However, when put into the full-functioning one, it is slowed down to 500ms per image.
I have checked that the data is pretty much the same and I am only measuring time consumed within the kernel, on the same iphone6. There are hardly any other computation in the full-functioning app so I am not sure what is pulling behind. Though GPU-processing is definitely an alternative and I am working on it, I would like to know if there is any tricks to use for now.
Currently, there is no explicit multi-threading in the computation part, so my simple guess is: should I programly put the computation part on a separate thread so the second core can be utilized?
[Update]
It turns out that I made some mistakes in packing my code as library, as the copying over the source code works out nicely. I have not figured out my problem yet and am going to post it on a separate question.
GPU Acceleration
This massively depends on the tasks you're performing, the GPU is good a specific subset of tasks and simply utilising it can sometimes even slow things down. Check this out
A lot of image based tasks that are part of the Quartz framework e.t.c are GPU accelerated (like blurring). Also if you use a library like OpenCV you get GPU acceleration on certain tasks out the box.
Unless you're a real pro I would avoid using the GPU specifically and let the frameworks and libraries you use do that for you.
Concurrency
It will certainly help to put intensive tasks on a background thread. Just be aware of what it entails (i.e. you can't make any UIKit calls from a background thread.
The answer heavily depends on how you do the processing. Some methods in the SDK perform their job in a background thread, while others require the caller to create and use one.
In general, in the case of drawing, most methods require you to create one explicitly. This is important especially for the ones that perform their work on the CPU (e.g. using CoreGraphics to draw within a drawRect method). If you're using methods that use GPU for the processing, then creating threads won't be much of use since CPU won't be the cause of the bottleneck.
If you want to determine why your app slows down, use Instruments. (Time Profiler for CPU and Core Animation for drawing)

NSThread, NSOperation or GCD for CoreMotion and accurate timing purposes?

I'm looking to do some high precision core motion reading (>=100Hz if possible) and motion analysis on the iPhone 4+ which will run continuously for the duration of the main part of the app. It's imperative that the motion response and the signals that the analysis code sends out are as free from lag as possible.
My original plan was to launch a dedicated NSThread based on the code in the metronome project as referenced here: Accurate timing in iOS, along with a protocol for motion analysers to link in and use the thread. I'm wondering whether GCD or NSOperation queues might be better?
My impression after copious reading is that they are designed to handle a quantity of discrete, one-off operations rather than a small number of operations performed over and over again on a regular interval and that using them every millisecond or so might inadvertently create a lot of thread creation/destruction overhead. Does anyone have any experience here?
I'm also wondering about the performance implications of an endless while loop in a thread (such as in the code in the above link). Does anyone know more about how things work under the hood with threads? I know that iPhone4 (and under) are single core processors and use some sort of intelligent multitasking (pre-emptive?) which switches threads based on various timing and I/O demands to create the effect of parallelism...
If you have a thread that has a simple "while" loop running endlessly but only doing any additional work every millisecond or so, does the processor's switching algorithm consider the endless loop a "high demand" on resources thus hogging them from other threads or will it be smart enough to allocate resources more heavily towards other threads in the "downtime" between additional code execution?
Thanks in advance for the help and expertise...
IMO the bottleneck are rather the sensors. The actual update frequency is most often not equal to what you have specified. See update frequency set for deviceMotionUpdateInterval it's the actual frequency? and Actual frequency of device motion updates lower than expected, but scales up with setting
Some time ago I made a couple of measurements using Core Motion and the raw sensor data as well. I needed a high update rate too because I was doing a Simpson integration and thus wnated to minimise errors. It turned out that the real frequency is always lower and that there is limit at about 80 Hz. It was an iPhone 4 running iOS 4. But as long as you don't need this for scientific purposes in most cases 60-70 Hz should fit your needs anyway.

Do NSThread have same memory privileges as main thread?

I'm using NSOperationQueue to manage a phase of an iOS application which is quite long so I would like to manage it asynchronously. Inside that phase I allocate big arrays in C by using directly calloc functions.
With big I mean a 1024x256 bidimensional array of floats and similar things.
If everything resides on the main thread than the app locks up while computing but everything goes fine, if, instead, I move the heavy part to a NSInvocationOperation then I got many strange results, with debugger sometimes I get a strange message in console stating
No memory available to program now: unsafe to call malloc
so I was wondering if threads managed by an operation queue have some different restrictions compared to main thread, and in case what is better to do to get around this issue.
There's no restrictions that I know of.. however, you may be hitting the edge of available RAM. Since iOS doesn't do virtual memory, when memory gets low, it'll send a warning to other apps to free up RAM. That may be the source of your issue.
Use instruments to profile how much RAM you're using. If it's more than about 20MB or so, you're probably in danger of being terminated due to excessive memory usage anyway.

Resources