I have an application that does some drawing via Direct2D and captures screen contents via the Desktop Duplication API. It is a C# application, but I have custom layered windows to do my displaying and I am rendering via Direct2D to get around the "smart" drawing in WPF that was previously causing my application to not compete very aggressively for graphics resources.
If I run this application at the same time as a DirectX game that takes lots of graphics resources, then the behavior I get is as follows:
if my application is the foreground application, it works ok
if my application is visible but not the foreground application, then the calls to Map Subresource and Copy Resource take highly variable amounts of time, sometimes blocking for hundreds of milliseconds (according to the profiler.) I am not sure I believe the outliers in the hundreds, but certainly Map Subresource averages 50ms so the capture rate drops to below 20FPS. When running in the foreground, the same call takes 4ms on average.
I already checked if manipulating process and thread priorities can do anything, but it does not. It appears that DirectX is throttling my stuff when I am not foreground, independent of priorities. This is a problem because my application needs to run full frame rate without being foreground.
If it is strictly prioritizing GPU commands from the foreground application, then I could see that maybe my app is being starved out on the channels to/from the GPU, because that would explain stalling out like that. If I could somehow exempt this application from such policing, that would be optimal. Like for example some sort of manifest entry or DirectX call that notifies the driver (or stack) not to consider my application to be background?
I compared the code of OBS (as far as I can work it out) and I don't see what they do different. How does OBS manage to continue duplicating / capturing even though they are clearly not the foreground application when streaming even demanding games?
[edit]
The game in question is DCS, which always uses all available GPU resources if you let it, and uses about 1.5 cores. When I say it starts to slow down when not in foreground, I am literally talking about clicking on a window from my app versus clicking in the game window (not even minimizing or anything.) When I am in foreground, DCS slows down as I consume resources. If DCS is in the foreground, it continues to render at its full rate and uses all resources and my capture slows way down.
I have found that I can make things work well if I just limit DCS to 60 FPS via vsync, so that it can't use all resources. Then everything works perfectly fine. Obviously, that does not solve the problem, but it further shows this is some resource contention most likely. I am fine competing for the resources, but it does seem to prioritize the foreground app in DirectX (or Nvidia's driver?)
[/edit]
I Encountered Exact same issue and Found that it's because of Window's Gamemode feature. Turning it off and rebooting fixed it for me.
Also, I was using Direct x video processor for scaling my images And It's performance is highly inconsistent with various driver updates so for now i've decided to use "as is" without scaling.
Related
I am making a real-time image processing app on IOS with my team. I am handling the custom computation kernel (mostly on CPU rather than GPU) and my teammates deals with the GUI. When I tested my kernel on a toy app, the core (ignoring any IO overhead ) runs steadily at 100ms per image. However, when put into the full-functioning one, it is slowed down to 500ms per image.
I have checked that the data is pretty much the same and I am only measuring time consumed within the kernel, on the same iphone6. There are hardly any other computation in the full-functioning app so I am not sure what is pulling behind. Though GPU-processing is definitely an alternative and I am working on it, I would like to know if there is any tricks to use for now.
Currently, there is no explicit multi-threading in the computation part, so my simple guess is: should I programly put the computation part on a separate thread so the second core can be utilized?
[Update]
It turns out that I made some mistakes in packing my code as library, as the copying over the source code works out nicely. I have not figured out my problem yet and am going to post it on a separate question.
GPU Acceleration
This massively depends on the tasks you're performing, the GPU is good a specific subset of tasks and simply utilising it can sometimes even slow things down. Check this out
A lot of image based tasks that are part of the Quartz framework e.t.c are GPU accelerated (like blurring). Also if you use a library like OpenCV you get GPU acceleration on certain tasks out the box.
Unless you're a real pro I would avoid using the GPU specifically and let the frameworks and libraries you use do that for you.
Concurrency
It will certainly help to put intensive tasks on a background thread. Just be aware of what it entails (i.e. you can't make any UIKit calls from a background thread.
The answer heavily depends on how you do the processing. Some methods in the SDK perform their job in a background thread, while others require the caller to create and use one.
In general, in the case of drawing, most methods require you to create one explicitly. This is important especially for the ones that perform their work on the CPU (e.g. using CoreGraphics to draw within a drawRect method). If you're using methods that use GPU for the processing, then creating threads won't be much of use since CPU won't be the cause of the bottleneck.
If you want to determine why your app slows down, use Instruments. (Time Profiler for CPU and Core Animation for drawing)
I am trying to understand under what circumstances and how iOS may throttle my application threads due to excessive CPU consumption. The results I'm getting are kind of strange.
I have an application with OpenGL / GLKViewController rendering a view and a separate logic thread, started in the background using NSThread.detachNewThreadSelector, performing calculations. I find that if I (for purposes of discussion) let my computation thread run flat out as fast as it can, iOS quickly throttles it down. e.g. I monitor the FPS of both the view and my thread and I see that the view maintains e.g. 60fps and my logic thread is humming along but then suddenly drops after a few seconds.
So that makes sense to me that perhaps iOS tries to limit thread consumption. What is weird is that it doesn't just slow down gradually but it seems to "quantize" my logic thread's FPS at approximately some multiple of the GPU frame rate (i.e. 30 or 60fps)!
Now, keep in mind that there is no synchronization between these threads and the logic loop is self contained hard loop equivalent to while(true) so I have no idea how it's even possible for iOS to accomplish this magic unless it is somehow aware of my top level loop and interjecting itself into it.
In case you don't believe me that there is no synchronization point I will tell you that I have created a test case that literally just has an empty GLKViewController loop and an dumb logic thread that churns some numbers and it exhibits the behavior. Screenshots are below and I can post the code if anyone is interested.
The screenshots below are for two different "loads" of the logic thread, printed at intervals of a second, running on an iPad Air with iOS 8.
What's even stranger is - sometimes setting a lower preferred GLK frame rate (e.g. 30fps) can actually make my logic thread run slower. I'd have expected that reducing the work done by the GPU would free up (resources / heat dissipation) and reduce the need for throttling, but it doesn't always seem to be the case.
Does anyone have an explanation for this behavior? Is it documented? Thanks.
EDIT: My only guess at this point is that if the GPU runs to hot they shut down the second core and migrate threads back to the first... and then somehow thread prioritization accounts for the implicit synchronization, although I still can't envision exactly how this happens.
Nearing the end of my first (very simple) game using spritekit.
I have noted a long delay (about 10 -13 seconds) when running the app on a real device at start up.
I am assuming this is the time taken to load resources and execute code in initWithSize().
Is there a workaround or what is the recommend approach i.e. use a splash screen while game loads.
I am using a texture atlas but my understanding is that is for optimising resource calls during runtime.
Cheers
Using the instruments tool can help you locate where bottlenecks appear in code. I doubt it is 10-13 seconds from app launch as any app that takes 10 seconds or longer to launch is killed by the system.
Try to load resources intelligently. Load what is absolutely necessary when launching for what I assume would be your menu. If there are variations in different gameplay roles or similar assets used across multiple levels. Load them up, then when a selection has been made load anymore initial resources.
Also try to recycle resources where possible. For instance if an enemy is destroyed by the player. Don't destroy the object. Reuse it, the memory space it took up is likely to be the same again and creation is more expensive than reusing.
I found a lot of brilliant pointers in using Profiling tools through the WWDC talks, they are always a great resource.
All over this document Apple mention iOS terminates apps under certain conditions, and the most popular reason seems to be freeing up some RAM. And that causes issues for apps that do not implement state restoration - some of the content user is working on and stepped away from for a moment could be easily erased. There's even a 16 page thread on Apple forums where users complaining about that.
Is anyone aware why iOS actually terminates apps instead of moving memory occupied by them onto disc/swap?
Does termination actually provide considerable performance improvement compared to other means?
What you are describing is paging, or more accurately, page swapping. The iOS version of BSD Unix does not perform paging, for lots of reasons. Here are a few educated guesses:
It's too power-hungry for a mobile device.
Flash memory can't handle the churn involved in paging. Flash memory has a limited number of lifetime write cycles per storage location, and paging would chew through the life of the flash chip.
As the other poster pointed out, swapping to disk would use up available disk space, which is also limited. Not a problem when you have a 500 GB drive, but it is a big problem on a device with only 16 GB of HD and 1 GB of RAM.
You're not going to get an answer for this question here. Apple don't explain the inner workings of iOS and anything else is going to be guesswork.
Here's my guesswork:
iOS is a heavily resource constrained environment. Memory is limited but so is disk space - a 16GB iPhone has 1GB of RAM, so "swapping to disk" isn't really something that can be freely applied. When do you stop? How do you know this isn't already being done, but there is only a limited swap in place?
The primary goal of iOS has always been to prioritise responsiveness of the foreground app. Anything other than warning, then closing background apps would probably impact this too much. If there are 15 apps in the background then imagine the processor load on nicely swapping the memory out for each process?
Because the RAM that was saved onto the disc would be much slower. It's better to cut the program then having it run poorly. I think that answered both questions.
Thanks everybody for responses. I had to do some research to answer this question, though. So I was looking for more understanding that led into "app termination" decisions. I know, there are some smart people working in Apple, but for me it always help to understand the reason something is build "this way" rather then just following it.
It turned down into these 2 questions
Why iOS terminates apps instead of freeing memory by paging out (swapping)?
Does termination provide considerable performance win?
To understand that I dug a bit into the history of iPhone. There's a video that was accessible on iTunes, unfortunately the link does not work anymore. Anyways, the video was introducing the very first version of multitasking on iPhone 3G (or was it 3GS? Not sure which device starts to support multitasking).
Nowadays iPhone devices are quite advanced in terms of hardware. Those are actually more advanced then some desktops we had 7-10 years ago, which already have had incorporated swapping long ago. But if we look for first iPhone releases, those are not that much advanced in terms of hardware. iPhone 3G is 620 Mhz ARM and 128 RAM. iPod touch 1gen had 400mhz ARM. And multitasking was supposed to run on all the devices of that time.
If we take a look at iOS, it was always has the smoothness of animations in priority; taking look at hardware I see it would be challenging to have both snappy and responsive device along with processing swapping background applications memory, so it seems very logical and very fair to terminate apps. A year or two later Apple provided APIs to facilitate implementation state restoration.
But if we look at the current iPhones and iPads - they do have enough power in order not to terminate apps and just drop their memory on disk without any drop downs in animations and foreground app performance. Why not add that on latest devices? I assume this is common for the software industry; new features often prioritised higher then improvements on existing workflows; Apple has been releasing MobileMe, support for Retina displays, AutoLayout, iCloud - so I can understand that cool improvements of already existing features has been sacrificed.
The issue with apps that don't provide state restoration is easily solved by providing state restoration.
Just killing apps when the system runs out of memory is a huge performance gain. Consider that the system usually runs out of memory when you launch another app, and any action that is done instead of killing old apps would have to be done before launching the new app; that's about the most performance critical point in time.
And for at least five years you have been told that when your app goes to the background, you should store just enough state to come back to that state if your app is restarted.
What coding tricks, compilation flags, software-architecture considerations can be applied in order to keep powerconsumption low in an AIR for iOS application (or to reduce powerconsumption in an existing application that burns too much battery)?
One of the biggest things you can do is adjust the framerate based off of the app state.
You can do this by adding handlers inside your App.mxml
<s:ViewNavigatorApplication xmlns:fx="http://ns.adobe.com/mxml/2009"
xmlns:s="library://ns.adobe.com/flex/spark"
activate="activate()" deactivate="close()" />
Inside your activate and close methods
//activate
FlexGlobals.topLevelApplication.frameRate = 24;
//deactivate
FlexGlobals.topLevelApplication.frameRate = 2;
You can also play around with this number depending on what your app is currently doing. If you're just displaying text try lowering your fps. This should give you the most bang for your buck power savings.
Generally, high power consumption can be the result of:
intensive network usage
no sleep mode for display while app idle
un-needed location services usage
continously high usage of cpu
Regarding (flex/flash) AIR I would suggest that:
First you use the Flex profiler + task-manager and monitor CPU and Memory usage. Try to reduce them as much as possible. As soon as you have this low on windows/mac they will go lower (theoretically on mobile devices)
Next step would be to use a network monitor and reduce the amount and size of the network (webservice) calls. Try to identify unneeded network activity and eliminate it.
Try to detect any idle state of the app (possible in flex, not sure about flash) and maybe put the whole app in an idle mode (if you have fireworks animation running then just call stop())
Also I am not sure about it, but will reduce for sure cpu and use more gpu: by using Stage3D (now available with air 3.2 also for mobile) when you do complex anymations. It may reduce execution time since HW accel is there, therefore power consumption may be lower.
If I am wrong about something please comment/downvote (as you like) but this is my personal impression.
Update 1
As prompted in the comments, there is not a 100% link between cpu usage on a desktop and on a mobile device, but "theoreticaly" on the low level we should have at least the same cpu usage trend.
My tips:
Profile your App in a first step with the profiler from the Flash Builder
If you have a Mac, profile your app with Instruments from XCode
And important:
behaviors of Simulator, IPA-Interpreter packages and IPA-Test builds are different.
Simulator - pro forma optimizations
IPA-Interpreter - Get a feeling of performance
IPA-Test - "real" performance behavior
And finally test the AppStore-Build, it is the fastest (in meaning of performance) package mode.
Additional we saw, that all this modes can vary.