I am working on a Sprite Kit game and I need to do some multithreading to maintain the healthy fps.
On update I call a function to create a lot of UIBezierPaths and merge them using a C++ static library.
If I have more than 10 shapes the frame rate drops dramatically so I decided I give GCD a try and try to solve the issue with a separate thread.
I put this in didMoveToView:
queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0);
and in the function that is being called on every frame, I call this:
dispatch_async(queue,^(void){[self heavyCalculationsFunc];});
For somebody who knows GCD well it might be obvious it creates a new thread on every frame, but for me it wasn't clear yet.
My question is, is there any way to re-use a thread that I want to call on update?
Thanks for your help in advance!
If you have work that you need to do on every frame, and that needs to get done before the frame is rendered, multithreading probably won't help you, unless you're willing to put a lot of effort into it.
Maintaining a frame rate is all about time — not CPU resources, just wall time. To keep a 60fps framerate, you have 16.67 ms to do all your work in. (Actually, less than that, because SpriteKit and OpenGL need some of that time to render the results of your work.) This is a synchronous problem — you have work, you have a specific amount of time to do it in, so the first step to improving performance is to do less work or do it more efficiently.
Multithreading, on the other hand, is generally for asynchronous problems — there's work you need to do, but it doesn't need to get done right now, so you can get on with the other things you need to do right now (like returning from your update method within 16 ms to keep your framerate up) and check back for the results of that work later (say, on a later frame).
There is a little bit of wiggle room between these two definitions, though: just about all modern iOS devices have multicore CPUs, so if you play your cards right you can fit a little bit of asynchronicity into your synchronous problem by parallelizing your workload. Getting this done, and doing it well, is no small feat — it's been the subject of serious research and investment by big game studios for years.
Take a look at the figure under "How a Scene Processes Frames of Animation" in the SpriteKit Programming Guide. That's your 16 ms clock. The light blue regions are slices of that 16 ms that Apple's SpriteKit (and OpenGL, and other system frameworks) code is responsible for. The other slices are yours. Let's unroll that diagram for a better look:
If you do too much work in any of those slices, or make SpriteKit's workload too large, the whole thing gets bigger than 16 ms and your framerate drops.
The opportunity for threading is to get some work done on the other CPU during that same timeline. If SpriteKit's handling of actions, physics, and constraints doesn't depend on that work, you can do it in parallel with those things:
Or, if your work needs to happen before SpriteKit runs actions & physics, but you have other work you need to do in the update method, you can send some of the work off to another thread while doing the rest of your update work, then check for results while still in your update method:
So how to accomplish these things? Here's one approach using dispatch groups and the assumption that actions/physics/constraints don't depend on your background work — it's totally off the top of my head, so it may not be the best. :)
// in setup
dispatch_queue_t workQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_group_t resultCatchingGroup = dispatch_group_create();
id stuffThatGetsMadeInTheBackground;
- (void)update:(NSTimeInterval)currentTime {
dispatch_group_async(group, queue, ^{
// Do the background work
stuffThatGetsMadeInTheBackground = // ...
});
// Do anything else you need to before actions/physics/constraints
}
- (void)didFinishUpdate {
// wait for results from the background work
dispatch_group_wait(resultCatchingGroup, DISPATCH_TIME_FOREVER);
// use those results
[self doSomethingWith:stuffThatGetsMadeInTheBackground];
}
Of course, dispatch_group_wait will, as its name suggests, block execution to wait until your background work is done, so you still have that 16ms time constraint. If the foreground work (the rest of your update, plus SpriteKit's actions/physics/constraints work and any of your other work that gets done in response to those things) gets done before your background work does, you'll be waiting for it. And if the background work plus SpriteKit's rendering work (plus whatever you do in update before spawning the background work) takes longer than 16 ms, you'll still drop frames. So the trick to this is knowing your workload in enough detail to schedule it well.
Consider a slightly different approach. Create and maintain your own queue rather than getting a system queue.
a) call dispatch_queue_create to make a new queue, save this queue in your object. Use dispatch_async on that queue to run your job. You may need to synchronize if that has to complete before the next frame, etc.
b) If you have multiple jobs, consider a concurrent queue instead of a serial queue, which may or may not make things 'faster' depending on your dependencies.
With GCD you're supposed to not think about threads, if new threads are created/reused etc. Just think about the queues, and what you're pushing on to them. Reading Apple's Concurrency Programming Guide and reference on gcd will also hopefully help.
Related
MetalKit calls drawInMTKView when it wants a your delegate to draw a new frame, but I wonder if it waits for the last drawable to have been presented before it asks your delegate to draw on a new one?
From what I understand reading this article, CoreAnimation can provide up to three "in flight" drawables, but I can't find whether MetalKit tries to draw to them as soon as possible or if it waits for something else to happen.
What would this something else be? What confuses me a little is the idea of drawing to up to two frames in advance, since it means the CPU must already know what it wants to render two frames in the future, and I feel like it isn't always the case. For instance if your application depends on user input, you can't know upfront the actions the user will have done between now and when the two frames you are drawing to will be presented, so they may be presented with out of date content. Is this assumption right? In this case, it could make some sense to only call the delegate method at a maximum rate determined by the intended frame rate.
The problem with synchronizing with the frame rate is that this means the CPU may sometimes be inactive when it could have done some useful work.
I also have the intuition this may not be happening this way since in the article aforementioned, it seems like drawInMTKView is called as often as a drawable is available, since they seem to rely on it being called to make work that uses resources in a way that avoids CPU stalling, but since there are many points that are unclear to me I am not sure of what is happening exactly.
MTKView documentation mentions in paused page that
If the value is NO, the view periodically redraws the contents, at a frame rate set by the value of preferredFramesPerSecond.
Based on samples there are for MTKView, it probably uses a combination of an internal timer and CVDisplayLink callbacks. Which means it will basically choose the "right" interval to call your drawing function at the right times, usually after other drawable is shown "on-glass", so at V-Sync interval points, so that your frame has the most CPU time to get drawn.
You can make your own view and use CVDisplayLink or CADisplayLink to manage the rate at which your draws are called. There are also other ways such as relying on back pressure of the drawable queue (basically just calling nextDrawable in a loop, because it will block the thread until the drawable is available) or using presentAfterMinimumDuration. Some of these are discussed in this WWDC video.
I think Core Animation triple buffers everything that gets composed by Window Server, so basically it waits for you to finish drawing your frame, then it draws it with the other frames and then presents it to "glass".
As to a your question about the delay: you are right, the CPU is two or even three "frames" ahead of the GPU. I am not too familiar with this, and I haven't tried it, but I think it's possible to actually "skip" the frames you drew ahead of time if you delay the presentation of your drawables up until the last moment, possibly until scheduled handler on one of your command buffers.
I'm looking for a way to stall core animation until I can finish doing some processing in core graphics. I've tried a few methods, but nothing seems to work.
Background:
I've got a lot of Core-Graphics drawing I need to do. All of the processing is done in an ImageContext, so I render the images concurrently and once they're done I set the appropriate layer.content to the resulting image (on the main thread). The problem I have, is that often the current run loop will end before I'm done processing the images and core-animation will begin it's thing without all the proper view backgrounds in place (often displaying black backgrounds) during the animation. I need a way to stall core animation until all the concurrent image processing is done.
I also do a lot of "stamping": caching the resulting image and returning it so I don't have to perform unnecessary processing. I believe this rules out using [CALayer drawInContext:] and setting drawsAsynchronously = YES; since I don't want to reprocess the core-graphics for each view.
Here's what I've tried:
I've tried using performSelector on a method that will "wait" (run an empty while loop) until the processing is done, but you can only either perform the selector immediately (too soon), or else in the next run loop using delay (too late).
I've tried adding an observer to the main run loop:
Code:
CFRunLoopObserverRef observer = CFRunLoopObserverCreateWithHandler(NULL, kCFRunLoopAllActivities, YES, 0, ^(CFRunLoopObserverRef observer, CFRunLoopActivity activity) {
[[Themes class] performSelectorOnMainThread:#selector(waitOnDraw) withObject:nil waitUntilDone:YES modes:Array(NSRunLoopCommonModes)];
});
CFRunLoopAddObserver(CFRunLoopGetMain(), observer, kCFRunLoopCommonModes);
Note: kCFRunLoopAllActivities is overkill, but none of the other modes was working. Just trying to cover all my basis.
If I could access the thread the core-animation runs on, maybe I could stall it directly, but I'm not sure how to access that thread.
And that's it... I can't think of any other way to do it.
Any help would be greatly appreciated.
Note: I know I could just run all the processing on the main thread, but I'm looking for as many performance gains as possible. Processing the Core-graphics code concurrently really speeds things up on newer devices.
I'm developing an iPad app that uses large textures in OpenGL ES. When the scene first loads I get a large black artifact on the ceiling for a few frames, as seen in the picture below. It's as if higher levels of the mipmap have not yet been filled in. On subsequent frames, the ceiling displays correctly.
This problem only began showing up when I started using mipmapping. One possible explanation is that the glGenerateMipmap() call does its work asynchronously, spawning some mipmap creation worker (in a separate process, or perhaps in the GPU) and returning.
Is this possible, or am I barking up the wrong tree?
Within a single context, all operations will appear to execute strictly in order. However, in your most recent reply, you mentioned using a second thread. To do that, you must have created a second shared context: it is always illegal to re-enter an OpenGL context. If already using a shared context, there are still some synchronization rules you must follow, documented at http://developer.apple.com/library/ios/ipad/#DOCUMENTATION/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/WorkingwithOpenGLESContexts/WorkingwithOpenGLESContexts.html
It should be synchronous; OpenGL does not in itself have any real concept of threading (excepting the implicit asynchronous dialogue between CPU and GPU).
A good way to diagnose would be to switch to GL_LINEAR_MIPMAP_LINEAR. If it's genuinely a problem with lower resolution mip maps not arriving until later then you'll see the troublesome areas on the ceiling blend into one another rather than the current black-or-correct effect.
A second guess, based on the output, would be some sort of depth buffer clearing issue.
I followed #Tommy's suggestion and switched to GL_LINEAR_MIPMAP_LINEAR. Now the black-or-correct effect changed to a fade between correct and black.
I guess that although we all know that OpenGL is a pipeline (and therefore asynchronous unless you are retrieving state or explicity synchronizing), we tend to forget it. I certainly did in this case, where I was not drawing, but loading and setting up textures.
Once I confirmed the nature of the problem, I added a glFinish() after loading all my textures, and the problem went away. (Btw, my draw loop is in the foreground and my texture loading loop - because it is so time consuming and would impair interactivity - is in the background. Also, since this may vary between platforms, I'm using iOS5 on an iPad 2)
Currently, I have a fixed timestep game loop running on a second thread in my game. The OpenGL context is on the same thread, and rendering is done once per frame after any updates. So the main "loop" has to wait for drawing each frame before it can proceed. This wasn't really a problem until I wrote my particle system. Upwards of 1500+ particles with a physics step of 16ms causes the framerate to drop just below 30, anymore and it's worse. The particle rendering can't be optimized anymore without losing capability, so I decided to try moving OpenGL to a 3rd thread. I know this is somewhat of an extreme case, but I feel it should be able to handle it.
I've thought of running 2 loops concurrently, one for the main stepping (fixed timestep) and one for drawing (however fast it can go). However the rendering calls pass in data that may be changed each update, so I was concerned that locking would slow it down and negate benefit. However, after implenting a test to do this, I'm just getting EXC_BAD_ACCESS after less than a second of runtime. I assume because they're trying to access the same data at the same time? I thought the system automatically handled this?
When I was first learning OpenGL on the iPhone, I had OpenGL setup on the main thread, and would call performSelectorOnMainThread:withObject:waitUntilDone: with the rendering selector, and these errors would happen any time waitUntilDone was false. If it was true, it would happen randomly sometimes, but sometimes I could let it run for 30 mins and it would be fine. Same concept as what's happening now I assume. I am getting the first frame drawn to the screen before it crashes though, so I know something is working.
How would this be properly handled, and if so would it even provide the speed up I'm looking for? Or would multiple access slow it down just as much?
ok, so apparently xna games can only run at 30fps, which is a shame, because our game on iphone looked alot better at 60...
at any rate, because the only way you can get information about the touch screen state is to get its current state, effectively this means you can only sample the touch screen at 30 fps.
even if our game has to run at 30fps, is there any way to get higher resolution sampling from the touch screen? maybe through callbacks? or by accessing a list of touch events with time stamps?
The function you are looking for is TouchPanel.GetState. It is a simple matter of calling this function at 60Hz.
To get 60Hz you could set Game.TargetElapsedTime to 1/60th of a second. This will give you two updates to every one draw (according to Shawn Hargreaves' post here) assuming you are VSyncing at 30FPS.
If you still want your game state updates to run at 30FPS (just doing touch input at 60FPS), then you could put those updates on a different thread. Start an update going on that thread on the first call to Game.Update, and wait for it to finish on the second one, and so on.
(You should note that normally XNA input must be done on the main thread (source). I assume this applies to Phone and to touch input.)
Alternately you could replace the Game class's timing yourself entirely (calling GraphicsDevice.Present yourself). It's not easy to do, but it's possible. A good place to start is to look at the Game class in Reflector.
(Disclaimer: I haven't tried any actual Phone-based development yet, so there may be some Phone-related gotchas I am unaware of.)
The sampling rate of 30fps is set for performance reasons.
Even if you could find a way to query for touches more frequently you still couldn't update the UI at a faster rate so I'm not sure what benefit you'd get.
Before spending too much time on trying to find a solution I'd test on an actual device to see how acceptable 30fps really is.