DirectX: Game loop order, draw first and then handle input? - directx

I was just reading through the DirectX documentation and encountered something interesting in the page for IDirect3DDevice9::BeginScene :
To enable maximal parallelism between
the CPU and the graphics accelerator,
it is advantageous to call
IDirect3DDevice9::EndScene as far
ahead of calling present as possible.
I've been accustomed to writing my game loop to handle input and such, then draw. Do I have it backwards? Maybe the game loop should be more like this: (semi-pseudocode, obviously)
while(running) {
d3ddev->Clear(...);
d3ddev->BeginScene();
// draw things
d3ddev->EndScene();
// handle input
// do any other processing
// play sounds, etc.
d3ddev->Present(NULL, NULL, NULL, NULL);
}
According to that sentence of the documentation, this loop would "enable maximal parallelism".
Is this commonly done? Are there any downsides to ordering the game loop like this? I see no real problem with it after the first iteration... And I know the best way to know the actual speed increase of something like this is to actually benchmark it, but has anyone else already tried this and can you attest to any actual speed increase?

Since I always felt that it was "awkward" to draw-before-sim, I tended to push the draws until after the update but also after the "present" call. E.g.
while True:
Simulate()
FlipBuffers()
Render()
While on the first frame you're flipping nothing (and you need to set up things so that the first flip does indeed flip to a known state), this always struck me as a bit nicer than putting the Render() first, even though the order of operations are the same once you're under way.

The short answer is yes, this is how it's commonly done. Take a look at the following presentation on the game loop in God of War III on the PS3:
http://www.tilander.org/aurora/comp/gdc2009_Tilander_Filippov_SPU.pdf
If you're running a double buffered game at 30 fps, the input lag will be 1 / 30 ~= 0.033 seconds which is way to small to be detected by a human (for comparison, any reaction time under 0.1 seconds on 100 metres is considered to be a false start).

Its worth noting that on nearly all PC hardware BeginScene and EndScene do nothing. In fact the driver buffers up all the draw commands and then when you call present it may not even begin drawing. They commonly buffer up several frames of draw commands to smooth out frame rate. Usually the driver does things based around the present call.
This can cause input lag when frame rate isn't particularly high.
I'd wager if you did your rendering immediately before the present you'd notice no difference to the loop you give above. Of course on some odd bits of hardware this may then cause issues so, in general, you are best off looping as you suggest above.

Related

When exactly is drawInMTKView called?

MetalKit calls drawInMTKView when it wants a your delegate to draw a new frame, but I wonder if it waits for the last drawable to have been presented before it asks your delegate to draw on a new one?
From what I understand reading this article, CoreAnimation can provide up to three "in flight" drawables, but I can't find whether MetalKit tries to draw to them as soon as possible or if it waits for something else to happen.
What would this something else be? What confuses me a little is the idea of drawing to up to two frames in advance, since it means the CPU must already know what it wants to render two frames in the future, and I feel like it isn't always the case. For instance if your application depends on user input, you can't know upfront the actions the user will have done between now and when the two frames you are drawing to will be presented, so they may be presented with out of date content. Is this assumption right? In this case, it could make some sense to only call the delegate method at a maximum rate determined by the intended frame rate.
The problem with synchronizing with the frame rate is that this means the CPU may sometimes be inactive when it could have done some useful work.
I also have the intuition this may not be happening this way since in the article aforementioned, it seems like drawInMTKView is called as often as a drawable is available, since they seem to rely on it being called to make work that uses resources in a way that avoids CPU stalling, but since there are many points that are unclear to me I am not sure of what is happening exactly.
MTKView documentation mentions in paused page that
If the value is NO, the view periodically redraws the contents, at a frame rate set by the value of preferredFramesPerSecond.
Based on samples there are for MTKView, it probably uses a combination of an internal timer and CVDisplayLink callbacks. Which means it will basically choose the "right" interval to call your drawing function at the right times, usually after other drawable is shown "on-glass", so at V-Sync interval points, so that your frame has the most CPU time to get drawn.
You can make your own view and use CVDisplayLink or CADisplayLink to manage the rate at which your draws are called. There are also other ways such as relying on back pressure of the drawable queue (basically just calling nextDrawable in a loop, because it will block the thread until the drawable is available) or using presentAfterMinimumDuration. Some of these are discussed in this WWDC video.
I think Core Animation triple buffers everything that gets composed by Window Server, so basically it waits for you to finish drawing your frame, then it draws it with the other frames and then presents it to "glass".
As to a your question about the delay: you are right, the CPU is two or even three "frames" ahead of the GPU. I am not too familiar with this, and I haven't tried it, but I think it's possible to actually "skip" the frames you drew ahead of time if you delay the presentation of your drawables up until the last moment, possibly until scheduled handler on one of your command buffers.

Is iOS glGenerateMipmap synchronous, or is it possibly asynchronous?

I'm developing an iPad app that uses large textures in OpenGL ES. When the scene first loads I get a large black artifact on the ceiling for a few frames, as seen in the picture below. It's as if higher levels of the mipmap have not yet been filled in. On subsequent frames, the ceiling displays correctly.
This problem only began showing up when I started using mipmapping. One possible explanation is that the glGenerateMipmap() call does its work asynchronously, spawning some mipmap creation worker (in a separate process, or perhaps in the GPU) and returning.
Is this possible, or am I barking up the wrong tree?
Within a single context, all operations will appear to execute strictly in order. However, in your most recent reply, you mentioned using a second thread. To do that, you must have created a second shared context: it is always illegal to re-enter an OpenGL context. If already using a shared context, there are still some synchronization rules you must follow, documented at http://developer.apple.com/library/ios/ipad/#DOCUMENTATION/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/WorkingwithOpenGLESContexts/WorkingwithOpenGLESContexts.html
It should be synchronous; OpenGL does not in itself have any real concept of threading (excepting the implicit asynchronous dialogue between CPU and GPU).
A good way to diagnose would be to switch to GL_LINEAR_MIPMAP_LINEAR. If it's genuinely a problem with lower resolution mip maps not arriving until later then you'll see the troublesome areas on the ceiling blend into one another rather than the current black-or-correct effect.
A second guess, based on the output, would be some sort of depth buffer clearing issue.
I followed #Tommy's suggestion and switched to GL_LINEAR_MIPMAP_LINEAR. Now the black-or-correct effect changed to a fade between correct and black.
I guess that although we all know that OpenGL is a pipeline (and therefore asynchronous unless you are retrieving state or explicity synchronizing), we tend to forget it. I certainly did in this case, where I was not drawing, but loading and setting up textures.
Once I confirmed the nature of the problem, I added a glFinish() after loading all my textures, and the problem went away. (Btw, my draw loop is in the foreground and my texture loading loop - because it is so time consuming and would impair interactivity - is in the background. Also, since this may vary between platforms, I'm using iOS5 on an iPad 2)

How can I ensure the correct frame rate when recording an animation using DirectShow?

I am attempting to record an animation (computer graphics, not video) to a WMV file using DirectShow. The setup is:
A Push Source that uses an in-memory bitmap holding the animation frame. Each time FillBuffer() is called, the bitmap's data is copied over into the sample, and the sample is timestamped with a start time (frame number * frame length) and duration (frame length). The frame rate is set to 10 frames per second in the filter.
An ASF Writer filter. I have a custom profile file that sets the video to 10 frames per second. Its a video-only filter, so there's no audio.
The pins connect, and when the graph is run, a wmv file is created. But...
The problem is it appears DirectShow is pushing data from the Push Source at a rate greater than 10 FPS. So the resultant wmv, while playable and containing the correct animation (as well as reporting the correct FPS), plays the animation back several times too slowly because too many frames were added to the video during recording. That is, a 10 second video at 10 FPS should only have 100 frames, but about 500 are being stuffed into the video, resulting in the video being 50 seconds long.
My initial attempt at a solution was just to slow down the FillBuffer() call by adding a sleep() for 1/10th second. And that indeed does more or less work. But it seems hackish, and I question whether that would work well at higher FPS.
So I'm wondering if there's a better way to do this. Actually, I'm assuming there's a better way and I'm just missing it. Or do I just need to smarten up the manner in which FillBuffer() in the Push Source is delayed and use a better timing mechanism?
Any suggestions would be greatly appreciated!
I do this with threads. The main thread is adding bitmaps to a list and the recorder thread takes bitmaps from that list.
Main thread
Animate your graphics at time T and render bitmap
Add bitmap to renderlist. If list is full (say more than 8 frames) wait. This is so you won't use too much memory.
Advance T with deltatime corresponding to desired framerate
Render thread
When a frame is requested, pick and remove a bitmap from the renderlist. If list is empty wait.
You need a threadsafe structure such as TThreadList to hold the bitmaps. It's a bit tricky to get right but your current approach is guaranteed to give to timing problems.
I am doing just the right thing for my recorder application (www.videophill.com) for purposes of testing the whole thing.
I am using Sleep() method to delay the frames, but am taking great care to ensure that timestamps of the frames are correct. Also, when Sleep()ing from frame to frame, please try to use 'absolute' time differences, because Sleep(100) will sleep about 100ms, not exactly 100ms.
If it won't work for you, you can always go for IReferenceClock, but I think that's overkill here.
So:
DateTime start=DateTime.Now;
int frameCounter=0;
while (wePush)
{
FillDataBuffer(...);
frameCounter++;
DateTime nextFrameTime=start.AddMilliseconds(frameCounter*100);
int delay=(nextFrameTime-DateTime.Now).TotalMilliseconds;
Sleep(delay);
}
EDIT:
Keep in mind: IWMWritter is time insensitive as long as you feed it with SAMPLES that are properly time-stamped.

openCV: is it possible to time cvQueryFrame to synchronize with a projector?

When I capture camera images of projected patterns using openCV via 'cvQueryFrame', I often end up with an unintended artifact: the projector's scan line. That is, since I'm unable to precisely time when 'cvQueryFrame' captures an image, the image taken does not respect the constant 30Hz refresh of the projector. The result is that typical horizontal band familiar to those who have turned a video camera onto a TV screen.
Short of resorting to hardware sync, has anyone had some success with approximate (e.g., 'good enough') informal projector-camera sync in openCV?
Below are two solutions I'm considering, but was hoping this is a common enough problem that an elegant solution might exist. My less-than-elegant thoughts are:
Add a slider control in the cvWindow displaying the video for the user to control a timing offset from 0 to 1/30th second, then set up a queue timer at this interval. Whenever a frame is needed, rather than calling 'cvQueryFrame' directly, I would request a callback to execute 'cvQueryFrame' at the next firing of the timer. In this way, theoretically the user would be able to use the slider to reduce the scan line artifact, provided that the timer resolution is sufficient.
After receiving a frame via 'cvQueryFrame', examine the frame for the tell-tale horizontal band by looking for a delta in HSV values for a vertical column of pixels. Naturally this would only work when the subject being photographed contains a fiducial strip of uniform color under smoothly varying lighting.
I've used several cameras with OpenCV, most recently a Canon SLR (7D).
I don't think that your proposed solution will work. cvQueryFrame basically copies the next available frame from the camera driver's buffer (or advances a pointer in a memory mapped region, or blah according to your driver implementation).
In any case, the timing of the cvQueryFrame call has no effect on when the image was captured.
So as you suggested, hardware sync is really the only route, unless you have a special camera, like a point grey camera, which gives you explicit software control of the frame integration start trigger.
I know this has nothing to do with synchronizing but, have you tried extending the exposure time? Or doing so by intentionally "blending" two or more images into one?

windows phone 7, xna, how do I sample the touch screen more regularly

ok, so apparently xna games can only run at 30fps, which is a shame, because our game on iphone looked alot better at 60...
at any rate, because the only way you can get information about the touch screen state is to get its current state, effectively this means you can only sample the touch screen at 30 fps.
even if our game has to run at 30fps, is there any way to get higher resolution sampling from the touch screen? maybe through callbacks? or by accessing a list of touch events with time stamps?
The function you are looking for is TouchPanel.GetState. It is a simple matter of calling this function at 60Hz.
To get 60Hz you could set Game.TargetElapsedTime to 1/60th of a second. This will give you two updates to every one draw (according to Shawn Hargreaves' post here) assuming you are VSyncing at 30FPS.
If you still want your game state updates to run at 30FPS (just doing touch input at 60FPS), then you could put those updates on a different thread. Start an update going on that thread on the first call to Game.Update, and wait for it to finish on the second one, and so on.
(You should note that normally XNA input must be done on the main thread (source). I assume this applies to Phone and to touch input.)
Alternately you could replace the Game class's timing yourself entirely (calling GraphicsDevice.Present yourself). It's not easy to do, but it's possible. A good place to start is to look at the Game class in Reflector.
(Disclaimer: I haven't tried any actual Phone-based development yet, so there may be some Phone-related gotchas I am unaware of.)
The sampling rate of 30fps is set for performance reasons.
Even if you could find a way to query for touches more frequently you still couldn't update the UI at a faster rate so I'm not sure what benefit you'd get.
Before spending too much time on trying to find a solution I'd test on an actual device to see how acceptable 30fps really is.

Resources