Screen tearing and camera capture with Metal - ios

To avoid writing to a constant buffer from both the gpu and cpu at the same time, Apple recommends using a triple-buffered system with the help of a semaphore to prevent the cpu getting too far ahead of the gpu (this is fine and covered in at least three Metal videos now at this stage).
However, when the constant resource is an MTLTexture and the AVCaptureVideoDataOutput delegate runs separately than the rendering loop (CADisplaylink), how can a similar triple-buffered system (as used in Apple’s sample code MetalVideoCapture) guarantee synchronization? Screen tearing (texture tearing) can be observed if you take the MetalVideoCapture code and simply render a full screen quad and change the preset to AVCaptureSessionPresetHigh (at the moment the tearing is obscured by the rotating quad and low quality preset).
I realize that the rendering loop and the captureOutput delegate method (in this case) are both on the main thread and that the semaphore (in the rendering loop) keeps the _constantDataBufferIndex integer in check (which indexes into the MTLTexture for creation and encoding), but screen tearing can still be observed, which is puzzling to me (it would make sense if the gpu writing of the texture is not the next frame after encoding but 2 or 3 frames after, but I don’t believe this to be the case). Also, just a minor point: shouldn’t the rendering loop and the captureOutput have the same frame rate for a buffered texture system so old frames aren’t rendered interleaved with recent ones.
Any thoughts or clarification on this matter would be greatly appreciated; there is another example from McZonk, which doesn’t use the triple-buffered system, but I also observed tearing with this approach (but less so). Obviously, no tearing is observed if I use waitUntilCompleted (equivalent to Open GL’s glfinish), but thats like playing an accordion with one arm tied behind your back!

Related

Fast display of image using openCV

I have written an image processing application using Visual C++ forms and OpenCV on a windows machine. Everything seems to work ok, but displaying the images is very slow - only a few fps. I would like to be able to get to 30 or so. I am currently using the standard imshow(...) followed by waitkey(1).
My question is: Is there a better (i.e. faster) way to get an image from memory to the monitor.
The Mat structure used by openCV is essentially a fancy header pointing to a contiguous block of unsigned char values.
Edit:
I tested my code with the VS2013 profiler and it claims that I am spending 50% of the execution time in imshow/waitkey.
I've seen several discussions on this in the OpenCV Q/A forum and they always end with "you shouldn't be using imshow except for debugging" but nobody is suggesting anything else to use, so I thought I'd try here.
guy
Without seeing what you have, here is the approach I would take to achieve what you want.
Have a dedicated thread for frame acquisition from the camera. Insert the acquired frames into a synchronized queue, that is consumed by:
Image processing thread. Takes frames from the queue, processes them into images suitable for display. It changes a synchronized output image, and notifies GUI about it.
Main (GUI) thread is only dedicated to display. When it is notified of an image update, it swaps the synchronized output image with its current working image. (To avoid copying and extra allocations, we just reuse those two image buffers.) Then it invalidates the window. In a WM_PAINT handler, it then displays the image using BitBlt.
Some notes:
Minimize allocation/deallocation of buffers. For acquisition, you could have a pre-allocated pool of buffers to cycle through.
Prepare the output images in format and size that suit display.
Keep track of the number of frames in the queue and set some upper limit. Define an algorithm for dropping excess frames, so that you don't run out of memory and don't lag too much.
If you just want to ditch the sleep in waitKey and want something simpler, have a look at this question
Instrument your code -- add timing of the crucial parts using high resolution timer. Log them, and/or keep statistics, history.

How does UIImageView animate so smoothly? (Or: how to get OpenGL to efficiently animate frame by frame textures)

To clarify, I know that a texture atlas improves performance when using multiple distinct images. But I'm interested in how things are done when you are not doing this.
I tried doing some frame-by-frame animation manually in custom OpenGL where each frame I bind a new texture and draw it on the same point sprite. It works, but it is very slow compared to the UIImageView ability to abstract the same. I load all the textures up front, but the rebinding is done each frame. By comparison, UIImageView accepts the individual images, not a texture atlas, so I'd imagine it is doing similarly.
These are 76 images loaded individually, not as a texture atlas, and each is about 200px square. In OpenGL, I suspect the bottleneck is the requirement to rebind a texture at every frame. But how is UIImageView doing this as I'd expect a similar bottleneck?? Is UIImageView somehow creating an atlas behind the scenes so no rebinding of textures is necessary? Since UIKit ultimately has OpenGL running beneath it, I'm curious how this must be working.
If there is a more efficient means to animate multiple textures, rather than swapping out different bound textures each frame in OpenGL, I'd like to know, as it might hint at what Apple is doing in their framework.
If I did in fact get a new frame for each of 60 frames in a second, then it would take about 1.25 seconds to animate through my 76 frames. Indeed I get that with UIImageView, but the OpenGL is taking about 3 - 4 seconds.
I would say your bottleneck is somewhere else. The openGL is more then capable doing an animation the way you are doing. Since all the textures are loaded and you just bind another one each frame there is no loading time or anything else. Consider for a comparison I have an application that can in runtime generate or delete textures and can at some point have a great amount of textures loaded on the GPU, I have to bind all those textures every frame (not 1 every frame), using all from depth buffer, stencil, multiple FBOs, heavy user input, about 5 threads bottlenecked into 1 to process all the GL code and I have no trouble with the FPS at all.
Since you are working with the iOS I suggest you run some profilers to see what code is responsible for the overhead. And if for some reason your time profiler will tell you that the line with glBindTexture is taking too long I would still say that the problem is somewhere else.
So to answer your question, it is normal and great that UIImageView does its work so smoothly and there should be no problem achieving same performance with openGL. THOUGH, there are a few things to consider at this point. How can you say that image view does not skip images, you might be setting a pointer to a different image 60 times per second but the image view might just ask itself 30 times per second to redraw and when it does just uses a current image assigned to it. On the other hand with your GL code you are forcing the application to do the redraw 60FPS regardless to if it is capable of doing so.
Taking all into consideration, there is a thing called display link that apple developers created for you. I believe it is meant for exactly what you want to do. The display link will tell you how much time has elapsed between frames and by that key you should ask yourself what texture to bind rather then trying to force them all in a time frame that might be too short.
And another thing, I have seen that if you try to present render buffer at 100 FPS on most iOS devices (might be all), you will only get 60 FPS as the method to present render buffer will pause your thread if it has been called in less then 1/60s. That being said it is rather impossible do display anything at all at 60 FPS on iOS devices and everything running 30+ FPS is considered good.
"not as a texture atlas" is the sentence that is a red flag for me.
USing a texture atlas is a good thing....the texture is loaded into memory once and then you just move the rectangle position to play the animation. It's fast because its already all in memory. Any operation which involves constantly loading and reloading new image frames is going to be slower than that.
You'd have to post source code to get any more exact an answer than that.

iOS OpenGL ES - Only draw on request

I'm using OpenGL ES to write a custom UI framework on iOS. The use case is for an application, as in something that won't be updating on a per-frame basis (such as a game). From what I can see so far, it seems that the default behavior of the GLKViewController is to redraw the screen at a rate of about 30fps. It's typical for UI to only redraw itself when necessary to reduce resource usage, and I'd like to not drain extra battery power by utilizing the GPU while the user isn't doing anything.
I tried only clearing and drawing the screen once as a test, and got a warning from the profiler saying that an uninitialized color buffer was being displayed.
Looking into it, I found this documentation: http://developer.apple.com/library/ios/#DOCUMENTATION/iPhone/Reference/EAGLDrawable_Ref/EAGLDrawable/EAGLDrawable.html
The documentation states that there is a flag, kEAGLDrawablePropertyRetainedBacking, which when set to YES, will allow the backbuffer to retain things drawn to it in the previous frame. However, it also states that it isn't recommended and cause performance and memory issues, which is exactly what I'm trying to avoid in the first place.
I plan to try both ways, drawing every frame and not, but I'm curious if anyone has encountered this situation. What would you recommend? Is it not as big a deal as I assume it is to re-draw everything 30 times per frame?
In this case, you shouldn't use GLKViewController, as its very purpose is to provide a simple animation timer on the main loop. Instead, your view can be owned by any other subclass of UIViewController (including one of your own creation), and you can rely on the usual setNeedsDisplay/drawRect system used by all other UIKit views.
It's not the backbuffer that retains the image, but a separate buffer. Possibly a separate buffer created specifically for your view.
You can always set paused on the GLKViewController to pause the rendering loop.

Is iOS glGenerateMipmap synchronous, or is it possibly asynchronous?

I'm developing an iPad app that uses large textures in OpenGL ES. When the scene first loads I get a large black artifact on the ceiling for a few frames, as seen in the picture below. It's as if higher levels of the mipmap have not yet been filled in. On subsequent frames, the ceiling displays correctly.
This problem only began showing up when I started using mipmapping. One possible explanation is that the glGenerateMipmap() call does its work asynchronously, spawning some mipmap creation worker (in a separate process, or perhaps in the GPU) and returning.
Is this possible, or am I barking up the wrong tree?
Within a single context, all operations will appear to execute strictly in order. However, in your most recent reply, you mentioned using a second thread. To do that, you must have created a second shared context: it is always illegal to re-enter an OpenGL context. If already using a shared context, there are still some synchronization rules you must follow, documented at http://developer.apple.com/library/ios/ipad/#DOCUMENTATION/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/WorkingwithOpenGLESContexts/WorkingwithOpenGLESContexts.html
It should be synchronous; OpenGL does not in itself have any real concept of threading (excepting the implicit asynchronous dialogue between CPU and GPU).
A good way to diagnose would be to switch to GL_LINEAR_MIPMAP_LINEAR. If it's genuinely a problem with lower resolution mip maps not arriving until later then you'll see the troublesome areas on the ceiling blend into one another rather than the current black-or-correct effect.
A second guess, based on the output, would be some sort of depth buffer clearing issue.
I followed #Tommy's suggestion and switched to GL_LINEAR_MIPMAP_LINEAR. Now the black-or-correct effect changed to a fade between correct and black.
I guess that although we all know that OpenGL is a pipeline (and therefore asynchronous unless you are retrieving state or explicity synchronizing), we tend to forget it. I certainly did in this case, where I was not drawing, but loading and setting up textures.
Once I confirmed the nature of the problem, I added a glFinish() after loading all my textures, and the problem went away. (Btw, my draw loop is in the foreground and my texture loading loop - because it is so time consuming and would impair interactivity - is in the background. Also, since this may vary between platforms, I'm using iOS5 on an iPad 2)

iOS: playing a frame-by-frame greyscale animation in a custom colour

I have a 32 frame greyscale animation of a diamond exploding into pieces (ie 32 PNG images # 1024x1024)
my game consists of 12 separate colours, so I need to perform the animation in any desired colour
this I believe rules out any Apple frameworks, also it rules out a lot of public code for animating frame by frame in iOS.
what are my potential solution paths?
these are the best SO links I have found:
Faster iPhone PNG Animations
frame by frame animation
Is it possible using video as texture for GL in iOS?
that last one just shows it is may be possible to load an image into a GL texture each frame ( he is doing it from the camera, so if I have everything stored in memory, that should be even faster )
I can see these options ( listed laziest first, most optimised last )
option A
each frame (courtesy of CADisplayLink), load the relevant image from file into a texture, and display that texture
I'm pretty sure this is stupid, so onto option B
option B
preload all images into memory
then as per above, only we load from memory rather than from file
I think this is going to be the ideal solution, can anyone give it the thumbs up or thumbs down?
option C
preload all of my PNGs into a single GL texture of the maximum size, creating a texture Atlas. each frame, set the texture coordinates to the rectangle in the Atlas for that frame.
while this is potentially a perfect balance between coding efficiency and performance efficiency, the main problem here is losing resolution; on older iOS devices maximum texture size is 1024x1024. if we are cramming 32 frames into this ( really this is the same as cramming 64 ) we would be at 128x128 for each frame. if the resulting animation is close to full screen on the iPad this isn't going to hack it
option D
instead of loading into a single GL texture, load into a bunch of textures
moreover, we can squeeze 4 images into a single texture using all four channels
I baulk at the sheer amount of fiddly coding required here. My RSI starts to tingle even thinking about this approach
I think I have answered my own question here, but if anyone has actually done this or can see the way through, please answer!
If something higher performance than (B) is needed, it looks like the key is glTexSubImage2D http://www.opengl.org/sdk/docs/man/xhtml/glTexSubImage2D.xml
Rather than pull across one frame at a time from memory, we could arrange say 16 512x512x8-bit greyscale frames contiguously in memory, send this across to GL as a single 1024x1024x32bit RGBA texture, and then split it within GL using the above function.
This would mean that we are performing one [RAM->VRAM] transfer per 16 frames rather than per one frame.
Of course, for more modern devices we could get 64 instead of 16, since more recent iOS devices can handle 2048x2048 textures.
I will first try technique (B) and leave it at that if it works ( I don't want to over code ), and look at this if needed.
I still can't find any way to query how many GL textures it is possible to hold on the graphics chip. I have been told that when you try to allocate memory for a texture, GL just returns 0 when it has run out of memory. however to implement this properly I would want to make sure that I am not sailing close to the wind re: resources... I don't want my animation to use up so much VRAM that the rest of my rendering fails...
You would be able to get this working just fine with CoreGraphics APIs, there is no reason to deep dive into OpenGL for a simple 2D problem like this. For the general approach you should take to creating colored frames from a grayscale frame, see colorizing-image-ignores-alpha-channel-why-and-how-to-fix. Basically, you need to use CGContextClipToMask() and then render a specific color so that what is left is the diamond colored in with the specific color you have selected. You could do this at runtime, or you could do it offline and create 1 video for each of the colors you want to support. It is be easier on your CPU if you do the operation N times and save the results into files, but modern iOS hardware is much faster than it used to be. Beware of memory usage issues when writing video processing code, see video-and-memory-usage-on-ios-devices for a primer that describes the problem space. You could code it all up with texture atlases and complex openGL stuff, but an approach that makes use of videos would be a lot easier to deal with and you would not need to worry so much about resource usage, see my library linked in the memory post for more info if you are interested in saving time on the implementation.

Resources