At the moment I am using snapshot to do my picking. I change the render code to render out object ids, grab the snapshot, then take the value for the pixel under the user tap. I think this is quite inefficient though - and I'm getting reports of slowness on some ipads (my mini is fine).
Is it possible to render to the backbuffer, and use a call of glreadpixels to retrieve only the pixel under the user tap without the object-ids being rendered to the screen? I am using GLKView for my rendering. I've tried glreadpixels with my current code - and it always seems to return black. I know that the documentation for GLKView recommends only to use snapshot, but surely it is more efficient for picking to only retrieve a single pixel.
You are correct, a much better way is to render the object ids to the back buffer and read back a particular pixel (or block of pixels).
(If you're doing a lot of selection, you could even use a second offscreen renderbuffer and generate the object ids every frame in a single render pass.)
But you will have to write your own view code to allocate offscreen render buffers, depth buffers, and whatnot. GLKView is a convenience class, a high level wrapper, and the Apple doco specifically says not to mess with the underlying implementation.
Setting up your own GL render buffers isn't too difficult, and there's example code all over the place. I've used the example code on the Apple dev site and from the OpenGL SuperBible.
Actually it is quite possible to read from the backbuffer, even using GLKView. The documentation states that it is not advised - but after a bit of fiddling I got it to work. The only thing which was a problem is that glreadpixels can only take GL_RGBA as argument (not GL_RGB). So long as you ensure that glClear is called after the picking you will not get object ids rendered to the screen.
Using snapshot to do the picking on an ipad mini slowed down the app 50%. Using glReadPixels leads to no noticable slowdown at all. You could do this by allocating an extra framebuffer - but I don't think it is necessary.
Related
I am trying to get the color of pixels/points (doesn't matter for my use case) of the current screen content in iOS. So, for example, I want to get the color of each pixel from screen coordinates 0, 0 to 10, 10. Additionally, the operation should be as fast as possible, since I will do it at regular intervals as a Timer. The timer should run multiple times a second, but it doesn't have to be 25fps.
Acceptable solutions:
Anything that returns the current color of a pixel or point on screen at a given position, doesn't produce noticable UI lag and doesn't turn my app into a battery hog. The result might be a CGImage, UIImage, buffer array, I don't really care. I also don't care if the solution uses additional Apple frameworks, such as OpenGL or Metal.
It is also acceptable if the solution does not capture system-UI, like the statusbar. Capturing the content of my app is sufficient.
Things I tried so far:
Using UIWindows drawHierarchy(in:afterScreenUpdates:). This method turns out to be way too slow. On my iPad Pro, it took 0.25s which causes noticable UI lag.
Using CALayers render(in:), but this method does not render UIVisualEffectViews, which I require. Also, while faster than drawHierarchy, I measured it at about 0.04s, which still causes noticable lag in the UI.
Use OpenGL, as for example described here. I don't know anything about OpenGL, so I might be using this wrong, but I never got it to return anything other than a black image.
To avoid writing to a constant buffer from both the gpu and cpu at the same time, Apple recommends using a triple-buffered system with the help of a semaphore to prevent the cpu getting too far ahead of the gpu (this is fine and covered in at least three Metal videos now at this stage).
However, when the constant resource is an MTLTexture and the AVCaptureVideoDataOutput delegate runs separately than the rendering loop (CADisplaylink), how can a similar triple-buffered system (as used in Apple’s sample code MetalVideoCapture) guarantee synchronization? Screen tearing (texture tearing) can be observed if you take the MetalVideoCapture code and simply render a full screen quad and change the preset to AVCaptureSessionPresetHigh (at the moment the tearing is obscured by the rotating quad and low quality preset).
I realize that the rendering loop and the captureOutput delegate method (in this case) are both on the main thread and that the semaphore (in the rendering loop) keeps the _constantDataBufferIndex integer in check (which indexes into the MTLTexture for creation and encoding), but screen tearing can still be observed, which is puzzling to me (it would make sense if the gpu writing of the texture is not the next frame after encoding but 2 or 3 frames after, but I don’t believe this to be the case). Also, just a minor point: shouldn’t the rendering loop and the captureOutput have the same frame rate for a buffered texture system so old frames aren’t rendered interleaved with recent ones.
Any thoughts or clarification on this matter would be greatly appreciated; there is another example from McZonk, which doesn’t use the triple-buffered system, but I also observed tearing with this approach (but less so). Obviously, no tearing is observed if I use waitUntilCompleted (equivalent to Open GL’s glfinish), but thats like playing an accordion with one arm tied behind your back!
I am trying to capture a screenshot on iOS from an OpenGL view using glReadPixels at half of the native resolution.
glReadPixels is quite slow on retina screens so I'd like to somehow force reading every second pixel and every second row, resulting in a non-retina screenshot (1/4 of the resolution).
I tried setting these:
glPixelStorei(GL_PACK_SKIP_PIXELS, 2);
glPixelStorei(GL_PACK_SKIP_ROWS, 2);
before calling glReadPixels but it doesn't seem to be changing absolutely anything. Instead, it just renders 1/4 of the original image because the width and height I'm passing to glReadPixels is the view's non-retina size.
Alternatively, if you know any more performant way of capturing an OpenGL screenshot, feel free to share it as well.
I don't think there's a very direct way of doing what you're looking for. As you already found out, GL_PACK_SKIP_ROWS and GL_PACK_SKIP_PIXELS do not have the functionality you intended. They only control how many rows/pixels are skipped at the start, not after each row/pixel. And I believe they control skipping in the destination memory anyway, not in the framebuffer you're reading from.
One simple approach to a partial solution would be to make a separate glReadPixels() call per row, which you can then make for every second row. You would still have to copy every second pixel from those rows, but at least it would cut the amount of data you read in half. And it does reduce the additional amount of memory to almost a quarter, since you would only store one row at full resolution. Of course you have overhead for making many more glReadPixels() calls, so it's hard to predict if this will be faster overall.
The nicer approach would be to produce a half-resolution frame that you can read directly. To do that, you could either:
If your toolkits allow it, re-render the frame at half the resolution. You could use an FBO as render target for this, with half the size of the window.
Copy the frame, while downscaling it in the process. Again, create an FBO with a render target half the size, and copy from default framebuffer to this FBO using glBlitFramebuffer().
You can also look into making the read back asynchronous by using a pixel pack buffer (see GL_PACK_BUFFER argument to glBindBuffer()). This will most likely not make the operation faster, but it allows you to continue feeding commands to the GPU while you're waiting for the glReadPixels() results to arrive. It might help you take screenshots while being less disruptive to the game play.
To clarify, I know that a texture atlas improves performance when using multiple distinct images. But I'm interested in how things are done when you are not doing this.
I tried doing some frame-by-frame animation manually in custom OpenGL where each frame I bind a new texture and draw it on the same point sprite. It works, but it is very slow compared to the UIImageView ability to abstract the same. I load all the textures up front, but the rebinding is done each frame. By comparison, UIImageView accepts the individual images, not a texture atlas, so I'd imagine it is doing similarly.
These are 76 images loaded individually, not as a texture atlas, and each is about 200px square. In OpenGL, I suspect the bottleneck is the requirement to rebind a texture at every frame. But how is UIImageView doing this as I'd expect a similar bottleneck?? Is UIImageView somehow creating an atlas behind the scenes so no rebinding of textures is necessary? Since UIKit ultimately has OpenGL running beneath it, I'm curious how this must be working.
If there is a more efficient means to animate multiple textures, rather than swapping out different bound textures each frame in OpenGL, I'd like to know, as it might hint at what Apple is doing in their framework.
If I did in fact get a new frame for each of 60 frames in a second, then it would take about 1.25 seconds to animate through my 76 frames. Indeed I get that with UIImageView, but the OpenGL is taking about 3 - 4 seconds.
I would say your bottleneck is somewhere else. The openGL is more then capable doing an animation the way you are doing. Since all the textures are loaded and you just bind another one each frame there is no loading time or anything else. Consider for a comparison I have an application that can in runtime generate or delete textures and can at some point have a great amount of textures loaded on the GPU, I have to bind all those textures every frame (not 1 every frame), using all from depth buffer, stencil, multiple FBOs, heavy user input, about 5 threads bottlenecked into 1 to process all the GL code and I have no trouble with the FPS at all.
Since you are working with the iOS I suggest you run some profilers to see what code is responsible for the overhead. And if for some reason your time profiler will tell you that the line with glBindTexture is taking too long I would still say that the problem is somewhere else.
So to answer your question, it is normal and great that UIImageView does its work so smoothly and there should be no problem achieving same performance with openGL. THOUGH, there are a few things to consider at this point. How can you say that image view does not skip images, you might be setting a pointer to a different image 60 times per second but the image view might just ask itself 30 times per second to redraw and when it does just uses a current image assigned to it. On the other hand with your GL code you are forcing the application to do the redraw 60FPS regardless to if it is capable of doing so.
Taking all into consideration, there is a thing called display link that apple developers created for you. I believe it is meant for exactly what you want to do. The display link will tell you how much time has elapsed between frames and by that key you should ask yourself what texture to bind rather then trying to force them all in a time frame that might be too short.
And another thing, I have seen that if you try to present render buffer at 100 FPS on most iOS devices (might be all), you will only get 60 FPS as the method to present render buffer will pause your thread if it has been called in less then 1/60s. That being said it is rather impossible do display anything at all at 60 FPS on iOS devices and everything running 30+ FPS is considered good.
"not as a texture atlas" is the sentence that is a red flag for me.
USing a texture atlas is a good thing....the texture is loaded into memory once and then you just move the rectangle position to play the animation. It's fast because its already all in memory. Any operation which involves constantly loading and reloading new image frames is going to be slower than that.
You'd have to post source code to get any more exact an answer than that.
I'm writing a 3d modeling application in D3D9 that I'd like to make as broadly compatible as possible. This means using few hardware-dependent features, i.e. multisampling. However, while the realtime render doesn't need to be flawless, I do need to provide nice-looking screen captures, which without multisampling, look quite aliased and poor.
To produce my screen captures, I create a temporary surface in memory, render the scene to it once, then save it to a file. My first thought of how I could achieve an antialiased capture was to create my off-screen stencilsurface as multisampled, but of course DX wouldn't allow that since the device itself had been initialized with D3DMULTISAMPLE_NONE.
To start off, here's a sample of exactly how I create the screencapture. I know that it'd be simpler to just save the backbuffer of an already-rendered frame, however I need the ability to save images of dimension different than the actual render window - which is why I do it this way. Error checking, code for restoring state, and releasing resource are ommitted here for brevity. m_d3ddev is my LPDIRECT3DDEVICE9.
//Get the current pp
LPDIRECT3DSWAPCHAIN9 sc;
D3DPRESENT_PARAMETERS pp;
m_d3ddev->GetSwapChain(0, &sc);
sc->GetPresentParameters(&pp);
//Create a new surface to which we'll render
LPDIRECT3DSURFACE9 ScreenShotSurface= NULL;
LPDIRECT3DSURFACE9 newDepthStencil = NULL;
LPDIRECT3DTEXTURE9 pRenderTexture = NULL;
m_d3ddev->CreateDepthStencilSurface(_Width, _Height, pp.AutoDepthStencilFormat, pp.MultiSampleType, pp.MultiSampleQuality, FALSE, &newDepthStencil, NULL );
m_d3ddev->SetDepthStencilSurface( newDepthStencil );
m_d3ddev->CreateTexture(_Width, _Height, 1, D3DUSAGE_RENDERTARGET, pp.BackBufferFormat, D3DPOOL_DEFAULT, &pRenderTexture, NULL);
pRenderTexture->GetSurfaceLevel(0,&ScreenShotSurface);
//Render the scene to the new surface
m_d3ddev->SetRenderTarget(0, ScreenShotSurface);
RenderFrame();
//Save the surface to a file
D3DXSaveSurfaceToFile(_OutFile, D3DXIFF_JPG, ScreenShotSurface, NULL, NULL);
You can see the call to CreateDepthStencilSurface(), which is where I was hoping I could replace pp.MultiSampleType with i.e. D3DMULTISAMPLE_4_SAMPLES, but that didn't work.
My next thought was to create an entirely different LPDIRECT3DDEVICE9 as a D3DDEVTYPE_REF, which always supports D3DMULTISAMPLE_4_SAMPLES (regardless of the video card). However, all of my resources (meshes, textures) have been loaded into m_d3ddev, my HAL device, thus I couldn't use them for rendering the scene under the REF device. Note that resources can be shared between devices under Direct3d9ex (Vista), but I'm working on XP. Since there are quite a lot of resources, reloading everything to render this one frame, then unloading them, is too time-inefficient for my application.
I looked at other options for antialiasing the image post-capture (i.e. 3x3 blur filter), but they all generated pretty crappy results, so I'd really like to try and get an antialiased scene right out of D3D if possible....
Any wisdom or pointers would be GREATLY appreciated...
Thanks!
Supersampling by either rendering to a larger buffer and scaling down or combining jittered buffers is probably your best bet. Combining multiple jittered buffers should give you the best quality for a given number of samples (better than the regular grid from simply rendering an equivalent number of samples at a multiple of the resolution and scaling down) but has the extra overhead of multiple rendering passes. It has the advantage of not being limited by the maximum supported size of your render target though and allows you to choose pretty much an arbitrary level of AA (though you'll have to watch out for precision issues if combining many jittered buffers).
The article "Antialiasing with Accumulation Buffer" at opengl.org describes how to modify your projection matrix for jittered sampling (OpenGL but the math is basically the same). The paper "Interleaved Sampling" by Alexander Keller and Wolfgang Heidrich talks about an extension of the technique that gives you a better sampling pattern at the expense of even more rendering passes. Sorry about not providing links - as a new user I can only post one link per answer. Google should find them for you.
If you want to go the route of rendering to a larger buffer and down sampling but don't want to be limited by the maximum allowed render target size then you can generate a tiled image using off center projection matrices as described here.
You could always render to a texture that is twice the width and height (ie 4x the size) and then supersample it down.
Admittedly you'd still get problems if the card can't create a texture 4x the size of the back buffer ...
Edit: There is another way that comes to mind.
If you repeat the frame n-times with tiny jitters to the view matrix you will be able to generate as many images as you like which you can then add together afterwards to form a very highly anti-aliased image. The bonus is, it will work on any machine that can render the image. It is, obviously, slower though. Still 256xAA really does look good when you do this!
This article http://msdn.microsoft.com/en-us/library/bb172266(VS.85).aspx seems to imply that you can use the render state flag D3DRS_MULTISAMPLEANTIALIAS to control this. Can you create your device with antialiasing enabled but turn it off for screen rendering and on for your offscreen rendering using this render state flag?
I've not tried this myself though.