Triple buffering using OpenGL on IOS - ios

Our app still uses OpenGLES2.0 on IOS. Yes, I know we should use Metal, but our app also works on Android. While most of the time it runs perfectly happy on 60 fps, occasionally there's a glitch, and in come cases it seems to alternate between taking one frame to render the scene, then two frames. 1, 2, 1, 2, 1, 2... Then, without changing whats rendered, it will jump back to 1,1,1 i.e. 60fps. The delay is in the first glClear after we've 'presented' the last buffer. I guess OpenGL is either still rendering the last scene, and it has to wait a whole frame to sync up again. Maybe our render/update loop takes close to or just over a whole frame - this would help explain the delay, as it 'misses' the vsync.
However, If we had triple buffering I would expect the frame times to be 1,1,2, 1,1,2, 1,1,2.. not 1,2,1,2,1,2,1. Is there a way to get the IOS to use triple buffering ?
Currently we only seem to initialise two 'buffers'
GLuint viewRenderbuffer;
GLuint viewFramebuffer;
glGenFramebuffers(1, &viewFramebuffer);
glGenRenderbuffers(1, &viewRenderbuffer);
glBindFramebuffer(GL_FRAMEBUFFER, viewFramebuffer);
glBindRenderbuffer(GL_RENDERBUFFER, viewRenderbuffer);
Then we call this after each frame is finished rendering
glBindRenderbuffer(GL_RENDERBUFFER, viewRenderbuffer);
[context presentRenderbuffer:GL_RENDERBUFFER];
Normally I would expect to call glSwapbuffers somewhere, but I expect that is inside the presentRenderbuffer call. I guess it's up to the driver then to handle double or triple buffering.
Is there a way to force triple buffering, or is this actually already being used.
Thanks
Shaun

Rendering on iOS is tripple buffered by default. This prevents frame tearing/and or stalls. Frame stuttering usually occurs when your avarage frame time is bigger than the limit imposed by the vsync interval (e.g. ~16.6ms for 60 FPS). You can check this time in XCode profiling tools or measure it yourself using system timers and draw the result in a debug HUD.
Unexpected performance drops with the same rendered contect could be due to CPU/GPU frequency managment by the OS.
Please check out this talk about Frame Pacing (6:00 and onwards)
https://developer.apple.com/videos/play/wwdc2018/612/
On a side note, performance may be bad not only because of raw load, but also synchronization issues. An example of a synchronization issue is reading back framebuffer content or improper handling of dynamic vertex/index buffers.
Improving rendering performance on mobile is a complex issue involving carefull handling of FBOs to avoid unnecessary bandwidth usage. You can use XCode profiling tools & frame capture to find the bottleneck.

Related

Webgl Upload Texture Data to the gpu without a draw call

I'm using webgl to do YUV to RGB conversions on a custom video codec.
The video has to play at 30 fps. In order to make this happen I'm doing all my math every other requestAnimationFrame.
This works great, but I noticed when profiling that uploading the textures to the gpu takes the longest amount of time.
So I uploaded the "Y" texture and the "UV" texture separately.
Now the first "requestAnimationFrame" will upload the "Y" texture like this:
gl.activeTexture(gl.TEXTURE0);
gl.bindTexture(gl.TEXTURE_2D, yTextureRef);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.LUMINANCE, textureWidth, textureHeight, 0, gl.LUMINANCE, gl.UNSIGNED_BYTE, yData);
The second "requestAnimationFrame" will upload the "UV" texture in the same way, and make a draw call to the fragment shader doing the math between them.
But this doesn't change anything in the profiler. I still show nearly 0 gpu time on the frame that uploads the "Y" texture, and the same amount of time as before on the frame that uploads the "UV" texture.
However if I add a draw call to my "Y" texture upload function, then the profiler shows the expected results. Every frame has nearly half the gpu time.
From this I'm guessing the Y texture isn't really uploaded to the gpu using the texImage2d function.
However I don't really want to draw the Y texture on the screen as it doesn't have the correct UV texture to do anything with until a frame later. So is there any way to force the gpu to upload this texture without performing a draw call?
Update
I mis-understood the question
It really depends on the driver. The problem is OpenGL/OpenGL ES/WebGL's texture API really sucks. Sucks is a technical term for 'has unintended consequences'.
The issue is the driver can't really fully upload the data until you draw because it doesn't know what things you're going to change. You could change all the mip levels in any order and any size and then fix them all in between and so until you draw it has no idea which other functions you're going to call to manipulate the texture.
Consider you create a 4x4 level 0 mip
gl.texImage2D(
gl.TEXTURE_2D,
0, // mip level
gl.RGBA,
4, // width
4, // height
...);
What memory should it allocate? 4(width) * 4(height) * 4(rgba)? But what if you call gl.generateMipmap? Now it needs 4*4*4+2*2*4+1*1*4. Ok but now you allocate an 8x8 mip on level 3. You intend to then replace levels 0 to 2 with 64x64, 32x32, 16x16 respectively but you did level 3 first. What should it do when you replace level 3 before replacing the levels above those? You then add in levels 4 8x8, 5 as 4x4, 6 as 2x2, and 7 as 1x1.
As you can see the API lets you change mips in any order. In fact I could allocate level 7 as 723x234 and then fix it later. The API is designed to not care until draw time when all the mips must be the correct size at which point they can finally allocate memory on the GPU and copy the mips in.
You can see a demonstration and test of this issue here. The test uploads mips out of order to verify that WebGL implementations correctly fail with they are not all the correct size and correctly start working once they are the correct sizes.
You can see this was arguably a bad API design.
They added gl.texStorage2D to fix it but gl.texStorage2D is not available in WebGL1 only WebGL2. gl.texStorage2D has new issues though :(
TLDR; textures get uploaded to the driver when you call gl.texImage2D but the driver can't upload to the GPU until draw time.
Possible solution: use gl.texSubImage2D since it does not allocate memory it's possible the driver could upload sooner. I suspect most drivers don't because you can use gl.texSubImage2D before drawing. Still it's worth a try
Let me also add that gl.LUMIANCE might be a bottleneck as well. IIRC DirectX doesn't have a corresponding format and neither does OpenGL Core Profile. Both support a RED only format but WebGL1 does not. So LUMIANCE has to be emulated by expanding the data on upload.
Old Answer
Unfortunately there is no way to upload video to WebGL except via texImage2D and texSubImage2D
Some browsers try to make that happen faster. I notice you're using gl.LUMINANCE. You might try using gl.RGB or gl.RGBA and see if things speed up. It's possible browsers only optimize for the more common case. On the other hand it's possible they don't optimize at all.
Two extensions what would allow using video without a copy have been proposed but AFAIK no browser as ever implemented them.
WEBGL_video_texture
WEBGL_texture_source_iframe
It's actually a much harder problem than it sounds like.
Video data can be in various formats. You mentioned YUV but there are others. Should the browser tell the app the format or should the browser convert to a standard format?
The problem with telling is lots of devs will get it wrong then a user will provide a video that is in a format they don't support
The WEBGL_video_texture extensions converts to a standard format by re-writing your shaders. You tell it uniform samplerVideoWEBGL video and then it knows it can re-write your color = texture2D(video, uv) to color = convertFromVideoFormatToRGB(texture(video, uv)). It also means they'd have to re-write shaders on the fly if you play different format videos.
Synchronization
It sounds great to get the video data to WebGL but now you have the issue that by the time you get the data and render it to the screen you've added a few frames of latency so the audio is no longer in sync.
How to deal with that is out of the scope of WebGL as WebGL doesn't have anything to do with audio but it does point out that it's not as simple as just giving WebGL the data. Once you make the data available then people will ask for more APIs to get the audio and more info so they can delay one or both and keep them in sync.
TLDR; there is no way to upload video to WebGL except via texImage2D and texSubImage2D

Wanting to ditch MTKView.currentRenderPassDescriptor

I have an occasional issue with my MTKView renderer stalling on obtaining a currentRenderPassDescriptor for 1.0s. According to the docs, this is either due the view's device not being set (it is) or there are no drawables available.
If there are no drawables available, I don't see a means of just immediately bailing or skipping that video frame. The render loop will stall for 1.0s.
Is there a workaround for this?. Any help would be appreciated.
My workflow is a bunch of kernel shader work then one final vertex shader. I could do the drawing of the final shader onto my own texture (instead of using the currentPassDescriptor), then hoodwink that texture into the view's currentDrawable -- but in the obtaining of that drawable we're back to the same stalling situation.
Should I get rid of MTKView entirely and fall back to using a CAMetalLayer instead? Again, I suspect the same stalling issues will arise. Is there a way to set the maximumDrawableCount on an MTKView like there is on CAMetalLayer?
I'm a little baffled as, according the Metal System Trace, my work is invariably completed under 5.0ms per frame on an iMac 2015 R9 M395.

Screen tearing and camera capture with Metal

To avoid writing to a constant buffer from both the gpu and cpu at the same time, Apple recommends using a triple-buffered system with the help of a semaphore to prevent the cpu getting too far ahead of the gpu (this is fine and covered in at least three Metal videos now at this stage).
However, when the constant resource is an MTLTexture and the AVCaptureVideoDataOutput delegate runs separately than the rendering loop (CADisplaylink), how can a similar triple-buffered system (as used in Apple’s sample code MetalVideoCapture) guarantee synchronization? Screen tearing (texture tearing) can be observed if you take the MetalVideoCapture code and simply render a full screen quad and change the preset to AVCaptureSessionPresetHigh (at the moment the tearing is obscured by the rotating quad and low quality preset).
I realize that the rendering loop and the captureOutput delegate method (in this case) are both on the main thread and that the semaphore (in the rendering loop) keeps the _constantDataBufferIndex integer in check (which indexes into the MTLTexture for creation and encoding), but screen tearing can still be observed, which is puzzling to me (it would make sense if the gpu writing of the texture is not the next frame after encoding but 2 or 3 frames after, but I don’t believe this to be the case). Also, just a minor point: shouldn’t the rendering loop and the captureOutput have the same frame rate for a buffered texture system so old frames aren’t rendered interleaved with recent ones.
Any thoughts or clarification on this matter would be greatly appreciated; there is another example from McZonk, which doesn’t use the triple-buffered system, but I also observed tearing with this approach (but less so). Obviously, no tearing is observed if I use waitUntilCompleted (equivalent to Open GL’s glfinish), but thats like playing an accordion with one arm tied behind your back!

glReadPixels specify resolution

I am trying to capture a screenshot on iOS from an OpenGL view using glReadPixels at half of the native resolution.
glReadPixels is quite slow on retina screens so I'd like to somehow force reading every second pixel and every second row, resulting in a non-retina screenshot (1/4 of the resolution).
I tried setting these:
glPixelStorei(GL_PACK_SKIP_PIXELS, 2);
glPixelStorei(GL_PACK_SKIP_ROWS, 2);
before calling glReadPixels but it doesn't seem to be changing absolutely anything. Instead, it just renders 1/4 of the original image because the width and height I'm passing to glReadPixels is the view's non-retina size.
Alternatively, if you know any more performant way of capturing an OpenGL screenshot, feel free to share it as well.
I don't think there's a very direct way of doing what you're looking for. As you already found out, GL_PACK_SKIP_ROWS and GL_PACK_SKIP_PIXELS do not have the functionality you intended. They only control how many rows/pixels are skipped at the start, not after each row/pixel. And I believe they control skipping in the destination memory anyway, not in the framebuffer you're reading from.
One simple approach to a partial solution would be to make a separate glReadPixels() call per row, which you can then make for every second row. You would still have to copy every second pixel from those rows, but at least it would cut the amount of data you read in half. And it does reduce the additional amount of memory to almost a quarter, since you would only store one row at full resolution. Of course you have overhead for making many more glReadPixels() calls, so it's hard to predict if this will be faster overall.
The nicer approach would be to produce a half-resolution frame that you can read directly. To do that, you could either:
If your toolkits allow it, re-render the frame at half the resolution. You could use an FBO as render target for this, with half the size of the window.
Copy the frame, while downscaling it in the process. Again, create an FBO with a render target half the size, and copy from default framebuffer to this FBO using glBlitFramebuffer().
You can also look into making the read back asynchronous by using a pixel pack buffer (see GL_PACK_BUFFER argument to glBindBuffer()). This will most likely not make the operation faster, but it allows you to continue feeding commands to the GPU while you're waiting for the glReadPixels() results to arrive. It might help you take screenshots while being less disruptive to the game play.

How does UIImageView animate so smoothly? (Or: how to get OpenGL to efficiently animate frame by frame textures)

To clarify, I know that a texture atlas improves performance when using multiple distinct images. But I'm interested in how things are done when you are not doing this.
I tried doing some frame-by-frame animation manually in custom OpenGL where each frame I bind a new texture and draw it on the same point sprite. It works, but it is very slow compared to the UIImageView ability to abstract the same. I load all the textures up front, but the rebinding is done each frame. By comparison, UIImageView accepts the individual images, not a texture atlas, so I'd imagine it is doing similarly.
These are 76 images loaded individually, not as a texture atlas, and each is about 200px square. In OpenGL, I suspect the bottleneck is the requirement to rebind a texture at every frame. But how is UIImageView doing this as I'd expect a similar bottleneck?? Is UIImageView somehow creating an atlas behind the scenes so no rebinding of textures is necessary? Since UIKit ultimately has OpenGL running beneath it, I'm curious how this must be working.
If there is a more efficient means to animate multiple textures, rather than swapping out different bound textures each frame in OpenGL, I'd like to know, as it might hint at what Apple is doing in their framework.
If I did in fact get a new frame for each of 60 frames in a second, then it would take about 1.25 seconds to animate through my 76 frames. Indeed I get that with UIImageView, but the OpenGL is taking about 3 - 4 seconds.
I would say your bottleneck is somewhere else. The openGL is more then capable doing an animation the way you are doing. Since all the textures are loaded and you just bind another one each frame there is no loading time or anything else. Consider for a comparison I have an application that can in runtime generate or delete textures and can at some point have a great amount of textures loaded on the GPU, I have to bind all those textures every frame (not 1 every frame), using all from depth buffer, stencil, multiple FBOs, heavy user input, about 5 threads bottlenecked into 1 to process all the GL code and I have no trouble with the FPS at all.
Since you are working with the iOS I suggest you run some profilers to see what code is responsible for the overhead. And if for some reason your time profiler will tell you that the line with glBindTexture is taking too long I would still say that the problem is somewhere else.
So to answer your question, it is normal and great that UIImageView does its work so smoothly and there should be no problem achieving same performance with openGL. THOUGH, there are a few things to consider at this point. How can you say that image view does not skip images, you might be setting a pointer to a different image 60 times per second but the image view might just ask itself 30 times per second to redraw and when it does just uses a current image assigned to it. On the other hand with your GL code you are forcing the application to do the redraw 60FPS regardless to if it is capable of doing so.
Taking all into consideration, there is a thing called display link that apple developers created for you. I believe it is meant for exactly what you want to do. The display link will tell you how much time has elapsed between frames and by that key you should ask yourself what texture to bind rather then trying to force them all in a time frame that might be too short.
And another thing, I have seen that if you try to present render buffer at 100 FPS on most iOS devices (might be all), you will only get 60 FPS as the method to present render buffer will pause your thread if it has been called in less then 1/60s. That being said it is rather impossible do display anything at all at 60 FPS on iOS devices and everything running 30+ FPS is considered good.
"not as a texture atlas" is the sentence that is a red flag for me.
USing a texture atlas is a good thing....the texture is loaded into memory once and then you just move the rectangle position to play the animation. It's fast because its already all in memory. Any operation which involves constantly loading and reloading new image frames is going to be slower than that.
You'd have to post source code to get any more exact an answer than that.

Resources