How to batch sprites in iOS/OpenGL ES 2.0 - ios

I have developed my own sprite library on top of OpenGL ES 2.0. Right now, I am not doing any batching of draw calls; instead, each sprite has its own VBO/VAO of four textured vertices, drawn as a triangle strip (The VAO/VBO itself is managed by the Texture atlas, so identical sprites reuse the same VAO/VBO, which is 'reference counted' and hence deleted when no sprite instances reference it).
Before drawing each sprite, I'll bind its texture, upload its uniforms/attributes to the shader (modelview matrix, opacity - Projection matrix stays constant all along), bind its Vertex Array Object (4 textured vertices + four indices), and call glDrawElements(). I do cull off-screen sprites (based on position and bounds), but still it is one draw call per sprite, even if all sprites share the same texture. The vertex positions and texture coordinates for each sprite never change.
I must say that, despite this inefficiency, I have never experienced performance issues, even when drawing many sprites on screen. I do split the sprites into opaque/non-opaque, draw the opaque ones first, and the non-opaque ones after, back to front. I have seen performance suffer only when I overdraw (tax the fill rate).
Nevertheless, the OpenGL instruments in Xcode will complain that I draw too many small meshes and that I should consolidate my geometry into less objects. And in the Unity world everyone talks about limiting the number of draw calls as if they were the plague.
So, how should I go around batching very many sprites, each with a different transform and opacity value (but the same texture), into one draw call? One thing that comes to mind is to modify the vertex data every frame, and stream it: applying the modelview matrix of each sprite to all its vertices, assembling the transformed vertices for all sprites into one mesh, and submitting it to the GPU. This approach does not solve the problem of varying opacity between sprites.
Another idea that comes to mind is to have all the textured vertices of all the sprites assembled into a single mesh (VBO), treated as 'static' (same vertex format I am using now), and a separate array with the stuff that changes per sprite every frame (transform matrix and opacity), and only stream that data each frame, and pull it/apply it on the vertex shader side. That is, have a separate array where the 'attribute' being represented is the modelview matrix/alpha for the corresponding vertices. Still have to figure out the exact implementation in terms of data format/strides etc. In any case, there is the additional complication that arises whenever a new sprite is created/destroyed, the whole mesh has to be modified...
Or perhaps there is an ideal, 'textbook' solution to this problem out there that I haven't figured out? What does cocos2d do?

When I initially started reading you post I though that each quad used a different texture (since you stated "Before drawing each sprite, I'll bind its texture") but then you said that each sprite has "the same texture".
A possible easy win is to control the way you bind your textures during the draw since each call is a burden for the OpenGL driver. If (and I am not really sure abut this from your post) you use different textures, I suggest to go for a simple texture atlas where all the sprites are inside a single picture (preferably a power of 2 texture with mipmapping) and then you take the piece of the texture you need in the fragment using texture coordinates (this is the reason they exist in the end)
If the position of the sprites change over time (of course it does) at each frame, a possible advantage would be to pack the new vertex coordinates of your sprites at each frame and draw directly from memory (possibly over VAO. VBO could cost more since you need to build it each frame? to be tested in real scenario). This would be a good call pack operation and I am pretty sure it will bust the performances.
Consider that the VAO option could be feasible since we are talking about very small amount of data and the memory bandwidth should not represent a real bottleneck (each quad I guess uses 12 floats for vertex coordinates, 8 for textures and 12 for normals, 128 byte?), it shouldn't be a big problem over VAO.
About opacity, can't you play using an uniform to your fragment shader where you play with alpha? Am I wrong with it? It should work.
I hope this helps.
Ciao,
Maurizio

Related

Get GLSL vertex shader positions back to use on cpu event collider functions

I'm using python kivy to render meshes with opengl onto a canvas. I want to return vertex data from the fragment shader so i can build a collider (to use on my cpu event listeners after doing projection and model view transforms). I can replicate the matrix multiplications on the cpu (i guess that's the easy way out), but then i would have to do the same calculations twice (not good).
The only way I can think of doing this (after some browsing) is to imprint an object id onto my rendered mesh alpha channel (wouldn't affect much if i'd keep data coding near value 1 for alpha ). And create some kind of 'color picker' on the cpu side to decode it (I'm guessing that's not hard to do using kivy).
Anyone has a better idea to deal with this? Or a better approach?
First criterion here is: do you need collision for picking or for physics simulation?
If it is for physics: you almost never want the same mesh for rendering and for physics collisions. Typically, you use a very rough approximation for the physics shape, nearly always a convex shape, or a union of convex shapes. (Colliding arbitrary concave meshes is something that no physics engine can do well, and if they attempt it at all, performance will be poor.)
If it is for the purpose of picking an object with a mouse-click: you can go two different ways for this:
You replicate the geometry on the CPU, and use the mouse-location plus camera-view to create a ray that intersects this geometry, to see what is hit first.
After rendering your scene, you read back a single pixel from the depth buffer. (The pixel that your mouse is over.) With the depth value you get back, plus camera info, you can reconstruct a corresponding 3D position in your world. Once you have a 3D location, you can query your world to see which object is the closest to this point, and you will have your hit.

Making GL_POINTS look like a 3D Rectangle

Suppose one has an array of GL_POINTS and wants to make each appear to have a distinct "height" or "depth", so instead of appearing like a flat scatter of squares they appear to be a scatter of 3D rectangles / right rectangular prisms.
Is there a technique in WebGL that will allow one to achieve this effect? One could of course use vertices that actually articulate those 3D rectangles, but my goal is to optimize for performance as I have ~100,000 of these rectangles to render, and I thought points would be the best primitive to use.
Right now I am thinking one could probably use a series of point sprites each with varying depth, then assign each point the sprite that corresponds most closely with the desired depth (effectively quantizing the depth data field). But is there a way to keep the depth field continuous?
Any pointers on this question would be greatly appreciated!
In my experience POINTS are not faster than making your own vertices. Also, if you use instanced drawing you can get away with almost the same amount of data. You need one quad and then position, width, and height for each rectangle. Not sure instancing is as fast as just making all the vertices though. Might depend on the GPU/driver
As pointed out 😄 in many other Q&As on points, the maximum point size is GPU/driver specific and allowed to be as low as 1 pixel. There are plenty of GPUs that only allow size >= 256 pixels (no idea why) and a few with only size >= 64. Yet another reason to not use POINTS
Otherwise though, POINTS always draw a square so you'd have to draw a square large enough that contains your rectangle and then in the fragment shader, discard the pixels outside of the rectangle.
That's unlikely to be good for speed though. Every pixel of the square will still need to be evaluated by the fragment shader which is slower than drawing a rectangle with vertices since then those pixels outside the rectangle are not even considered. Further, using discard in a shader is often slower than not using it. This is because, for example, things like setting the depth buffer, if there is no discard nothing needs to be checked, the depth buffer can be updated unconditionally separate from the shader. With discard the depth buffer can't be updated until the GPU knows if the shader kept or discarded the fragment.
As for making them appear 3D I'm not sure what you mean. Effectively points are just like drawing a square quad so you can put anything you want on that square. The majority of shaders on shadertoy can be adapted to draw themselves on points. I wouldn't recommend it as it would likely be slow but just pointing out that it's just a quad. Draw a texture on them, draw a procedural texture on them, draw a solid color on them, draw a procedural snail on them.
Another possible solution is you can apply a normal map to the quad and then do lighting calculations on those normals so each quad will have the correct lighting for its position relative to your light(s)

iOS OpenGL ES 2.0 VBO confusion

I'm attempting to render a large number of textured quads on the iPhone. To improve render speeds I've created a VBO that I leverage to render my objects in a single draw call. This seems to work well, but I'm new to OpenGL and have run into issues when it comes to providing a unique transform for each of my quads (ultimately I'm looking for each quad to have a custom scale, position and rotation).
After a decent amount of Googling, it appears that the standard means of handling this situation is to pass a uniform matrix to the vertex shader and to have each quad take care of rendering itself. But this approach seems to negate the purpose of the VBO, by ultimately requiring a draw call per object.
In my mind, it makes sense that each object should keep it's own model view matrix, using it to transform, scale and rotate the object as necessary. But applying separate matrices to objects in a VBO has me lost. I've considered two approaches:
Send the model view matrix to the vertex shader as a non-uniform attribute and apply it within the shader.
Or transform the vertex data before it's stored in the VBO and sent to the GPU
But the fact that I'm finding it difficult to find information on how best to handle this leads me to believe I'm confusing the issue. What's the best way of handling this?
This is the "evergreen" question (a good one) on how to optimize the rendering of many simple geometries (a quad is in fact 2 triangles, 6 vertices most of the time unless we use a strip).
Anyway, the use of VBO vs VAO in this case should not mean a significant advantage since the size of the data to be transferred on the memory buffer is rather low (32 bytes per vertex, 96 bytes per triangle, 192 per quad) which is not a big effort for nowadays memory bandwidth (yet it depends on How many quads you mean. If you have 20.000 quads per frame then it would be a problem anyway).
A possible approach could be to batch the drawing of the quads by building a new VAO at each frame with the different quads positioned in your own coordinate system. Something like shifting the quads vertices to the correct position in a "virtual" mesh origin. Then you just perform a single draw of the newly creates mesh in your VAO.
In this way, you could batch the drawing of multiple objects in fewer calls.
The problem would be if your quads need to "scale" and "rotate" and not just translate, you can compute it with CPU the actual vertices position but it would be way to costly in terms of computing power.
A simple suggestion on top of the way you transfer the meshes is to use a texture atlas for all the textures of your quads, in this way you will need a much lower (if not needed at all) texture bind operation which might be costly in rendering operations.

Avoid gap between in textures while scaling in OpenGL ES 1.1

I am using glScale() to zoom out whole game scene. But at some scales I get a gap between textures:
How can I avoid this gap?
I have already tried to put upper texture little lower. But then I get a darker line between textures (because my textures have alpha channel).
I may scale down whole scene manually in CPU (by calculating vertices for scaled textures). But in this case I can't take advantage of VBOs, because vertices will change in every frame (zooming is very dynamic in my case).
What you can suggest to avoid this gap between textures, when I scale down the scene?
I wasn't able to found a solution with textures. Thus, I have created one more texture big enough to contain two these texture I wanted to draw. I have rendered to these two texture to this additional texture (using FBO). And finally I render the scene using this big one texture.

Optimise OpenGL ES 2.0 2D drawing using dirty rectangles

Is it possible to optimise OpenGL ES 2.0 drawing by using dirty rectangles?
In my case, I have a 2D app that needs to draw a background texture (full screen on iPad), followed by the contents of several VBOs on each frame. The problem is that these VBOs can potentially contain millions of vertices, taking anywhere up to a couple of seconds to draw everything to the display. However, only a small fraction of the display would actually be updated each frame.
Is this optimisation possible, and how (or perhaps more appropriately, where) would this be implemented? Would some kind of clipping plane need to be passed into the vertex shader?
If you set an area with glViewport, clipping is adjusted accordingly. This however happens after the vertex shader stage, just before rasterization. As the GL cannot know the result of your own vertex program, it cannot sort out any vertex before applying the vertex program. After that, it does. How efficent it does depents on the actual GPU.
Thus you have to sort and split your objects to smaller (eg. rectangulary bounded) tiles and test them against the field of view by yourself for full performance.

Resources