OpenGL slows down when rendering nearby objects on top of others - ios

I am writing an iOS app using OpenGL ES 2.0 to render a number of objects to the screen.
Currently, those objects are simple shapes (squares, spheres, and cylinders).
When none of the objects overlap each other, the program runs smoothly at 30 fps.
My problem arises when I add objects that appear behind the rest of my models (a background rectangle, for example). When I attempt to draw a background rectangle, I can only draw objects in front of it that take up less than half the screen. Any larger than that and the frame rate drops to between 15 and 20 fps.
As it stands, all of my models, including the background, are drawn with the following code:
- (void)drawSingleModel:(Model *)model
{
//Create a model transform matrix.
CC3GLMatrix *modelView = [CC3GLMatrix matrix];
//Transform model view
// ...
//Pass matrix to shader.
glUniformMatrix4fv(_modelViewUniform, 1, 0, modelView.glMatrix);
//Bind the correct buffers to openGL.
glBindBuffer(GL_ARRAY_BUFFER, [model vertexBuffer]);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, [model indexBuffer]);
glVertexAttribPointer(_positionSlot, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), 0);
glVertexAttribPointer(_colorSlot, 4, GL_FLOAT, GL_FALSE, sizeof(Vertex), (GLvoid*) (sizeof(float) * 3));
//Load vertex texture coordinate attributes into the texture buffer.
glVertexAttribPointer(_texCoordSlot, 2, GL_FLOAT, GL_FALSE, sizeof(Vertex), (GLvoid*) (sizeof(float) * 7));
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, [model textureIndex]);
glUniform1i(_textureUniform, 0);
glDrawElements([model drawMode], [model numIndices], GL_UNSIGNED_SHORT, 0);
}
This code is called from my draw method, which is defined as follows:
- (void)draw
{
glUseProgram(_programHandle);
//Perform OpenGL rendering here.
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA);
glEnable(GL_BLEND);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glEnable(GL_DEPTH_TEST);
glEnable(GL_CULL_FACE);
_camera = [CC3GLMatrix matrix];
//Camera orientation code.
//...
//Pass the camera matrix to the shader program.
glUniformMatrix4fv(_projectionUniform, 1, 0, _camera.glMatrix);
glViewport(0, 0, self.frame.size.width, self.frame.size.height);
//Render the background.
[self drawSingleModel:_background];
//Render the objects.
for(int x = 0; x < [_models count]; ++x)
{
[self drawSingleModel:[_models objectAtIndex:x]];
}
//Send the contents of the render buffer to the UI View.
[_context presentRenderbuffer:GL_RENDERBUFFER];
}
I found that by changing the render order as follows:
for(int x = 0; x < [_models count]; ++x)
{
[self drawSingleModel:[_models objectAtIndex:x]];
}
[self drawSingleModel:_background];
my frame rate when rendering on top of the background is 30 fps.
Of course, the slowdown still occurs if any objects in _models must render in front of each other. Additionally, rendering in this order causes translucent and transparent objects to be drawn black.
I'm still somewhat new to OpenGL, so I don't quite know where my problem lies. My assumption is that there is a slowdown in performing depth testing, and I also realize I'm working on a mobile device. But I can't believe that iOS devices are simply too slow to do this. The program is only rendering 5 models, with around 180 triangles each.
Is there something I'm not seeing, or some sort of workaround for this?
Any suggestions or pointers would be greatly appreciated.

You're running in one of the peculiarities of mobile GPUs: Those things (except the NVidia Tegra) don't do depth testing for hidden surface removal. Most mobile GPUs, including the one in the iPad are tile based rasterizers. The reason for this is to save memory bandwidth, because memory access is actually a power intensive operation. In the power constrained environment of a mobile device reducing required memory bandwidth gains significant battery lifetime.
Tile based renderers split up the viewport into a number of tiles. When sending geometry into it, it is split into the tiles and then for each tile it is intersected with the the geometry already in the tile. Most of the time the tile is covered by only a single primitive. If the incoming primitive happens to be in front of the already present geometry it replaces it. If there's a cutting intersection a new edge is added. Only if a certain threshold of number of edges is reached, that single tile will switch to depth testing mode.
Only at synchronization points the prepared tiles are rasterized, then.
Now it's obvious why overlapping objects reduce rendering performance: The more primitives overlap, the more preprocessing has to be done to setup the tiles.

See "transparency sorting"/"alpha sorting".
I suspect the slowness you're seeing is largely due to "overdraw", i.e. framebuffer pixels being drawn more than once. This is worst when you draw the scene back-to-front, since the depth test always passes. While the iPhone 4/4S/5 may have a beefy GPU, last I checked the memory bandwidth was pretty terrible (I don't know how big the GPU cache is).
If you render front-to-back, the problem is that transparent pixels still write to the depth buffer, causing them to occlude polys behind them. You can reduce this slightly (but only slightly) using the alpha test.
The simple solution: Render opaque polys approximately front-to-back and then transparent polys back-to-front. This may mean making two passes through your scene, and ideally you want to sort the transparent polys which isn't that easy to do well.
I think it's also possible (in principle) to render everything front-to-back and perform alpha testing on the destination alpha, but I don't think OpenGL supports this.

Related

VAO + VBOs logic for data visualization (boost graph)

I'm using the Boost Graph Library to organize points linked by edges in a graph, and now I'm working on their display.
I'm a newbie in OpenGL ES 2/GLKit and Vertex Array Objects / Vertex Buffer Objects. I followed this tutorial which is really good, but at the end of what I guess I should do is :
Create vertices only once for a "model "instance of a Shape class (the "sprite" representing my boost point position) ;
Use this model to feed VBOs ;
Bind VBOs to a unique VAO ;
Draw everything in a single draw call, changing the matrix for each "sprite".
I've read that accessing VBOs is really bad for performances, and that I should use swapping VBOs.
My questions are :
is the matrix translation/scaling/rotation possible in a single call ?
then, if it is: is my logic good ?
finally: it would be great to have some code examples :-)
If you just want to draw charts, there are much easier libraries to use besides OpenGL ES. But assuming you have your reasons:
Just take a stab at what you've described and test it. If it's good enough then congratulations: you're done.
You don't mention how many graphs, how many points per graph, how often the points are modified, and the frame rate you desire.
If you're updating a few hundred vertices, and they don't change frequently, you might not even need VBOs. Recent hardware can render a lot of sprites even without them. Depends on how many verts and how often they change.
To start, try this:
// Bind the shader
glUseProgram(defaultShaderProgram);
// Set the projection (camera) matrix.
glUniformMatrix4fv(uProjectionMatrix, 1, GL_FALSE, (GLfloat*)projectionMatrix[0]);
for ( /* each chart */ )
{
// Set the sprite (scale/rotate/translate) matrix.
glUniformMatrix4fv(uModelViewMatrix, 1, GL_FALSE, (GLfloat*)spriteViewMatrix[0]);
// Set the vertices.
glVertexAttribPointer(ATTRIBUTE_VERTEX_POSITION, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), &pVertices->x));
glVertexAttribPointer(ATTRIBUTE_VERTEX_DIFFUSE, 4, GL_UNSIGNED_BYTE, GL_TRUE, sizeof(Vertex), &pVertices->color));
// Render. Assumes your shader does not use a texture,
// since we did not set one.
glDrawArrays(GL_TRIANGLES, 0, numVertices);
}

render a small texture every frame and then scale it up?

Using OpenGL on iOS, is it possible to update a small texture (by setting each pixel individually) and then scale it up to fill the screen (60 frames per second)?
You should be able to update the content of a texture using glTexImage2D.
Untested example:
GLubyte data[1024]; // 32x32 (power of two)
for (int i=0; i<1024; i+=4) {
// write a red pixel (RGBA)
data[i] = 255;
data[i+1] = 0;
data[i+2] = 0;
data[i+3] = 255;
}
glBindTexture(GL_TEXTURE_2D, my_texture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, 32, 32, 0, GL_RGBA, GL_UNSIGNED_BYTE, data);
// then simply render a quad with this texture.
In general the answer is yes, it is possible. But it might depend on what you need to draw.
Since you don't provide more details I will describe the general approach:
Bind a texture to a framebuffer (Here is a good explanation with code on how to do that. See "Example 6.10. Initialize() for Supersampling" code example)
Now draw what you need in the same way as you would on the screen (transformations, modelview matrix etc). If you need pixel accuracy (to modify each and every pixel) you might consider using an orthographic projection. If this is possible or not, depends on what you need to draw. All this drawing will be performed on your texture achieving the "update the texture" part.
Bind the normal framebuffer that you use, to draw on the screen. Draw a rectangle (possibly using orthographic projection again) that uses the texture from the previous step. You can scale this rectangle to fill the screen.
If the above approach would be able to achieve a 60 fps, depends on your target device and the scene you need to render.
Hope that helps

optimizing openGL ES 2.0 2D texture output and framerate

I was hoping someone can help me make some progress in some texture benchmarks I'm doing in OpenGL ES 2.0 on and iPhone 4.
I have an array that contains sprite objects. the render loop cycles through all the sprites per texture, and retrieves all their texture coords and vertex coords. it adds those to a giant interleaved array, using degenerate vertices and indices, and sends those to the GPU (I'm embedding code are the bottom). This is all being done per texture so I'm binding the texture once and then creating my interleave array and then drawing it. Everything works just great and the results on the screen are exactly what they should be.
So my benchmark test is done by adding 25 new sprites per touch at varying opacities and changing their vertices on the update so that they are bouncing around the screen while rotation and running OpenGL ES Analyzer on the app.
Heres where I'm hoping for some help....
I can get to around 275 32x32 sprites with varying opacity bouncing around the screen at 60 fps. By 400 I'm down to 40 fps. When i run the OpenGL ES Performance Detective it tells me...
The app rendering is limited by triangle rasterization - the process of converting triangles into pixels. The total area in pixels of all of the triangles you are rendering is too large. To draw at a faster frame rate, simplify your scene by reducing either the number of triangles, their size, or both.
Thing is i just whipped up a test in cocos2D using CCSpriteBatchNode using the same texture and created 800 transparent sprites and the framerate is an easy 60fps.
Here is some code that may be pertinent...
Shader.vsh (matrixes are set up once in the beginning)
void main()
{
gl_Position = projectionMatrix * modelViewMatrix * position;
texCoordOut = texCoordIn;
colorOut = colorIn;
}
Shader.fsh (colorOut is used to calc opacity)
void main()
{
lowp vec4 fColor = texture2D(texture, texCoordOut);
gl_FragColor = vec4(fColor.xyz, fColor.w * colorOut.a);
}
VBO setup
glGenBuffers(1, &_vertexBuf);
glGenBuffers(1, &_indiciesBuf);
glGenVertexArraysOES(1, &_vertexArray);
glBindVertexArrayOES(_vertexArray);
glBindBuffer(GL_ARRAY_BUFFER, _vertexBuf);
glBufferData(GL_ARRAY_BUFFER, sizeof(TDSEVertex)*12000, &vertices[0].x, GL_DYNAMIC_DRAW);
glEnableVertexAttribArray(GLKVertexAttribPosition);
glVertexAttribPointer(GLKVertexAttribPosition, 2, GL_FLOAT, GL_FALSE, sizeof(TDSEVertex), BUFFER_OFFSET(0));
glEnableVertexAttribArray(GLKVertexAttribTexCoord0);
glVertexAttribPointer(GLKVertexAttribTexCoord0, 2, GL_FLOAT, GL_FALSE, sizeof(TDSEVertex), BUFFER_OFFSET(8));
glEnableVertexAttribArray(GLKVertexAttribColor);
glVertexAttribPointer(GLKVertexAttribColor, 4, GL_FLOAT, GL_FALSE, sizeof(TDSEVertex), BUFFER_OFFSET(16));
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, _indiciesBuf);
glBufferData(GL_ELEMENT_ARRAY_BUFFER, sizeof(ushort)*12000, indicies, GL_STATIC_DRAW);
glBindVertexArrayOES(0);
Update Code
/*
Here it cycles through all the sprites, gets their vert info (includes coords, texture coords, and color) and adds them to this giant array
The array is of...
typedef struct{
float x, y;
float tx, ty;
float r, g, b, a;
}TDSEVertex;
*/
glBindBuffer(GL_ARRAY_BUFFER, _vertexBuf);
//glBufferSubData(GL_ARRAY_BUFFER, sizeof(vertices[0])*(start), sizeof(TDSEVertex)*(indicesCount), &vertices[start]);
glBufferData(GL_ARRAY_BUFFER, sizeof(TDSEVertex)*indicesCount, &vertices[start].x, GL_DYNAMIC_DRAW);
glBindBuffer(GL_ARRAY_BUFFER, 0);
Render Code
GLKTextureInfo* textureInfo = [[TDSETextureManager sharedTextureManager].textures objectForKey:textureName];
glBindTexture(GL_TEXTURE_2D, textureInfo.name);
glBindVertexArrayOES(_vertexArray);
glDrawElements(GL_TRIANGLE_STRIP, indicesCount, GL_UNSIGNED_SHORT, BUFFER_OFFSET(start));
glBindVertexArrayOES(0);
Heres a screenshot at 400 sprites (800 triangles + 800 degenerate triangles) to give an idea of the opacity layering as the textures are moving...
Again i should note that a VBO is being created and sent per texture so Im binding and then drawing only twice per frame (since there are only two textures).
Sorry if this is overwhelming but its my first post on here and wanted to be thorough.
Any help would be much appreciated.
PS, i know that i could just use Cocos2D instead of writing everything from scratch, but wheres the fun(and learning) in that?!
UPDATE #1
When i switch my fragment shader to only be
gl_FragColor = texture2D(texture, texCoordOut);
it gets to 802 sprites at 50fps (4804 triangles including degenerate triangles), though setting sprite opacity is lost.. Any suggestions as to how I can still handle opacity in my shader without running at 1/4th the speed?
UPDATE #2
So i ditched GLKit's View and View controller and wrote a custom view loaded from the AppDelegate. 902 sprites with opacity & transparency at 60fps.
Mostly miscellaneous thoughts...
If you're triangle limited, try switching from GL_TRIANGLE_STRIP to GL_TRIANGLES. You're still going to need to specify exactly the same number of indices — six per quad — but the GPU never has to spot that the connecting triangles between quads are degenerate (ie, it never has to convert them into zero pixels). You'll need to profile to see whether you end up paying a cost for no longer implicitly sharing edges.
You should also shrink the footprint of your vertices. I would dare imagine you can specify x, y, tx and ty as 16-bit integers, and your colours as 8-bit integers without any noticeable change in rendering. That would reduce the footprint of each vertex from 32 bytes (eight components, each four bytes in size) to 12 bytes (four two-byte values plus four one-byte values, with no padding needed because everything is already aligned) — cutting almost 63% of the memory bandwidth costs there.
As you actually seem to be fill-rate limited, you should consider your source texture too. Anything you can do to trim its byte size will directly help texel fetches and hence fill rate.
It looks like you're using art that is consciously about the pixels so switching to PVR probably isn't an option. That said, people sometimes don't realise the full benefit of PVR textures; if you switch to, say, the 4 bits per pixel mode then you can scale your image up to be twice as wide and twice as tall so as to reduce compression artefacts and still only be paying 16 bits on each source pixel but likely getting a better luminance range than a 16 bpp RGB texture.
Assuming you're currently using a 32 bpp texture, you should at least see whether an ordinary 16 bpp RGB texture is sufficient using any of the provided hardware modes (especially if the 1 bit of alpha plus 5 bits per colour channel is appropriate to your art, since that loses only 9 bits of colour information versus the original while reducing bandwidth costs by 50%).
It also looks like you're uploading indices every single frame. Upload only when you add extra objects to the scene or if the buffer as last uploaded is hugely larger than it needs to be. You can just limit the count passed to glDrawElements to cut back on objects without a reupload. You should also check whether you actually gain anything by uploading your vertices to a VBO and then reusing them if they're just changing every frame. It might be faster to provide them directly from client memory.

Automatically calculate normals in GLKit/OpenGL-ES

I'm making some fairly basic shapes in OpenGL-ES based on sample code from Apple. They've used an array of points, with an array of indices into the first array and each set of three indices creates a polygon. That's all great, I can make the shapes I want. To shade the shapes correctly I believe I need to calculate normals for each vertex on each polygon. At first the shapes were cuboidal so it was very easy, but now I'm making (slightly) more advanced shapes I want to create those normals automatically. It seems easy enough if I get vectors for two edges of a polygon (all polys are triangles here) and use their cross product for every vertex on that polygon. After that I use code like below to draw the shape.
glEnableVertexAttribArray(GLKVertexAttribPosition);
glVertexAttribPointer(GLKVertexAttribPosition, 3, GL_FLOAT, GL_FALSE, 0, triangleVertices);
glEnableVertexAttribArray(GLKVertexAttribColor);
glVertexAttribPointer(GLKVertexAttribColor, 4, GL_FLOAT, GL_FALSE, 0, triangleColours);
glEnableVertexAttribArray(GLKVertexAttribNormal);
glVertexAttribPointer(GLKVertexAttribNormal, 3, GL_FLOAT, GL_FALSE, 0, triangleNormals);
glDrawArrays(GL_TRIANGLES, 0, 48);
glDisableVertexAttribArray(GLKVertexAttribPosition);
glDisableVertexAttribArray(GLKVertexAttribColor);
glDisableVertexAttribArray(GLKVertexAttribNormal);
What I'm having trouble understanding is why I have to do this manually. I'm sure there are cases when you'd want something other than just a vector perpendicular to the surface, but I'm also sure that this is the most popular use case by far, so shouldn't there be an easier way? Have I missed something obvious? glCalculateNormals() would be great.
//And here is an answer
Pass in a GLKVector3[] that you wish to be filled with your normals, another with the vertices (each three are grouped into polygons) and then the count of the vertices.
- (void) calculateSurfaceNormals: (GLKVector3 *) normals forVertices: (GLKVector3 *)incomingVertices count:(int) numOfVertices
{
for(int i = 0; i < numOfVertices; i+=3)
{
GLKVector3 vector1 = GLKVector3Subtract(incomingVertices[i+1],incomingVertices[i]);
GLKVector3 vector2 = GLKVector3Subtract(incomingVertices[i+2],incomingVertices[i]);
GLKVector3 normal = GLKVector3Normalize(GLKVector3CrossProduct(vector1, vector2));
normals[i] = normal;
normals[i+1] = normal;
normals[i+2] = normal;
}
}
And again the answer is: OpenGL is neither a scene managment library nor a geometry library, but just a drawing API that draws nice pictures to the screen. For lighting it needs normals and you give it the normals. That's all. Why should it compute normals if this can just be done by the user and has nothing to do with the actual drawing?
Often you don't compute them at runtime anyway, but load them from a file. And there are many many ways to compute normals. Do you want per-face normals or per-vertex normals? Do you need any specific hard edges or any specific smooth patches? If you want to average face normals to get vertex normals, how do you want to average these?
And with the advent of shaders and the removing of the builtin normal attribute and lighting computations in newer OpenGL versions, this whole question becomes obsolete anyway as you can do lighting any way you want and don't neccessarily need traditional normals anymore.
By the way, it sounds like at the moment you are using per-face normals, which means every vertex of a face has the same normal. This creates a very faceted model with hard edges and also doesn't work very well together with indices. If you want a smooth model (I don't know, maybe you really want a faceted look), you should average the face normals of the adjacent faces for each vertex to compute per-vertex normals. That would actually be the more usual use-case and not per-face normals.
So you can do something like this pseudo-code:
for each vertex normal:
intialize to zero vector
for each face:
compute face normal using cross product
add face normal to each vertex normal of this face
for each vertex normal:
normalize
to generate smooth per-vertex normals. Even in actual code this should result in something between 10 and 20 lines of code, which isn't really complex.

OpenGL ES rendering triangle mesh - Black dots in iPhone and perfect image in simulator

This is not a texture related problem as described in other StackOverflow questions: Rendering to texture on iOS...
My Redraw loop:
glClearColor(0.0f, 0.0f, 0.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
glMatrixMode(GL_MODELVIEW);
glLoadIdentity();
glTranslatef(0.0f, 0.0f, -300.0f);
glMultMatrixf(transform);
glVertexPointer(3, GL_FLOAT, MODEL_STRIDE, &model_vertices[0]);
glEnableClientState(GL_VERTEX_ARRAY);
glNormalPointer(GL_FLOAT, MODEL_STRIDE, &model_vertices[3]);
glEnableClientState(GL_NORMAL_ARRAY);
glColorPointer(4, GL_FLOAT, MODEL_STRIDE, &model_vertices[6]);
glEnableClientState(GL_COLOR_ARRAY);
glEnable(GL_COLOR_MATERIAL);
glDrawArrays(GL_TRIANGLES, 0, MODEL_NUM_VERTICES);
The result in the simulator:
Then the result in the IPhone 4 (iOS5 using OpenGLES 1.1):
Notice the black dots, they are random as you rotate the object (brain)
The mesh has 15002 vertices and 30k triangles.
Any ideas on how to fix this jitter in the Device image?
I've solved the problem increasing the precision of depth buffer:
// Set up the projection
static const GLfloat zNear = 0.1f, zFar = 1000.0f, fieldOfView = 45.0f;
glEnable(GL_DEPTH_TEST);
glMatrixMode(GL_PROJECTION);
GLfloat size = zNear * tanf(DEGREES_TO_RADIANS(fieldOfView) / 2.0);
CGRect rect = self.bounds;
glFrustumf(-size, size, -size / (rect.size.width / rect.size.height), size / (rect.size.width / rect.size.height), zNear, zFar);
glViewport(0, 0, rect.size.width, rect.size.height);
glMatrixMode(GL_MODELVIEW);
In the code that produced the jitter the zNear was 0.01f
The hint came from devforums.apple.com
There's nothing special in the code you posted that would cause this. The problem is likely in your mesh data rather than in your code, due to precision limitations on the processing of the vertices in your model. This type of problem is common if you have adjacent triangles that have close, but not identical, values for the positions of the vertices they share. It's also the type of thing that will commonly vary between a gpu and a simulator.
You say that the black dots flash around randomly as you rotate the object. If you're rotating the object, I assume your real code isn't always loading the identity matrix in for the model-view?
If the gaps between your triangles are much smaller than the projected size of one pixel then usually they will end up being rounded to the same pixel and you won't see any problem. But if one vertex is rounded in one direction and the other vertex being rounded in the other direction then that can leave a one-pixel gap. The locations of the rounding errors will vary depending on the transform matrix, so will move every frame as the object rotates.
If you load a different mesh do you get the same errors?
If you have your brain mesh in a data format that you can edit in a 3D modeling app, then search for an option named something like "weld vertices" or "merge vertices". You set a minimum threshold for vertices to be considered identical and it will look for vertex pairs within that distance and move one (or both) to match perfectly. Many 3D modelling apps will have cleanup tools to ensure that a mesh is manifold, which means (among other things) that there are no holes in the mesh. You usually only want to deal with manifold meshes in 3D rendering. You can can also weld vertices in your own code, though the operation is expensive and not usually the type of thing you want to do at runtime unless you really have to.

Resources