glGenTextures speed and memory concerns - memory

I am learning OpenGL and recently discovered about glGenTextures.
Although several sites explain what it does, I feel forced to wonder how it behaves in terms of speed and, particularly, memory.
Exactly what should I consider when calling glGenTextures? Should I consider unloading and reloading textures for better speed? How many textures should a standard game need? What workarounds are there to get around any limitations memory and speed may bring?

According to the manual, glGenTextures only allocates texture "names" (eg ids) with no "dimensionality". So you are not actually allocating texture memory as such, and the overhead here is negligible compared to actual texture memory allocation.
glTexImage will actually control the amount of texture memory used per texture. Your application's best usage of texture memory will depend on many factors: including the maximum working set of textures used per frame, the available dedicated texture memory of the hardware, and the bandwidth of texture memory.
As for your question about a typical game - what sort of game are you creating? Console games are starting to fill blu-ray disk capacity (I've worked on a PS3 title that was initially not projected to fit on blu-ray). A large portion of this space is textures. On the other hand, downloadable web games are much more constrained.
Essentially, you need to work with reasonable game design and come up with an estimate of:
1. The total textures used by your game.
2. The maximum textures used at any one time.
Then you need to look at your target hardware and decide how to make it all fit.
Here's a link to an old Game Developer article that should get you started:
http://number-none.com/blow/papers/implementing_a_texture_caching_system.pdf

Related

Need to put out serious numbers of SkSpriteNode and CPU usage already at 90%

Ok so I'd like to place a large number of skspritenodes on screen. In the game I'm working on and even in the sample twirling spaceship game the cpu usage runs high in the simulator. I'm not sure how to test my game on an actual device (unless I submit to apple) but I'd like to know whether having something like 50-100 nodes on screen would use too much CPU time.
I've tested putting out large numbers of skspritenodes and the cpu usage reads 90% or more. Is this normal? Will I get laughed at if I hand Apple this game based on the extremely high (and growing) amount of CPU usage for this game?
Lastly, is there a way to avoid lagging during different points in the game? Arching? Preloading textures? idk, something like that.
Performance results seen in simulator are not relevant at all. If you are interested in real results, then you should test on different devices.
From the docs:
Rendering performance of OpenGL ES in Simulator has no relation to the
performance of OpenGL ES on an actual device. Simulator provides an
optimized software rasterizer that takes advantage of the
vector-processing capabilities of your Macintosh computer. As a
result, your OpenGL ES code may run faster or slower in iOS simulator
(depending on your computer and what you are drawing) than on an
actual device. Always profile and optimize your drawing code on a real
device, and never assume that Simulator reflects real-world
performance.
On the other hand, SpriteKit is capable to render a hundreds of sprites at 60fps if you are using texture atlases to draw many nodes in a single draw pass. Read more here.
About preloading textures into memory you can check :
Preload Texture Atlas Data section
and
+ preloadTextureAtlases:withCompletionHandler: method.
Hope this helps.

iOS OpenGL ES 2.0 VBO vertex count limit: Once exceeded, CPU bound

I am testing the rendering of extremely large 3d meshes, and I am currently testing on an iPhone 5 (I also have an iPad 3).
I have here two screenshots of Instruments with a profiling run. The first one is rendering a 1.3M vertex mesh, and the second is rendering a 2.1M vertex mesh.
The blue histogram-bar at the top shows CPU load, and it can be seen that for the first mesh is hovering at around ~10% CPU load so the GPU is doing most of the heavy lifting. The mesh is very detailed and my point-light-with-specular shader makes it look quite impressive if I say so myself, as it is able to render consistently above 20 frames per second. Oh, and 4x MSAA is enabled as well!
However, once I step up to a 2 million+ vertex mesh, everything goes to crap as we see here a massive CPU bound situation, and all instruments report 1 frame per second performance.
So, it's pretty clear that somewhere between these two assets (and I will admit that they are both tremendously large meshes to be loading in under one single VBO), whether it is the vertex buffer size or the index buffer size that is over the limit, some limit is being surpassed by the 2megavertex (462K tris) mesh.
So, the question is, what is this limit, and how can I query it? It would really be very preferable if I can have some reasonable assurance that my app will function well without exhaustively testing every device.
I also see an alternative approach to this problem, which is to stick to a known good VBO size limit (I have read about 4MB being a good limit), and basically just have the CPU work a little bit harder if the mesh being rendered is monstrous. With a 100MB VBO, having it in 4MB chunks (segmenting the mesh into 25 draw calls) does not really sound that bad.
But, I'm still curious. How can I check the max size, in order to work around the CPU fallback? Could I be running into an out of memory condition, and Apple is simply applying a CPU based workaround (oh LORD have mercy, 2 million vertices in immediate mode...)?
In pure OpenGL, there are two implementation-defined attributes: GL_MAX_ELEMENTS_VERTICES and GL_MAX_ELEMENTS_INDICES. When exceeded performance can drop off a cliff in some implementations.
I spent a while looking through the OpenGL ES specification for the equivalent and could not find it. Chances are it's burried in one of the OES or vendor-specific extensions on OpenGL ES. Nevertheless, there is a very real hardware limit to the number of elements you can draw and the number of vertices. After a point with too many indices, you can exceed the capacity of the post-T&L cache. 2 million is a lot for a single draw call, in lieu of being able to query the OpenGL ES implementation for this information, I'd try successively lower powers-of-two until you dial it back to the sweet spot.
65,536 used to be a sweet spot on DX9 hardware. That was the limit for 16-bit indices and was always guaranteed to be below the maximum hardware vertex count. Chances are it'll work for OpenGL ES class hardware too...

cocos2d flicker

I have a pool of CCSprites numbering 1200 in each of two arrays, displayGrid1 and displayGrid2. I turn them visible or invisible when showing walls or floors. Floors have a number of different textures and are not z-order dependent. Walls also have several textures and are z-order dependent.
I am getting about 6-7 frames when moving which is okay because its a turn based isometric rogue-like. However, I am also getting a small amount of flicker, which I think is performance related, because there is no flicker on the simulator.
I would like to improve performance. I am considering using an array CCSpriteBatchNodes for the floor which is not z-order dependent but am concerned with the cost of adding and removing sprites frequently between the elements of this array, which would be necessary I think.
Can anyone please advise as to how I can improve performance?
As mentioned in the comments, you're using multiple small sprite files loaded individually which can cause performance issues as there is wasted memory being used to store excess pixel data around each of the individual sprites. Each row of pixel data in an OpenGL texture must have a number of bytes totaling a power of 2 for performance reasons. Although I believe OpenGL ES under iOS does this automatically, it can come with a big performance hit. Grouping sprites together into a single texture that is correctly sized can be a tremendous boon to rendering performance.
You can use an App like Zwoptex to group all these smaller sprite files into a larger, more manageable sprite sheets/texture atlas and utilize one CCSpriteBatchNode for each sprite sheet/texture atlas.
Cocos2D has pretty good support for utilizing sprite sheets with texture atlases and converting your code to using these instead of individual files can be done with little effort. Creating individual sprites from a texture atlas is easy, you just call the sprite by name instead of from the file.
CCSpriteBatchNodes group OpenGL calls for their sprites together, a process known as batching, which causes the operating system and OpenGL to have to make less round trips to the GPU which greatly improves performance. Unfortunately, CCSpriteBatchNodes are limited to only being able to draw sprites for the texture that backs them (enter sprite sheets/texture atlases).

When to use power of 2 textures in XNA?

I hear a lot that power of 2 textures are better for performance reasons, but I couldn't find enough solid information about if it's a problem when using XNA. Most of my textures have random dimensions and I don't see much of a problem, but maybe VS profiler doesn't show that.
In general, pow 2 textures are better. But most graphics cards should allow non pow 2 textures with a minimal loss of performance. However, if you use XNA reach profile, only pow 2 textures are allowed. And some small graphics cards only support the reach profile.
XNA is really a layer built on top of DirectX. So any performance guidelines that goes for that will also apply for anything using XNA.
The VS profiler also won't really apply to the graphics specific things you are doing. That will need to be profiled separately by some tool that can check how the graphic card itself is doing. If the graphics card is struggling it won't show up as a high resource usage on your CPU, but rather as a slow rendering speed.

Performace loss for using non-power of two textures

Is there any performance loss for using non-power-of-two textures under iOS? I have not noticed any my in quick benchmarks. I can save quite a bit of active memory by dumping them all together since there is a lot of wasted padding (despite texture packing). I don't care about the older hardware that can't use them.
This can vary widely depending on the circumstances and your particular device. On iOS, the loss is smaller if you use NEAREST filtering rather than LINEAR, but it isn't huge to begin with (think 5-10%).

Resources