My team is running into an issue where the amount of texture memory allocated via the glTexImage2D is high enough that it crashes the app ( at about 400 MB for iPhone 5). We're taking steps to minimize the texture allocation ( via compression, using fewer bits/channel and doing procedural shaders for VFX etc).
Since the app crashed on glTexImage2D, I felt like, it's running out of texture memory (as against virtual memory). Is there any documentation/guideline on the recommended texture memory usage by an app (not just optimize your texture memory) .
AFAIK on iOS devices ( and many Android devices) there's no dedicated VRAM and our app process is still well within the virtual memory limit. Is this some how related to the size of physical RAM ? My searches so far has resulted only in info on max texture size and tricks for optimizing texture usage and such. Any information is appreciated.
Related
Ok so I'd like to place a large number of skspritenodes on screen. In the game I'm working on and even in the sample twirling spaceship game the cpu usage runs high in the simulator. I'm not sure how to test my game on an actual device (unless I submit to apple) but I'd like to know whether having something like 50-100 nodes on screen would use too much CPU time.
I've tested putting out large numbers of skspritenodes and the cpu usage reads 90% or more. Is this normal? Will I get laughed at if I hand Apple this game based on the extremely high (and growing) amount of CPU usage for this game?
Lastly, is there a way to avoid lagging during different points in the game? Arching? Preloading textures? idk, something like that.
Performance results seen in simulator are not relevant at all. If you are interested in real results, then you should test on different devices.
From the docs:
Rendering performance of OpenGL ES in Simulator has no relation to the
performance of OpenGL ES on an actual device. Simulator provides an
optimized software rasterizer that takes advantage of the
vector-processing capabilities of your Macintosh computer. As a
result, your OpenGL ES code may run faster or slower in iOS simulator
(depending on your computer and what you are drawing) than on an
actual device. Always profile and optimize your drawing code on a real
device, and never assume that Simulator reflects real-world
performance.
On the other hand, SpriteKit is capable to render a hundreds of sprites at 60fps if you are using texture atlases to draw many nodes in a single draw pass. Read more here.
About preloading textures into memory you can check :
Preload Texture Atlas Data section
and
+ preloadTextureAtlases:withCompletionHandler: method.
Hope this helps.
In Chrome task manager, there is a column called GPU memory. In GPU-z, I can see the memory size information of the video card. I suppose it is the video memory. Is it the same as GPU memory?
Yes that is the same as the GPU Memory.
The only exception to this is on some lower-end computers use a technique called shared graphics memory in which the integrated graphics card uses some of the RAM as video memory. In the case of your non-integrated graphics card, this would not be the case.
I am testing the rendering of extremely large 3d meshes, and I am currently testing on an iPhone 5 (I also have an iPad 3).
I have here two screenshots of Instruments with a profiling run. The first one is rendering a 1.3M vertex mesh, and the second is rendering a 2.1M vertex mesh.
The blue histogram-bar at the top shows CPU load, and it can be seen that for the first mesh is hovering at around ~10% CPU load so the GPU is doing most of the heavy lifting. The mesh is very detailed and my point-light-with-specular shader makes it look quite impressive if I say so myself, as it is able to render consistently above 20 frames per second. Oh, and 4x MSAA is enabled as well!
However, once I step up to a 2 million+ vertex mesh, everything goes to crap as we see here a massive CPU bound situation, and all instruments report 1 frame per second performance.
So, it's pretty clear that somewhere between these two assets (and I will admit that they are both tremendously large meshes to be loading in under one single VBO), whether it is the vertex buffer size or the index buffer size that is over the limit, some limit is being surpassed by the 2megavertex (462K tris) mesh.
So, the question is, what is this limit, and how can I query it? It would really be very preferable if I can have some reasonable assurance that my app will function well without exhaustively testing every device.
I also see an alternative approach to this problem, which is to stick to a known good VBO size limit (I have read about 4MB being a good limit), and basically just have the CPU work a little bit harder if the mesh being rendered is monstrous. With a 100MB VBO, having it in 4MB chunks (segmenting the mesh into 25 draw calls) does not really sound that bad.
But, I'm still curious. How can I check the max size, in order to work around the CPU fallback? Could I be running into an out of memory condition, and Apple is simply applying a CPU based workaround (oh LORD have mercy, 2 million vertices in immediate mode...)?
In pure OpenGL, there are two implementation-defined attributes: GL_MAX_ELEMENTS_VERTICES and GL_MAX_ELEMENTS_INDICES. When exceeded performance can drop off a cliff in some implementations.
I spent a while looking through the OpenGL ES specification for the equivalent and could not find it. Chances are it's burried in one of the OES or vendor-specific extensions on OpenGL ES. Nevertheless, there is a very real hardware limit to the number of elements you can draw and the number of vertices. After a point with too many indices, you can exceed the capacity of the post-T&L cache. 2 million is a lot for a single draw call, in lieu of being able to query the OpenGL ES implementation for this information, I'd try successively lower powers-of-two until you dial it back to the sweet spot.
65,536 used to be a sweet spot on DX9 hardware. That was the limit for 16-bit indices and was always guaranteed to be below the maximum hardware vertex count. Chances are it'll work for OpenGL ES class hardware too...
I'm doing some tests regarding loading of POT vs NPOT textures on OpenGL ES 2.0 iOS devices.
Surprisingly, NPOT textures (smaller in size) seem to take more memory than the next biggest POT texture. Can anybody explain why?
My test consists of a bare-bones App in which I load a really big texture (I'm using cocos2d, so this could be a bug in this engine). Then I output memory usage using this method. (I'm looking for a better way of reporting texture memory, see here).
The NPOT texture is 1010x1708 (3399 kB at RGBA4444). The equivalent POT texture is 1024 x 2048 (4096 kB at RGBA4444).
App usage using the POT memory usage stabilizes at a little over 1600000 bytes (I did three runs, with these values: 16261120, 16232448 and 16240640). The NPOT memory usage stabilizes at around 1900000 bytes (19173376, 19038208 and 19140608). Nothing else changes between runs, only the texture.
Why, oh, why? :-)
Note: I did these tests on iOS 6.1 (iOS 5 was known to have a bug which caused POT textures to take 33% more memory than NPOT ones.
Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL