(2d Texture Array) vs (Texture Atlas) vs (Multiple Binds / Draws) - directx

My question pertains to the best way to handle multiple textures. First some context:
I'm using DirectX-11 in a non-gaming application; the gui uses DirectX exclusively. I'm in the process of making the gui skinnable, so the user can customize the gui to their liking.
I've written the code in such a way that the gui layout and the size of each gui element can change based on a configuration file. The gui currently uses only DirectX primatives via DrawIndexedInstanced, but I'd like to support user supplied textures. The size of these textures can vary. There can be as many as two dozen of these different textures.
I can solve this problem by either:
Dynamically putting together a texture atlas, or...
Forcing all of the textures into a 2d texture array (by making all of the textures the same size via padding as needed), or ...
Splitting up the DrawIndexedInstanced calls so that there's one draw call for each of the different textures (i.e. multiple binds / draws).
I spent the afternoon looking for consensus. I didn't find it. Penny for your thoughts?

The approach that runs fastest is the texture atlas. This is why 2d games use sprite maps. Multiple binds / draws is the slowest approach.

Related

OpenGL-ES iOS DrawText on screen drops FPS dramastically

Our team has developed an OpenGL application which draws different polygons on the screen. Additionally we want to create about 1000 different strings to print on the screen. If we do this with the Texture2D class the FPS drops down under 3.
I've already tested Bitmap fonts, which doesn't improve the performance.
Which is the best way in OpenGL iOS to draw a lot of text without loss of performance and without losing quality (text should be scalable)?
Allocating 1000 textures takes up a huge amount of memory and will slow down your app, especially if they are at a high enough resolution for readable text. You should generate these textures as they are needed and free them once they are no longer being displayed. Make sure that you aren't generating and freeing textures each frame, but only as needed.
If you are drawing all 1000 strings in the same scene, you should combine as many as you can into like textures. This will allow you to leverage Cocos2D's TrueType rendering system to keep text high-quality. On the other hand if this is not an option and all 1000 strings need to be distinct from each other, consider building a font rendering system that renders each character as a glyph image. This will reduce the number of textures used from 1000 to about 100 to represent all standard English characters and punctuation. I had to do something similar for a video game with lots of dynamic text in an OpenGL environment, and got good performance out of it. However, I do not recommend it unless it's absolutely necessary since it limits your text to only the glyphs you define and you have to program the formatting yourself.

How does GLKTextureLoader loads cube-map from image-strip?

Is there a way to load cube-map using giant image-strip in OpenGL-ES? (or desktop GL or extension, anything)
For example, GLKTextureLoader class offers loading cube-map at once if they're sequenced vertically. I want to know there's some GL functions for this feature or the class is just splitting textures when loading. Of course, I can use this class, but I want to know which is more efficient between loading long image-stip or separated 6 images for each side.
My guess is that this function is indeed just splitting up the image into 6 sets of faces after loading the image file and then using these to do standard cubemap generation via the standard cubemap calls:
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_X...)
glTexImage2D(GL_TEXTURE_CUBE_MAP_POSITIVE_Y...)
etc
Note that the GLKTextureLoader class also defines cubeMapWithContentsOfFiles which allows you to specify 6 individual image files to define the face textures.
You could check the time it takes to setup the cubemap using the 6-files input versus the cube strip input (cubeMapWithContentsOfFile). Whhich runs faster will depend on if the loading of 6 files is faster than loading one big file and having the method split it up. Otherwise, I bet all the rest of the code between the two functions is identical and uses the standard cubemap texture calls as above.
Since GLKit is Apple proprietary, we can't just look at the source as we can with most OpenGl functions.

On iOS, how do CALayer bitmaps (CGImage objects) get displayed onto Graphics Card?

On iOS, I was able to create 3 CGImage objects, and use a CADisplayLink at 60fps to do
self.view.layer.contents = (__bridge id) imageArray[counter++ % 3];
inside the ViewController, and each time, an image is set to the view's CALayer contents, which is a bitmap.
And this all by itself, can alter what the screen shows. The screen will just loop through these 3 images, at 60fps. There is no UIView's drawRect, no CALayer's display, drawInContext, or CALayer's delegate's drawLayerInContext. All it does is to change the CALayer's contents.
I also tried adding a smaller size sublayer to self.view.layer, and set that sublayer's contents instead. And that sublayer will cycle through those 3 images.
So this is very similar to back in the old days even on Apple ][ or even in King's Quest III era, which are DOS video games, where there is 1 bitmap, and the screen just constantly shows what the bitmap is.
Except this time, it is not 1 bitmap, but a tree or a linked list of bitmaps, and the graphics card constantly use the Painter's Model to paint those bitmaps (with position and opacity), onto the main screen. So it seems that drawRect, CALayer, everything, were all designed to achieve this final purpose.
Is that how it works? Does the graphics card take an ordered list of bitmaps or a tree of bitmaps? (and then constantly show them. To simplify, we don't consider the Implicit animation in the CA framework) What is actually happening down in the graphics card handling layer? (and actually, is this method almost the same on iOS, Mac OS X, and on the PCs?)
(this question aims to understand how our graphics programming actually get rendered in modern graphics cards, since for example, if we need to understand UIView and how CALayer works, or even use CALayer's bitmap directly, we do need to understand the graphics architecture.)
Modern display libraries (such as Quartz used in iOS and Mac OS) use hardware accelerated compositing. The workings is very similar to how computer graphics libraries such as OpenGL work. In essence, each CALayer is kept in as a separate surface that is buffered and rendered by the video hardware much like a texture in a 3D game. This is exceptionally well implemented in iOS and this is why the iPhone is so well-known for having a smooth UI.
In the "old days" (i.e. Windows 9x, Mac OS Classic, etc), the screen was essentially one big framebuffer, and everything that was exposed by e.g. moving a window had to be redrawn manually by each application. The redrawing was mostly done by the CPU, which put an upper limit on animation performance. Animation were usually very "flickery" due to the redrawing involved. This technique was mostly suited for desktop applications without too much animation. Notably, Android uses (or at least used to use) this technique, which is a big problem when porting iOS applications over to Android.
Games of the old days days (e.g. DOS, arcade machines, etc, also used a lot on Mac OS classic), something called sprite animation was used to improve performance and reduce flickering by keeping the moving images in offscreen buffers that were rendered by the hardware and synchronized with the monitor's vblank, which meant that animations were smooth even on very low-end systems. However, the size of these images were very limited and the screen resolutions were low, only about 10-15% of the pixels of even an iPhone screen of today.
You've got a reasonable intuition here, but there are still several steps between contents and the display. First off, contents doesn't have to be a CGImage. It is often a private class called CABackingStorage which is not quite the same thing. In many cases there are hardware optimizations going on to bypass rendering the image into main memory and then copying it to video memory. And since the contents of various layers are all composited together, you're still a ways from the "real" display memory. Not to mention that modifications to contents just directly impacts the model layer, not the presentation or render layers. Plus there are CGLayer objects that can store their image directly in video memory. There's a lot of different stuff going on.
So the answer is, no, the video "card" (chip; it's the PowerVR BTW) does not take an ordered bunch of layers. It takes lower-level data in ways that are not well documented. Some things (particularly parts of Core Animation, and perhaps CGLayer) appear to be wrappers around OpenGL textures, but others are probably Core Graphics directly accessing the hardware itself. Once you get to this level of the stack, it's all private and can change from version to version and from device to device.
You also may find Brad Larson's response useful here:
iOS: is Core Graphics implemented on top of OpenGL?
You may also be interested in Chapter 6 of iOS:PTL. While it doesn't go into the implementation specifics, it does include a lot of practical discussion of how to improve drawing performance and best utilize the hardware with Core Graphics. Chapter 7 details all the developer-accessible steps involved in CALayer drawing.

On iPad, to keep a bitmap of the current main view, how is Core Graphics compared with cocos2D and OpenGL ES?

I heard that for a single view app, it is best to keep the current main view's content in a bitmap, and we paint extra things on this bitmap, and when it is time to do drawRect (when called by iOS), then we just display the bitmap.
Is it true that Core Graphics is quite capable of handling it? But since both cocos2D and OpenGL ES are said to be super fast when handling graphics, is using one of them good for the situation above, making it more smooth or is it not that different from using Core Graphics?
(Or maybe it matters for type of application, such as a simple drawing program, versus a game that refreshes 30fps or 60fps?)
It depends if you need to draw every frame and have your drawings change.
Check out the AccelerometerGraph sample code provided by Apple. This is one way of using Core Animation layers and Core Graphics (Quartz drawing) together to do continuous drawings across many frames without redrawing what has already been drawn.
Cocos2D is less for drawing and more for sprite-based animations where your sprites are pre-rendered images that you have imported (png, etc). It's very very fast at handling multiple images simultaneously and offers batch nodes for consolidating all images to just one single Open GL call.
Quartz (Core Graphics) by itself is not usually the best option for continuous drawing, but is better for generating static images once that you then move around, if necessary.

iOS: playing a frame-by-frame greyscale animation in a custom colour

I have a 32 frame greyscale animation of a diamond exploding into pieces (ie 32 PNG images # 1024x1024)
my game consists of 12 separate colours, so I need to perform the animation in any desired colour
this I believe rules out any Apple frameworks, also it rules out a lot of public code for animating frame by frame in iOS.
what are my potential solution paths?
these are the best SO links I have found:
Faster iPhone PNG Animations
frame by frame animation
Is it possible using video as texture for GL in iOS?
that last one just shows it is may be possible to load an image into a GL texture each frame ( he is doing it from the camera, so if I have everything stored in memory, that should be even faster )
I can see these options ( listed laziest first, most optimised last )
option A
each frame (courtesy of CADisplayLink), load the relevant image from file into a texture, and display that texture
I'm pretty sure this is stupid, so onto option B
option B
preload all images into memory
then as per above, only we load from memory rather than from file
I think this is going to be the ideal solution, can anyone give it the thumbs up or thumbs down?
option C
preload all of my PNGs into a single GL texture of the maximum size, creating a texture Atlas. each frame, set the texture coordinates to the rectangle in the Atlas for that frame.
while this is potentially a perfect balance between coding efficiency and performance efficiency, the main problem here is losing resolution; on older iOS devices maximum texture size is 1024x1024. if we are cramming 32 frames into this ( really this is the same as cramming 64 ) we would be at 128x128 for each frame. if the resulting animation is close to full screen on the iPad this isn't going to hack it
option D
instead of loading into a single GL texture, load into a bunch of textures
moreover, we can squeeze 4 images into a single texture using all four channels
I baulk at the sheer amount of fiddly coding required here. My RSI starts to tingle even thinking about this approach
I think I have answered my own question here, but if anyone has actually done this or can see the way through, please answer!
If something higher performance than (B) is needed, it looks like the key is glTexSubImage2D http://www.opengl.org/sdk/docs/man/xhtml/glTexSubImage2D.xml
Rather than pull across one frame at a time from memory, we could arrange say 16 512x512x8-bit greyscale frames contiguously in memory, send this across to GL as a single 1024x1024x32bit RGBA texture, and then split it within GL using the above function.
This would mean that we are performing one [RAM->VRAM] transfer per 16 frames rather than per one frame.
Of course, for more modern devices we could get 64 instead of 16, since more recent iOS devices can handle 2048x2048 textures.
I will first try technique (B) and leave it at that if it works ( I don't want to over code ), and look at this if needed.
I still can't find any way to query how many GL textures it is possible to hold on the graphics chip. I have been told that when you try to allocate memory for a texture, GL just returns 0 when it has run out of memory. however to implement this properly I would want to make sure that I am not sailing close to the wind re: resources... I don't want my animation to use up so much VRAM that the rest of my rendering fails...
You would be able to get this working just fine with CoreGraphics APIs, there is no reason to deep dive into OpenGL for a simple 2D problem like this. For the general approach you should take to creating colored frames from a grayscale frame, see colorizing-image-ignores-alpha-channel-why-and-how-to-fix. Basically, you need to use CGContextClipToMask() and then render a specific color so that what is left is the diamond colored in with the specific color you have selected. You could do this at runtime, or you could do it offline and create 1 video for each of the colors you want to support. It is be easier on your CPU if you do the operation N times and save the results into files, but modern iOS hardware is much faster than it used to be. Beware of memory usage issues when writing video processing code, see video-and-memory-usage-on-ios-devices for a primer that describes the problem space. You could code it all up with texture atlases and complex openGL stuff, but an approach that makes use of videos would be a lot easier to deal with and you would not need to worry so much about resource usage, see my library linked in the memory post for more info if you are interested in saving time on the implementation.

Resources