I have an application (hardware) that produces large images (e.g. 2048x5000) very fast (e.g. 30 fps).
So I would like to use the GPU in order to scale and display them together with an overlay (e.g. text annotations).
What is the fastest way to do this?
Copy images into an offscreen
surface, stretch it into the
backbuffer, redraw all annotations.
Create textures (tiling?) and map them onto a rectangle
DirectShow?
Other options?
Thanks,
Florian
P.S.: Should run with Windows XP, too
If it has to run in windows XP, your best bet is Direc3D 9, using a video card that supports up to 8192-height textures. Otherwise, you will have to map 2-4 quads vertically to cover the whole area.
Surfaces and textures of the same format should be about the same speed, but StretchRect() with surfaces is going to be way more convenient.
Related
My question pertains to the best way to handle multiple textures. First some context:
I'm using DirectX-11 in a non-gaming application; the gui uses DirectX exclusively. I'm in the process of making the gui skinnable, so the user can customize the gui to their liking.
I've written the code in such a way that the gui layout and the size of each gui element can change based on a configuration file. The gui currently uses only DirectX primatives via DrawIndexedInstanced, but I'd like to support user supplied textures. The size of these textures can vary. There can be as many as two dozen of these different textures.
I can solve this problem by either:
Dynamically putting together a texture atlas, or...
Forcing all of the textures into a 2d texture array (by making all of the textures the same size via padding as needed), or ...
Splitting up the DrawIndexedInstanced calls so that there's one draw call for each of the different textures (i.e. multiple binds / draws).
I spent the afternoon looking for consensus. I didn't find it. Penny for your thoughts?
The approach that runs fastest is the texture atlas. This is why 2d games use sprite maps. Multiple binds / draws is the slowest approach.
Our team has developed an OpenGL application which draws different polygons on the screen. Additionally we want to create about 1000 different strings to print on the screen. If we do this with the Texture2D class the FPS drops down under 3.
I've already tested Bitmap fonts, which doesn't improve the performance.
Which is the best way in OpenGL iOS to draw a lot of text without loss of performance and without losing quality (text should be scalable)?
Allocating 1000 textures takes up a huge amount of memory and will slow down your app, especially if they are at a high enough resolution for readable text. You should generate these textures as they are needed and free them once they are no longer being displayed. Make sure that you aren't generating and freeing textures each frame, but only as needed.
If you are drawing all 1000 strings in the same scene, you should combine as many as you can into like textures. This will allow you to leverage Cocos2D's TrueType rendering system to keep text high-quality. On the other hand if this is not an option and all 1000 strings need to be distinct from each other, consider building a font rendering system that renders each character as a glyph image. This will reduce the number of textures used from 1000 to about 100 to represent all standard English characters and punctuation. I had to do something similar for a video game with lots of dynamic text in an OpenGL environment, and got good performance out of it. However, I do not recommend it unless it's absolutely necessary since it limits your text to only the glyphs you define and you have to program the formatting yourself.
What would be a better approach in Corona re static background for a 2D scrolling based game?
Let's say the game level "size" is the equivalent of two screens wide and two screens deep.
Q1 - Would one large background image be an OK approach? This would probably be easier as you could do the work in Photoshop to prepare. Or is there a significant advantage (re performance) in Corona to have a small image "pattern" and repeat this in Corona (Lua code) to create the backdrop?
Q2 - If one large background image approach is OK, would I assume that one might have to sacrifice the resolution of the image, noting the size (2xscreens wide, and 2xscreens deep) correct for the higher resolution devices? That is for the iPad 3 say, assuming your configuration would normally would try to pickup the 3x image version (for other images, say smaller play icons, etc.) that for the background you might have to remain with the 1x or 2x image size. Otherwise, it may hit the texture limit (I've read "Most devices have a maximum texture size of 2048x2048"). Is this correct / does this make sense?
I used both approaches on my games.
Advantages of tiled mode:
You can make huge backgrounds.
Can be made to use less memory (specially with smallish tiles with lots of repeating, like a real world wallpaper)
Allow for some interesting effects (like parallax scrolling).
Problems of tiled mode:
Use more CPU performance
Might be buggy and hard to make behave correctly (for example one of my games gaps showed between tiles, but only on iPad Retina... it required some heavy math hackery to make it work)
It is hard to make complex and awesome backgrounds (reason why my point and click adventure games don't use tiled backgrounds).
Pay attention that some devices, has a limit in the size of the textures in pixels, this might be the ultimate limit of how large a single-texture background can be.
On iOS, I was able to create 3 CGImage objects, and use a CADisplayLink at 60fps to do
self.view.layer.contents = (__bridge id) imageArray[counter++ % 3];
inside the ViewController, and each time, an image is set to the view's CALayer contents, which is a bitmap.
And this all by itself, can alter what the screen shows. The screen will just loop through these 3 images, at 60fps. There is no UIView's drawRect, no CALayer's display, drawInContext, or CALayer's delegate's drawLayerInContext. All it does is to change the CALayer's contents.
I also tried adding a smaller size sublayer to self.view.layer, and set that sublayer's contents instead. And that sublayer will cycle through those 3 images.
So this is very similar to back in the old days even on Apple ][ or even in King's Quest III era, which are DOS video games, where there is 1 bitmap, and the screen just constantly shows what the bitmap is.
Except this time, it is not 1 bitmap, but a tree or a linked list of bitmaps, and the graphics card constantly use the Painter's Model to paint those bitmaps (with position and opacity), onto the main screen. So it seems that drawRect, CALayer, everything, were all designed to achieve this final purpose.
Is that how it works? Does the graphics card take an ordered list of bitmaps or a tree of bitmaps? (and then constantly show them. To simplify, we don't consider the Implicit animation in the CA framework) What is actually happening down in the graphics card handling layer? (and actually, is this method almost the same on iOS, Mac OS X, and on the PCs?)
(this question aims to understand how our graphics programming actually get rendered in modern graphics cards, since for example, if we need to understand UIView and how CALayer works, or even use CALayer's bitmap directly, we do need to understand the graphics architecture.)
Modern display libraries (such as Quartz used in iOS and Mac OS) use hardware accelerated compositing. The workings is very similar to how computer graphics libraries such as OpenGL work. In essence, each CALayer is kept in as a separate surface that is buffered and rendered by the video hardware much like a texture in a 3D game. This is exceptionally well implemented in iOS and this is why the iPhone is so well-known for having a smooth UI.
In the "old days" (i.e. Windows 9x, Mac OS Classic, etc), the screen was essentially one big framebuffer, and everything that was exposed by e.g. moving a window had to be redrawn manually by each application. The redrawing was mostly done by the CPU, which put an upper limit on animation performance. Animation were usually very "flickery" due to the redrawing involved. This technique was mostly suited for desktop applications without too much animation. Notably, Android uses (or at least used to use) this technique, which is a big problem when porting iOS applications over to Android.
Games of the old days days (e.g. DOS, arcade machines, etc, also used a lot on Mac OS classic), something called sprite animation was used to improve performance and reduce flickering by keeping the moving images in offscreen buffers that were rendered by the hardware and synchronized with the monitor's vblank, which meant that animations were smooth even on very low-end systems. However, the size of these images were very limited and the screen resolutions were low, only about 10-15% of the pixels of even an iPhone screen of today.
You've got a reasonable intuition here, but there are still several steps between contents and the display. First off, contents doesn't have to be a CGImage. It is often a private class called CABackingStorage which is not quite the same thing. In many cases there are hardware optimizations going on to bypass rendering the image into main memory and then copying it to video memory. And since the contents of various layers are all composited together, you're still a ways from the "real" display memory. Not to mention that modifications to contents just directly impacts the model layer, not the presentation or render layers. Plus there are CGLayer objects that can store their image directly in video memory. There's a lot of different stuff going on.
So the answer is, no, the video "card" (chip; it's the PowerVR BTW) does not take an ordered bunch of layers. It takes lower-level data in ways that are not well documented. Some things (particularly parts of Core Animation, and perhaps CGLayer) appear to be wrappers around OpenGL textures, but others are probably Core Graphics directly accessing the hardware itself. Once you get to this level of the stack, it's all private and can change from version to version and from device to device.
You also may find Brad Larson's response useful here:
iOS: is Core Graphics implemented on top of OpenGL?
You may also be interested in Chapter 6 of iOS:PTL. While it doesn't go into the implementation specifics, it does include a lot of practical discussion of how to improve drawing performance and best utilize the hardware with Core Graphics. Chapter 7 details all the developer-accessible steps involved in CALayer drawing.
I have a 32 frame greyscale animation of a diamond exploding into pieces (ie 32 PNG images # 1024x1024)
my game consists of 12 separate colours, so I need to perform the animation in any desired colour
this I believe rules out any Apple frameworks, also it rules out a lot of public code for animating frame by frame in iOS.
what are my potential solution paths?
these are the best SO links I have found:
Faster iPhone PNG Animations
frame by frame animation
Is it possible using video as texture for GL in iOS?
that last one just shows it is may be possible to load an image into a GL texture each frame ( he is doing it from the camera, so if I have everything stored in memory, that should be even faster )
I can see these options ( listed laziest first, most optimised last )
option A
each frame (courtesy of CADisplayLink), load the relevant image from file into a texture, and display that texture
I'm pretty sure this is stupid, so onto option B
option B
preload all images into memory
then as per above, only we load from memory rather than from file
I think this is going to be the ideal solution, can anyone give it the thumbs up or thumbs down?
option C
preload all of my PNGs into a single GL texture of the maximum size, creating a texture Atlas. each frame, set the texture coordinates to the rectangle in the Atlas for that frame.
while this is potentially a perfect balance between coding efficiency and performance efficiency, the main problem here is losing resolution; on older iOS devices maximum texture size is 1024x1024. if we are cramming 32 frames into this ( really this is the same as cramming 64 ) we would be at 128x128 for each frame. if the resulting animation is close to full screen on the iPad this isn't going to hack it
option D
instead of loading into a single GL texture, load into a bunch of textures
moreover, we can squeeze 4 images into a single texture using all four channels
I baulk at the sheer amount of fiddly coding required here. My RSI starts to tingle even thinking about this approach
I think I have answered my own question here, but if anyone has actually done this or can see the way through, please answer!
If something higher performance than (B) is needed, it looks like the key is glTexSubImage2D http://www.opengl.org/sdk/docs/man/xhtml/glTexSubImage2D.xml
Rather than pull across one frame at a time from memory, we could arrange say 16 512x512x8-bit greyscale frames contiguously in memory, send this across to GL as a single 1024x1024x32bit RGBA texture, and then split it within GL using the above function.
This would mean that we are performing one [RAM->VRAM] transfer per 16 frames rather than per one frame.
Of course, for more modern devices we could get 64 instead of 16, since more recent iOS devices can handle 2048x2048 textures.
I will first try technique (B) and leave it at that if it works ( I don't want to over code ), and look at this if needed.
I still can't find any way to query how many GL textures it is possible to hold on the graphics chip. I have been told that when you try to allocate memory for a texture, GL just returns 0 when it has run out of memory. however to implement this properly I would want to make sure that I am not sailing close to the wind re: resources... I don't want my animation to use up so much VRAM that the rest of my rendering fails...
You would be able to get this working just fine with CoreGraphics APIs, there is no reason to deep dive into OpenGL for a simple 2D problem like this. For the general approach you should take to creating colored frames from a grayscale frame, see colorizing-image-ignores-alpha-channel-why-and-how-to-fix. Basically, you need to use CGContextClipToMask() and then render a specific color so that what is left is the diamond colored in with the specific color you have selected. You could do this at runtime, or you could do it offline and create 1 video for each of the colors you want to support. It is be easier on your CPU if you do the operation N times and save the results into files, but modern iOS hardware is much faster than it used to be. Beware of memory usage issues when writing video processing code, see video-and-memory-usage-on-ios-devices for a primer that describes the problem space. You could code it all up with texture atlases and complex openGL stuff, but an approach that makes use of videos would be a lot easier to deal with and you would not need to worry so much about resource usage, see my library linked in the memory post for more info if you are interested in saving time on the implementation.