I'm making a Worms-style bitmap destructible terrain game using OpenGL. I'd like to know where the limitiations in terms of video memory are for the size of the worlds.
Currently, I use blocks of 512*512 RGBA textures for the terrain.
How much memory, very roughly, can I expect such a 512*512 RGBA texture to take up?
Is there any internal, automatic compression going on?
How much video memory can I expect most user's computers to have free?
How much memory, very roughly, can I expect such a 512*512 RGBA texture to take up?
Not enough information. You should always use sized OpenGL image formats (GL_RGBA8, GL_RGBA16).
GL_RGBA8 takes up 32-bits per pixel, which is 4 bytes. Therefore, 512*512*4 = 1MB.
Is there any internal, automatic compression going on?
No.
How much video memory can I expect most user's computers to have free?
How much are you using currently?
OpenGL will page image data in and out according to the available space. If you run out of GPU memory, OpenGL will happily allocate system memory and upload the images as needed.
But to be honest, your little Worms game isn't going to actually cost anything in terms of memory size. Maybe 64MB when you're done, tops. It's nothing you need to be concerned about.
I would not worry about that very much. Even with 8192*2048 world (4 screens wide and 2 screens tall, which is very big for Worms-style game) you would require only 8*2*4=64Mb (add mipmaps, other textures, framebuffer) you should fit into 128MB bounds. As far as I know even older GPUs have that kind of memory (we don't speak about GeForce4 cards, right?).
Older GPUs may have limitation on how big each texture could be, but since you already split your world into 512x512 chunks it won't be a problem.
If video memory becomes an issue you could allow users to use half-sized textures (i.e. downsample the world to 4096*1024 and 256x256 chinks) and fetch new / discard unused regions on demand.
With 32-bpp (4 bytes) you get 4*512*512 = 1 MB
See this regarding texture compression: http://www.oldunreal.com/editing/s3tc/ARB_texture_compression.pdf
Again, this depends on your engine, but if I were you I would do this:
Since your terrain texture will probably be reusing some mosaic-like textures, and you need to know whether a pixel is present, or destroyed, then given you are using mosaic textures no larger than 256x256 you could definitely get away with an GL_RG16 internal format (where each component would be a texture coordinate that you would need to map from [0, 255] -> [0.0, 1.0] and you would reserve some special value to indicate that the terrain is destroyed) for your terrain texture, making every 512x512 block take up 0.5MB.
Although it's temping to add an extra byte to indicate terrain presence, but a 3 byte format wouldn't cache too well
Related
I am interested in the datastructure "quadtree" and for an project of mine, i want to use them. Ok lets make an example :
We have a 3D space where the cameraposition is locked but i can rotate the camera. Whenever i rotate my camera to a certain point, a large 2d image(bigger than frustum) is shown.
1.Loading the whole image isnt necessary when i can only see 1/4 of it! . Does it make sense to use quadtrees here, to load only the parts of the image that are visible to me?(When using opengl/webgl) If so, each quadtree node has to contain its own vertexbuffer and texture or not?
Quad tree fits good when you need to switch between multiple precision levels on demand. Geographical maps with zooming is a good example. If you have tiles with only one level of precision it should be more handy to control their loading / visibility without having such complicated structure. You could just load low precision image fast and then load high precision images on demand.
Also, speaking of your case - 50mb for 4k image sounds strange. Compressed DDS/dxt1 or PVRTC textures should take less space (and uncompressed jpg/png much less). Also, it is helpful to determine, what is the lowest applicable image precision in your case (so you don't waste space/traffic without reason).
It is my understanding that PVR textures, made with texturetool, are simply compressed images. Therefore the difference lies in the file size.
Frankly, the file size doesn't interest me. What I want to know is, can a PVR texture consume less RAM than a normal .PNG texture? Or does this depend entirely on the texture format (like RGBA8888 etc)?
The essential question would be:
Given X.png and X.pvr, if I display both with texture format RGBA8888, will one consume less RAM than the other?
Yes, the PVR will consume less RAM at all stages — it's unpacked live by the GPU as it's accessed. There's no intermediate decompression.
A PVR-like approach used in digital video is that instead of storing RGB at every pixel, convert to YUV, then store Y at every pixel and U and V only twice per four-pixel block. So you go from 128 bits for the block to 64 bits. To get back to RGB the outputter reads the exactly correct Y and interpolates or accesses the most nearby U and V as necessary.
Schemes like PVR do a similar thing of not storing the full value at every pixel but inferring parts of it from nearby context. What counts as nearby is picked directly corresponding to however the caching is arranged on that GPU. It's usually more in-depth that just scaling down the sampling resolution of some of the channels, e.g. specifying a base offset for samples and then using a tiny precision for each is also common.
So the GPU can always get a value for pixel X by reading only values in a very small, local region of the data.
This contrasts with traditional schemes like PNG where having to know every pixel in the stream prior to X is acceptable if it improves the compression. Processing such things live would flood the GPU's memory bandwidth and hence be completely impractical, so such textures are decompressed from disk and then uploaded.
So schemes like PVR tend to lead to poorer compression and lower per-pixel quality but the win is that they can sit in VRAM compressed. A game will often increase the resolution of its textures if using PVR to try to find a comfortable balance.
Uncompressed textures are:
16 bit per pixel (RGB565, RGBA4444),
24 bit per pixel (RGB888)
PVRTC textures are either 4bpp or even 2bpp. So yes they do use less memory.
Also they perform better because need less memory bandwidth to fetch textures.
I'm currently trying to reduce the memory size of my textures. I use texture packer already, as well as .pvr.cczs with either RGB565 or RGB5551. This, however, often leads to a huge, unacceptable reduction in texture quality.
Specifically, I got a spritesheet for the main character. In size it's roughly 4k*2.5k pixels. This is not really negotiable as we have lots of different animations and we need the character in a size acceptable for retina displays of ipads. So reducing the size of the character sprite would again result in huge reductions of quality when we use him in the scene.
So of course I'm trying to use 16 bit textures as often as possible. Using the above mentioned spritesheet as a 16 bit texture takes about 17 mb of memory. This is already a lot. As it's a spritesheet for a character, the texture needs transparency and therefor I need to use rgb5551 as colour depth. With only 1 bit for the alpha channel, the character just looks plain ugly. In fact, everything that needs alpha looks rather ugly with only 1 bit for the alpha channel.
However, if I'd use RGB8888 instead the spritesheet uses double the memory, around 34mb. Now imagine several characters in a scene and you'll end up with 100mb memory for characters alone. Add general overhead, sound, background, foreground, objects and UI to it and you'll end up with far too much memory. In fact, 100mb is "far too much memory" as far as I'm concerned.
I feel like I'm overlooking something in the whole process. Like something obvious I didn't do or something. RGB4444 is no solution either, it really looks unacceptably bad.
In short: How do I get acceptable texture quality including alpha channel for less than 100mb of memory? "Not at all"? Because that's kinda where it leads as far as I can see.
Split your main texture in 'per character/peranimation/per resolution' files. Use .pvr.ccz because they load faster (much faster, i've measured 8x faster on some devices'). If you are using TexturePacker, you should be able to eliminate most if not all artefacts from the 'pvr' conversion.
When running your scenes, preload only the 'next' posture/stance/combat that you know will need. Experiment with asynchronous loading, with completion block, to signal when the texture is available for use. Dump your unused texture as fast as you can. This will tend to keep the memory requirement flatish at a much lower clip than if you load all animations at once.
Finally, do you really need 15 frames for all these animations ? I get away with as few as 5 frames for some of the animations (idle, asleep, others too). TexturePacker takes of symetrical animations around a certain frame, just points frames midPoint +1 ... midPoint + N to MidPoint -N ... MidPoint -1.
I'm working with DirectX9 and now I'm having problems with the texture creation.
I'm using the functions CreateTexture and LoadSurfaceFromMemory with D3DFMT_DXT1 compression, I checked the devices caps of my graphic card and D3DPTEXTURECAPS_POW2 and D3DPTEXTURECAPS_NONPOW2CONDITIONAL are off, I think this means that my graphic card have support of NON Power of Two Textures... I can use textures of any sizes.
My problem is the most of the textures are working well (and their sizes aren't power of two), but in some cases don't work, like "1228 x 453", if I resize to "1228 x 452" the texture works well.
What's going on?
Sorry for my English!.
Thanks.
The BCn texture formats are block based. The blocks pack pixels into groups of 4x4 elements, so the texture dimension must be aligned on 4 for theses formats.
Unfortunately, this is a graphics card issue. Even if the card claims support for non power of two textures, support is often buggy / limited.
You could pad the texture and use a subtexture, but the best approach is to build a texture atlas (in general you should be doing this anyway to conserve memory bandwidth)
Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL