Is it ok to mix use different texture formats at the same time? - directx

I am using the expensive R32G32B32A32Float format for some quality reason. In particular, I need to retrieve the result by un-doing the pre-multiply on the buffer. Now I want to do some optimization. Thus I wonder if there would be problem / hidden trap if I mix use textures in different formats. e.g. Use another lightweight texture format for non-transparent images.
Note: I am not making games but some image processing stuff. While speed is not that much of a concern because of hardware acceleration which is already faster than doing that on CPU, GPU memory is quite limited. Thus I am here asking.

Related

Convoluting a large filter in GPGPU

I wish to apply a certain 2D filter to 2D images, however, the filter size is huge. Image dimensions are about 2000x2000 and the filter size is about 500*500.
No, I cannot do this in frequency domain so FFT is no go. I'm aware of normal GPU convolution and the use of shared memory for coalescing memory access, however shared memory doesn't seem feasible since the space needed by the filter is large and would therefore need to be divided, this might even prove to be very complex to write.
Any ideas?
I think you can easily manage doing filtering for such sized images. You can transfer hundreds of megabytes to the videomemory. Such size is going to be working well.
You can use byte matrices to transfer the image data then you can use your filter to operate on it.

High-performance copying of RGB pixel data to the screen in iOS

Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL

Performace loss for using non-power of two textures

Is there any performance loss for using non-power-of-two textures under iOS? I have not noticed any my in quick benchmarks. I can save quite a bit of active memory by dumping them all together since there is a lot of wasted padding (despite texture packing). I don't care about the older hardware that can't use them.
This can vary widely depending on the circumstances and your particular device. On iOS, the loss is smaller if you use NEAREST filtering rather than LINEAR, but it isn't huge to begin with (think 5-10%).

OpenGL: Texture size and video memory

I'm making a Worms-style bitmap destructible terrain game using OpenGL. I'd like to know where the limitiations in terms of video memory are for the size of the worlds.
Currently, I use blocks of 512*512 RGBA textures for the terrain.
How much memory, very roughly, can I expect such a 512*512 RGBA texture to take up?
Is there any internal, automatic compression going on?
How much video memory can I expect most user's computers to have free?
How much memory, very roughly, can I expect such a 512*512 RGBA texture to take up?
Not enough information. You should always use sized OpenGL image formats (GL_RGBA8, GL_RGBA16).
GL_RGBA8 takes up 32-bits per pixel, which is 4 bytes. Therefore, 512*512*4 = 1MB.
Is there any internal, automatic compression going on?
No.
How much video memory can I expect most user's computers to have free?
How much are you using currently?
OpenGL will page image data in and out according to the available space. If you run out of GPU memory, OpenGL will happily allocate system memory and upload the images as needed.
But to be honest, your little Worms game isn't going to actually cost anything in terms of memory size. Maybe 64MB when you're done, tops. It's nothing you need to be concerned about.
I would not worry about that very much. Even with 8192*2048 world (4 screens wide and 2 screens tall, which is very big for Worms-style game) you would require only 8*2*4=64Mb (add mipmaps, other textures, framebuffer) you should fit into 128MB bounds. As far as I know even older GPUs have that kind of memory (we don't speak about GeForce4 cards, right?).
Older GPUs may have limitation on how big each texture could be, but since you already split your world into 512x512 chunks it won't be a problem.
If video memory becomes an issue you could allow users to use half-sized textures (i.e. downsample the world to 4096*1024 and 256x256 chinks) and fetch new / discard unused regions on demand.
With 32-bpp (4 bytes) you get 4*512*512 = 1 MB
See this regarding texture compression: http://www.oldunreal.com/editing/s3tc/ARB_texture_compression.pdf
Again, this depends on your engine, but if I were you I would do this:
Since your terrain texture will probably be reusing some mosaic-like textures, and you need to know whether a pixel is present, or destroyed, then given you are using mosaic textures no larger than 256x256 you could definitely get away with an GL_RG16 internal format (where each component would be a texture coordinate that you would need to map from [0, 255] -> [0.0, 1.0] and you would reserve some special value to indicate that the terrain is destroyed) for your terrain texture, making every 512x512 block take up 0.5MB.
Although it's temping to add an extra byte to indicate terrain presence, but a 3 byte format wouldn't cache too well

Speed of ComputeShader vs. PixelShader

I've got a question regarding ComputeShader compared to PixelShader.
I want to do some processing on a buffer, and this is possible both with a pixel shader and a compute shader, and now I wonder if there is any advantage in either over the other one, specifically when it comes to speed. I've had issues with either getting to use just 8 bit values, but I should be able to work-around that.
Every data point in the output will be calculated from using in total 8 data points surrounding it (MxN matrix), so I'd think this would be perfect for a pixel shader, since the different outputs don't influence each other at all.
But I was unable to find any benchmarkings to compare the shaders, and now I wonder which one I should aim for. Only target is the speed.
From what i understand, shaders are shaders in the sense that they are just programs run by alot of threads on data. Therefore, in general there should not be any diffrence in terms of computing power/speed doing calculations in the pixel shader as opposed to the compute shader. However..
To do calculations on the pixelshader you have to massage your data so that it looks like image data, this means you have to draw a quad first of all, but also that your output must have the 'shape' of a pixel (float4 basically). This data must then be interpreted by you app into something useful
if you're using the computeshader you can completly control the number of threads to use where as for pixel shaders they have to be valid resolutions. Also you can input and output data in any format you like and take advantage of accelerated conversion using UAVs (i think)
i'd recommend using computeshaders since they are ment for doing general purpose computation and are alot easier to work with. Your over all application will probably be faster too, even if the actual shader computation time is about the same, just because you can avoid some of the hoops you have to jump through just through to get pixel shaders to do what you want.

Resources