Can I use gdi+ image as directx9 texture without copy overhead

Can I use gdi+ image as directx9 texture without copy overhead - directx

I have a stream of jpeg pictures(25fps, aboud 1000x700) and i want to render it the screen with as less CPU usage as possible.
By now I found out a fast way to decompress jpeg images - it is a gdi+ api. On my machine it take about 50ns per frame. I don't know how do they manage to do it but it's true, libjpeg8 for example is a much much slower as remembered it.
I tried to use gdi+ to output a stretched picture but it uses to much CPU for such a simple job. So I switched to directx9. It's good for me, but I can't find a good way to convert a gdi+ picture to directx9 texture.
There are a lot of ways to do it and all of them slow and have high CPU usage.
One of them:
get surface from texture
get hdc from surface
create gdi+ graphics from hdc
draw without stretching (DrawI of flat API).
Another way:
lock bits of image
lock bits of surface
copy bits
By the way D3DXCreateTextureFromFileInMemory is slow.
The question is how can I use an image as texture without copy overhead? Or what is the best way to convert image to texture?

Related

FMX TBitmap fastest saving of transparent image

I am scaling icons (using a software scaler algorithm) and want to cache the newly resized icons by saving them to disk while preserving transparency (the alpha channel).
Calling "Bitmap.SaveToDisk('filename.bmp')" followed by "Bitmap.LoadFromDisk('filename.bmp')" strips the alpha channel.
Calling "Bitmap.SaveToDisk('filename.png')" followed by "Bitmap.LoadFromDisk('filename.png')" maintains the alpha channel but has a much higher CPU overhead due to the encoding/decoding required by the PNG format.
I'm aware I can go behind the scenes, get the scanlines and simply dump the scanline data to a file, but I was wondering if there was a more straightforward method with lower CPU utilization?
Edit #1:
I am still interested in an answer, but in the meanwhile I wrote a work-around unit that saves/loads raw ARGB data from a firemonkey TBitmap:
https://github.com/bLightZP/Save-and-Load-FMX-ARGB-Bitmap

High-performance copying of RGB pixel data to the screen in iOS

Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?

So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.

After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.

CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.

The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL

(iPhone, OpenGL) direct texture data storage in files

At this moment I use this scenario to load OpenGL texture from PNG:
load PNG via UIImage
get pixels data via bitmap context
repack pixels to new format (currently RGBA8 -> RGBA4, RGB8 -> RGB565, using ARM NEON instructions)
create OpenGL texture with data
(this approach is commonly used in Cocos2d engine)
It takes much time and seems to do extra work that may be done once per build. So I want to save repacked pixels data back into file and load it directly to OpenGL on second time.
I would know the practical advantages. Does anyone tried it? Is it worth to compress data via zip (as I know, current iDevices have bottleneck in file access)? Would be very thankful for real experience sharing.

Even better, if these are pre-existing images, compress them using PowerVR Texture Compression (PVRTC). PVRTC textures can be loaded directly, and are stored on the GPU in their compressed form, so they can be much smaller than the various raw pixel formats.
I provide an example of how to compress and use PVRTC textures in this sample code (the texture coordinates are a little messed up there, because I haven't corrected them yet). In that example, I just reuse Apple's PVRTexture sample class for handling this type of texture. The PVRTC textures are compressed via a script that's part of one of the build phases, so this can be automated for your various source images.

So, I have made some successful experiment:
I compress texture data by zlib (max compression ratio) and save it to file (via NSData methods). The size of file is much smaller then PNG in some cases.
As for loading time, I can't say exact timestamps because in my project there are 2 parallel threads - one is loading textures on background while another is still rendering scene. It is approximately twice faster - IMHO the main reason is that we copy image data directly to OpengGL without repacking, and input data amount smaller).
PS: Build optimization level plays very high role in loading time: about 4 seconds in debug configuration vs. 1 second in release.

Ignore any advice about PVRTC, that stuff is only useful for 3D textures that have limited color usage. It is better to just use 24 or 32 BPP textures from real images. If you would like to see a real working example of the process you describe then take a look at load-opengl-textures-with-alpha-channel-on-ios. The example shows how texture data can be compressed with 7zip (much better than zip) when attached to the app resource, but then the results are decompressed and saved to disk in an optimal format that can be directly sent to the video card without further pixel component rearranging. This example uses a POT texture, but it would not be too hard to adapt to non-POT and to use the Apple optimizations so that the texture data need not be explicitly copied into the graphics card. Those optimizations are already implemented when sending video data to CoreGraphics.

OpenGL: Texture size and video memory

I'm making a Worms-style bitmap destructible terrain game using OpenGL. I'd like to know where the limitiations in terms of video memory are for the size of the worlds.
Currently, I use blocks of 512*512 RGBA textures for the terrain.
How much memory, very roughly, can I expect such a 512*512 RGBA texture to take up?
Is there any internal, automatic compression going on?
How much video memory can I expect most user's computers to have free?

How much memory, very roughly, can I expect such a 512*512 RGBA texture to take up?
Not enough information. You should always use sized OpenGL image formats (GL_RGBA8, GL_RGBA16).
GL_RGBA8 takes up 32-bits per pixel, which is 4 bytes. Therefore, 512*512*4 = 1MB.
Is there any internal, automatic compression going on?
No.
How much video memory can I expect most user's computers to have free?
How much are you using currently?
OpenGL will page image data in and out according to the available space. If you run out of GPU memory, OpenGL will happily allocate system memory and upload the images as needed.
But to be honest, your little Worms game isn't going to actually cost anything in terms of memory size. Maybe 64MB when you're done, tops. It's nothing you need to be concerned about.

I would not worry about that very much. Even with 8192*2048 world (4 screens wide and 2 screens tall, which is very big for Worms-style game) you would require only 8*2*4=64Mb (add mipmaps, other textures, framebuffer) you should fit into 128MB bounds. As far as I know even older GPUs have that kind of memory (we don't speak about GeForce4 cards, right?).
Older GPUs may have limitation on how big each texture could be, but since you already split your world into 512x512 chunks it won't be a problem.
If video memory becomes an issue you could allow users to use half-sized textures (i.e. downsample the world to 4096*1024 and 256x256 chinks) and fetch new / discard unused regions on demand.

With 32-bpp (4 bytes) you get 4*512*512 = 1 MB
See this regarding texture compression: http://www.oldunreal.com/editing/s3tc/ARB_texture_compression.pdf

Again, this depends on your engine, but if I were you I would do this:
Since your terrain texture will probably be reusing some mosaic-like textures, and you need to know whether a pixel is present, or destroyed, then given you are using mosaic textures no larger than 256x256 you could definitely get away with an GL_RG16 internal format (where each component would be a texture coordinate that you would need to map from [0, 255] -> [0.0, 1.0] and you would reserve some special value to indicate that the terrain is destroyed) for your terrain texture, making every 512x512 block take up 0.5MB.
Although it's temping to add an extra byte to indicate terrain presence, but a 3 byte format wouldn't cache too well

Example code for Resizing an image using DirectX

I know it is possible, and a lot faster than using GDI+. However I haven't found any good example of using DirectX to resize an image and save it to disk. I have implemented this over and over in GDI+, thats not difficult. However GDI+ does not use any hardware acceleration, and I was hoping to get better performance by tapping into the graphics card.

You can load the image as a texture, texture-map it onto a quad and draw that quad in any size on the screen. That will do the scaling. Afterwards you can grab the pixel-data from the screen, store it in a file or process it further.
It's easy. The basic texturing DirectX examples that come with the SDK can be adjusted to do just this.
However, it is slow. Not the rendering itself, but the transfer of pixel data from the screen to a memory buffer.
Imho it would be much simpler and faster to just write a little code that resizes an image using bilinear scaling from one buffer to another.

Do you really need to use DirectX? GDI+ does the job well for resizing images. In DirectX, you don't really need to resize images, as most likely you'll be displaying your images as textures. Since textures can only applies on 3d object (triangles/polygons/mesh), the size of the 3d object and view port determines the actual image size displayed. If you need to scale your texture within the 3d object, just play the texture coordinate or matrix.
To manipute the texture, you can use alpha blending, masking and all sort of texture manipulation technique, if that's what you're looking for. To manipulate individual pixel like GDI+, I still think GDI+ is the way to do. DirectX was never mend to do image manipulation.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart