At this moment I use this scenario to load OpenGL texture from PNG:
load PNG via UIImage
get pixels data via bitmap context
repack pixels to new format (currently RGBA8 -> RGBA4, RGB8 -> RGB565, using ARM NEON instructions)
create OpenGL texture with data
(this approach is commonly used in Cocos2d engine)
It takes much time and seems to do extra work that may be done once per build. So I want to save repacked pixels data back into file and load it directly to OpenGL on second time.
I would know the practical advantages. Does anyone tried it? Is it worth to compress data via zip (as I know, current iDevices have bottleneck in file access)? Would be very thankful for real experience sharing.
Even better, if these are pre-existing images, compress them using PowerVR Texture Compression (PVRTC). PVRTC textures can be loaded directly, and are stored on the GPU in their compressed form, so they can be much smaller than the various raw pixel formats.
I provide an example of how to compress and use PVRTC textures in this sample code (the texture coordinates are a little messed up there, because I haven't corrected them yet). In that example, I just reuse Apple's PVRTexture sample class for handling this type of texture. The PVRTC textures are compressed via a script that's part of one of the build phases, so this can be automated for your various source images.
So, I have made some successful experiment:
I compress texture data by zlib (max compression ratio) and save it to file (via NSData methods). The size of file is much smaller then PNG in some cases.
As for loading time, I can't say exact timestamps because in my project there are 2 parallel threads - one is loading textures on background while another is still rendering scene. It is approximately twice faster - IMHO the main reason is that we copy image data directly to OpengGL without repacking, and input data amount smaller).
PS: Build optimization level plays very high role in loading time: about 4 seconds in debug configuration vs. 1 second in release.
Ignore any advice about PVRTC, that stuff is only useful for 3D textures that have limited color usage. It is better to just use 24 or 32 BPP textures from real images. If you would like to see a real working example of the process you describe then take a look at load-opengl-textures-with-alpha-channel-on-ios. The example shows how texture data can be compressed with 7zip (much better than zip) when attached to the app resource, but then the results are decompressed and saved to disk in an optimal format that can be directly sent to the video card without further pixel component rearranging. This example uses a POT texture, but it would not be too hard to adapt to non-POT and to use the Apple optimizations so that the texture data need not be explicitly copied into the graphics card. Those optimizations are already implemented when sending video data to CoreGraphics.
Related
I am using the expensive R32G32B32A32Float format for some quality reason. In particular, I need to retrieve the result by un-doing the pre-multiply on the buffer. Now I want to do some optimization. Thus I wonder if there would be problem / hidden trap if I mix use textures in different formats. e.g. Use another lightweight texture format for non-transparent images.
Note: I am not making games but some image processing stuff. While speed is not that much of a concern because of hardware acceleration which is already faster than doing that on CPU, GPU memory is quite limited. Thus I am here asking.
I have an interesting problem I need to research related to very low level video streaming.
Has anyone had any experience converting a raw stream of bytes(separated into per pixel information, but not a standard format of video) into a low resolution video stream? I believe that I can map the data into RGB value per pixel bytes, as the color values that correspond to the value in the raw data will be determined by us. I'm not sure where to go from there, or what the RGB format needs to be per pixel.
I've looked at FFMPeg but it's documentation is massive and I don't know where to start.
Specific questions I have include, is it possible to create CVPixelBuffer with that pixel data? If I were to do that, what sort of format for the per pixel data would I need to convert to?
Also, should I be looking deeper into OpenGL, and if so where would the best place to look for information on this topic?
What about CGBitmapContextCreate? For example, if I went I went with something like this, what would a typical pixel byte need to look like? Would this be fast enough to keep the frame rate above 20fps?
EDIT:
I think with the excellent help of you two, and some more research on my own, I've put together a plan for how to construct the raw RGBA data, then construct a CGImage from that data, in turn create a CVPixelBuffer from that CGImage from here CVPixelBuffer from CGImage.
However, to then play that live as the data comes in, I'm not sure what kind of FPS I would be looking at. Do I paint them to a CALayer, or is there some similar class to AVAssetWriter that I could use to play it as I append CVPixelBuffers. The experience that I have is using AVAssetWriter to export constructed CoreAnimation hierarchies to video, so the videos are always constructed before they begin playing, and not displayed as live video.
I've done this before, and I know that you found my GPUImage project a little while ago. As I replied on the issues there, the GPUImageRawDataInput is what you want for this, because it does a fast upload of RGBA, BGRA, or RGB data directly into an OpenGL ES texture. From there, the frame data can be filtered, displayed to the screen, or recorded into a movie file.
Your proposed path of going through a CGImage to a CVPixelBuffer is not going to yield very good performance, based on my personal experience. There's too much overhead when passing through Core Graphics for realtime video. You want to go directly to OpenGL ES for the fastest display speed here.
I might even be able to improve my code to make it faster than it is right now. I currently use glTexImage2D() to update texture data from local bytes, but it would probably be even faster to use the texture caches introduced in iOS 5.0 to speed up refreshing data within a texture that maintains its size. There's some overhead in setting up the caches that makes them a little slower for one-off uploads, but rapidly updating data should be faster with them.
My 2 cents:
I made an opengl game which lets the user record a 3d scene. Playback was done via replaying the scene (instead of playing a video because realtime encoding did not yield a comfortable FPS.
There is a technique which could help out, unfortunately I didn't have time to implement it:
http://allmybrain.com/2011/12/08/rendering-to-a-texture-with-ios-5-texture-cache-api/
This technique should cut down time on getting pixels back from openGL. You might get an acceptable video encoding rate.
I'm working on an XNA project and running into issues involving memory usage. I was curious if using indexed color PNG 8s vs PNG 24s or PNG 32s for certain textures would free up any memory. It definitely makes the app smaller, but I'm curious if when XNA's content manager loads them they somehow become uncompressed or act as loss-less images.
I answered a similar question just recently.
If you want to save space when distributing your game, you could distribute your PNG files (you can also use jpeg) and load them with Texture2D.FromStream. Note that you'll have to handle things like premultiplied alpha yourself.
If you want to save space on the GPU (and this also improves texture-fetch because the amount of data transferred is smaller) you can use a smaller texture format. The list of supported texture formats is available in this table.
While there is no support for palletised images (like PNG8) or 8-bit colour images at all, there are several 16-bit formats (Bgr565, Bgra5551, Bgra4444). DXT compressed formats are also available. They compress at 4:1 ratio, except opaque DXT1, which is 6:1.
You can use both techniques at once.
Someone just pointed to me to this link on Facebook, immediately after I posted this question.
http://msdn.microsoft.com/en-us/library/windowsphone/develop/hh855082%28v=vs.92%29.aspx
Apparently, in XNA, PNGs of any type become uncompressed in memory and I should look at using DXTs.
Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL
For iPhone game development, I switched from PNG format to PVRTC format for the sake of performance. But PVRTC compression is creating files that are much bigger than the PNG files.. So a PNG of 140 KB (1024x1024) gets bloated to 512 KB or more in the PVRTC format.. I read somewhere that a PNG file of 50KB got compressed to some 10KB and all, in my case, its the other way around..
Any reason why it happens this way and how I can avoid this.. If PVRTC compression is blindly doing 4bpp conversion (1024x1024x0.5) irrespective of the transparencies in the PNG, then whats the compression we are achieving here..
I have 100s of these 1024x1024 images in my game as there are numerous characters each doing some complex animations.. so in this rate of 512KB per image, my app would get more than 50MB.. which is unacceptable for my customer.. ( with PNG, I could have got my app to 10MB)..
In general, uncompressed image data is either 24bpp (RGB) or 32bpp (RGBA) flatrate. PVRTC is 4bpp (or 2bpp) flatrate so there is a compression of 6 or 8 (12 or 16) times compared to this.
A requirement for graphics hardware to use textures natively is that the format of the texture must be random accessible for the hardware. PVRTC is this kind of format, PNG is not and this is why PNG can achieve greater compression ratios. PVRTC is a runtime, deployment format; PNG is a storage format.
PVRTC compression is carried out on 4x4 blocks of pixels at a time and at a flat bit rate so it is easy to calculate where in memory to retrieve the data required to derive a particular texel's value from and there is only one access to memory required. There is dedicated circuitry in the graphics core which will decode this 4x4 block and give the texel value to your shader/texture combiner etc.
PNG compression does not work at a flat bitrate and is more complicated to retrieve specific values from; memory needs to be accessed from multiple locations in order to retrieve a single colour value and far more memory and processing would be required every single time a texture read occurs. So it's not suitable for use as a native texture format and this is why your textures must be decompressed before the graphics hardware will use them. This increases bandwidth use when compared to PVRTC, which requires no decompression for use.
So for offline storage (the size of your application on disk), PNG is smaller than PVRTC which is smaller than completely uncompressed. For runtime memory footprint and performance, PVRTC is smaller and faster than PNG which, because it must be decompressed, is just as large and slow as uncompressed textures. You might gain some advantage with PNG at initialisation for disk access, but then you'd lose time for decompression.
If you want to reduce the storage footprint of PVRTC you could try zip-style compression on the texture files and expand these when you load from disk.
PVRTC (PowerVR Texture Compression) is a texture compression format. On devices using PowerVR e.g. most higher end mobile phones including the iPhone and other ARM-based gadgets like the iPod it is very fast to draw since drawing it is hardware accelerated. It also uses much less memory since images are represented in their compressed form and decoded each draw, whereas a PNG needs to be decompressed before being drawn.
PNG is lossless compression.
PVRTC is lossy compression meaning it approximates the image. It has a completely different design criteria.
PVRTC will 'compress' (by approximating) any type of artwork, giving a fixed bits per texel, including photographic images.
PNG does not approximate the image, so if the image contains little redundancy it will hardly compress at all. On the other hand, a uniform image e.g. an illustration will compress best with PNG.
Its apples and oranges.
Place more than one frame tiled onto a single image and blit the subrectangles of the texture. This will dramatically reduce your memory consumption.
If you images are, say, 64x64, then you can place 256 of them on a 1024x1024 texture in a 16x16 arrangement.
With a little effort, images do not need to be all the same size, just so long as you keep track in the code of the rectangle in the texture that each image is at.
This is how iPhone game developers do it.
I agree with Will. There is no point in the question. I read the question 3 times, but I still don't know what Sankar want to know. It's just a complain, no question.
The only thing I can advice, don't use PVRTC if you mind to use it. It offers performance gain and saves VRAM, but it won't help you in this case. Because what you want is just reducing game volume, not a consideration about trade-off between performance and quality.