We are building an application that works with a lot of images. We are interested in both Core Image, GPUImage, and UIImage and how it decompresses the images. We are already familiar with the fact that doing decompression of images on a background thread will help remove stutter or jitter in our UI will scrolling. However, we are not so familiar with where this decompression work is happening. We also do some cropping of images using UIImage. So here goes the questions:
Background: We are supporting devices all the way back to iPhone 4, but soon may drop the iPhone 4 in favor of the iPhone 4S as being our oldest device.
1) Is decompression of an image done on the GPU? ... Core Image? GPUImage? UIImage?
2) Can cropping of an image be done on the GPU? ... Core Image? GPUImage? UIImage?
3) Is there a difference in GPU support based on our device profile?
Basically we want to offload as much as we can to the GPU to free up the CPUs on the device. Also, we want to do any operation on the GPU that would be faster to do there instead of on the CPU.
To answer your question about decompression: Core Image, GPUImage, and UIImage all use pretty much the same means of loading an image from disk. For Core Image, you start with a UIImage, and for GPUImage you can see in the GPUImagePicture source that it currently relies on a CGImageRef usually obtained via a UIImage.
UIImage does image decompression on the CPU side, and other libraries I've looked at for improving image loading performance for GPUImage do the same. The biggest bottleneck in GPUImage for image loading is having to load the image into a UIImage, then take a trip through Core Graphics to upload it into a texture. I'm looking into more direct ways to obtain pixel data, but all of the decompression routines I've tried to date end up being slower than native UIImage loading.
Cropping of an image can be done on the GPU, and both Core Image and GPUImage let you do this. With image loading overhead, this may or may not be faster than cropping via Core Graphics, so you'd need to benchmark that yourself for the image sizes you care about. More complex image processing operations, like adjustment of color, etc. generally end up being overall wins on the GPU for most image sizes on most devices. If this image loading overhead could be reduced, GPU-side processing would win in more cases.
As far as GPU capabilities with device classes, there are significant performance differences between different iOS devices, but the capabilities tend to be mostly the same. Fragment shader processing performance can be orders of magnitude different between iPhone 4, 4S, 5, and 5S devices, where for some operations the 5S is 1000x faster than the 4. The A6 and A7 devices have a handful of extensions that the older devices lack, but those only come into play in very specific situations.
The biggest difference tends to be the maximum texture size supported by GPU hardware, with iPhone 4 and earlier devices limited to 2048x2048 textures and iPhone 4S and higher supporting 4096x4096 textures. This can limit the size of images that can be processed on the GPU using something like GPUImage.
Related
I have been working on a photo editor app for iOS using cifilter framework and GPU image framework, it takes a lot of time when applying filters on high-resolution images.
In order to decrease the processing time, I implemented the filtering as well as editing feature by reducing the original size of them. Thus, as obvious, it produces a low-resolution image as an output.
Now I am struggling to generate high-resolution image in the output. Therefore, it would be a great help for me if anyone helps me by providing ideas or probable solutions to decrease the processing time or a way to upscale image resolution to the original resolution.
In our apps, we use different resolutions for editing and exporting. For editing, the rendering needs to be fast and snappy, but for export, depending on the user-chosen export resolution, processing might take some time.
We reduce the export time for older devices by processing on a smaller resolution internally (but still much higher than preview resolution) and upsampling the image afterward.
For upsampling, you can use a joint bilinear upsampling technique, which uses the original image to scale up the smaller, filtered image with very high quality. Apple implemented this technique in the CIEdgePreserveUpsampleFilter.
I am designing a game that makes use of large backgrounds. These are illustrated backgrounds, that are currently sitting at around 4.5 MB and as backgrounds, are sitting in the scene for the entirety of the game.
First, I am not sure if this would cause memory usage to amp up, but I imagine it would, given that there are also other overlaid textures on the screen. That is my first question: can it cause memory issues?
Second, if I have a background that is 2048 x 1536 and at a 300 dpi, and compress/optimise this image, would it reduce memory usage/CPU usage? Is there documentation that relates to how best to optimise these kinds of images?
There are several techniques to do that. It depends on how you're going to use the images.
If it's a background in movement you can split it in tiles, then you render smaller images.
Depends on the format, most of the people just know PNG and JPEG, but there are other projects/formats you can use. Some of them are the smaller size but slower read/write, so is up to you how to use them. i.e.: https://github.com/onevcat/APNGKit
If in your background is not necessary the alpha channel, use JPEG over PNG then you'll save some space.
I'm using Brad Larson's GPUImage framework. However when I'm trying to apply kuwahara filter with filter radius 5.0f , I'm getting artifacts on an iPhone 4S. (works fine on higher performance devices)
Source image size was 2048x2048px.
By reading original developer's comments I understood that there's a kind of watchdog timer which fires when something takes too long to run on the GPU.
So my question is , what is the maximum possible resolution for an iPhone 4S I can apply Kuwahara filter with radius of 5.0f without getting artifacts ?
Kuwahara filter makes square artefacts and very complex.
You can use Generalised Kuwahara filter (e.g. with 8 segments).
You can manually generate shader without cycles for selected radius. For decreased number of readings from texture, you can make trick:
Generate shader for constant radius.
Pixels offset must depend on ratio of current radius and constant radius.
You get some artefacts, but they are artistic (like canvas). And Kuwahara will be faster.
There really isn't a hard limit. The tiling artifacts you are seeing are due to the OpenGL ES watchdog timer aborting the scene rendering after it takes too long. If you have a single frame that takes longer than approximately 2 seconds to render, your frame rendering will be killed in this manner.
The exact time it takes is a function of hardware capabilities, system load, shader complexity, and iOS version. In GPUImage, you pretty much only see this with the Kuwahara filter because of the ridiculously unoptimized shader I use for that. It's drawn from a publication that was doing this using desktop GPUs, and is about the worst case operation for a mobile GPU like these. Someone contributed a fixed-radius version of this which is significantly faster, but you'll need to create your own optimized version if you want to use this with large images on anything but the latest devices.
Our product contains a kind of software image decoder that essentially produces full-frame pixel data that needs to be rapidly copied the screen (we're running on iOS).
Currently we're using CGBitmapContextCreate and we access the memory buffer directly, then for each frame we call CGBitmapContextCreateImage, and then draw that bitmap to the screen. This is WAY too slow for full-screen refreshes on the iPad's retina display at a decent framerate (but it was okay for non-Retina-devices).
We've tried all kinds of OpenGL ES-based approaches, including the use of glTexImage2D and glTexSubImage2D (essentially rendering to a texture), but CPU usage is still high and we can't get more than ~30 FPS for full-screen refreshes on the iPad 3. The problem is that with 30 FPS, CPU usage is nearly at %100 just for copying the pixels to the screen, which means we don't have much to work with for our own rendering on the CPU.
We are open to using OpenGL or any iOS API that would give us maximum performance. The pixel data is formatted as a 32-bit-per-pixel RGBA data but we have some flexibility there...
Any suggestions?
So, the bad news is that you have run into a really hard problem. I have been doing quite a lot of research in this specific area and currently the only way that you can actually blit a framebuffer that is the size of the full screen at 2x is to use the h.264 decoder. There are quite a few nice tricks that can be done with OpenGL once you have image data already decoded into actual memory (take a look at GPUImage). But, the big problem is not how to move the pixels from live memory onto the screen. The real issue is how to move the pixels from the encoded form on disk into live memory. One can use file mapped memory to hold the pixels on disk, but the IO subsystem is not fast enough to be able to swap out enough pages to make it possible to stream 2x full screen size images from mapped memory. This used to work great with 1x full screen sizes, but now the 2x size screens are actually 4x the amount of memory and the hardware just cannot keep up. You could also try to store frames on disk in a more compressed format, like PNG. But, then decoding the compressed format changes the problem from IO bound to CPU bound and you are still stuck. Please have a look at my blog post opengl_write_texture_cache for the full source code and timing results I found with that approach. If you have a very specific format that you can limit the input image data to (like an 8 bit table), then you could use the GPU to blit 8 bit data as 32BPP pixels via a shader, as shown in this example xcode project opengl_color_cycle. But, my advice would be to look at how you could make use of the h.264 decoder since it is actually able to decode that much data in hardware and no other approaches are likely to give you the kind of results you are looking for.
After several years, and several different situations where I ran into this need, I've decided to implement a basic "pixel viewer" view for iOS. It supports highly optimized display of a pixel buffer in a wide variety of formats, including 32-bpp RGBA, 24-bpp RGB, and several YpCbCr formats.
It also supports all of the UIViewContentMode* for smart scaling, scale to fit/fill, etc.
The code is highly optimized (using OpenGL), and achieves excellent performance on even older iOS devices such as iPhone 5 or the original iPad Air. On those devices it achieves 60FPS on all pixel formats except for 24bpp formats, where it achieves around 30-50fps (I usually benchmark by showing a pixel buffer at the device's native resolution, so obviously an iPad has to push far more pixels than the iPhone 5).
Please check out EEPixelViewer.
CoreVideo is most likely the framework you should be looking at. With the OpenGL and CoreGraphics approaches, you're being hit hard by the cost of moving bitmap data from main memory onto GPU memory. This cost exists on desktops as well, but is especially painful on iPhones.
In this case, OpenGL won't net you much of a speed boost over CoreGraphics because the bottleneck is the texture data copy. OpenGL will get you a more efficient rendering pipeline, but the damage will have already been done by the texture copy.
So CoreVideo is the way to go. As I understand the framework, it exists to solve the very problem you're encountering.
The pbuffer or FBO can then be used as a texture map for further rendering by OpenGL ES. This is called Render to Texture or RTT. its much quicker search pbuffer or FBO in EGL
I'm working on an iPad app that displays lightmapped scenes. Loading the 20 or so 1Kx1K textures that are involved is taking a while, and when I started timing the various operations I found it was taking slightly less than 1/2 second per texture.
It turns out that loading a texture image from the filesystem is pretty fast, and that the bottleneck is in copying the UIImage to a CGContext in order to pass the image to a glTexImage2D()
I've tried two different ways of making the copy:
CGContextSetInterpolationQuality(textureCopyContext, kCGInterpolationNone);
CGContextDrawImage( textureCopyContext, CGRectMake( 0, 0, width, height ), image);
and
UIGraphicsPushContext(textureCopyContext) ;
[uiImage drawInRect:CGRectMake(0, 0, width, height)] ;
UIGraphicsPopContext() ;
and both take about 0.45 seconds. This strikes me as excessive, even for a relatively underpowered device.
I'm relatively new to iOS development, so I just want to ask whether the times I'm seeing are reasonable, or whether they can be improved.
Update: I'm aware of the PVRTC alternative, but for now I've got to stick with PNGs. However, there is an excellent summary of the pros and cons of PVRTC in this answer. The same answer also hints at why PNGs result in such long texture setup times -- "internal pixel reordering". Can anybody confirm this?
Switching texture context has traditionally been expensive, dating back to desktops (it's alot faster on modern GPUs). You could try using texture atlas, depending how big your textures are this is the most efficient approach. A texture Atlas is putting the textures together, I believe the iPad is able to load 2048 by 2048 textures, you could squash 4 textures together.
The other alternative, use pvrtc texture compression, you can reduce file size by about 25% depending on quality. The PowerVR chip stores it on device using this compression so it saves time AND bandwidth copying. This can look lossy on occasion with lower settings, but for 3d textures it is the best option, whereas 2d sprites prefer the first option.
Let me know if you need more clarification. Don't forget that png files are compressed when loaded from file system and expanded into the full-size bit buffer which is ALOT bigger.
I'd start by taking a look at how it's done in https://github.com/cocos2d/cocos2d-iphone/blob/develop/cocos2d/CCTexture2D.m. I hope there's something in there that helps. There they're doing glTexImage2D straight from the image data, no CGContext involved.