I'm working on drawing individual pixels to a UIView to create fractal images. My problem is my rendering speed. I am currently running this loop 260,000 times, but would like to render even more pixels. As it is, it takes about 5 seconds to run on my iPad Mini.
I was using a UIBezierPath before, but that was even a bit slower (about 7 seconds). I've been looking in NSBitMap stuff, but I'm not exactly sure if that would speed it up or how to implement it in the first place.
I was also thinking about trying to store the pixels from my loop into an array, and then draw them all together after my loop. Again though, I am not quite sure what the best process would be to store and then retrieve pixels into and from an array.
Any help on speeding up this process would be great.
for (int i = 0; i < 260000; i++) {
float RN = drand48();
for (int i = 1; i < numBuckets; i++) {
if (RN < bucket[i]) {
col = i;
CGContextSetFillColor(context, CGColorGetComponents([UIColor colorWithRed:(colorSelector[i][0]) green:(colorSelector[i][1]) blue:(colorSelector[i][2]) alpha:(1)].CGColor));
break;
}
}
xT = myTextFieldArray[1][1][col]*x1 + myTextFieldArray[1][2][col]*y1 + myTextFieldArray[1][5][col];
yT = myTextFieldArray[1][3][col]*x1 + myTextFieldArray[1][4][col]*y1 + myTextFieldArray[1][6][col];
x1 = xT;
y1 = yT;
if (i > 10000) {
CGContextFillRect(context, CGRectMake(xOrigin+(xT-xMin)*sizeScalar,yOrigin-(yT-yMin)*sizeScalar,.5,.5));
}
else if (i < 10000) {
if (x1 < xMin) {
xMin = x1;
}
else if (x1 > xMax) {
xMax = x1;
}
if (y1 < yMin) {
yMin = y1;
}
else if (y1 > yMax) {
yMax = y1;
}
}
else if (i == 10000) {
if (xMax - xMin > yMax - yMin) {
sizeScalar = 960/(xMax - xMin);
yOrigin=1000-(1000-sizeScalar*(yMax-yMin))/2;
}
else {
sizeScalar = 960/(yMax - yMin);
xOrigin=(1000-sizeScalar*(xMax-xMin))/2;
}
}
}
Edit
I created a multidimensional array to store UIColors into, so I could use a bitmap to draw my image. It is significantly faster, but my colors are not working appropriately now.
Here is where I am storing my UIColors into the array:
int xPixel = xOrigin+(xT-xMin)*sizeScalar;
int yPixel = yOrigin-(yT-yMin)*sizeScalar;
pixelArray[1000-yPixel][xPixel] = customColors[col];
Here is my drawing stuff:
CGDataProviderRef provider = CGDataProviderCreateWithData(nil, pixelArray, 1000000, nil);
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
CGImageRef image = CGImageCreate(1000,
1000,
8,
32,
4000,
colorSpace,
kCGBitmapByteOrder32Big | kCGImageAlphaNoneSkipLast,
provider,
nil, //No decode
NO, //No interpolation
kCGRenderingIntentDefault); // Default rendering
CGContextDrawImage(context, self.bounds, image);
Not only are the colors not what they are supposed to be, but every time I render my image, to colors are completely different from the previous time. I have been testing different stuff with the colors, but I still have no idea why the colors are wrong, and I'm even more confused how they keep changing.
Per-pixel drawing — with complicated calculations for each pixel, like fractal rendering — is one of the hardest things you can ask a computer to do. Each of the other answers here touches on one aspect of its difficulty, but that's not quite all. (Luckily, this kind of rendering is also something that modern hardware is optimized for, if you know what to ask it for. I'll get to that.)
Both #jcaron and #JustinMeiners note that vector drawing operations (even rect fill) in CoreGraphics take a penalty for CPU-based rasterization. Manipulating a buffer of bitmap data would be faster, but not a lot faster.
Getting that buffer onto the screen also takes time, especially if you're having to go through a process of creating bitmap image buffers and then drawing them in a CG context — that's doing a lot of sequential drawing work on the CPU and a lot of memory-bandwidth work to copy that buffer around. So #JustinMeiners is right that direct access to GPU texture memory would be a big help.
However, if you're still filling your buffer in CPU code, you're still hampered by two costs (at best, worse if you do it naively):
sequential work to render each pixel
memory transfer cost from texture memory to frame buffer when rendering
#JustinMeiners' answer is good for his use case — image sequences are pre-rendered, so he knows exactly what each pixel is going to be and he just has to schlep it into texture memory. But your use case requires a lot of per-pixel calculations.
Luckily, per-pixel calculations are what GPUs are designed for! Welcome to the world of pixel shaders. For each pixel on the screen, you can be running an independent calculation to determine the relationship of that point to your fractal set and thus what color to draw it in. The can be running that calculation in parallel for many pixels at once, and its output is going straight to the screen, so there's no memory overhead to dump a bitmap into the framebuffer.
One easy way to work with pixel shaders on iOS is SpriteKit — it can handle most of the necessary OpenGL/Metal setup for you, so all you have to write is the per-pixel algorithm in GLSL (actually, a subset of GLSL that gets automatically translated to Metal shader language on Metal-supported devices). Here's a good tutorial on that, and here's another on OpenGL ES pixel shaders for iOS in general.
If you really want to change many different pixels individually, your best option is probably to allocate a chunk of memory (of size width * height * bytes per pixel), make the changes directly in memory, and then convert the whole thing into a bitmap at once with CGBitmapContextCreateWithData
There may be even faster methods that this (see Justin's answer).
If you want to maximize render speed I would recommend bitmap rendering. Vector rasterization is much slower and CGContext drawing isn't really intended for high performance realtime rendering.
I faced a similar technical challenge and found CVOpenGLESTextureCacheRef to be the fastest. The texture cache allows you to upload a bitmap directly into graphics memory for fast rendering. Rendering utilizes OpenGL, but because its just 2D fullscreen image - you really don't need to learn much about OpenGL to use it.
You can see see an example I wrote of using the texture cache here: https://github.com/justinmeiners/image-sequence-streaming
My original question related to this is here:
How to directly update pixels - with CGImage and direct CGDataProvider
My project renders bitmaps from files so it is a little bit different but you could look at ISSequenceView.m for an example of how to use the texture cache and setup OpenGL for this kind of rendering.
Your rendering procedure could like something like:
1. Draw to buffer (raw bytes)
2. Lock texture cache
3. Copy buffer to texture cache
4. Unlock texture cache.
5. Draw fullscreen quad with texture
Related
I have two questions:
First, is there any more direct, sane way to go from a texture atlas image to a texture array in WebGL than what I'm doing below? I've not tried this, but doing it entirely in WebGL seems possible, though four-times the work and I still have to make two round trips to the GPU to do it.
And am I right that because buffer data for texImage3D() must come from PIXEL_UNPACK_BUFFER, this data must come directly from the CPU side? I.e. There is no way to copy from one block of GPU memory to a PIXEL_UNPACK_BUFFER without copying it to the CPU first. I'm pretty sure the answer to this is a hard "no".
In case my questions themselves are stupid (and they may be), my ultimate goal here is simply to convert a texture atlas PNG to a texture array. From what I've tried, the fastest way to do this by far is via PIXEL_UNPACK_BUFFER, rather than extracting each sub-image and sending them in one at a time, which for large atlases is extremely slow.
This is basically how I'm currently getting my pixel data.
const imageToBinary = async (image: HTMLImageElement) => {
const canvas = document.createElement('canvas');
canvas.width = image.width;
canvas.height = image.height;
const context = canvas.getContext('2d');
context.drawImage(image, 0, 0);
const imageData = context.getImageData(0, 0, image.width, image.height);
return imageData.data;
};
So, I'm creating an HTMLImageElement object, which contains the uncompressed pixel data I want, but has no methods to get at it directly. Then I'm creating a 2D context version containing the same pixel data a second time. Then I'm repopulating the GPU with the same pixel data a third time. Seems bonkers to me, but I don't see a way around it.
I'm working on an iOS app that requires drawing Bézier curves in real time in response to the user's input. At first, I decided to try using CoreGraphics, which has a fantastic vector drawing API. However, I quickly discovered that performance was painfully, excruciatingly slow, to the point where the framerate started dropping severely with just ONE curve on my retina iPad. (Admittedly, this was a quick test with inefficient code. For example, the curve was getting redrawn every frame. But surely today's computers are fast enough to handle drawing a simple curve every 1/60th of a second, right?!)
After this experiment, I switched to OpenGL and the MonkVG library, and I couldn't be happier. I can now render HUNDREDS of curves simultaneously without any framerate drop, with only a minimal impact on fidelity (for my use case).
Is it possible that I misused CoreGraphics somehow (to the point where it was several orders of magnitude slower than the OpenGL solution), or is performance really that terrible? My hunch is that the problem lies with CoreGraphics, based on the number of StackOverflow/forum questions and answers regarding CG performance. (I've seen several people state that CG isn't meant to go in a run loop, and that it should only be used for infrequent rendering.) Why is this the case, technically speaking?
If CoreGraphics really is that slow, how on earth does Safari work so smoothly? I was under the impression that Safari isn't hardware-accelerated, and yet it has to display hundreds (if not thousands) of vector characters simultaneously without dropping any frames.
More generally, how do applications with heavy vector use (browsers, Illustrator, etc.) stay so fast without hardware acceleration? (As I understand it, many browsers and graphics suites now come with a hardware acceleration option, but it's often not turned on by default.)
UPDATE:
I have written a quick test app to more accurately measure performance. Below is the code for my custom CALayer subclass.
With NUM_PATHS set to 5 and NUM_POINTS set to 15 (5 curve segments per path), the code runs at 20fps in non-retina mode and 6fps in retina mode on my iPad 3. The profiler lists CGContextDrawPath as having 96% of the CPU time. Yes — obviously, I can optimize by limiting my redraw rect, but what if I really, truly needed full-screen vector animation at 60fps?
OpenGL eats this test for breakfast. How is it possible for vector drawing to be so incredibly slow?
#import "CGTLayer.h"
#implementation CGTLayer
- (id) init
{
self = [super init];
if (self)
{
self.backgroundColor = [[UIColor grayColor] CGColor];
displayLink = [[CADisplayLink displayLinkWithTarget:self selector:#selector(updatePoints:)] retain];
[displayLink addToRunLoop:[NSRunLoop mainRunLoop] forMode:NSRunLoopCommonModes];
initialized = false;
previousTime = 0;
frameTimer = 0;
}
return self;
}
- (void) updatePoints:(CADisplayLink*)displayLink
{
for (int i = 0; i < NUM_PATHS; i++)
{
for (int j = 0; j < NUM_POINTS; j++)
{
points[i][j] = CGPointMake(arc4random()%768, arc4random()%1024);
}
}
for (int i = 0; i < NUM_PATHS; i++)
{
if (initialized)
{
CGPathRelease(paths[i]);
}
paths[i] = CGPathCreateMutable();
CGPathMoveToPoint(paths[i], &CGAffineTransformIdentity, points[i][0].x, points[i][0].y);
for (int j = 0; j < NUM_POINTS; j += 3)
{
CGPathAddCurveToPoint(paths[i], &CGAffineTransformIdentity, points[i][j].x, points[i][j].y, points[i][j+1].x, points[i][j+1].y, points[i][j+2].x, points[i][j+2].y);
}
}
[self setNeedsDisplay];
initialized = YES;
double time = CACurrentMediaTime();
if (frameTimer % 30 == 0)
{
NSLog(#"FPS: %f\n", 1.0f/(time-previousTime));
}
previousTime = time;
frameTimer += 1;
}
- (void)drawInContext:(CGContextRef)ctx
{
// self.contentsScale = [[UIScreen mainScreen] scale];
if (initialized)
{
CGContextSetLineWidth(ctx, 10);
for (int i = 0; i < NUM_PATHS; i++)
{
UIColor* randomColor = [UIColor colorWithRed:(arc4random()%RAND_MAX/((float)RAND_MAX)) green:(arc4random()%RAND_MAX/((float)RAND_MAX)) blue:(arc4random()%RAND_MAX/((float)RAND_MAX)) alpha:1];
CGContextSetStrokeColorWithColor(ctx, randomColor.CGColor);
CGContextAddPath(ctx, paths[i]);
CGContextStrokePath(ctx);
}
}
}
#end
You really should not compare Core Graphics drawing with OpenGL, you are comparing completely different features for very different purposes.
In terms of image quality, Core Graphics and Quartz are going to be far superior than OpenGL with less effort. The Core Graphics framework is designed for optimal appearance , naturally antialiased lines and curves and a polish associated with Apple UIs. But this image quality comes at a price: rendering speed.
OpenGL on the other hand is designed with speed as a priority. High performance, fast drawing is hard to beat with OpenGL. But this speed comes at a cost: It is much harder to get smooth and polished graphics with OpenGL. There are many different strategies to do something as "simple" as antialiasing in OpenGL, something which is more easily handled by Quartz/Core Graphics.
First, see Why is UIBezierPath faster than Core Graphics path? and make sure you're configuring your path optimally. By default, CGContext adds a lot of "pretty" options to paths that can add a lot of overhead. If you turn these off, you will likely find dramatic speed improvements.
The next problem I've found with Core Graphics Bézier curves is when you have many components in a single curve (I was seeing problems when I went over about 3000-5000 elements). I found very surprising amounts of time spent in CGPathAdd.... Reducing the number of elements in your path can be a major win. From my talks with the Core Graphics team last year, this may have been a bug in Core Graphics and may have been fixed. I haven't re-tested.
EDIT: I'm seeing 18-20FPS in Retina on an iPad 3 by making the following changes:
Move the CGContextStrokePath() outside the loop. You shouldn't stroke every path. You should stroke once at the end. This takes my test from ~8FPS to ~12FPS.
Turn off anti-aliasing (which is probably turned off by default in your OpenGL tests):
CGContextSetShouldAntialias(ctx, false);
That gets me to 18-20FPS (Retina) and up to around 40FPS non-Retina.
I don't know what you're seeing in OpenGL. Remember that Core Graphics is designed to make things beautiful; OpenGL is designed to make things fast. Core Graphics relies on OpenGL; so I would always expect well-written OpenGL code to be faster.
Disclaimer: I'm the author of MonkVG.
The biggest reason that MonkVG is so much faster then CoreGraphics is actually not so much that it is implemented with OpenGL ES as a render backing but because it "cheats" by tessellating the contours into polygons before any rendering is done. The contour tessellation is actually painfully slow, and if you were to dynamically generate contours you would see a big slowdown. The great benefit of an OpenGL backing (verse CoreGraphics using direct bitmap rendering) is that any transform such a translation, rotation or scaling does not force a complete re-tessellation of the contours -- it's essentially for "free".
Your slowdown is because of this line of code:
[self setNeedsDisplay];
You need to change this to:
[self setNeedsDisplayInRect:changedRect];
It's up to you to calculate what rectangle has changed every frame, but if you do this properly, you will likely see over an order of magnitude performance improvement with no other changes.
I'm currently new to OpenGL ES and am self teaching myself how to program iOS games. I'm currently playing with a project that I would like to put a HUD over with some custom text. I don't want to do this using a UILabel and currently have no idea how to use Quads to cut up a png or such full of text and attach them to normal text to be used for display. I would like the end result to be providing a simple string to a command/method and the output to be displayed using the textures/bitmap for the quad. Say glPrint("Hello World");. Would anyone be able to guide me in the proper direction? There doesn't seem to be a single good tutorial on how to do this for OpenGL ES 2.0 (just OpenGL). I also want to try to avoid using 3rd party APIs. I really need/want to understand how to tackle this.
When I was getting started with OpenGL ES for my current 2D project I used Ray's tutorial, which helped me get a handle on rendering textured 2D quads. In conjunction with his 3D OpenGL ES tutorial, you might be able to piece together what you want to do. Note that you probably wouldn't render every single quad separately like in the tutorial, as that is very inefficient. Instead, you would gather all of the vertices of the characters into two big arrays/vertex buffers and batch render the characters. The basic flow for rendering each frame would probably look like this: pass a normal perspective projection matrix for 3D rendering, get your vertex information for your 3D scene to your shaders somehow, render the 3D scene. This part you've already done. For the text, immediately after, pass an orthogonal projection matrix in, bind your font texture (generally generated earlier with the GLKTextureLoader class) to the active texture unit, generate two big arrays of texture and geometric vertices for the characters/update VBOs if the text has changed, pass that in, and then batch render all of the letters at once using either glDrawArrays or glDrawElements (which requires indices).
Also, as I'm also new at using OpenGL, some of this may be wrong/inefficient. I've yet to use OpenGL ES to render anything 3D, so I'm not sure what other state changes (enabling, disabling, etc) besides a different projection matrix might be needed between rendering your 3D scene and the 2D scene (text).
It seems that drawing text using only OpenGL is a relatively difficult and tedious task, so if you just want to render a HUD overlay displaying frame rates and other things you are much better off using UILabels and saving yourself the trouble, especially if your project is not very complex. This also prevents you from having to deal with wrapping, kerning, font sizes, colors, different languages and a load of other stuff that greatly complicates text rendering if you need anything more complex.
Rather than tracking the location of each letter, why not use Core Graphics to draw your entire string into a bitmap, then upload that as a texture? You'd just need to get the dimensions from your bitmap to know what size quad to draw for that text string.
Within my open source GPUImage framework, I have an input class called a GPUImageUIElement that does something similar. The relevant code from that input is as follows:
CGSize layerPixelSize = [self layerSizeInPixels];
GLubyte *imageData = (GLubyte *) calloc(1, (int)layerPixelSize.width * (int)layerPixelSize.height * 4);
CGColorSpaceRef genericRGBColorspace = CGColorSpaceCreateDeviceRGB();
CGContextRef imageContext = CGBitmapContextCreate(imageData, (int)layerPixelSize.width, (int)layerPixelSize.height, 8, (int)layerPixelSize.width * 4, genericRGBColorspace, kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedFirst);
CGContextTranslateCTM(imageContext, 0.0f, layerPixelSize.height);
CGContextScaleCTM(imageContext, layer.contentsScale, -layer.contentsScale);
[layer renderInContext:imageContext];
CGContextRelease(imageContext);
CGColorSpaceRelease(genericRGBColorspace);
glBindTexture(GL_TEXTURE_2D, outputTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, (int)layerPixelSize.width, (int)layerPixelSize.height, 0, GL_BGRA, GL_UNSIGNED_BYTE, imageData);
free(imageData);
This code takes a CALayer (either directly or from the backing layer of a UIView) and renders its contents to a texture. I've already initialized the texture before this, so the code sets up a bitmap context, renders the layer into that context using -renderInContext:, and then uploads that bitmap to the texture for use in OpenGL ES.
The helper method -layerSizeInPixels just accounts for the current Retina scale factor as follows:
- (CGSize)layerSizeInPixels;
{
CGSize pointSize = layer.bounds.size;
return CGSizeMake(layer.contentsScale * pointSize.width, layer.contentsScale * pointSize.height);
}
If you used a UILabel for your view and had it autosize to fit its text, you could set the text on it, use the above to render and upload your texture, and then take the pixel size of the element to determine your quad size. However, it would probably be more efficient to just draw the text yourself using -drawAtPoint:withFont:fontForSize: or the like with an NSString.
Using Core Graphics to render your text makes it easy to manipulate the text as an NSString and use all of Core Graphics' typesetting capabilities instead of rolling your own.
UPDATE
I got around CG's limitations by drawing everything with OpenGL. Still some glitches, but so far it's working much, much faster.
Some interesting points :
GLKView : That's an iOS-specific view, and it helps a lot in setting up the OpenGL context and rendering loop. If you're not on iOS, I'm afraid you're on your own.
Shader precision : The precision of shader variables in the current version of OpenGL ES (2.0) is 16-bit. That was a little low for my purposes, so I emulated 32-bit arithmetics with pairs of 16-bit variables.
GL_LINES : OpenGL ES can natively draw simple lines. Not very well (no joints, no caps, see the purple/grey line on the top of the screenshot below), but to improve that you'll have to write a custom shader, convert each line into a triangle strip and pray that it works! (supposedly that's how browsers do that when they tell you that Canvas2D is GPU-accelerated)
Draw as little as possible. I suppose that makes sense, but you can frequently avoid rendering things that are, for instance, outside of the viewport.
OpenGL ES has no support for filled polygons, so you have to tesselate them yourself. Consider using iPhone-GLU : that's a port of the MESA code and it's pretty good, although it's a little hard to use (no standard Objective-C interface).
Original Question
I'm trying to draw lots of CGPaths (typically more than 1000) in the drawRect method of my scroll view, which is refreshed when the user pans with his finger. I have the same application in JavaScript for the browser, and I'm trying to port it to an iOS native app.
The iOS test code is (with 100 line operations, path being a pre-made CGMutablePathRef) :
- (void) drawRect:(CGRect)rect {
// Start the timer
BSInitClass(#"Renderer");
BSStartTimedOp(#"Rendering");
// Get the context
CGContextRef context = UIGraphicsGetCurrentContext();
CGContextSetLineWidth(context, 2.0);
CGContextSetFillColorWithColor(context, [[UIColor redColor] CGColor]);
CGContextSetStrokeColorWithColor(context, [[UIColor blueColor] CGColor]);
CGContextTranslateCTM(context, 800, 800);
// Draw the points
CGContextAddPath(context, path);
CGContextStrokePath(context);
// Display the elapsed time
BSEndTimedOp(#"Rendering");
}
In JavaScript, for reference, the code is (with 10000 line operations) :
window.onload = function() {
canvas = document.getElementById("test");
ctx = canvas.getContext("2d");
// Prepare the points before drawing
var data = [];
for (var i = 0; i < 100; i++) data.push ({x: Math.random()*canvas.width, y: Math.random()*canvas.height});
// Draw those points, and write the elapsed time
var __start = new Date().getTime();
for (var i = 0; i < 100; i++) {
for (var j = 0; j < data.length; j++) {
var d = data[j];
if (j == 0) ctx.moveTo (d.x, d.y);
else ctx.lineTo(d.x,d.y)
}
}
ctx.stroke();
document.write ("Finished in " + (new Date().getTime() - __start) + "ms");
};
Now, I'm much more proficient in optimizing JavaScript than I am at iOS, but, after some profiling, it seems that CGPath's overhead is absolutely, incredibly bad compared to JavaScript. Both snippets run at about the same speed on a real iOS device, and the JavaScript code has 100x the number of line operations of the Quartz2D code!
EDIT: Here is the top of the time profiler in Instruments :
Running Time Self Symbol Name
6487.0ms 77.8% 6487.0 aa_render
449.0ms 5.3% 449.0 aa_intersection_event
112.0ms 1.3% 112.0 CGSColorMaskCopyARGB8888
73.0ms 0.8% 73.0 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::LookupBucketFor(objc_object* const&, std::pair<objc_object*, unsigned long>*&) const
69.0ms 0.8% 69.0 CGSFillDRAM8by1
66.0ms 0.7% 66.0 ml_set_interrupts_enabled
46.0ms 0.5% 46.0 objc_msgSend
42.0ms 0.5% 42.0 floor
29.0ms 0.3% 29.0 aa_ael_insert
It is my understanding that this should be much faster on iOS, simply because the code is native... So, do you know :
...what I am doing wrong here?
...and if there's another, better solution to draw that many lines in real-time?
Thanks a lot!
As you described on question, using OpenGL is the right solution.
Theoretically, you can emulate all kind of graphics drawing with OpenGL, but you need to implement all shape algorithm yourself. For example, you need to extend edge corners of lines yourself. There's no concept of lines in OpenGL. The line drawing is kind of utility feature, and almost used only for debugging. You should treat everything as a set of triangles.
I believe 16bit floats are enough for most drawings. If you're using coordinates with large numbers, consider dividing space into multiple sector to make coordinate numbers smaller. Floats' precision become bad when it's going to very large or very small.
Update
I think you will meet this issue soon if you try to display UIKit over OpenGL display. Unfortunately, I also couldn't find the solution yet.
How to synchronize OpenGL drawing with UIKit updates
You killed CGPath performance by using CGContextAddPath.
Apple explicitly says this will run slowly - if you want it to run fast, you are required to attach your CGPath objects to CAShapeLayer instances.
You're doing dynamic, runtime drawing - blocking all of Apple's performance optimizations. Try switching to CALayer - especially CAShapeLayer - and you should see performance improve by a large amount.
(NB: there are other performance bugs in CG rendering that might affect this use case, such as obscure default settings in CG/Quartz/CA, but ... you need to get rid of the bottleneck on CGContextAddPath first)
I am working on iOS application where user can apply a certain set of photo filters. Each filter is basically set of Photoshop actions with a specific parameters. This actions are:
Levels adjustment
Brightness / Contrast
Hue / Saturation
Single and multiple overlay
I've repeated all this actions in my code using arithmetic expressions looping through the all pixels in image. But when I run my app on iPhone 4, each filter takes about 3-4 sec to apply which is quite a few time for the user to wait. The image size is 640 x 640 px which is #2x of my view size because it's displayed on Retina display. I've found that my main problem is levels modifications which are calling the pow() C function each time I need to adjust the gamma. I am using floats not doubles of course because ARMv6 and ARMv7 are slow with doubles. Tried to enable and disable Thumb and got the same result.
Example of the simplest filter in my app which is runs pretty fast though (2 secs). The other filters includes more expressions and pow() calls thus making them slow.
https://gist.github.com/1156760
I've seen some solutions which are using Accelerate Framework vDSP matrix transformations for fast image modifications. I've also seen OpenGL ES solutions. I am not sure that they are capable of my needs. But probably it's just a matter of translating my set of changes into some good convolution matrix?
Any advice would be helpful.
Thanks,
Andrey.
For the filter in your example code, you could use a lookup table to make it much faster. I assume your input image is 8 bits per color and you are converting it to float before passing it to this function. For each color, this only gives 256 possible values and therefore only 256 possible output values. You could precompute these and store them in an array. This would avoid the pow() calculation and the bounds checking since you could factor them into the precomputation.
It would look something like this:
unsigned char table[256];
for(int i=0; i<256; i++) {
float tmp = pow((float)i/255.0f, 1.3f) * 255.0;
table[i] = tmp > 255 ? 255 : (unsigned char)tmp;
}
for(int i=0; i<length; ++i)
m_OriginalPixelBuf[i] = table[m_OriginalPixelBuf[i]];
In this case, you only have to perform pow() 256 times instead of 3*640*640 times. You would also avoid the branching caused by the bounds checking in your main image loop which can be costly. You would not have to convert to float either.
Even a faster way may be to precompute the table outside the program and just put the 256 coefficients in the code.
None of the operations you have listed there should require a convolution or even a matrix multiply. They are all pixel-wise operations, meaning that each output pixel only depends on the single corresponding input pixel. You would need to consider convolution for operations like blurring or sharpening where multiple input pixels affect a single output pixel.
If you're looking for the absolute fastest way to do this, you're going to want to use the GPU to handle the processing. It's built to do massively parallel operations, like color adjustments on single pixels.
As I've mentioned in other answers, I measured a 14X - 28X improvement in performance when running an image processing operation using OpenGL ES instead of on the CPU. You can use the Accelerate framework to do faster on-CPU image manipulation (I believe Apple claims around a ~4-5X boost is possible here), but it won't be as fast as OpenGL ES. It can be easier to implement, however, which is why I've sometimes used Accelerate for this over OpenGL ES.
iOS 5.0 also brings over Core Image from the desktop, which gives you a nice wrapper around these kind of on-GPU image adjustments. However, there are some limitations to the iOS Core Image implementation that you don't have when working with OpenGL ES 2.0 shaders directly.
I present an example of an OpenGL ES 2.0 shader image filter in my article here. The hardest part about doing this kind of processing is getting the OpenGL ES scaffolding set up. Using my sample application there, you should be able to extract that setup code and apply your own filters using it. To make this easier, I've created an open source framework called GPUImage that handles all of the OpenGL ES interaction for you. It has almost every filter you list above, and most run in under 2.5 ms for a 640x480 frame of video on an iPhone 4, so they're far faster than anything processed on the CPU.
As I said in a comment, you should post this question on the official Apple Developer Forums as well.
That aside, one real quick check: are you calling pow( ) or powf( )? Even if your data is float, calling pow( ) will get you the double-precision math library function, which is significantly slower than the single-precision variant powf( ) (and you'll have to pay for the extra conversions between float and double as well).
And a second check: have you profiled your filters in Instruments? Do you actually know where the execution time is being spent, or are you guessing?
I actually wanted to do all this myself but I found Silverberg's Image Filters. You could apply various instagram type image filters on your images. This so much better than other image filters out there - GLImageProcessing or Cimg.
Also check Instagram Image Filters on iPhone.
Hope this helps...
From iOS 5 upwards, you can use the Core Image filters to adjust a good range of image parameters.
To adjust contrast for example, this code works like a charm:
- (void)setImageContrast:(float)contrast forImageView:(UIImageView *)imageView {
if (contrast > MIN_CONTRAST && contrast < MAX_CONTRAST) {
CIImage *inputImage = [[CIImage alloc] initWithImage:imageView.image];
CIFilter *exposureAdjustmentFilter = [CIFilter filterWithName:#"CIColorControls"];
[exposureAdjustmentFilter setDefaults];
[exposureAdjustmentFilter setValue:inputImage forKey:#"inputImage"];
[exposureAdjustmentFilter setValue:[NSNumber numberWithFloat:contrast] forKey:#"inputContrast"]; //default = 1.00
// [exposureAdjustmentFilter setValue:[NSNumber numberWithFloat:1.0f] forKey:#"inputSaturation"]; //default = 1.00
// [exposureAdjustmentFilter setValue:[NSNumber numberWithFloat:0.0f] forKey:#"inputBrightness"];
CIImage *outputImage = [exposureAdjustmentFilter valueForKey:#"outputImage"];
CIContext *context = [CIContext contextWithOptions:nil];
imageView.image = [UIImage imageWithCGImage:[context createCGImage:outputImage fromRect:outputImage.extent]];
}
}
N.B. Default value for contrast is 1.0 (maximum value suggested is 4.0).
Also, contrast is calculated here on the imageView's image, so calling this method repeatedly will cumulate the contrast. Meaning, if you call this method with contrast value 2.0 first and then again with contrast value 3.0, you will get the original image with contrast value increased by 6.0 (2.0 * 3.0) - not 5.0.
Check the Apple documentation for more filters and parameters.
To list all available filters and parameters in code, just run this loop:
NSArray* filters = [CIFilter filterNamesInCategories:nil];
for (NSString* filterName in filters)
{
NSLog(#"Filter: %#", filterName);
NSLog(#"Parameters: %#", [[CIFilter filterWithName:filterName] attributes]);
}
This is an old thread, but I got to it from another link on SO, so people still read it.
With iOS 5, Apple added support for Core Image, and a decent number of Core image filters. I'm pretty sure all the ones the OP mentioned are available
Core Image uses OpenGL shaders under the covers, so it's really fast. It's much easier to use than OpenGL however. If you aren't already working in OpenGL, and just want to apply filters to CGImage or UIIMage objects, Core Image filters are the way to go.