Painfully slow software vectors, particularly CoreGraphics vs. OpenGL - ios

I'm working on an iOS app that requires drawing Bézier curves in real time in response to the user's input. At first, I decided to try using CoreGraphics, which has a fantastic vector drawing API. However, I quickly discovered that performance was painfully, excruciatingly slow, to the point where the framerate started dropping severely with just ONE curve on my retina iPad. (Admittedly, this was a quick test with inefficient code. For example, the curve was getting redrawn every frame. But surely today's computers are fast enough to handle drawing a simple curve every 1/60th of a second, right?!)
After this experiment, I switched to OpenGL and the MonkVG library, and I couldn't be happier. I can now render HUNDREDS of curves simultaneously without any framerate drop, with only a minimal impact on fidelity (for my use case).
Is it possible that I misused CoreGraphics somehow (to the point where it was several orders of magnitude slower than the OpenGL solution), or is performance really that terrible? My hunch is that the problem lies with CoreGraphics, based on the number of StackOverflow/forum questions and answers regarding CG performance. (I've seen several people state that CG isn't meant to go in a run loop, and that it should only be used for infrequent rendering.) Why is this the case, technically speaking?
If CoreGraphics really is that slow, how on earth does Safari work so smoothly? I was under the impression that Safari isn't hardware-accelerated, and yet it has to display hundreds (if not thousands) of vector characters simultaneously without dropping any frames.
More generally, how do applications with heavy vector use (browsers, Illustrator, etc.) stay so fast without hardware acceleration? (As I understand it, many browsers and graphics suites now come with a hardware acceleration option, but it's often not turned on by default.)
UPDATE:
I have written a quick test app to more accurately measure performance. Below is the code for my custom CALayer subclass.
With NUM_PATHS set to 5 and NUM_POINTS set to 15 (5 curve segments per path), the code runs at 20fps in non-retina mode and 6fps in retina mode on my iPad 3. The profiler lists CGContextDrawPath as having 96% of the CPU time. Yes — obviously, I can optimize by limiting my redraw rect, but what if I really, truly needed full-screen vector animation at 60fps?
OpenGL eats this test for breakfast. How is it possible for vector drawing to be so incredibly slow?
#import "CGTLayer.h"
#implementation CGTLayer
- (id) init
{
self = [super init];
if (self)
{
self.backgroundColor = [[UIColor grayColor] CGColor];
displayLink = [[CADisplayLink displayLinkWithTarget:self selector:#selector(updatePoints:)] retain];
[displayLink addToRunLoop:[NSRunLoop mainRunLoop] forMode:NSRunLoopCommonModes];
initialized = false;
previousTime = 0;
frameTimer = 0;
}
return self;
}
- (void) updatePoints:(CADisplayLink*)displayLink
{
for (int i = 0; i < NUM_PATHS; i++)
{
for (int j = 0; j < NUM_POINTS; j++)
{
points[i][j] = CGPointMake(arc4random()%768, arc4random()%1024);
}
}
for (int i = 0; i < NUM_PATHS; i++)
{
if (initialized)
{
CGPathRelease(paths[i]);
}
paths[i] = CGPathCreateMutable();
CGPathMoveToPoint(paths[i], &CGAffineTransformIdentity, points[i][0].x, points[i][0].y);
for (int j = 0; j < NUM_POINTS; j += 3)
{
CGPathAddCurveToPoint(paths[i], &CGAffineTransformIdentity, points[i][j].x, points[i][j].y, points[i][j+1].x, points[i][j+1].y, points[i][j+2].x, points[i][j+2].y);
}
}
[self setNeedsDisplay];
initialized = YES;
double time = CACurrentMediaTime();
if (frameTimer % 30 == 0)
{
NSLog(#"FPS: %f\n", 1.0f/(time-previousTime));
}
previousTime = time;
frameTimer += 1;
}
- (void)drawInContext:(CGContextRef)ctx
{
// self.contentsScale = [[UIScreen mainScreen] scale];
if (initialized)
{
CGContextSetLineWidth(ctx, 10);
for (int i = 0; i < NUM_PATHS; i++)
{
UIColor* randomColor = [UIColor colorWithRed:(arc4random()%RAND_MAX/((float)RAND_MAX)) green:(arc4random()%RAND_MAX/((float)RAND_MAX)) blue:(arc4random()%RAND_MAX/((float)RAND_MAX)) alpha:1];
CGContextSetStrokeColorWithColor(ctx, randomColor.CGColor);
CGContextAddPath(ctx, paths[i]);
CGContextStrokePath(ctx);
}
}
}
#end

You really should not compare Core Graphics drawing with OpenGL, you are comparing completely different features for very different purposes.
In terms of image quality, Core Graphics and Quartz are going to be far superior than OpenGL with less effort. The Core Graphics framework is designed for optimal appearance , naturally antialiased lines and curves and a polish associated with Apple UIs. But this image quality comes at a price: rendering speed.
OpenGL on the other hand is designed with speed as a priority. High performance, fast drawing is hard to beat with OpenGL. But this speed comes at a cost: It is much harder to get smooth and polished graphics with OpenGL. There are many different strategies to do something as "simple" as antialiasing in OpenGL, something which is more easily handled by Quartz/Core Graphics.

First, see Why is UIBezierPath faster than Core Graphics path? and make sure you're configuring your path optimally. By default, CGContext adds a lot of "pretty" options to paths that can add a lot of overhead. If you turn these off, you will likely find dramatic speed improvements.
The next problem I've found with Core Graphics Bézier curves is when you have many components in a single curve (I was seeing problems when I went over about 3000-5000 elements). I found very surprising amounts of time spent in CGPathAdd.... Reducing the number of elements in your path can be a major win. From my talks with the Core Graphics team last year, this may have been a bug in Core Graphics and may have been fixed. I haven't re-tested.
EDIT: I'm seeing 18-20FPS in Retina on an iPad 3 by making the following changes:
Move the CGContextStrokePath() outside the loop. You shouldn't stroke every path. You should stroke once at the end. This takes my test from ~8FPS to ~12FPS.
Turn off anti-aliasing (which is probably turned off by default in your OpenGL tests):
CGContextSetShouldAntialias(ctx, false);
That gets me to 18-20FPS (Retina) and up to around 40FPS non-Retina.
I don't know what you're seeing in OpenGL. Remember that Core Graphics is designed to make things beautiful; OpenGL is designed to make things fast. Core Graphics relies on OpenGL; so I would always expect well-written OpenGL code to be faster.

Disclaimer: I'm the author of MonkVG.
The biggest reason that MonkVG is so much faster then CoreGraphics is actually not so much that it is implemented with OpenGL ES as a render backing but because it "cheats" by tessellating the contours into polygons before any rendering is done. The contour tessellation is actually painfully slow, and if you were to dynamically generate contours you would see a big slowdown. The great benefit of an OpenGL backing (verse CoreGraphics using direct bitmap rendering) is that any transform such a translation, rotation or scaling does not force a complete re-tessellation of the contours -- it's essentially for "free".

Your slowdown is because of this line of code:
[self setNeedsDisplay];
You need to change this to:
[self setNeedsDisplayInRect:changedRect];
It's up to you to calculate what rectangle has changed every frame, but if you do this properly, you will likely see over an order of magnitude performance improvement with no other changes.

Related

Render speed for individual pixels in loop

I'm working on drawing individual pixels to a UIView to create fractal images. My problem is my rendering speed. I am currently running this loop 260,000 times, but would like to render even more pixels. As it is, it takes about 5 seconds to run on my iPad Mini.
I was using a UIBezierPath before, but that was even a bit slower (about 7 seconds). I've been looking in NSBitMap stuff, but I'm not exactly sure if that would speed it up or how to implement it in the first place.
I was also thinking about trying to store the pixels from my loop into an array, and then draw them all together after my loop. Again though, I am not quite sure what the best process would be to store and then retrieve pixels into and from an array.
Any help on speeding up this process would be great.
for (int i = 0; i < 260000; i++) {
float RN = drand48();
for (int i = 1; i < numBuckets; i++) {
if (RN < bucket[i]) {
col = i;
CGContextSetFillColor(context, CGColorGetComponents([UIColor colorWithRed:(colorSelector[i][0]) green:(colorSelector[i][1]) blue:(colorSelector[i][2]) alpha:(1)].CGColor));
break;
}
}
xT = myTextFieldArray[1][1][col]*x1 + myTextFieldArray[1][2][col]*y1 + myTextFieldArray[1][5][col];
yT = myTextFieldArray[1][3][col]*x1 + myTextFieldArray[1][4][col]*y1 + myTextFieldArray[1][6][col];
x1 = xT;
y1 = yT;
if (i > 10000) {
CGContextFillRect(context, CGRectMake(xOrigin+(xT-xMin)*sizeScalar,yOrigin-(yT-yMin)*sizeScalar,.5,.5));
}
else if (i < 10000) {
if (x1 < xMin) {
xMin = x1;
}
else if (x1 > xMax) {
xMax = x1;
}
if (y1 < yMin) {
yMin = y1;
}
else if (y1 > yMax) {
yMax = y1;
}
}
else if (i == 10000) {
if (xMax - xMin > yMax - yMin) {
sizeScalar = 960/(xMax - xMin);
yOrigin=1000-(1000-sizeScalar*(yMax-yMin))/2;
}
else {
sizeScalar = 960/(yMax - yMin);
xOrigin=(1000-sizeScalar*(xMax-xMin))/2;
}
}
}
Edit
I created a multidimensional array to store UIColors into, so I could use a bitmap to draw my image. It is significantly faster, but my colors are not working appropriately now.
Here is where I am storing my UIColors into the array:
int xPixel = xOrigin+(xT-xMin)*sizeScalar;
int yPixel = yOrigin-(yT-yMin)*sizeScalar;
pixelArray[1000-yPixel][xPixel] = customColors[col];
Here is my drawing stuff:
CGDataProviderRef provider = CGDataProviderCreateWithData(nil, pixelArray, 1000000, nil);
CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
CGImageRef image = CGImageCreate(1000,
1000,
8,
32,
4000,
colorSpace,
kCGBitmapByteOrder32Big | kCGImageAlphaNoneSkipLast,
provider,
nil, //No decode
NO, //No interpolation
kCGRenderingIntentDefault); // Default rendering
CGContextDrawImage(context, self.bounds, image);
Not only are the colors not what they are supposed to be, but every time I render my image, to colors are completely different from the previous time. I have been testing different stuff with the colors, but I still have no idea why the colors are wrong, and I'm even more confused how they keep changing.
Per-pixel drawing — with complicated calculations for each pixel, like fractal rendering — is one of the hardest things you can ask a computer to do. Each of the other answers here touches on one aspect of its difficulty, but that's not quite all. (Luckily, this kind of rendering is also something that modern hardware is optimized for, if you know what to ask it for. I'll get to that.)
Both #jcaron and #JustinMeiners note that vector drawing operations (even rect fill) in CoreGraphics take a penalty for CPU-based rasterization. Manipulating a buffer of bitmap data would be faster, but not a lot faster.
Getting that buffer onto the screen also takes time, especially if you're having to go through a process of creating bitmap image buffers and then drawing them in a CG context — that's doing a lot of sequential drawing work on the CPU and a lot of memory-bandwidth work to copy that buffer around. So #JustinMeiners is right that direct access to GPU texture memory would be a big help.
However, if you're still filling your buffer in CPU code, you're still hampered by two costs (at best, worse if you do it naively):
sequential work to render each pixel
memory transfer cost from texture memory to frame buffer when rendering
#JustinMeiners' answer is good for his use case — image sequences are pre-rendered, so he knows exactly what each pixel is going to be and he just has to schlep it into texture memory. But your use case requires a lot of per-pixel calculations.
Luckily, per-pixel calculations are what GPUs are designed for! Welcome to the world of pixel shaders. For each pixel on the screen, you can be running an independent calculation to determine the relationship of that point to your fractal set and thus what color to draw it in. The can be running that calculation in parallel for many pixels at once, and its output is going straight to the screen, so there's no memory overhead to dump a bitmap into the framebuffer.
One easy way to work with pixel shaders on iOS is SpriteKit — it can handle most of the necessary OpenGL/Metal setup for you, so all you have to write is the per-pixel algorithm in GLSL (actually, a subset of GLSL that gets automatically translated to Metal shader language on Metal-supported devices). Here's a good tutorial on that, and here's another on OpenGL ES pixel shaders for iOS in general.
If you really want to change many different pixels individually, your best option is probably to allocate a chunk of memory (of size width * height * bytes per pixel), make the changes directly in memory, and then convert the whole thing into a bitmap at once with CGBitmapContextCreateWithData
There may be even faster methods that this (see Justin's answer).
If you want to maximize render speed I would recommend bitmap rendering. Vector rasterization is much slower and CGContext drawing isn't really intended for high performance realtime rendering.
I faced a similar technical challenge and found CVOpenGLESTextureCacheRef to be the fastest. The texture cache allows you to upload a bitmap directly into graphics memory for fast rendering. Rendering utilizes OpenGL, but because its just 2D fullscreen image - you really don't need to learn much about OpenGL to use it.
You can see see an example I wrote of using the texture cache here: https://github.com/justinmeiners/image-sequence-streaming
My original question related to this is here:
How to directly update pixels - with CGImage and direct CGDataProvider
My project renders bitmaps from files so it is a little bit different but you could look at ISSequenceView.m for an example of how to use the texture cache and setup OpenGL for this kind of rendering.
Your rendering procedure could like something like:
1. Draw to buffer (raw bytes)
2. Lock texture cache
3. Copy buffer to texture cache
4. Unlock texture cache.
5. Draw fullscreen quad with texture

iOS CGPath Performance

UPDATE
I got around CG's limitations by drawing everything with OpenGL. Still some glitches, but so far it's working much, much faster.
Some interesting points :
GLKView : That's an iOS-specific view, and it helps a lot in setting up the OpenGL context and rendering loop. If you're not on iOS, I'm afraid you're on your own.
Shader precision : The precision of shader variables in the current version of OpenGL ES (2.0) is 16-bit. That was a little low for my purposes, so I emulated 32-bit arithmetics with pairs of 16-bit variables.
GL_LINES : OpenGL ES can natively draw simple lines. Not very well (no joints, no caps, see the purple/grey line on the top of the screenshot below), but to improve that you'll have to write a custom shader, convert each line into a triangle strip and pray that it works! (supposedly that's how browsers do that when they tell you that Canvas2D is GPU-accelerated)
Draw as little as possible. I suppose that makes sense, but you can frequently avoid rendering things that are, for instance, outside of the viewport.
OpenGL ES has no support for filled polygons, so you have to tesselate them yourself. Consider using iPhone-GLU : that's a port of the MESA code and it's pretty good, although it's a little hard to use (no standard Objective-C interface).
Original Question
I'm trying to draw lots of CGPaths (typically more than 1000) in the drawRect method of my scroll view, which is refreshed when the user pans with his finger. I have the same application in JavaScript for the browser, and I'm trying to port it to an iOS native app.
The iOS test code is (with 100 line operations, path being a pre-made CGMutablePathRef) :
- (void) drawRect:(CGRect)rect {
// Start the timer
BSInitClass(#"Renderer");
BSStartTimedOp(#"Rendering");
// Get the context
CGContextRef context = UIGraphicsGetCurrentContext();
CGContextSetLineWidth(context, 2.0);
CGContextSetFillColorWithColor(context, [[UIColor redColor] CGColor]);
CGContextSetStrokeColorWithColor(context, [[UIColor blueColor] CGColor]);
CGContextTranslateCTM(context, 800, 800);
// Draw the points
CGContextAddPath(context, path);
CGContextStrokePath(context);
// Display the elapsed time
BSEndTimedOp(#"Rendering");
}
In JavaScript, for reference, the code is (with 10000 line operations) :
window.onload = function() {
canvas = document.getElementById("test");
ctx = canvas.getContext("2d");
// Prepare the points before drawing
var data = [];
for (var i = 0; i < 100; i++) data.push ({x: Math.random()*canvas.width, y: Math.random()*canvas.height});
// Draw those points, and write the elapsed time
var __start = new Date().getTime();
for (var i = 0; i < 100; i++) {
for (var j = 0; j < data.length; j++) {
var d = data[j];
if (j == 0) ctx.moveTo (d.x, d.y);
else ctx.lineTo(d.x,d.y)
}
}
ctx.stroke();
document.write ("Finished in " + (new Date().getTime() - __start) + "ms");
};
Now, I'm much more proficient in optimizing JavaScript than I am at iOS, but, after some profiling, it seems that CGPath's overhead is absolutely, incredibly bad compared to JavaScript. Both snippets run at about the same speed on a real iOS device, and the JavaScript code has 100x the number of line operations of the Quartz2D code!
EDIT: Here is the top of the time profiler in Instruments :
Running Time Self Symbol Name
6487.0ms 77.8% 6487.0 aa_render
449.0ms 5.3% 449.0 aa_intersection_event
112.0ms 1.3% 112.0 CGSColorMaskCopyARGB8888
73.0ms 0.8% 73.0 objc::DenseMap<objc_object*, unsigned long, true, objc::DenseMapInfo<objc_object*>, objc::DenseMapInfo<unsigned long> >::LookupBucketFor(objc_object* const&, std::pair<objc_object*, unsigned long>*&) const
69.0ms 0.8% 69.0 CGSFillDRAM8by1
66.0ms 0.7% 66.0 ml_set_interrupts_enabled
46.0ms 0.5% 46.0 objc_msgSend
42.0ms 0.5% 42.0 floor
29.0ms 0.3% 29.0 aa_ael_insert
It is my understanding that this should be much faster on iOS, simply because the code is native... So, do you know :
...what I am doing wrong here?
...and if there's another, better solution to draw that many lines in real-time?
Thanks a lot!
As you described on question, using OpenGL is the right solution.
Theoretically, you can emulate all kind of graphics drawing with OpenGL, but you need to implement all shape algorithm yourself. For example, you need to extend edge corners of lines yourself. There's no concept of lines in OpenGL. The line drawing is kind of utility feature, and almost used only for debugging. You should treat everything as a set of triangles.
I believe 16bit floats are enough for most drawings. If you're using coordinates with large numbers, consider dividing space into multiple sector to make coordinate numbers smaller. Floats' precision become bad when it's going to very large or very small.
Update
I think you will meet this issue soon if you try to display UIKit over OpenGL display. Unfortunately, I also couldn't find the solution yet.
How to synchronize OpenGL drawing with UIKit updates
You killed CGPath performance by using CGContextAddPath.
Apple explicitly says this will run slowly - if you want it to run fast, you are required to attach your CGPath objects to CAShapeLayer instances.
You're doing dynamic, runtime drawing - blocking all of Apple's performance optimizations. Try switching to CALayer - especially CAShapeLayer - and you should see performance improve by a large amount.
(NB: there are other performance bugs in CG rendering that might affect this use case, such as obscure default settings in CG/Quartz/CA, but ... you need to get rid of the bottleneck on CGContextAddPath first)

Why is renderInContext so much slower than drawing to the screen?

I have a pretty large CAShapeLayer that I'm rendering. The layer is completely static, but it's contained in a UIScrollView so it can move around and be zoomed -- basically, it must be redrawn every now and then. In an attempt to improve the framerate of this scrolling, I set shouldRasterize = YES on the layer, which worked perfectly. Because I never change any property of the layer it never has a rasterization miss and I get a solid 60 fps. High fives all around, right?
Until the layer gets a little bigger. Eventually -- and it doesn't take long -- the rasterized image gets too large for the GPU to handle. According to my console, <Notice>: CoreAnimation: surface 2560 x 4288 is too large, and it just doesn't draw anything on the screen. I don't really blame it -- 2560 x 4288 is pretty big -- but I spent a while scratching my head before I noticed this in the device console.
Now, my question is: how can I work around this limitation? How can I rasterize a really large layer?
The obvious solution seems to be to break the layer up into multiple sublayers, say one for each quadrant, and rasterize each one independently. Is there an "easy" way to do this? Can I create a new layer that renders a rectangular area from another layer? Or is there some other solution I should explore?
Edit
Creating a tiled composite seems to have really bad performance because the layers are re-rasterized every time they enter the screen, creating for a very jerky scrolling experience. Is there some way to cache those rasterizations? Or is this the wrong approach altogether?
Edit
Alright, here's my current solution: render the layer once to a CGImageRef. Create multiple tile layers using sub-rectangles from that image, and actually put those on the screen.
- (CALayer *)getTiledLayerFromLayer:(CALayer *)sourceLayer withHorizontalTiles:(int)horizontalTiles verticalTiles:(int)verticalTiles
{
CALayer *containerLayer = [CALayer layer];
CGFloat tileWidth = sourceLayer.bounds.size.width / horizontalTiles;
CGFloat tileHeight = sourceLayer.bounds.size.height / verticalTiles;
// make sure these are integral, otherwise you'll have image alignment issues!
NSLog(#"tileWidth:%f height:%f", tileWidth, tileHeight);
UIGraphicsBeginImageContextWithOptions(sourceLayer.bounds.size, NO, 0);
CGContextRef tileContext = UIGraphicsGetCurrentContext();
[sourceLayer renderInContext:tileContext];
CGImageRef image = CGBitmapContextCreateImage(tileContext);
UIGraphicsEndImageContext();
for(int horizontalIndex = 0; horizontalIndex < horizontalTiles; horizontalIndex++) {
for(int verticalIndex = 0; verticalIndex < verticalTiles; verticalIndex++) {
CGRect frame = CGRectMake(horizontalIndex * tileWidth, verticalIndex * tileHeight, tileWidth, tileHeight);
CGRect visibleRect = CGRectMake(horizontalIndex / (CGFloat)horizontalTiles, verticalIndex / (CGFloat)verticalTiles, 1.0f / horizontalTiles, 1.0f / verticalTiles);
CALayer *tile = [CALayer layer];
tile.frame = frame;
tile.contents = (__bridge id)image;
tile.contentsRect = visibleRect;
[containerLayer addSublayer:tile];
}
}
CGImageRelease(image);
return containerLayer;
}
This works great...sort of. One the one hand, I get 60fps panning and zooming of a 1980 x 3330 layer on a retina iPad. On the other hand, it takes 20 seconds to start up! So while this solution solves my original problem, it gives me a new one: how can I generate the tiles faster?
Literally all of the time is spent in the [sourceLayer renderInContext:tileContext]; call. This seems weird to me, because if I just add that layer directly I can render it about 40 times per second, according to the Core Animation Instrument. Is it possible that creating my own image context causes it to not use the GPU or something?
Breaking the layer into tiles is the only solution. You can however implement it in many different ways. I suggest doing it manually (creating layers & sublayers on your own), but many recommend using CATiledLayer http://www.mlsite.net/blog/?p=1857, which is the way maps are usually implemented - zooming and rotating is quite easy with this one. The tiles of CATiledLayer are loaded (drawn) on demand, just after they are put on the screen. This implies a short delay (blink) before the tile is fully drawn and AFAIK it is quite hard to get rid of this behaviour.

How to implement fast image filters on iOS platform

I am working on iOS application where user can apply a certain set of photo filters. Each filter is basically set of Photoshop actions with a specific parameters. This actions are:
Levels adjustment
Brightness / Contrast
Hue / Saturation
Single and multiple overlay
I've repeated all this actions in my code using arithmetic expressions looping through the all pixels in image. But when I run my app on iPhone 4, each filter takes about 3-4 sec to apply which is quite a few time for the user to wait. The image size is 640 x 640 px which is #2x of my view size because it's displayed on Retina display. I've found that my main problem is levels modifications which are calling the pow() C function each time I need to adjust the gamma. I am using floats not doubles of course because ARMv6 and ARMv7 are slow with doubles. Tried to enable and disable Thumb and got the same result.
Example of the simplest filter in my app which is runs pretty fast though (2 secs). The other filters includes more expressions and pow() calls thus making them slow.
https://gist.github.com/1156760
I've seen some solutions which are using Accelerate Framework vDSP matrix transformations for fast image modifications. I've also seen OpenGL ES solutions. I am not sure that they are capable of my needs. But probably it's just a matter of translating my set of changes into some good convolution matrix?
Any advice would be helpful.
Thanks,
Andrey.
For the filter in your example code, you could use a lookup table to make it much faster. I assume your input image is 8 bits per color and you are converting it to float before passing it to this function. For each color, this only gives 256 possible values and therefore only 256 possible output values. You could precompute these and store them in an array. This would avoid the pow() calculation and the bounds checking since you could factor them into the precomputation.
It would look something like this:
unsigned char table[256];
for(int i=0; i<256; i++) {
float tmp = pow((float)i/255.0f, 1.3f) * 255.0;
table[i] = tmp > 255 ? 255 : (unsigned char)tmp;
}
for(int i=0; i<length; ++i)
m_OriginalPixelBuf[i] = table[m_OriginalPixelBuf[i]];
In this case, you only have to perform pow() 256 times instead of 3*640*640 times. You would also avoid the branching caused by the bounds checking in your main image loop which can be costly. You would not have to convert to float either.
Even a faster way may be to precompute the table outside the program and just put the 256 coefficients in the code.
None of the operations you have listed there should require a convolution or even a matrix multiply. They are all pixel-wise operations, meaning that each output pixel only depends on the single corresponding input pixel. You would need to consider convolution for operations like blurring or sharpening where multiple input pixels affect a single output pixel.
If you're looking for the absolute fastest way to do this, you're going to want to use the GPU to handle the processing. It's built to do massively parallel operations, like color adjustments on single pixels.
As I've mentioned in other answers, I measured a 14X - 28X improvement in performance when running an image processing operation using OpenGL ES instead of on the CPU. You can use the Accelerate framework to do faster on-CPU image manipulation (I believe Apple claims around a ~4-5X boost is possible here), but it won't be as fast as OpenGL ES. It can be easier to implement, however, which is why I've sometimes used Accelerate for this over OpenGL ES.
iOS 5.0 also brings over Core Image from the desktop, which gives you a nice wrapper around these kind of on-GPU image adjustments. However, there are some limitations to the iOS Core Image implementation that you don't have when working with OpenGL ES 2.0 shaders directly.
I present an example of an OpenGL ES 2.0 shader image filter in my article here. The hardest part about doing this kind of processing is getting the OpenGL ES scaffolding set up. Using my sample application there, you should be able to extract that setup code and apply your own filters using it. To make this easier, I've created an open source framework called GPUImage that handles all of the OpenGL ES interaction for you. It has almost every filter you list above, and most run in under 2.5 ms for a 640x480 frame of video on an iPhone 4, so they're far faster than anything processed on the CPU.
As I said in a comment, you should post this question on the official Apple Developer Forums as well.
That aside, one real quick check: are you calling pow( ) or powf( )? Even if your data is float, calling pow( ) will get you the double-precision math library function, which is significantly slower than the single-precision variant powf( ) (and you'll have to pay for the extra conversions between float and double as well).
And a second check: have you profiled your filters in Instruments? Do you actually know where the execution time is being spent, or are you guessing?
I actually wanted to do all this myself but I found Silverberg's Image Filters. You could apply various instagram type image filters on your images. This so much better than other image filters out there - GLImageProcessing or Cimg.
Also check Instagram Image Filters on iPhone.
Hope this helps...
From iOS 5 upwards, you can use the Core Image filters to adjust a good range of image parameters.
To adjust contrast for example, this code works like a charm:
- (void)setImageContrast:(float)contrast forImageView:(UIImageView *)imageView {
if (contrast > MIN_CONTRAST && contrast < MAX_CONTRAST) {
CIImage *inputImage = [[CIImage alloc] initWithImage:imageView.image];
CIFilter *exposureAdjustmentFilter = [CIFilter filterWithName:#"CIColorControls"];
[exposureAdjustmentFilter setDefaults];
[exposureAdjustmentFilter setValue:inputImage forKey:#"inputImage"];
[exposureAdjustmentFilter setValue:[NSNumber numberWithFloat:contrast] forKey:#"inputContrast"]; //default = 1.00
// [exposureAdjustmentFilter setValue:[NSNumber numberWithFloat:1.0f] forKey:#"inputSaturation"]; //default = 1.00
// [exposureAdjustmentFilter setValue:[NSNumber numberWithFloat:0.0f] forKey:#"inputBrightness"];
CIImage *outputImage = [exposureAdjustmentFilter valueForKey:#"outputImage"];
CIContext *context = [CIContext contextWithOptions:nil];
imageView.image = [UIImage imageWithCGImage:[context createCGImage:outputImage fromRect:outputImage.extent]];
}
}
N.B. Default value for contrast is 1.0 (maximum value suggested is 4.0).
Also, contrast is calculated here on the imageView's image, so calling this method repeatedly will cumulate the contrast. Meaning, if you call this method with contrast value 2.0 first and then again with contrast value 3.0, you will get the original image with contrast value increased by 6.0 (2.0 * 3.0) - not 5.0.
Check the Apple documentation for more filters and parameters.
To list all available filters and parameters in code, just run this loop:
NSArray* filters = [CIFilter filterNamesInCategories:nil];
for (NSString* filterName in filters)
{
NSLog(#"Filter: %#", filterName);
NSLog(#"Parameters: %#", [[CIFilter filterWithName:filterName] attributes]);
}
This is an old thread, but I got to it from another link on SO, so people still read it.
With iOS 5, Apple added support for Core Image, and a decent number of Core image filters. I'm pretty sure all the ones the OP mentioned are available
Core Image uses OpenGL shaders under the covers, so it's really fast. It's much easier to use than OpenGL however. If you aren't already working in OpenGL, and just want to apply filters to CGImage or UIIMage objects, Core Image filters are the way to go.

Adjust brightness/contrast/gamma of scene in DirectX?

In a Direct3D application filled with Sprite objects from D3DX, I would like to be able to globally adjust the brightness and contrast. The contrast is important, if at all possible.
I've seen this question here about OpenGL: Tweak the brightness/gamma of the whole scene in OpenGL
But that doesn't give me what I would need in a DirectX context. I know this is something I could probably do with a pixel shader, too, but that seems like shooting a fly with a bazooka and I worry about backwards compatibility with older GPUs that would have to do any shaders in software. It seems like this should be possible, I remember even much older games like the original Half Life having settings like this well before the days of shaders.
EDIT: Also, please note that this is not a fullscreen application, so this would need to be something that would just affect the one Direct3D device and that is not a global setting for the monitor overall.
A lot of games do increasing brightness by, literally, applying a full screen poly over the screen with additive blending. ie
SRCBLEND = ONE
DESTBLEND = ONE
and then applying a texture with a colour of (1, 1, 1, 1) will increase the brightness(ish) of every pixel by 1.
To adjust contrast then you need to do similar but MULTIPLY by a constant factor. This will require blend settings as follows
SRCBLEND = DESTCOLOR
DESTBLEND = ZERO
This way if you blend a pixel with a value of (2, 2, 2, 2) then you will change the contrast.
Gamma is a far more complicated beast and i'm not aware of a way you can fake it like above.
Neither of these solutions is entirely accurate but they will give you a result that looks "slightly" correct. Its much more complicated doing either way properly but by using those methods you'll see effects that will look INCREDIBLY similar to various games you've played, and that's because that's EXACTLY what they are doing ;)
For a game I made in XNA, I used a pixel shader. Here is the code. [disclaimer] I'm not entirely sure the logic is right and some settings can have weird effects (!) [/disclaimer]
float offsetBrightness = 0.0f; //can be set from my C# code, [-1, 1]
float offsetContrast = 0.0f; //can be set from my C# code [-1, 1]
sampler2D screen : register(s0); //can be set from my C# code
float4 PSBrightnessContrast(float2 inCoord : TEXCOORD0) : COLOR0
{
return (tex2D(screen, inCoord.xy) + offsetBrightness) * (1.0 + offsetContrast);
}
technique BrightnessContrast
{
pass Pass1 { PixelShader = compile ps_2_0 PSBrightnessContrast(); }
}
You didn't specify which version of DirectX you're using, so I'll assume 9. Probably the best you can do is to use the gamma-ramp functionality. Pushing the ramp up and down increases or decreases brightness. Increasing or decreasing the gradient alters contrast.
Another approach is to edit all your textures. It would basically be the same approach as the gamma map, except you apply the ramp manually to the pixel data in each of your image files to generate the textures.
Just use a shader. Anything that doesn't at least support SM 2.0 is ancient at this point.
The answers in this forum may be of help, as I have never tried this.
http://social.msdn.microsoft.com/Forums/en-US/windowsdirectshowdevelopment/thread/95df6912-1e44-471d-877f-5546be0eabf2

Resources