CGContextDrawLayerAtPoint is slow on iPad 3 - ios

I have a custom view (inherited from UIView) in my app. The custom view overrides
- (void) drawRect:(CGRect) rect
The problem is: the drawRect: executes many times longer on iPad 3 than on iPad 2 (about 0.1 second on iPad 3 and 0.003 second on iPad 2). It's about 30 times slower.
Basically, I am using some pre-created layers and draw them in the drawRect:. The last call
CGContextDrawLayerAtPoint(context, CGPointZero, m_currentLayer);
takes most of the time (about 95% of total time in drawRect:)
What might be slowing things so much and how should I fix the cause?
There are no threads directly involved. I do call setNeedsDisplay: in one thread and drawRect: gets called from another but that's it. The same goes for locks (there are no locks used).
The view gets redrawn in response to touches (it's a coloring book app). On iPad 2 I get reasonable delay between a touch and an update of the screen. I want to achieve the same on iPad 3.

So, the iPad 3 is definitely slower in a lot of areas. I have a theory about this. Marco Arment noted that the method renderInContext is ridiculously slow on the new iPad. I also found this to be the case when trying to create a magnifying glass for a custom text view. In the end I had to forego renderInContext for custom Core Graphics drawing.
I've also been having problem hitting the dreaded wait_fences errors on my core graphics drawing here: Only on new iPad 3: wait_fences: failed to receive reply: 10004003.
This is what I've figured out so far. The iPad 3 obviously has 4 times the pixels to drive. This can cause problems in two place:
First, the CPU. All core graphics drawing is done by the CPU. In the case of rotational events, if the CPU takes too long to draw, it hits the wait_fences error, which I believe is simply a call that tells the device to wait a little longer to actually perform the rotation, thus the delay.
Transferring images to the GPU. The GPU obviously handles the retina resolution just fine (see Infinity Blade 2). But when core graphics draws, it draws its images directly to the GPU buffers to avoid memcpy. However, either the GPU buffers haven't changes since the iPad 2 or they just didn't make them large enough, because it's remarkably easy to overload those buffers. When that happens, I believe the CPU writes the images to standard memory and then copies them to the GPU when the GPU buffers can handle it. This, I think is what causes the performance problems. That extra copy is time consuming with so many pixels and slows things down considerably.
To avoid memcpy I recommend several things:
Only draw what you need. Avoid drawing anything offscreen at all costs. If you're drawing a large view, but only display part of that view (subviews covering it, for example) try to find a way to only draw what is visible.
If you have to draw a large view, consider breaking the view up in to parts either as subviews or sublayers (probably sublayers in your case). And only redraw what you need. Take the notability app, for example. When you zoom in, you can literally watch it redraw one square at a time. Or in safari you can watch it update squares as you scroll. Unfortunately, I haven't had to do this so I'm uncertain of the methodology.
Try to keep your drawings simple. I had an awesome looking custom core text view that had to redraw on every character entered. Very slow. I changed the background to simple white (in core graphics) and it sped up well. Even better would be for me to not redraw the background.
I would like to point out that my theory is conjecture. Apple doesn't really explain what exactly they do. My theory is just based on what they have said and how the iPad responds as well as my own experimentation.
So Apple has now released the 2012 WWDC Developer videos. They have two videos that may help you (requires developer account):
iOS App Performance: Responsiveness
iOS App Performance: Graphics and Animation
One thing they talk about I think may help you is using the method: setNeedsDisplayInRect:(CGRect)rect. Using this method instead of the normal setNeedsDisplay and making sure that your drawRect method only draws the rect given to it can greatly help performance. Personally, I use the function: CGContextClipToRect(context, rect); to clip my drawing only to the rect provided.
As an example, I have a separate class I use to draw text directly to my views using Core Text. My UIView subclass keeps a reference to this object and uses it to draw it's text rather than use a UILabel. I used to refresh the entire view (setNeedsDisplay) when the text change. Now I have my CoreText object calculate the changed CGRect and use setNeedsDisplayInRect to only change the portion of the view that contains the text. This really helped my performance when scrolling.

I ended up using approach described in #Kurt Revis answer for similar question.
I minimized number of layers used, added UIImageView and set its image to an UIImage wrapping my CGImageRef. Please read the mentioned answer to get more details about the approach.
In the end my application become even simpler than before and works with almost identical speed on iPad 2 and iPad 3.


Fastest way to get pixel/point color of screen content on iOS

I am trying to get the color of pixels/points (doesn't matter for my use case) of the current screen content in iOS. So, for example, I want to get the color of each pixel from screen coordinates 0, 0 to 10, 10. Additionally, the operation should be as fast as possible, since I will do it at regular intervals as a Timer. The timer should run multiple times a second, but it doesn't have to be 25fps.
Acceptable solutions:
Anything that returns the current color of a pixel or point on screen at a given position, doesn't produce noticable UI lag and doesn't turn my app into a battery hog. The result might be a CGImage, UIImage, buffer array, I don't really care. I also don't care if the solution uses additional Apple frameworks, such as OpenGL or Metal.
It is also acceptable if the solution does not capture system-UI, like the statusbar. Capturing the content of my app is sufficient.
Things I tried so far:
Using UIWindows drawHierarchy(in:afterScreenUpdates:). This method turns out to be way too slow. On my iPad Pro, it took 0.25s which causes noticable UI lag.
Using CALayers render(in:), but this method does not render UIVisualEffectViews, which I require. Also, while faster than drawHierarchy, I measured it at about 0.04s, which still causes noticable lag in the UI.
Use OpenGL, as for example described here. I don't know anything about OpenGL, so I might be using this wrong, but I never got it to return anything other than a black image.

On iOS, how do CALayer bitmaps (CGImage objects) get displayed onto Graphics Card?

On iOS, I was able to create 3 CGImage objects, and use a CADisplayLink at 60fps to do
self.view.layer.contents = (__bridge id) imageArray[counter++ % 3];
inside the ViewController, and each time, an image is set to the view's CALayer contents, which is a bitmap.
And this all by itself, can alter what the screen shows. The screen will just loop through these 3 images, at 60fps. There is no UIView's drawRect, no CALayer's display, drawInContext, or CALayer's delegate's drawLayerInContext. All it does is to change the CALayer's contents.
I also tried adding a smaller size sublayer to self.view.layer, and set that sublayer's contents instead. And that sublayer will cycle through those 3 images.
So this is very similar to back in the old days even on Apple ][ or even in King's Quest III era, which are DOS video games, where there is 1 bitmap, and the screen just constantly shows what the bitmap is.
Except this time, it is not 1 bitmap, but a tree or a linked list of bitmaps, and the graphics card constantly use the Painter's Model to paint those bitmaps (with position and opacity), onto the main screen. So it seems that drawRect, CALayer, everything, were all designed to achieve this final purpose.
Is that how it works? Does the graphics card take an ordered list of bitmaps or a tree of bitmaps? (and then constantly show them. To simplify, we don't consider the Implicit animation in the CA framework) What is actually happening down in the graphics card handling layer? (and actually, is this method almost the same on iOS, Mac OS X, and on the PCs?)
(this question aims to understand how our graphics programming actually get rendered in modern graphics cards, since for example, if we need to understand UIView and how CALayer works, or even use CALayer's bitmap directly, we do need to understand the graphics architecture.)
Modern display libraries (such as Quartz used in iOS and Mac OS) use hardware accelerated compositing. The workings is very similar to how computer graphics libraries such as OpenGL work. In essence, each CALayer is kept in as a separate surface that is buffered and rendered by the video hardware much like a texture in a 3D game. This is exceptionally well implemented in iOS and this is why the iPhone is so well-known for having a smooth UI.
In the "old days" (i.e. Windows 9x, Mac OS Classic, etc), the screen was essentially one big framebuffer, and everything that was exposed by e.g. moving a window had to be redrawn manually by each application. The redrawing was mostly done by the CPU, which put an upper limit on animation performance. Animation were usually very "flickery" due to the redrawing involved. This technique was mostly suited for desktop applications without too much animation. Notably, Android uses (or at least used to use) this technique, which is a big problem when porting iOS applications over to Android.
Games of the old days days (e.g. DOS, arcade machines, etc, also used a lot on Mac OS classic), something called sprite animation was used to improve performance and reduce flickering by keeping the moving images in offscreen buffers that were rendered by the hardware and synchronized with the monitor's vblank, which meant that animations were smooth even on very low-end systems. However, the size of these images were very limited and the screen resolutions were low, only about 10-15% of the pixels of even an iPhone screen of today.
You've got a reasonable intuition here, but there are still several steps between contents and the display. First off, contents doesn't have to be a CGImage. It is often a private class called CABackingStorage which is not quite the same thing. In many cases there are hardware optimizations going on to bypass rendering the image into main memory and then copying it to video memory. And since the contents of various layers are all composited together, you're still a ways from the "real" display memory. Not to mention that modifications to contents just directly impacts the model layer, not the presentation or render layers. Plus there are CGLayer objects that can store their image directly in video memory. There's a lot of different stuff going on.
So the answer is, no, the video "card" (chip; it's the PowerVR BTW) does not take an ordered bunch of layers. It takes lower-level data in ways that are not well documented. Some things (particularly parts of Core Animation, and perhaps CGLayer) appear to be wrappers around OpenGL textures, but others are probably Core Graphics directly accessing the hardware itself. Once you get to this level of the stack, it's all private and can change from version to version and from device to device.
You also may find Brad Larson's response useful here:
iOS: is Core Graphics implemented on top of OpenGL?
You may also be interested in Chapter 6 of iOS:PTL. While it doesn't go into the implementation specifics, it does include a lot of practical discussion of how to improve drawing performance and best utilize the hardware with Core Graphics. Chapter 7 details all the developer-accessible steps involved in CALayer drawing.

Quartz Performance Drawing Large Buffers

I am wondering if what I'm attempting is just a bad idea. I'm currently working in monotouch. Is it possible to draw a screen-sized (on my iPhone 4 its about 320x460) buffer onto a UIView of equal size fast enough so that animated changes to that buffer look smooth to the end user (need it to be around 20ms per draw).
I've attempted many different implementations. The best one so far seems to be using an in-memory CGLayer and calling context.DrawLayer() to apply it to the view inside of Draw(). But even that takes 30-40ms per DrawLayer.
I'm writing my own tile-image control, and aside from performance, the idea is working well. I just can't figure out how to get the buffer onto the UIView fast enough.
Any ideas?
I've been dealing with custom views a lot lately, and i've had a bunch of performance problems, too.
All of these performance issues could be solved by determining the elements that need to be redrawn, and, more importantly, the elements that do not need to be redrawn.
Then, split the contents in the layer into individual sublayers and only redraw them if necessary. The good thing is, animations and so on are very smooth for those individual layers. (Their content is only a simple bitmap and does not change until you tell it to).
The only limitation i've come across was, that you cannot use CG blend modes (e.g. multiply) for the sublayers. As far as i know that is not possible. You can only use those blend modes inside the CG code used to draw the contents of the sublayers, but after that they are all composed in "normal" mode.
It really depends on what you are drawing.
If you are just drawing a solid filled color, that should not be a problem. The question is how much of the surface you are changing, and how you are changing it.
Again, it depends on what you are drawing and whether you could offload some of the work to the GPU. For example if you have static parts of your interface that will remain the same, or are animated/updated independently, you could use a different layer for those areas and let the GPU compose those.
Layers have the advantage that they are composited by the GPU, and they are backed by their own bitmaps. Once you draw into the surface of the layer, the OS will cache the result in the GPU and compose all of your layers at the same time.
Then you can determine which parts of your application actually need to be redrawn and only redraw those sections on each frame.
But again, it really will depend a lot on what you are trying to do.

CATiledLayer and UIImageView what's the big deal between them?

few months ago I've found a really awesome sample code from Apple site. The sample is called "LargeImageDownsizing" the wonderful thing is that it explain a lot about how image are read from resources and then rendered on screen. Digging into that code I've found something that is disturbing me a little. The downsized image is passed to a view that has a CATiledLayer, but without giving a piece of image at each tile to improve memory performance, it just set the tile size and then load image (I'm making things simple to go to the concept). So my question basically is why?Why use a CATiledLayer if it is not feed in the right way, they could have used a normal UIImageView... So I made few tests to understand if I was right. Modifing the code simple adding a scrollview with an image view as subview and responding to the delegate scrollview for zoom. I went to those conclusions testing on device and sim:
-The memory impact and footprint is exactly the same, even during zooming scrolling operation and it doesn't surprise me at all, the image is decompressed in memory
-Time profile say that a tileview take more time to be drawn during scrolling zoom operation instead of a uiimageview and that doesn't surprise me at all again the uiimageview is already drawn
-If I send memory warning nothing change between the two solution(only on sim)
-Testing Core Animation performance I get the same results around 60FPS
So what's the deal between those two views/layers why should I pick one instead of the other in these specific case? UIImageView seems to win the battle.
I hope that someone could help me to understand that.
They might perform the same for small images because ghen the only difference in terms os performance is that CATiledLayer draws on a background thread. Depending on the tile size CATiledLayer would even be slower because it has to draw multiple tiles for one image.
BUT ...
the point of CATiledLayer is that you don't need to draw all tiles, especially when zooming into a very very large image. It is smart to know which parts are actually needed. It also is smart about evicting tiles that are not needed any more.
Or this mechanism to work you need to provide the individual parts of the image separately. We're talking a total size of an image that probably cannot be held in memory uncompressed.

Image partly off screen killing as3 frame rate on IOS

I'm developing a game in as3 for iPhone, and I've gotten it running reasonably well (consistanty 24fps on iPhone 3G), but I've noticed that when the "character" goes partly off the screen, the frame rate drops to 10-12fps. Does anyone know why this is and what I can do to remedy it?
Update - Been through the code pretty thoroughly, even made a new project just to test animations. Started a image offscreen and moved it across the screen and back off. Any time the image is offscreen, even partially, the frame rates are terrible. Once the image is fully on the screen, things pick back up to a solid 24fps. I'm using cacheAsBitmap, I've tried masking the stage, I've tried placing the image in a movieclip and using scrollRect. I would keep objects from going off the screen, except that the nature of the game I'm working on has objects dropping from the top down (yes, I'm using object pooling. No, I'm not scaling anything. Striclt x,y translations). And yes, I realize that Obj-C is probably the best answer, but I'd really like to avoid that if I can. AS3 is so much nicer to write in
Try and take a look at the 'blitmasking' technique:
From Doyle himself:
A BlitMask is basically a rectangular Sprite that acts as a high-performance mask for a DisplayObject by caching a bitmap version of it and blitting only the pixels that should be visible at any given time, although its bitmapMode can be turned off to restore interactivity in the DisplayObject whenever you want. When scrolling very large images or text blocks, BlitMask can greatly improve performance, especially on mobile devices that have weaker processorst
