Quartz Performance Drawing Large Buffers - ios

I am wondering if what I'm attempting is just a bad idea. I'm currently working in monotouch. Is it possible to draw a screen-sized (on my iPhone 4 its about 320x460) buffer onto a UIView of equal size fast enough so that animated changes to that buffer look smooth to the end user (need it to be around 20ms per draw).
I've attempted many different implementations. The best one so far seems to be using an in-memory CGLayer and calling context.DrawLayer() to apply it to the view inside of Draw(). But even that takes 30-40ms per DrawLayer.
I'm writing my own tile-image control, and aside from performance, the idea is working well. I just can't figure out how to get the buffer onto the UIView fast enough.
Any ideas?

I've been dealing with custom views a lot lately, and i've had a bunch of performance problems, too.
All of these performance issues could be solved by determining the elements that need to be redrawn, and, more importantly, the elements that do not need to be redrawn.
Then, split the contents in the layer into individual sublayers and only redraw them if necessary. The good thing is, animations and so on are very smooth for those individual layers. (Their content is only a simple bitmap and does not change until you tell it to).
The only limitation i've come across was, that you cannot use CG blend modes (e.g. multiply) for the sublayers. As far as i know that is not possible. You can only use those blend modes inside the CG code used to draw the contents of the sublayers, but after that they are all composed in "normal" mode.

It really depends on what you are drawing.
If you are just drawing a solid filled color, that should not be a problem. The question is how much of the surface you are changing, and how you are changing it.
Again, it depends on what you are drawing and whether you could offload some of the work to the GPU. For example if you have static parts of your interface that will remain the same, or are animated/updated independently, you could use a different layer for those areas and let the GPU compose those.
Layers have the advantage that they are composited by the GPU, and they are backed by their own bitmaps. Once you draw into the surface of the layer, the OS will cache the result in the GPU and compose all of your layers at the same time.
Then you can determine which parts of your application actually need to be redrawn and only redraw those sections on each frame.
But again, it really will depend a lot on what you are trying to do.

Related

Need to make a Gantt chart like control in iOS, to draw or to subview?

I'm about ready to begin to create a Gantt chart like control in iOS for my app. I need to show a timeline of events. Basically a bunch of rectangles, some lines/arcs for some decoration, possibly a touch point or two to edit attributes. It will basically be a "full screen" control on my phone.
I see two basic paths to implement this custom UIView subclass:
Simply implement drawRect: and go to town using CoreGraphics calls. Most likely split over a bunch of private methods that do all the drawing work. Possibly cache some information as needed, to help with any sub region hit detection.
Rather than "draw" the graphics, add a bunch of UIViews as children using addSubview: and manipulating their layer properties to get them to show the different graphic pieces, and bounds\frame to get them positioned appropriately. And then just let "drawing" take care of itself.
Is one path better than the other? I may end up trying both in the long run just to see, but I figured I'd seek the wisdom of those who've gone before first.
My guess is that the quicker solution would be to go the drawRect: route, and the subview approach would require more code, but maybe be more robust (easier hit detection, animation support, automatic clipping management, etc). I do want to be able to implement pinch to zoom and the like, long term.
UPDATE
I went with the UICollectionView approach. Which got me selection and scrolling for free (after some surprises). I've been pretty pleased with the results so far.
Going with CoreGraphics is going force you to write many more lines of code than building with UIViews, although it is more performant and better on memory. However, you're likely going to need a more robust solution for managing all of that content. A UICollectionView seems like an appropriate solution for mapping your data on to a view with a custom UICollectionViewCell subclass. This is going to be much quicker to develop than rolling your own, and comes with great flexibility through UICollectionViewLayout subclasses. Pinch to zoom isn't supported out of the box, but there are ways to do it. This is also better for memory than using a bunch of UIViews because of cell reuse, but reloading can become slow with a few hundred items that all have different sizes to be calculated.
When it comes to performance, a well written drawRect: is preferred, especially when you would potentially have to render many many rects. With views, an entire layout system goes to work, much worse if you have autolayout, where an entire layout system goes to town and kills your performance. I recently upgraded our calendar views from view-based to CG-based for performance reasons.
In all other aspects, working with views is much preferred, of course. Interface Builder, easy gesture recognizer setup, OO, etc. You could still create logical classes for each element and have it draw itself in the current context (best to pass a context reference and draw on that), but still not as straight forward.
On newer devices, view drawing performance is quite high actually. But if you want these iPhone 4 and 4S devices, if you want these iPad 3 devices, which lack quite a lot in GPU performance, I would say, depending on your graphs potential sizes, you might have to go the CG way.
Now, you mention pinch to zoom. This is a bitch no matter what. If you write your drawRect: well, you could eventually work your way to tiling and work with that.
If you plan on letting the user move parts of the chart around I would definitely suggest going with the views.
FYI, you will be able to handle pinch to zoom with drawRect just fine.
What would push me to using UIView's in this case would be to support dragging parts of the chart, animating transitions in the chart, and tapping on elements in the chart (though that wouldn't be too hard with drawRect:). Also, if you have elements in your chart that will need heavy CPU usage to render you will get better performance if you need to redraw sub portions of your chart with UIView's since the rendering of the elements is cached to a layer and you will only need to redraw the pieces you care about and not the entire chart.
If you chart will be VERY big AND you want to use drawRect: you will probably want to look at using CATileLayer for you backing so that you don't have the entire layer in memory. This can add added challenges if you only want to render the requested tiles and not the entire area.

Why is -drawRect faster than using CALayers/UIViews for UITableViews?

I can already hear the wrenching guts of a thousand iOS developers.
No, I am not noob.
Why is -drawRect faster for UITableView performance than having multiple views?
I understand that compositing operations take place on the GPU. But compositing is a one-time operation; once the layers are committed to memory, it is no different from a cached buffer that, from the point of view of the GPU, gets translated in and out of view. Compare this to using Core Graphics in drawRect, which employ an unknown amount of operations on the CPU to produce pixels that end up getting cached in CALayers anyway. What's the difference if it all ends up cached and flattened anyway?
Also, if you're handling cell reuse properly, you shouldn't need to regenerate views on each call to -cellForRowAtIndexPath. In fact, there may be a performance benefit to having the state data (font, font size, text color, attributes, etc) cached by UIView/CALayer objects than having them constantly recreated during -drawRect.
Why the craze for drawRect? Can someone give me pointers?
When you talking about optimization, you need to provide specific situations and conditions and limitations. Because optimization is all about micro-management. Otherwise, it's meaningless.
What's the basis of your faster? How did you measured it? What's the numbers?
For example, no-op or very simple -drawRect: can be faster, but it doesn't mean it always does.
I don't know internal design of CA neither. So here are my guesses.
In case of static content
It's weird that your drawing code is being called constantly. Because CALayer caches drawing result, and won't draw it again until you send setNeedsDisplay message. If you don't update cell's content, it's just same with single bitmap layer. Should be faster than multiple composited layers because it doesn't need composition cost. If you're using only small number of cells which are enough to be exist all in the pool at same time, it doesn't need to be updated. As RAM becomes larger in recent model, it's more likely to happen in recent models.
In case of dynamic content
If it is being updated constantly, it means you're actually updating them yourself. So maybe your layer-composited version would also being updated constantly. It means it is being composited again for every frame. It could be slower by how it is complex and large. If it's complex and large and have a lot of overlapping areas, it could be slower. I guess CA will draw everything strictly if it can't determine what area is fine to ignore. Unlike you can choose what to draw or not.
In case of actual drawing is done in CPU
Even you configure your view as pure composition of many layers, each sublayers should be drawn eventually. And drawing of their content is not guaranteed to be done in GPU. For example, I believe CATextLayer is drawing itself in CPU. (because drawing text with polygons on current mobile GPU doesn't make sense in performance perspective) And some filtering effects too. In that case, overall cost would be similar and plus it requires compositing cost.
In case of well balanced load of CPU and GPU
If your GPU is very busy for heavy load because there're too many layers or direct OpenGL drawings, your CPU may be idle. If your CG drawing can be done within the idle CPU time, it could be faster than giving more load to GPU.
None of them is your case?
If your case is none of situations I listed above, I really want to see and check the CG code draws faster than CA composition. I wish you attach some source code.
well, your program could easily end up moving and converting a lot of pixel data if going back and forth from GPU to CPU based renderers.
as well, many layers can consume a lot of memory.
I'm only seeing half the conversation here, so I might have misunderstood. Based on my recent experiences optimizing CALayer rendering, and investigating the ways Apple does(n't) optimize stuff you'd expect to be optimized...
What's the difference if it all ends up cached and flattened anyway?
Apple ends up creating a separate GPU element per layer. If you have lots of layers, you have lots of GPU elements. If you have one drawRect, you only have one element. Apple often does NOT flatten those, even where they could (and possibly "should").
In many cases, "lots of elements" is no issue. But if they get to be large ... or there's enough of them ... or they're bad sizes for OpenGL ... AND (see below) they get stored in CPU instead of on GPU, then things start to get nasty. NB: in my experience:
"enough": 40+ in memory
"large": 100x100 points (200x200 retina pixels)
Apple's code for GPU elements / buffers is well optimized in MOST places, but in a few places it's very POORLY optimized. The performance drop is like going off a cliff.
Also, if you're handling cell reuse properly, you shouldn't need to
regenerate views on each call to -cellForRowAtIndexPath
You say "properly", except ... IIRC Apple's docs tell people not to do it that way, they go for a simpler approach (IMHO: weak docs), and instead re-populate all the subviews on every call. At which point ... how much are you saving?
FINALLY:
...doesn't all this change with iOS 6, where the cost of creating a UIView is greatly reduced? (I haven't profiled it yet, just been hearing about it from other devs)

CGContextDrawLayerAtPoint is slow on iPad 3

I have a custom view (inherited from UIView) in my app. The custom view overrides
- (void) drawRect:(CGRect) rect
The problem is: the drawRect: executes many times longer on iPad 3 than on iPad 2 (about 0.1 second on iPad 3 and 0.003 second on iPad 2). It's about 30 times slower.
Basically, I am using some pre-created layers and draw them in the drawRect:. The last call
CGContextDrawLayerAtPoint(context, CGPointZero, m_currentLayer);
takes most of the time (about 95% of total time in drawRect:)
What might be slowing things so much and how should I fix the cause?
UPDATE:
There are no threads directly involved. I do call setNeedsDisplay: in one thread and drawRect: gets called from another but that's it. The same goes for locks (there are no locks used).
The view gets redrawn in response to touches (it's a coloring book app). On iPad 2 I get reasonable delay between a touch and an update of the screen. I want to achieve the same on iPad 3.
So, the iPad 3 is definitely slower in a lot of areas. I have a theory about this. Marco Arment noted that the method renderInContext is ridiculously slow on the new iPad. I also found this to be the case when trying to create a magnifying glass for a custom text view. In the end I had to forego renderInContext for custom Core Graphics drawing.
I've also been having problem hitting the dreaded wait_fences errors on my core graphics drawing here: Only on new iPad 3: wait_fences: failed to receive reply: 10004003.
This is what I've figured out so far. The iPad 3 obviously has 4 times the pixels to drive. This can cause problems in two place:
First, the CPU. All core graphics drawing is done by the CPU. In the case of rotational events, if the CPU takes too long to draw, it hits the wait_fences error, which I believe is simply a call that tells the device to wait a little longer to actually perform the rotation, thus the delay.
Transferring images to the GPU. The GPU obviously handles the retina resolution just fine (see Infinity Blade 2). But when core graphics draws, it draws its images directly to the GPU buffers to avoid memcpy. However, either the GPU buffers haven't changes since the iPad 2 or they just didn't make them large enough, because it's remarkably easy to overload those buffers. When that happens, I believe the CPU writes the images to standard memory and then copies them to the GPU when the GPU buffers can handle it. This, I think is what causes the performance problems. That extra copy is time consuming with so many pixels and slows things down considerably.
To avoid memcpy I recommend several things:
Only draw what you need. Avoid drawing anything offscreen at all costs. If you're drawing a large view, but only display part of that view (subviews covering it, for example) try to find a way to only draw what is visible.
If you have to draw a large view, consider breaking the view up in to parts either as subviews or sublayers (probably sublayers in your case). And only redraw what you need. Take the notability app, for example. When you zoom in, you can literally watch it redraw one square at a time. Or in safari you can watch it update squares as you scroll. Unfortunately, I haven't had to do this so I'm uncertain of the methodology.
Try to keep your drawings simple. I had an awesome looking custom core text view that had to redraw on every character entered. Very slow. I changed the background to simple white (in core graphics) and it sped up well. Even better would be for me to not redraw the background.
I would like to point out that my theory is conjecture. Apple doesn't really explain what exactly they do. My theory is just based on what they have said and how the iPad responds as well as my own experimentation.
UPDATE
So Apple has now released the 2012 WWDC Developer videos. They have two videos that may help you (requires developer account):
iOS App Performance: Responsiveness
iOS App Performance: Graphics and Animation
One thing they talk about I think may help you is using the method: setNeedsDisplayInRect:(CGRect)rect. Using this method instead of the normal setNeedsDisplay and making sure that your drawRect method only draws the rect given to it can greatly help performance. Personally, I use the function: CGContextClipToRect(context, rect); to clip my drawing only to the rect provided.
As an example, I have a separate class I use to draw text directly to my views using Core Text. My UIView subclass keeps a reference to this object and uses it to draw it's text rather than use a UILabel. I used to refresh the entire view (setNeedsDisplay) when the text change. Now I have my CoreText object calculate the changed CGRect and use setNeedsDisplayInRect to only change the portion of the view that contains the text. This really helped my performance when scrolling.
I ended up using approach described in #Kurt Revis answer for similar question.
I minimized number of layers used, added UIImageView and set its image to an UIImage wrapping my CGImageRef. Please read the mentioned answer to get more details about the approach.
In the end my application become even simpler than before and works with almost identical speed on iPad 2 and iPad 3.

On iOS, how do CALayer bitmaps (CGImage objects) get displayed onto Graphics Card?

On iOS, I was able to create 3 CGImage objects, and use a CADisplayLink at 60fps to do
self.view.layer.contents = (__bridge id) imageArray[counter++ % 3];
inside the ViewController, and each time, an image is set to the view's CALayer contents, which is a bitmap.
And this all by itself, can alter what the screen shows. The screen will just loop through these 3 images, at 60fps. There is no UIView's drawRect, no CALayer's display, drawInContext, or CALayer's delegate's drawLayerInContext. All it does is to change the CALayer's contents.
I also tried adding a smaller size sublayer to self.view.layer, and set that sublayer's contents instead. And that sublayer will cycle through those 3 images.
So this is very similar to back in the old days even on Apple ][ or even in King's Quest III era, which are DOS video games, where there is 1 bitmap, and the screen just constantly shows what the bitmap is.
Except this time, it is not 1 bitmap, but a tree or a linked list of bitmaps, and the graphics card constantly use the Painter's Model to paint those bitmaps (with position and opacity), onto the main screen. So it seems that drawRect, CALayer, everything, were all designed to achieve this final purpose.
Is that how it works? Does the graphics card take an ordered list of bitmaps or a tree of bitmaps? (and then constantly show them. To simplify, we don't consider the Implicit animation in the CA framework) What is actually happening down in the graphics card handling layer? (and actually, is this method almost the same on iOS, Mac OS X, and on the PCs?)
(this question aims to understand how our graphics programming actually get rendered in modern graphics cards, since for example, if we need to understand UIView and how CALayer works, or even use CALayer's bitmap directly, we do need to understand the graphics architecture.)
Modern display libraries (such as Quartz used in iOS and Mac OS) use hardware accelerated compositing. The workings is very similar to how computer graphics libraries such as OpenGL work. In essence, each CALayer is kept in as a separate surface that is buffered and rendered by the video hardware much like a texture in a 3D game. This is exceptionally well implemented in iOS and this is why the iPhone is so well-known for having a smooth UI.
In the "old days" (i.e. Windows 9x, Mac OS Classic, etc), the screen was essentially one big framebuffer, and everything that was exposed by e.g. moving a window had to be redrawn manually by each application. The redrawing was mostly done by the CPU, which put an upper limit on animation performance. Animation were usually very "flickery" due to the redrawing involved. This technique was mostly suited for desktop applications without too much animation. Notably, Android uses (or at least used to use) this technique, which is a big problem when porting iOS applications over to Android.
Games of the old days days (e.g. DOS, arcade machines, etc, also used a lot on Mac OS classic), something called sprite animation was used to improve performance and reduce flickering by keeping the moving images in offscreen buffers that were rendered by the hardware and synchronized with the monitor's vblank, which meant that animations were smooth even on very low-end systems. However, the size of these images were very limited and the screen resolutions were low, only about 10-15% of the pixels of even an iPhone screen of today.
You've got a reasonable intuition here, but there are still several steps between contents and the display. First off, contents doesn't have to be a CGImage. It is often a private class called CABackingStorage which is not quite the same thing. In many cases there are hardware optimizations going on to bypass rendering the image into main memory and then copying it to video memory. And since the contents of various layers are all composited together, you're still a ways from the "real" display memory. Not to mention that modifications to contents just directly impacts the model layer, not the presentation or render layers. Plus there are CGLayer objects that can store their image directly in video memory. There's a lot of different stuff going on.
So the answer is, no, the video "card" (chip; it's the PowerVR BTW) does not take an ordered bunch of layers. It takes lower-level data in ways that are not well documented. Some things (particularly parts of Core Animation, and perhaps CGLayer) appear to be wrappers around OpenGL textures, but others are probably Core Graphics directly accessing the hardware itself. Once you get to this level of the stack, it's all private and can change from version to version and from device to device.
You also may find Brad Larson's response useful here:
iOS: is Core Graphics implemented on top of OpenGL?
You may also be interested in Chapter 6 of iOS:PTL. While it doesn't go into the implementation specifics, it does include a lot of practical discussion of how to improve drawing performance and best utilize the hardware with Core Graphics. Chapter 7 details all the developer-accessible steps involved in CALayer drawing.

CATiledLayer and UIImageView what's the big deal between them?

few months ago I've found a really awesome sample code from Apple site. The sample is called "LargeImageDownsizing" the wonderful thing is that it explain a lot about how image are read from resources and then rendered on screen. Digging into that code I've found something that is disturbing me a little. The downsized image is passed to a view that has a CATiledLayer, but without giving a piece of image at each tile to improve memory performance, it just set the tile size and then load image (I'm making things simple to go to the concept). So my question basically is why?Why use a CATiledLayer if it is not feed in the right way, they could have used a normal UIImageView... So I made few tests to understand if I was right. Modifing the code simple adding a scrollview with an image view as subview and responding to the delegate scrollview for zoom. I went to those conclusions testing on device and sim:
-The memory impact and footprint is exactly the same, even during zooming scrolling operation and it doesn't surprise me at all, the image is decompressed in memory
-Time profile say that a tileview take more time to be drawn during scrolling zoom operation instead of a uiimageview and that doesn't surprise me at all again the uiimageview is already drawn
-If I send memory warning nothing change between the two solution(only on sim)
-Testing Core Animation performance I get the same results around 60FPS
So what's the deal between those two views/layers why should I pick one instead of the other in these specific case? UIImageView seems to win the battle.
I hope that someone could help me to understand that.
They might perform the same for small images because ghen the only difference in terms os performance is that CATiledLayer draws on a background thread. Depending on the tile size CATiledLayer would even be slower because it has to draw multiple tiles for one image.
BUT ...
the point of CATiledLayer is that you don't need to draw all tiles, especially when zooming into a very very large image. It is smart to know which parts are actually needed. It also is smart about evicting tiles that are not needed any more.
Or this mechanism to work you need to provide the individual parts of the image separately. We're talking a total size of an image that probably cannot be held in memory uncompressed.

Resources