what is byte alignment (cache line alignment) for Core Animation? Why it matters? - ios

I am loading images on a scroll view in a non-lazy way, so that the stutter behavior is not seen. The code works and the FPS is close to 60.
BUT, I do not understand what is byte alignment (or cache line alignment) for Core Animation?
As mentioned here and here this is an important thing to do. However, I noticed as long as I do the steps mentioned here, byte-alignment or not does not really matter.
Anyone knows what exactly it is?

When the CPU copies something from memory into the CPU cache it does so in chunks. Those chunks are cache lines and they are of a fixed size. When data is stored in the CPU cache, it's store as lines. Making your data fit into the cache line size for your target architecture can be important for performance because it affects data locality.
ARMv7 uses 32 byte cache lines (like PowerPC). The A9 processor uses 64 byte cache lines. Because of this, you will see the most benefit by rendering into a rectangle that is on a 64 byte boundary and has dimensions that are a multiple of 64 bytes.
On the other hand, the graphics accelerator does prefer working with image data that is a square power of two in dimensions. This doesn't have anything to do with cache lines or byte alignment. This is another thing that can have a large impact on performance.
In the specific cases you linked to, the Apple API being called (Core Animation, QT, etc). is performing these kinds of optimizations on the caller's behalf. In the case of CoreAnimation, the caller is giving it data that it is optimizing for the hardware. According to what Path wrote in the documentation you linked to, they suggest giving Core Animation data it will not have to optimize (in this case, optimizing and making a copy) to avoid the optimization step.
So if your images are some multiple of 64 bytes in dimension and each dimension is a square power of two, you're good to go ;) Rendering that image into an area of the screen that is on a 64 byte boundary is also good, but is not always realistic for anything but a full screen application like a game.
That said, use Instruments. Build your application, profile it with Instruments and a representative workload (UIAutomation is great for this). If you see scrolling performance problems Instruments will give you everything you need to zero in on where the bottleneck is.
I can honestly say that all of the scrolling performance problems I have seen have not involved byte alignment or cache lines. Instead it's been other forms of Core Animation abuse (not using rasterization and caching), or doing too much other work on the main thread, etc.
The guidance on the effect of byte alignment on performance is mentioned in the Quartz 2D Programming Guide
This is the format that Core Animation is optimizing images to when it does a copy. If you already have your data in the format Core Animation wants, it will skip the potentially expensive optimization step.
If you want to know more about how the iOS graphics pipeline works, see:
WWDC 2012 Session 238 "iOS App Performance: Graphics and Animations"
WWDC 2012 Session 235 "iOS App Performance: Responsiveness"
WWDC 2011 Session 121 "Understanding UIKit Rendering"
iOS Device Compatibility Reference: OpenGL ES Graphics

Related

Sprite Animation file sizes in SpriteKit

I looked into inverse kinematics as a way of using animation, but overall thought I might want to proceed with using sprite texture atlases to create animation instead. The only thing is i'm concerned about size..
I wanted to ask for some help in the "overall global solution":
I will have 100 monsters. Each has 25 frames of animation for an attack, idle, and spawning animation. Thus 75 frames in total per monster.
I'd imagine I want to do 3x, 2x and 1x animations so that means even more frames (75 x 3 images per monster). Unless I do pdf vectors then it's just one size.
Is this approach just too much in terms of size? 25 frames of animation alone was 4MB on the hard disk, but i'm not sure what happens in terms of compression when you load that into the Xcode and texture atlas.
Does anyone know if this approach i'm embarking on will take up a lot of space and potentially be a poor decision long term if I want even more monsters (right now I only have a few monsters and other images and i'm already up to ~150MB when I go to the app on the phone and look at it's storage - so it's hard to tell what would happen in the long term with way more monsters but I feel like it would be prohibitively large like 4GB+).
To me, this sounds like the wrong approach, and yet everywhere I read, they encourage using sprites and atlases accordingly. What am I doing wrong? too many frames of animation? too many monsters?
Thanks!
So, you are correct that you will run into a problem. In general, the tutorials you find online simply ignore this issue of download side and memory use on device. When building a real game you will need to consider total download size and the amount of memory on the actual device when rendering multiple animations at the same time on screen. There are 3 approaches, just store everything as PNG, make use of an animation format that compresses better than PNG, or third you can encode things as H264. Each of these approaches has issues. If you would like to take a look at my solution to the memory use issue at runtime, have a peek at SpriteKitFireAnimation link at this question. If you want to roll your own approach with H264, you can get lots of compression but you will have issues with alpha channel support. The lazy thing to do is use PNGs, it will work and support alpha channel, but PNGs will bloat your app and runtime memory use is heavy.

App keep crashing due to memory pressure

My app is saving and retrieving data from Parse.com. And showing images, buttons, scrollviews, etc.. (the normal stuff). Then when I got near finishing my app, it started to receive memory warnings and the app started crashing often. I checked it in the Instruments and noticed the live bytes was extremely high at some points, and I can't figure out why.
Is the app crashing because of the high live bytes? What should value of the live bytes be?
Obiously something is going on in the VM. But I have no idea what this is. What is the VM: CG raster data? And this: VM: CG Image? I am not using CGImages only UIImages
Is the app crashing because of the high live bytes?
Yes.
What should value of the live bytes be?
There's not fixed number. The limits change from OS version to OS version, and sometimes depend on the device and what else is going on at the moment. The right thing to do is (a) try not to use so much, and (b) heed the warnings and dispose of stuff you don't need.
Obiously something is going on in the VM. But I have no idea what this is. What is the VM: CG raster data? And this: VM: CG Image? I am not using CGImages only UIImages
A UIImage is just a wrapper around a CGImage.
You have too many images alive at the same time. That's the problem you have to fix.
So, how many is too many? It depends on how big they are.
Also, note that the "raster data" is the decompressed size. A 5Mpix RGBA 8bpp image takes 20MB of RAM for its raster data, whether the file is 8MB or 8KB.
I still feel the number is too high though, or is 30-40 MB an okey number handling 3-6 full-screen sized images at a time? This is when tested on a 4 year old iPhone4, iOS 7. If that matters.
On an iPhone 4, "full-screen" means 640x960 pixels. 8bpp RGBA means 4 bytes per pixel. So, with 6 such images, that's 640*960*4*6 = 14MB. So, that's the absolute minimum storage you should expect if you've loaded and drawn 6 full-screen images.
So, why do you actually see more than twice that?
Well, as Images and Memory Management in the class reference says:
In low-memory situations, image data may be purged from a UIImage object to free up memory on the system. This purging behavior affects only the image data stored internally by the UIImage object and not the object itself. When you attempt to draw an image whose data has been purged, the image object automatically reloads the data from its original file. This extra load step, however, may incur a small performance penalty.
So think of that 14MB as basically a cache that iOS uses to speed things up, in case you want to draw the images again. If you run a little low on memory, it'll purge the cache automatically, so you don't have to worry about it.
So, that leaves you with 16-24MB, which is presumably used by the buffers of your UI widgets and layers and by the compositor behind the scenes. That's a bit more than the theoretical minimum of 14MB, but not horribly so.
If you want to reduce memory usage further, what you probably need to do is not draw all 6 images. If they're full-screen, there's no way the user can see more than 1 or 2 at a time. So, you could load and render them on demand instead of preloading them (or, if you can predict which one will usually be needed next, preload 1 of them instead of all of them), and destroy them when they're no longer visible. Since you'd then only have 2 images instead of 6, that should drop your memory usage from 16-24MB + a 14MB cache to 5-9MB + a 5MB cache. This obviously means a bit more CPU—it probably won't noticeably affect responsiveness or battery drain, but you'd want to test that. And, more importantly, it will definitely make your code more complicated.
Obviously, if it's appropriate for your images, you could also do things like using non-Retina images (which will cut memory by 75%) or dropping color depth from RGBA-8 to ARGB-1555 (50%), but most images don't look as good that way (which is why we have high-color Retina displays).

Does UIView transparency affects a performance of an app?

I'm creating an app where there're up to 40 UIViews where every view stores a drawing of a stick on it which is available in several positions, rotated to 30 degree angle, 45 degree angle etc). Background of a View is transparent. These views can intersect with each other, so I need the UIViews to be transparent in order a user could see both drawings from overlapped and overlapping view. I wonder if this affects a performance of an application seriously? (all this transparency of all 40 UIViews). And how I can track how much memory or CPU my app currently uses.
I recommend watching WWDC 2012 Session 238 - iOS App Performance: Graphics and Animations, which covers these questions.
As a broad answer:
The iPhone will probably handle your 40-view requirement fine—but its impossible to know for sure without trying it out, and without more context (are they being animated? Are they scrolling?)
More views creates more performance problems, because all of the views need to be packaged up and shipped off to be rendered (by backboardd I think).
Transparency will hurt application performance. I believe the core reason is that transparent views need to be drawn in an off-screen buffer rather than be painted over existing content (something like that).
Use Instruments for Profiling
Profile your GPU usage using the Open GL ES Driver (look at 'Device Utilization')
Measure CPU usage using Time Profiler
Measure FPS and check for common performance problems using the CoreAnimation instrument
I wouldn't bother thinking about this until you actually see performance issues. If you do, I can't recommend that WWDC session enough—it covers things like what strategy you should take to optimize performance (e.g. moving work to the GPU as long as it can handle more; the basics of profiling, etc.) as well as tips and tricks based on the implementation details of iOS.

Why is -drawRect faster than using CALayers/UIViews for UITableViews?

I can already hear the wrenching guts of a thousand iOS developers.
No, I am not noob.
Why is -drawRect faster for UITableView performance than having multiple views?
I understand that compositing operations take place on the GPU. But compositing is a one-time operation; once the layers are committed to memory, it is no different from a cached buffer that, from the point of view of the GPU, gets translated in and out of view. Compare this to using Core Graphics in drawRect, which employ an unknown amount of operations on the CPU to produce pixels that end up getting cached in CALayers anyway. What's the difference if it all ends up cached and flattened anyway?
Also, if you're handling cell reuse properly, you shouldn't need to regenerate views on each call to -cellForRowAtIndexPath. In fact, there may be a performance benefit to having the state data (font, font size, text color, attributes, etc) cached by UIView/CALayer objects than having them constantly recreated during -drawRect.
Why the craze for drawRect? Can someone give me pointers?
When you talking about optimization, you need to provide specific situations and conditions and limitations. Because optimization is all about micro-management. Otherwise, it's meaningless.
What's the basis of your faster? How did you measured it? What's the numbers?
For example, no-op or very simple -drawRect: can be faster, but it doesn't mean it always does.
I don't know internal design of CA neither. So here are my guesses.
In case of static content
It's weird that your drawing code is being called constantly. Because CALayer caches drawing result, and won't draw it again until you send setNeedsDisplay message. If you don't update cell's content, it's just same with single bitmap layer. Should be faster than multiple composited layers because it doesn't need composition cost. If you're using only small number of cells which are enough to be exist all in the pool at same time, it doesn't need to be updated. As RAM becomes larger in recent model, it's more likely to happen in recent models.
In case of dynamic content
If it is being updated constantly, it means you're actually updating them yourself. So maybe your layer-composited version would also being updated constantly. It means it is being composited again for every frame. It could be slower by how it is complex and large. If it's complex and large and have a lot of overlapping areas, it could be slower. I guess CA will draw everything strictly if it can't determine what area is fine to ignore. Unlike you can choose what to draw or not.
In case of actual drawing is done in CPU
Even you configure your view as pure composition of many layers, each sublayers should be drawn eventually. And drawing of their content is not guaranteed to be done in GPU. For example, I believe CATextLayer is drawing itself in CPU. (because drawing text with polygons on current mobile GPU doesn't make sense in performance perspective) And some filtering effects too. In that case, overall cost would be similar and plus it requires compositing cost.
In case of well balanced load of CPU and GPU
If your GPU is very busy for heavy load because there're too many layers or direct OpenGL drawings, your CPU may be idle. If your CG drawing can be done within the idle CPU time, it could be faster than giving more load to GPU.
None of them is your case?
If your case is none of situations I listed above, I really want to see and check the CG code draws faster than CA composition. I wish you attach some source code.
well, your program could easily end up moving and converting a lot of pixel data if going back and forth from GPU to CPU based renderers.
as well, many layers can consume a lot of memory.
I'm only seeing half the conversation here, so I might have misunderstood. Based on my recent experiences optimizing CALayer rendering, and investigating the ways Apple does(n't) optimize stuff you'd expect to be optimized...
What's the difference if it all ends up cached and flattened anyway?
Apple ends up creating a separate GPU element per layer. If you have lots of layers, you have lots of GPU elements. If you have one drawRect, you only have one element. Apple often does NOT flatten those, even where they could (and possibly "should").
In many cases, "lots of elements" is no issue. But if they get to be large ... or there's enough of them ... or they're bad sizes for OpenGL ... AND (see below) they get stored in CPU instead of on GPU, then things start to get nasty. NB: in my experience:
"enough": 40+ in memory
"large": 100x100 points (200x200 retina pixels)
Apple's code for GPU elements / buffers is well optimized in MOST places, but in a few places it's very POORLY optimized. The performance drop is like going off a cliff.
Also, if you're handling cell reuse properly, you shouldn't need to
regenerate views on each call to -cellForRowAtIndexPath
You say "properly", except ... IIRC Apple's docs tell people not to do it that way, they go for a simpler approach (IMHO: weak docs), and instead re-populate all the subviews on every call. At which point ... how much are you saving?
FINALLY:
...doesn't all this change with iOS 6, where the cost of creating a UIView is greatly reduced? (I haven't profiled it yet, just been hearing about it from other devs)

iOS: playing a frame-by-frame greyscale animation in a custom colour

I have a 32 frame greyscale animation of a diamond exploding into pieces (ie 32 PNG images # 1024x1024)
my game consists of 12 separate colours, so I need to perform the animation in any desired colour
this I believe rules out any Apple frameworks, also it rules out a lot of public code for animating frame by frame in iOS.
what are my potential solution paths?
these are the best SO links I have found:
Faster iPhone PNG Animations
frame by frame animation
Is it possible using video as texture for GL in iOS?
that last one just shows it is may be possible to load an image into a GL texture each frame ( he is doing it from the camera, so if I have everything stored in memory, that should be even faster )
I can see these options ( listed laziest first, most optimised last )
option A
each frame (courtesy of CADisplayLink), load the relevant image from file into a texture, and display that texture
I'm pretty sure this is stupid, so onto option B
option B
preload all images into memory
then as per above, only we load from memory rather than from file
I think this is going to be the ideal solution, can anyone give it the thumbs up or thumbs down?
option C
preload all of my PNGs into a single GL texture of the maximum size, creating a texture Atlas. each frame, set the texture coordinates to the rectangle in the Atlas for that frame.
while this is potentially a perfect balance between coding efficiency and performance efficiency, the main problem here is losing resolution; on older iOS devices maximum texture size is 1024x1024. if we are cramming 32 frames into this ( really this is the same as cramming 64 ) we would be at 128x128 for each frame. if the resulting animation is close to full screen on the iPad this isn't going to hack it
option D
instead of loading into a single GL texture, load into a bunch of textures
moreover, we can squeeze 4 images into a single texture using all four channels
I baulk at the sheer amount of fiddly coding required here. My RSI starts to tingle even thinking about this approach
I think I have answered my own question here, but if anyone has actually done this or can see the way through, please answer!
If something higher performance than (B) is needed, it looks like the key is glTexSubImage2D http://www.opengl.org/sdk/docs/man/xhtml/glTexSubImage2D.xml
Rather than pull across one frame at a time from memory, we could arrange say 16 512x512x8-bit greyscale frames contiguously in memory, send this across to GL as a single 1024x1024x32bit RGBA texture, and then split it within GL using the above function.
This would mean that we are performing one [RAM->VRAM] transfer per 16 frames rather than per one frame.
Of course, for more modern devices we could get 64 instead of 16, since more recent iOS devices can handle 2048x2048 textures.
I will first try technique (B) and leave it at that if it works ( I don't want to over code ), and look at this if needed.
I still can't find any way to query how many GL textures it is possible to hold on the graphics chip. I have been told that when you try to allocate memory for a texture, GL just returns 0 when it has run out of memory. however to implement this properly I would want to make sure that I am not sailing close to the wind re: resources... I don't want my animation to use up so much VRAM that the rest of my rendering fails...
You would be able to get this working just fine with CoreGraphics APIs, there is no reason to deep dive into OpenGL for a simple 2D problem like this. For the general approach you should take to creating colored frames from a grayscale frame, see colorizing-image-ignores-alpha-channel-why-and-how-to-fix. Basically, you need to use CGContextClipToMask() and then render a specific color so that what is left is the diamond colored in with the specific color you have selected. You could do this at runtime, or you could do it offline and create 1 video for each of the colors you want to support. It is be easier on your CPU if you do the operation N times and save the results into files, but modern iOS hardware is much faster than it used to be. Beware of memory usage issues when writing video processing code, see video-and-memory-usage-on-ios-devices for a primer that describes the problem space. You could code it all up with texture atlases and complex openGL stuff, but an approach that makes use of videos would be a lot easier to deal with and you would not need to worry so much about resource usage, see my library linked in the memory post for more info if you are interested in saving time on the implementation.

Resources