Lightweight iOS performance profiling

Lightweight iOS performance profiling - ios

I need to do some performance profiling of an iOS app (including CPU usage, Memory usage, network usage). I need a way to store the results with graphs of what those metrics look like to compare over time. I need useful/helpful graphs and hopefully smallish in size, I’m not necessarily interested in stack traces across all threads for each time slice or any of that type of additional fluff.
I have tried instruments with adding time profiler (and some of the other templates), but I have 2 big issues:
The graphs are kind of tiny to the eye/not particularly helpful.
a 30 second profile used up something like 100mb, which is too big for what I’m looking for with regards to long term storage as each profiling session will probably 10+ minutes

You can do 2 things:
After entering Instruments, there are Record and Pause buttons. You can use Pause button to pause and unpause your desired operation profiling.
Under Instruments->Preferences->Recording tab, there is a Background Sampling Duration parameter - it allows you to specify how often it records the activity. Play with this parameter. You may get your desired file size.
If you observe below screenshot: There is also one more parameter named max backtrace depth. It changes size of your recorded call stack. You can also play with it to observe file size changes.

Related

How to know if I have to do memory profiling too?

I currently do CPU sampling of an ASP.NET Core application where I send huge number of requests(> 500K) to it. I see that the peak working set of the application is around ~300 MB which in my opinion is not huge considering the number of requests being made to the application. But what I have been observing is huge drop in requests per second when I enable certain pieces of functionality in my application.
Question:
Should I do memory profiling too? I ask this because even though the peak working set is ~300MB, there could be large number of short lived objects that could be created & collected by GC and since work by GC also counts as CPU, should I do memory profiling too to see if I allocate too much?

I will answer this question myself based on new information that I found out.
This is based on the tool PerfView, which provides information about GC and allocations.
When you open the GCStats view navigate the links to the process you care and you should see information like below:
Notice that view has the information has the % CPU Time spent Garbage Collecting. If you see this to be > 5% then it should be a cause of concern and you should start memory profiling.

Which factors affect the speed of cpu tracing?

When I use YJP to do cpu-tracing profile on our own product, it is really slow.
The product runs in a 16 core machine with 8GB heap, and I use grinder to run a small load test (e.g. 10 grinder threads) which have about 7~10 steps during the profiling. I have a script to start the product with profiler, start profiling (using controller api) and then start grinder to emulate user operations. When all the operations finish, the script tells the profiler to stop profiling and save snapshot.
During the profiling, for each step in the grinder test, it takes more than 1 million ms to finish. The whole profiling often takes more than 10 hours with just 10 grinder threads, and each runs the test 10 times. Without profiler, it finishes within 500 ms.
So... besides the problems with the product to be profiled, is there anything else that affects the performance of the cpu tracing process itself?

Last I used YourKit (v7.5.11, which is pretty old, current version is 12) it had two CPU profiling settings: sampling and tracing, the latter being much faster and less accurate. Since tracing is supposed to be more accurate I used it myself and also observed huge slowdown, in spite of the statement that the slowdown were "average". Yet it was far less than your results: from 2 seconds to 10 minutes. My code is a fragment of a calculation engine, virtually no IO, no waits on whatever, just reading a input, calculating and output the result into the console - so the whole slowdown comes from the profiler, no external influences.
Back to your question: the option mentioned - samping vs tracing, will affect the performance, so you may try sampling.
Now that I think of it: YourKit can be setup such that it does things automatically, like making snapshots periodically or on low memory, profiling memory usage, object allocations, each of this measures will make profiling slowlier. Perhaps you should make an online session instead of script controlled, to see what it really does.

According to some Yourkit Doc:
Although tracing provides more information, it has its drawbacks.
First, it may noticeably slow down the profiled application, because
the profiler executes special code on each enter to and exit from the
methods being profiled. The greater the number of method invocations
in the profiled application, the lower its speed when tracing is
turned on.
The second drawback is that, since this mode affects the execution
speed of the profiled application, the CPU times recorded in this mode
may be less adequate than times recorded with sampling. Please use
this mode only if you really need method invocation counts.
Also:
When sampling is used, the profiler periodically queries stacks of
running threads to estimate the slowest parts of the code. No method
invocation counts are available, only CPU time.
Sampling is typically the best option when your goal is to locate and
discover performance bottlenecks. With sampling, the profiler adds
virtually no overhead to the profiled application.
Also, it's a little confusing what the doc means by "CPU time", because it also talks about "wall-clock time".
If you are doing any I/O, waits, sleeps, or any other kind of blocking, it is important to get samples on wall-clock time, not CPU-only time, because it's dangerous to assume that blocked time is either insignificant or unavoidable.
Fortunately, that appears to be the default (though it's still a little unclear):
The default configuration for CPU sampling is to measure wall time for
I/O methods and CPU time for all other methods.
"Use Preconfigured Settings..." allows to choose this and other
presents. (sic)
If your goal is to make the code as fast as possible, don't be concerned with invocation counts and measurement "accuracy"; do find out which lines of code are on the stack a large fraction of the time, and why.
More on all that.

Using Instruments to Work Through Low Memory Warnings

I am trying to work through some low memory conditions using instruments. I can watch memory consumption in the Physical Memory Free monitor drop down to a couple of MB, even though Allocations shows that All Allocations is about 3 MB and Overall Bytes is 34 MB.
I have started to experience crashing since I moved some operations to a separate thread with an NSOperationQueue. But I wasn't using instruments before the change. Nevertheless, I'm betting I did something that I can undo to stop the crashes.
By the way, it is much more stable without instruments or the debugger connected.
I have the leaks down to almost none (maybe a hundred bytes max before a crash).
When I look at Allocations, I only see very primitive objects. And the total memory reported by it is also very low. So I cant see how my app is causing these low memory warnings.
When I look at Heap Shots from the start up, I don't see more than about 3 MB there, between the baseline and the sum of all the heap growth values.
What should I be looking at to find where the problem is? Can I isolate it to one of my view controller instances, for example? Or to one of my other instances?
What I have done:
I powered the device off and back on, and this made a significant improvement. Instruments is not reporting a low memory warning. Also, I noticed that Physical Free Memory at start up was only about 7 MB before restarting, and its about 60 MB after restarting.
However, I am seeing a very regular (periodic) drop in Physical Free Memory, dropping from 43 MB to 6 MB (an then back up to 43 MB). I would like to knwo what it causing that. I don't have any timers running in this app. (I do have some performSelector:afterDelay:, but those aren't active during these tests.)
I am not using ARC.

The allocations and the leaks instruments only show what the objects actually take, but not what their underlaying non-object structures (the backing stores) are taking. For example, for UIImages it will show you have a few allocated bytes. This is because a UIImage object only takes those bytes, but the CGImageRef that actually contains the image data is not an object, and it is not taken into account in these instruments.
If you are not doing it already, try running the VM Tracker at the same time you run the allocations instrument. It will give you an idea of the type memory that is being allocated. For iOS the "Dirty Memory", shown by this instrument, is what normally triggers the memory warnings. Dirty memory is memory that cannot be automatically discarded by the VM system. If you see lots of CGImages, images might be your problem.
Another important concept is abandoned memory. This is memory that was allocated, it is still referenced somewhere (and as such not a leak), but not used. An example of this type of memory is a cache of some sort, which is not freeing up upon memory warning. A way to find this out is to use the heap shot analysis. Press the "Mark Heap" button of the allocations instrument, do some operation, return to the previous point in the app and press "Mark Heap" again. The second heap shot should show you what new objects have been allocated between those two moments, and might shed some light on the mystery. You could also repeat the operation simulating a memory warning to see if that behaviour changes.
Finally, I recommend you to read this article, which explains how all this works: http://liam.flookes.com/wp/2012/05/03/finding-ios-memory/.

The difference between physical memory from VM Tracker and allocated memory from "Allocations" is due to the major differences of how these instruments work:
Allocations traces what your app does by installing a tap in the functions that allocate memory (malloc, NSAllocateObject, ...). This method yields very precise information about each allocation, like position in code (stack), amount, time, type. The downside is that if you don't trace every function (like vm_allocate) that somehow allocates memory, you lose this information.
VM Tracker samples the state of the system's virtual memory in regular intervals. This is a much less precise method, as it just gives you an overall view of the current state. It operates at a low frequency (usually something like every three seconds) and you get no idea of how this state was reached.
A known culprit of invisible allocations is CoreGraphics: It uses a lot of memory when decompressing images, drawing bitmap contexts and the like. This memory is usually invisible in the Allocations instrument. So if your app handles a lot of images it is likely that you see a big difference between the amount of physical memory and the overall allocated size.
Spikes in physical memory might result from big images being decompressed, downsized and then only used in screen resolution in some view's or layer's contents. All this might happen automatically in UIKit without your code being involved.

I have the leaks down to almost none (maybe a hundred bytes max before a crash).
In my experience, also very small leaks are "dangerous" sign. In fact, I have never seen a leak larger than 4K, and leaks I usually see are a couple hundreds of bytes. Still, they usually "hide" behind themselves a much larger memory which is lost.
So, my first suggestion is: get rid of those leaks, even though they seem small and insignificant -- they are not.
I have started to experience crashing since I moved some operations to a separate thread with an NSOperationQueue.
Is there a chance that the operation you moved to the thread is the responsible for the pulsing peak? Could it be spawned more than once at a time?
As to the peaks, I see two ways you can go about them:
use the Time Profiler in Instruments and try to understand what code is executing while you see the peak rising;
selectively comment out portions of your code (I mean: entire parts of your app -- e.g., replace a "real" controller with a basic/empty UIViewController, etc) and see if you can identify the culprit this way.
I have never seen such a pulsating behaviour, so I assume it depends on your app or on your device. Have you tried with a different device? What happens in the simulator (do you see the peak)?

When I'm reading your text, I have the impression that you might have some hidden leaks. I could be wrong but, are you 100% sure that you have check all leaks?
I remember one particular project I was doing few month ago, I had the same kind of issue, and no leaks in Instruments. My memory kept growing up and I get memory warnings... I start to log on some important dealloc method. And I've seen that some objects, subviews (UIView) were "leaking". But they were not seen by Instruments because they were still attached to a main view.
Hope this was helpful.

In the Allocations Instrument make sure you have "Only Track Active Allocations" checked. See Image Below. I think this makes it easier to see what is actually happening.

Have you run Analyze on the project? If there's any analyze warnings, fix them first.
Are you using any CoreFoundation stuff? Some of the CF methods have ... strange ... interactions with the ObjC runtime and mem management (they shouldn't do, AFAICS, but I've seen some odd behaviour with the low-level image and AV manipulations where it seems like mem is being used outside the core app process - maybe the OS calls being used by Apple?)
... NB: there have also, in previous versions of iOS, been a few mem-leaks inside Apple's CF methods. IIRC the last of those was fixed in iOS 5.0.
(StackOVerflow's parser sucks: I typed "3" not "1") Are you doing something with a large number of / large-sized CALayer instances (or UIView's with CG* methods, e.g. a custom drawRect method in a UIView?)
... NB: I have seen the exact behaviour you describe caused by 2 and 3 above, either in the CF libraries, or in the Apple windowing system when it tries to work with image data that was originally generated inside CF libraries - or which found its way into CALayers.
It seems that Instruments DOES NOT CORRECTLY TRACK memory usage inside the CA / CG system; this area is a bit complex since Apple is shuffling back and forth between CPU and GPU ram, but it's disappointing that the mem usage seems to simply "disappear" when it clearly is still being used!
Final thought (4. -- but SO won't let me type that) - are you using the invisible RHS of Instruments?
Apple hardcoded Instruments to always disable itself everytime you run it (so you have to keep manually opening it). This is stupid, since some of the core information only exists in the RHS bar. But I've worked with several people who didn't even know it existed :)

NSThread, NSOperation or GCD for CoreMotion and accurate timing purposes?

I'm looking to do some high precision core motion reading (>=100Hz if possible) and motion analysis on the iPhone 4+ which will run continuously for the duration of the main part of the app. It's imperative that the motion response and the signals that the analysis code sends out are as free from lag as possible.
My original plan was to launch a dedicated NSThread based on the code in the metronome project as referenced here: Accurate timing in iOS, along with a protocol for motion analysers to link in and use the thread. I'm wondering whether GCD or NSOperation queues might be better?
My impression after copious reading is that they are designed to handle a quantity of discrete, one-off operations rather than a small number of operations performed over and over again on a regular interval and that using them every millisecond or so might inadvertently create a lot of thread creation/destruction overhead. Does anyone have any experience here?
I'm also wondering about the performance implications of an endless while loop in a thread (such as in the code in the above link). Does anyone know more about how things work under the hood with threads? I know that iPhone4 (and under) are single core processors and use some sort of intelligent multitasking (pre-emptive?) which switches threads based on various timing and I/O demands to create the effect of parallelism...
If you have a thread that has a simple "while" loop running endlessly but only doing any additional work every millisecond or so, does the processor's switching algorithm consider the endless loop a "high demand" on resources thus hogging them from other threads or will it be smart enough to allocate resources more heavily towards other threads in the "downtime" between additional code execution?
Thanks in advance for the help and expertise...

IMO the bottleneck are rather the sensors. The actual update frequency is most often not equal to what you have specified. See update frequency set for deviceMotionUpdateInterval it's the actual frequency? and Actual frequency of device motion updates lower than expected, but scales up with setting
Some time ago I made a couple of measurements using Core Motion and the raw sensor data as well. I needed a high update rate too because I was doing a Simpson integration and thus wnated to minimise errors. It turned out that the real frequency is always lower and that there is limit at about 80 Hz. It was an iPhone 4 running iOS 4. But as long as you don't need this for scientific purposes in most cases 60-70 Hz should fit your needs anyway.

Ambiguities in using Instruments for iOS Development

I am Profiling an Application with Instruments. The profiling is done using Allocations Tool in two ways:
By choosing Directly the Allocations when I run the App for Profiling
By Choosing Leaks when I run the App for Profiling.
In both the cases , I had Allocations tool enabled for testing. But surprisingly , I had two different kind of Out put for Allocations at these instances.
Are they supposed to behave differently? or this is a problem with Instruments.
The time I Profile the with Leaks Tool:
In Allocations Graph:
1. I get lot of Peaks in Graph, The Live bytes and overall bytes are same.
2. I get the Black flags (I think it alarms about memory warning) after 1 minutes usage. Then after a set of flags appearing, my App Crashes. (This happens at times, even when directly run the App in Device)
The time I Profile the with Allocation Tool:
In Allocations Graph:
1. I do not get peaks often as it was in the above case. The Live bytes was always way less than Overall bytes.
2. I have used for over 20 minutes and never got Black flags.
One fact I came to know is that, when Live bytes and overall bytes are equal, the NSZombieEnabled could be enabled.
Have any of you ever encountered this problem.
UPDATE 1:
I faced another problem with first case. Whenever I profiled after a short duration (compared to profiling in second case), the app got lots of Black Flags and my App Crashed. (Due to Memory warning)
And when I tried the similar step by step use of Application my Application did not crash and got no flags.
Why this discrepancy?

In the first case, you are only tracking live allocations because the "Leaks" template configures the Allocations instrument that way. In the second, you are tracking both live and deallocated allocations. (As CocoaFu said).
Both are useful, but for slightly different reasons.
Only tracking live allocations (in combination with Heapshot Analysis, typically), is a great way to analyze permanent heap growth in your application. Once you know what is sticking around forever, you can figure out why and see if there are ways to optimize it away.
Tracking all allocations, alive and dead, is a very effective means of tracking allocation bandwidth. You can sort by overall bytes and start with the largest #. Have a look at all the points of allocation (click the little arrow next to the label in the Category of the selected row), and see where all the allocations are coming from.
For example, your graph shows that there are 1.27MB of 14 byte allocations -- 9218 allocations -- over that period of time. All have been free()d [good!], but that still represents a bunch of work to allocate, fill with data (presumably), and free each one of those. It may be a problem, maybe not.
(To put this in perspective, I used this technique to optimize an application. By merely focusing on reducing the # of transient -- short lived -- allocations, I was able to make the primary algorithms of the application 5x faster and reduce memory use by 85%. Turns out the app was copying strings many, many, times.)
Not sure why your app crashed as you described. Since it is a memory warning, you should see what is most frequently allocated.
Keep in mind that if you have zombie detection enabled, that takes a lot of additional memory.

Depending on the way Allocations is instantiated there are different options. Check the options by clicking the "i" symbol in the Allocations tile.
Yes, I find this annoying also.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart