Perf profiling shows wrong call-graph - stack

I am trying to profile my code.
After running perf, the resulting flame graph looks like that:
The 30% left follows my program calling graph. The 70% on the right are various samples, which are incorrectly detached from the left 30%, as they are always called from these 30%.
So, the 30% left should in fact stretched to 100%, and the right 70% should be on top.
Why is perf loosing track of callers in stack at some point?

Related

1024x768 resolution with nearest neighbor scaling

I'm using Arch Linux with Cinnamon and a 4K or rather UHD screen (3840x2160) and would like to use a 1024x768 resolution with black bars on
the left and right side and nearest neighbor scaling instead of bilinear filtering which is the default for all monitors that I've ever seen.
There's ways to kind of get this to work by actually running 3840x2160 but rendering a 1024x768 and scaling it up.
This can be done with xrandr or nvidia-settings.
I also managed to get some black bars going.
So what worked best for me so far was this command:
nvidia-settings -a CurrentMetaMode="DP-2: 3840x2160_60 {ViewPortIn=1024x768, ViewPortOut=2880x2160+480+0, ResamplingMethod=Nearest}"
This gives me the crisp upscaling and black bars.
There's one problem though: The right side of the screen is "cut off".
Which means when I maximize windows it acts as if I was still using a 16:9 resolution, rendering the right part of let's say a browser not accessible.
In games that scroll by putting the mouse at the edge of the screen it makes this not work for the right side, while working on the left, top and bottom.
Does anyone know about this problem or has a better solution?
I'm open to anything, like for example using some WINE settings to pull this off or something. Since this is mainly for playing old games a completely different approach with WINE would be totally fine to solve the problem.
I've already tried all kinds of things over the last few days. At this point I would jump into the air if somebody knows of a way to get this to work.

CAShapeLayer poor scrolling performance. Why?

I have a single container view inside a UIScrollView, and that container view has several CAShapeLayer based subviews (about 200). Each CAShapeLayer contains a very simple CGPath (a filled polygon of about 10 points). You can see it as a sort of map.
The container view itself is big (about 1000x2500 points) but I have implemented zooming using the transform property (I'm not using UIScrollView implementation), so at small scales it's entirely visible.
At high scales (only a part of the container view is visible on screen) this works great and scrolls smoothly, even on old hardware.
However, at small scales (when most of the container view is visible), this results in very bad scrolling performance on old hardware (40 fps on an iPhone 4S). And if I go over 200 subviews (which is something I would like to do), this is much worse (down to 15fps) even on newer hardware.
I have done some profiling but I can't find the bottleneck. During scrolling I make the following measurements (on average) :
Activity monitor instrument :
CPU : 8%
GPU driver instrument :
Device utilization : 40%
Renderer utilization : 35%
Tiler utilisation : 8%
FPS : 40
Here is what I have tried :
Setting shouldRasterize with the proper rasterizationScale on all CAShapeLayer. It makes things worse.
Setting shouldRasterize with the proper rasterizationScale on the layer of the container view. This improves scrolling for very small scales (when the entire container is visible inside the scrollView), but makes it much worse for bigger scales. Activating rasterization only for for small scales leaves a gap (between 0.5 an 2.0 approximately) where both options result in lost frames.
Using layers only instead of views. Doesn't improve anything.
Using CATiledLayer for the container view layer. Doesn't improve anything.
Any ideas ? If there was a limitation of the hardware shouldn't I see a 100% somewhere ? What else should I be profiling ?
At this point the main thing that I'm asking is help on how to profile my app and understand what is causing lost frames. Then maybe, depending on what is happening and what is doable, I will try to improve things. I'm not asking for every single possible tweak in the book that could maybe improve things a little if i'm lucky.
EDIT
Just found out that although my app is running at 8% CPU, backboardd runs at 100% CPU during scrolling. So the bottleneck is in the Core Animation render server !
Now I just have to figure out why. Maybe there are just too many layers... Anyone knows what is the maximum number of layers the render server can process every 16ms ? I can't find any official documentation about that.
EDIT 2
So after some more profiling it looks like it has nothing to do with the number of layers. If I combine all the shapes in a small number of layers (around 10) by combining some of the paths (and as a result using more complex paths), the result is approximately the same. backboardd still runs at 100% CPU during scrolling.
So I guess that it's the paths processing/drawing/something that takes time, regardless of wether or not they are split in smaller/simpler paths. Maybe it's simply bound to the number of points. I'm starting to think that there is nothing that I can do...

Interpreting downward spikes in Time Profiler

I'd appreciate some help on how I should interpret some results I get from Time Profiler and Activity Monitor. I couldn't find anything on this on the site, probably because the question is rather specific. However, I imagine I'm not the only one not sure what to read into the spikes they get on the Time Profiler.
I'm trying to figure out why my game is having regular hiccups on the iPhone 4. I'm trying to run it at 60 FPS, so I know it's tricky on such an old device, but I know some other games manage that fine. I'm using Unity, but this is a more general question about interpreting Instruments results. I don't have enough reputation to post images, and I can only post two links, so I can't post everything I'd like.
Here is what I get running my game on Time Profiler:
Screenshot of Time Profiler running my game
As far as I understand (but please correct me if I'm wrong), this graph is showing how much CPU my game uses during each sample the Time Profiler takes (I've set the samples to be taken once per millisecond). As you can see, there are frequent downward spikes in that graph, which (based on looking at the game itself as it plays) coincide with the hiccups in the game.
Additionally, the spikes are more common while I touch the device, especially if I move my finger on it continuously (which is what I did while playing our game above). (I couldn't make a comparable non-touching version because my game requires touching, but see below for a comparison.)
What confuses me here is that the spikes are downward: If my code was inefficient, doing too many calculations on some frames, I'd expect to see upward spikes, now downward. So here are the theories I've managed to come up with:
1) The downward spikes represent something else stealing CPU time (like, a background task, or the CPU's speed itself varying, or something). Because less time is available for my processing, I get hiccups, and it also shows as my app using less CPU.
2) My code is in fact inefficient, causing spikes every now and then. Because the processing takes isn't finished in one frame, it continues onto the next, but only needs a little extra time. That means that on that second frame, it uses less CPU, resulting in a downward spike. (It is my understanding that iOS frames are always equal legnth, say, 1/60 s, and so the third frame cannot start early even if we spent just a little extra time on the second.)
3) This is just a sampling problem, caused by the fact that the sampling frequency is 1ms while the frame length is about 16ms.
Both theories would make sense to me, and would also explain why our game has hiccups but some lighter games don't. 1) Lighter games would not suffer so badly from CPU stolen, because they don't need that much CPU to begin with. 2) Lighter games don't have as many spikes of their own.
However, some other tests seem to go against each of these theories:
1) If frames always get stolen like this, I'd expect similar spikes to appear on other games too. However, testing with another game (from the App Store, also using Unity), I don't get them (I had an image to show that but unfortunately I cannot post it).
Note: This game has lots of hiccups while running in the Time Profiler as well, so hiccups don't seem to always mean downward spikes.
2) To test the hypothesis that my app is simply spiking, I wrote a program (again in Unity) that wastes a consistent amount of milliseconds per frame (by running a loop until the specified time has passed according to the system clock). Here's what I get on Time Profiler when I make it waste 8ms per frame:
Screenshot of Time Profiler running my time waster app
As you can see, the downward spikes are still there, even though the app really shouldn't be able to cause spikes. (You can also see the effect of touching here, as I didn't touch it for the first half of the visible graph, and touched it continuously for the second.)
3) If this was due to unsync between the framerate and the sampling, I'd expect there to be a lot more oscillation there. Surely, my app would use 100% of the milliseconds until it's done with a frame, then drop to zero?
So I'm pretty confused about what to make of this. I'd appreciate any insight you can provide into this, and if you can tell me how to fix it, all the better!
Best regards,
Tommi Horttana
Have you tried unity's profiler? Does it show simillar results? Note that unity3d has two profilers on ios:
editor profiler - pro only (but there is a 30 day trial)
internal profiler - you have to enable it in xcode project's source
Look at http://docs.unity3d.com/Manual/MobileProfiling.html, maybe something will hint you.
If i had to guess, I'd check one of the most common source timing hickups - the mono garbage collector.
Try running it yourself in a set frequency (like every 250ms) and see if there is a difference in the pattern:
System.GC.Collect();

iOS: Instruments, allocations peaking to a flat line

I'm trying to learn how to use instruments. I wonder is I can get some opinions or insight at to what is going on here.
Firstly, shortly after 02:00 my app crashes due to creating many high resolution views. You can see the graph peak when the views are created. I think around 20 - 30 rendered views at aprox 1000 points.
My question is this: Note how the graph peaks and flattens at the end of the track (see the red arrow), starting just before 02:00 when the views are created, does this mean that the device (an iPhone 5) has "run out of memory". I see that the allocations listed as "All Allocations" is 17.76MB. Could this be the reason for the crash? Or is it the graphics crashing?
does this mean that the device (an iPhone 5) has "run out of memory"?
No. It is relative to peak active memory used in that run. Illustration: you would likely see the allocation amount at 1 minute "shrink" when you start creating all those views around 2 minutes.

How to profile on iOS?

I'm using instruments to profile the CPU activity of an iOS game.
The problem I'm having is that I'm not entirely sure what the data I'm looking at represents.
Here is the screen I see after running my game for a couple of minutes,
I can expand the call tree to see exactly what methods are using the most CPU time. I'm unsure if this data represents CPU usage for the entire duration the profiler was running or is it just at that point in time.
I've tried running the slider along the timeline to see what effect that has on the numbers and it doesn't seem to have any. So that leeds me to believe the data represents CPU usage for the duration the game was running.
If this is the case then is it possible to access CPU usage at a particular point in time. There are a few spikes along the time line, I would like to see exactly what was happening at that time to see if there are any improvements I can make.
Thanks in advance for any responses.
To select a time range, use the "inspection range" buttons at the top of the window (left of the stop watch).
First select the start of the range by clicking on the graph ruler, the press the left most button to select the left edge. Then select the end of the range on the graph ruler and press the right most button to select the right edge.

Resources