Metal performance debugging

Metal performance debugging - ios

I have a Metal application on iOS where I take video frames and pass each frame through a number of shaders, some are compute shaders applied in multiple passes and 4 of them are independent MTKViews which display computed textures (example, Histogram) along with video preview. Sometimes (but not always) on older hardware such as iPhone 6s, I notice the app has become too sluggish with frame rate dropping to 1 or 2 frames per second. Please let me know how to debug which Metal shaders are clogging the GPU/GPU and how do I optimize the performance of Metal related code.

Run your application in Xcode.
Select Debug -> Capture GPU frame
Select Issue Navigator from the left toolbar
Select Runtime
Fix the listed issues, at least "high" priority.
You can also see where the performance is being spent by looking at drawPrimitives etc. times in the debug navigator on the left.
To view a shader's performance details, select the draw or dispatch call from the event list on the left. There's a "performance" section.

Related

Metal App FPS drops in Debug but fine in Instruments while profiling

I am using Metal for rendering live video frames plus some custom control (a circular slider) for zooming that I implemented using Quartz 2D API. When I run the app in debugger, I see FPS drop from 30 to sometimes 11 and zoom is not smooth on older devices such as iPad Mini 2. I then run the code in Time Profiler and surprisingly, there is no fps drop in Time Profiler. App runs smooth in Profiler. How do I know what is causing fps drop in debug?

It's probably the Metal Validation layer that's active for your debug scheme. It's not typically surprising that performance of programs is worse in general when debugging (due to lack of optimizations, or asserts being enabled, etc.).
If you want to get similar Metal performance when debugging, you can try disabling the Metal Validation in the scheme settings. But, then, of course, you lose the actual debugging benefit of the validation of your use of Metal.

SceneKit scenes lag when resuming app

In my app, I have several simple scenes (a single 80 segment sphere with a 500px by 1000px texture, rotating once a minute) displaying at once. When I open the app, everything goes smoothly. I get constant 120fps with less than 50mb of memory usage and around 30% cpu usage.
However, if I minimize the app and come back to it a minute later, or just stop interacting with the app for a while, the scenes all lag terribly and get around 4 fps, despite Xcode reporting 30fps, normal memory usage, and super low (~3%) cpu usage.
I get this behavior when testing on a real iPhone 7 iOS 10.3.1, and I'm not sure if this behavior exists on other devices or the emulator.
Here is a sample project I pulled together to demonstrate this issue. (link here) Am I doing something wrong here? How can I make the scenes wake up and resume using as much cpu as they need to maintain good fps?

I won't probably answer the question you've asked directly, but can give you some points to think about.
I launched you demo app on my iPod 6-th gen (64-bit), iOS 10.3.1 and it lags from the very beginning up to about a minute with FPS 2-3. Then after some time it starts to spin smoothly. The same after going background-foreground. It can be explained with some caching of textures.
I resized one of the SCNView's so that it fits the screen, other views stayed behind. Set v4.showsStatistics = true
And here what I got
as you can see Metal flush takes about 18.3 ms for one frame and its only for one SCNView.
According to this answer on Stackoverflow
So, if my interpretation is correct, that would mean that "Metal
flush" measures the time the CPU spends waiting on video memory to
free up so it can push more data and request operations to the GPU.
So we might suspect that problem is in 4 different SCNViews working with GPU simultaneously.
Let's check it. Comparing to the 2-nd point, I've deleted 3 SCNViews behind and put 3 planets from those views to the front one. So that one SCNView has 4 planets at once. And here is the screenshot
and as you can see Metal flush takes up to 5 ms and its from the beginning and everything goes smoothly. Also you may notice that amount of triangles (top right icon) is four times as many as what we can see on the first screenshot.
To sum up, just try to combine all SCNNodes on one SCNView and possibly you'll get a speed up.

So, I finally figured out a partially functional solution, even though its not what I thought it would be.
The first thing I tried was to keep all the nodes in a single global scene as suggested by Sander's answer and set the delegate on one of the SCNViews as suggested in the second answer to this question. Maybe this used to work or it worked in a different context, but it didn't work for me.
How Sander ended up helping me was the use of the performance statistics, which I didn't know existed. I enabled them for one of my scenes, and something stood out to me about performance:
In the first few seconds of running, before the app gets dramatic frame drops, the performance display read 240fps. "Why was this?", I thought. Who would need 240 fps on a mobile phone with a 60hz display, especially when the SceneKit default is 60. Then it hit me: 60 * 4 = 240.
What I guess was happening is that each update in a single scene triggered a "metal flush", meaning that each scene was being flushed 240 times per second. I would guess that this fills the gpu buffer (or memory? I have no idea) slowly, and eventually SceneKit needs to start clearing it out, and 240 fps across 4 views is simply too much for it to keep up with. (which explains why it initially gets good performance before dropping completely.).
My solution (and this is why I said "partial solution"), was to set the preferedFramesPerSecond for each SceneView to 15, for a total of 60 (I can also get away with 30 on my phone, but I'm not sure if this holds up on weaker devices). Unfortunately 15fps is noticeably choppy, but way better than the terrible performance I was getting originally.
Maybe in the future Apple will enable unique refreshes per SceneView.
TL;DR: set preferredFramesPerSecond to sum to 60 over all of your SceneViews.

How to prevent pixel bleeding from rendering sprite-sheet generated with Zwoptex on older iOS device?

I packed up several individual sprites and generated a big sprite-sheet 2048*2048 in size with Zwoptex. But I scale down to match each iOS device such as 2048*2048 for iPad HD, 512*512 for iPhone, etc.
I found out that "Spacing Pixel" option in Zwoptex will effect the result of sprites rendering on device. That value means a space (in pixel) between each individual sprite packing up inside sprite-sheet. For instance, if I set that value too low then there's more chance that pixel bleeding will occur on newer or better device as well as older device. But if I increase that value, the chance lowers and for certain value that is high enough, pixel bleeding (hopefully) won't happen.
Anyway, I set value to around 17-20 which is really high and it consumes valuable space on sprite-sheet. The result is, on iPhone simulator, there's still a problem.
As we can only restricts some devices from install the game for certain iOS version, but iPhone 3GS can still update to newest version thus I need to solve this problem.
So I want to know the solution on how to prevent pixel bleeding problem to occur across all iOS devices ranging from iPhone to iPad (included retina).
It would be great to know any best practice or practical solution on selecting certain value for "Spacing Pixel" between sprites to remove the problem away when rendering.

If only the Simulator shows those artifacts, then by all means ignore them! None of your users will ever run your app in the Simulator, will they? The Simulator isn't perfect.
A spacing of 2 pixels around each texture atlas sprite frame is enough (and generally recommended) to kill all artifacts caused by pixel bleeding. If you still see artifacts, they're not a direct cause from too little spacing. They can't be.
I'm not sure about Zwoptex, do you actually have to manually create each scaled-down version of the texture atlas? You may be doing something wron there. Try TexturePacker, I wouldn't be surprised if the artifacts go away just like that.
For example, one type of artifact is caused by not placing objects at integer positions. You may see a gap (usually a black line) between two objects if their position is something like (1.23545, 10.0) and (41.23545, 10.0). Using integer coordinates (1,10) and (41,10) would fix the issues. The difficulty is that this goes all the way up the hierarchy, if these object's parent node is also on a non-integer position you can still experience this line gap artifact.
If you search around you'll find numerous cause and effect discussions for cocos2d artifacts. One thing to keep in mind: do not use the CC_FIX_ARTIFACTS_BY_STRECHING_TEXEL macro. It's not a fix, it doesn't even come close. It kinda fixes the non-integer position artifact and introduces another (much worse IMHO): aliasing/flickering during movement.

cacheAsBitmap has no effect on a Sprite masked with a scrollRect in AIR for iOS

I'm developing a simple kinetic menu UI component for AIR for iPad. It's basically a lightweight fill-in for a combobox that matches the style of iOS. I have a sprite containing any where from 2 to 60 item buttons that pops up and lets you flick/ scroll through them, only showing about 7 items at any given time.
My first attempt at this used a mask over my sprite, moving my menu sprite up and down under the stationary mask. This produced lackluster results on the test device (< 20 fps).
I then tried a blitting solution, leaving the menu sprite off the display list and using BitmapData.draw() to render only the part to the list i needed visible. This produced the best results on my Windows dev platform, but this time the framerate dropped below 10 fps on iPad. I am assuming I was incurring either a taxing CPU usage or a GPU readback penalty. Originally I had hoped to be able to run my app a 60 fps, however I've ratcheted my goal down to a more humble 30 fps.
Which brings me to my 3rd attempt at this UI component using the sprite's .scrollRect masking function in conjunction with .cacheAsBitmap . Again, the observed behaviors differ wildly between AIR on Windows vs. iOS. On Windows it only redraws the part of the menu sprite bounded by the dimensions of the scrollRect as it should. With iOS i can touch the area of the screen above or below the visible area of the menu sprite and still drag the menu even though my finger is over "empty" space! The performance here is decent, hovering between (19 - 25 fps) and would almost certainly be perfect at 30 if it worked as it did on windows.
Does anyone have any ideas either about the scrollRect feature's behavior on AIR for iOS or a better way of implementing an iOS native style gliding menu in AIR for iOS?
Note, the above methods were tried in both CPU and GPU mode, but CPU mode performed vastly better. I used AIR 2.7 installed on top of Flash Pro CS 5.5, with FlashDevelop as my IDE.

http://esdot.ca/site/2011/fast-rendering-in-air-3-0-ios-android#comment-10
Really nice guy from the above link: "Ya, scrollRect is basically a no-go on mobile, basically forget that API even exists. Believe it or not… old school masking is the way to go. Round and round we go!"

iOS: playing a frame-by-frame greyscale animation in a custom colour

I have a 32 frame greyscale animation of a diamond exploding into pieces (ie 32 PNG images # 1024x1024)
my game consists of 12 separate colours, so I need to perform the animation in any desired colour
this I believe rules out any Apple frameworks, also it rules out a lot of public code for animating frame by frame in iOS.
what are my potential solution paths?
these are the best SO links I have found:
Faster iPhone PNG Animations
frame by frame animation
Is it possible using video as texture for GL in iOS?
that last one just shows it is may be possible to load an image into a GL texture each frame ( he is doing it from the camera, so if I have everything stored in memory, that should be even faster )
I can see these options ( listed laziest first, most optimised last )
option A
each frame (courtesy of CADisplayLink), load the relevant image from file into a texture, and display that texture
I'm pretty sure this is stupid, so onto option B
option B
preload all images into memory
then as per above, only we load from memory rather than from file
I think this is going to be the ideal solution, can anyone give it the thumbs up or thumbs down?
option C
preload all of my PNGs into a single GL texture of the maximum size, creating a texture Atlas. each frame, set the texture coordinates to the rectangle in the Atlas for that frame.
while this is potentially a perfect balance between coding efficiency and performance efficiency, the main problem here is losing resolution; on older iOS devices maximum texture size is 1024x1024. if we are cramming 32 frames into this ( really this is the same as cramming 64 ) we would be at 128x128 for each frame. if the resulting animation is close to full screen on the iPad this isn't going to hack it
option D
instead of loading into a single GL texture, load into a bunch of textures
moreover, we can squeeze 4 images into a single texture using all four channels
I baulk at the sheer amount of fiddly coding required here. My RSI starts to tingle even thinking about this approach
I think I have answered my own question here, but if anyone has actually done this or can see the way through, please answer!

If something higher performance than (B) is needed, it looks like the key is glTexSubImage2D http://www.opengl.org/sdk/docs/man/xhtml/glTexSubImage2D.xml
Rather than pull across one frame at a time from memory, we could arrange say 16 512x512x8-bit greyscale frames contiguously in memory, send this across to GL as a single 1024x1024x32bit RGBA texture, and then split it within GL using the above function.
This would mean that we are performing one [RAM->VRAM] transfer per 16 frames rather than per one frame.
Of course, for more modern devices we could get 64 instead of 16, since more recent iOS devices can handle 2048x2048 textures.
I will first try technique (B) and leave it at that if it works ( I don't want to over code ), and look at this if needed.
I still can't find any way to query how many GL textures it is possible to hold on the graphics chip. I have been told that when you try to allocate memory for a texture, GL just returns 0 when it has run out of memory. however to implement this properly I would want to make sure that I am not sailing close to the wind re: resources... I don't want my animation to use up so much VRAM that the rest of my rendering fails...

You would be able to get this working just fine with CoreGraphics APIs, there is no reason to deep dive into OpenGL for a simple 2D problem like this. For the general approach you should take to creating colored frames from a grayscale frame, see colorizing-image-ignores-alpha-channel-why-and-how-to-fix. Basically, you need to use CGContextClipToMask() and then render a specific color so that what is left is the diamond colored in with the specific color you have selected. You could do this at runtime, or you could do it offline and create 1 video for each of the colors you want to support. It is be easier on your CPU if you do the operation N times and save the results into files, but modern iOS hardware is much faster than it used to be. Beware of memory usage issues when writing video processing code, see video-and-memory-usage-on-ios-devices for a primer that describes the problem space. You could code it all up with texture atlases and complex openGL stuff, but an approach that makes use of videos would be a lot easier to deal with and you would not need to worry so much about resource usage, see my library linked in the memory post for more info if you are interested in saving time on the implementation.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart