Are SIMD groups supported in Metal on iOS? - ios

I can't figure out whether or not SIMD groups are supported on iOS.
The Metal Shading Language Specification states at the time of writing on page 59, section 4.4.1:
iOS: No support for SIMD-groups.
However, in Table 6.11., "SIMD-group functions in the Metal standard library", some SIMD-group functions are listed as supported on iOS. This is one of the ones I'd like to use:
T simd_shuffle_down(T data, ushort delta)
macOS: Since Metal 2.0.
iOS: Since Metal 2.2.
Similarly, table Table 5.7., "Attributes for kernel function input arguments", states that some attributes are available:
threads_per_simdgroup
macOS: Since Metal 2.0.
iOS: Since Metal 2.2.
So it's not clear from the documentation whether any SIMD group functionality is supposed to be supported. Using a function argument with the threads_per_simdgroup attribute in a compute kernel currently causes the run-time Metal compiler to crash on iPhone 7 and 8 (but not 11):
Compiler failed with XPC_ERROR_CONNECTION_INTERRUPTED
All devices tested with iOS 13.3. Metal language version was 2.2. Xcode version 11.3.

I think that the claim that SIMD-groups are unsupported on iOS is either inaccurate, or not specific enough.
If you consult the Metal Feature Set Tables for Metal 2.2, you'll note that "SIMD-scoped permute operations" (simd_broadcast,
simd_shuffle, simd_shuffle_up, etc.) are supported on MTLGPUFamilyApple6, which includes devices with A13 processors. Hence why this works on iPhone 11.
The fact that using this attribute on unsupported devices causes a compiler crash is a bug, and I'd recommend that you file feedback.

Related

iOS App is >20x slower after building with Xcode 11

I have an app (Xcode project) that I developed for iOS 12 (with Xcode 10, Swift 4.2) which I now want to rebuild for iOS 13 using Xcode 11.
When I opened the project in Xcode 11, built and run the project, I found that my app was suddenly more than 20 times slower! I checked multiple times that Release configuration was selected and that the same compile flags were used as in Xcode 10. Needless to say, compiler optimisations are maxed out, just like in Xcode 10.
The app uses a lot of SIMD operations (float4, float3, float4x4, etc.) and works with pointers (e.g. UnsafeMutablePointer<float4>). I found that Xcode 11 showed me that simd types like float4 are now deprecated and typealiased to new SIMD structs.
Is there something obvious I should know about porting from iOS 12 to iOS 13? Can this change of SIMD types be the source of such terrible decrease in performance? How do I restore the original speeds (iOS 13 seems like a downgrade)?
My money is on Xcode 11. Apple probably changed the compiler, or compiler settings, in a way that affected the generation of matrix math instructions.
Apple's major releases of Xcode definitely break stuff sometimes, and it can take a while for them to sort it out.
Try building for iOS 12 while using Xcode 11. Then take the same project and build it from Xcode 10.2.
My company's app is HUGE, and build times went up by ≈3X under Xcode 11. It's quite painful. That's a different issue than execution speed, obviously, but it still tells you that there we big changes to the build process.
After I rewrote the SIMD-heavy Swift codebase to Objective-C, the old performance was restored. Objective-C thankfully still uses the good old SIMD data types without pointless wrappers.
The performance correlates with changes in Swift's implementation of SIMD. In Xcode 10.1 and older, Swift used built-in SIMD data types. With Xcode 10.2 (and newer), Apple added useless Object-Oriented wrapper for SIMD data types (e.g. float3 is typealias for SIMD4<Float>) and operations and didn't even bother to test the performance, which made my code more than 20x slower.
Sometimes you just need speed, don't force Protocol-oriented design where it doesn't belong to.
It doesn't matter if you choose to use Swift 4 or target iOS 12 in Xcode 10.2 and newer, you'll still get the same, bad SIMD performance.
TL;DR: Apple messed-up pre-Swift-5 SIMD levels of performance in Xcode 10.2 and newer. Port your SIMD-heavy code to Objective-C to get good old levels performance (>20x faster in my case). Let's hope Apple doesn't touch Objective-C or we're soon going to write in Neon assembly to get decent performance...

How does Swift 5.1 runtime work with older versions of iOS?

About a year ago, if you wanted to use Swift 4.2 for iOS development, you would have to install Xcode 10, which meant that you used iOS 12 SDK. As part your apps deployment, Swift 4.2 runtime would automatically be bundled with your app binary. This would mean that user installing your app would essentially download a copy of that Swift runtime that will enable your app work.
However, ABI stability came with Swift 5, and you no longer needed to bundle a runtime if your deployment target was iOS 12.2, since the runtime was now part of that iOS version. However, if you wanted to support iOS 10 and iOS 11, this Swift runtime would still be bundled with your app binary, and it would behave the same way as described above.
Documentation on swift.org states the same:
Apps deploying back to earlier OS releases will have a copy of the Swift runtime embedded inside them. Those copies of the runtime will be ignored — essentially inert — when running on OS releases that ship with the Swift runtime.
So far so good. If you use Xcode 10.2 with Swift 5.0, and you deploy your app to older iOS releases, you will still bundle Swift 5.0 runtime with it. Then, if your app is running on iOS 12, app will use the runtime provided by the iOS, and if it's running on e.g. iOS 11, it would use the runtime that was bundled as part of the app binary. Now the first question: Is that a correct assumption?
Now we come to Swift 5.1 and iOS 13 that will be released in September. Swift 5.1 includes some additional runtime features, e.g. opaque result types, which require Swift 5.1 runtime.
In WWDC 2019 session 402 "What's New in Swift", the speaker, when discussing the Swift 5.1 feature Opaque Result Type (SE-0244), mentions that the feature will only work on new OSes:
Requires new Swift runtime support
Available on macOS Catalina, iOS 13, tvOS 13, watchOS 6 and later
This is the confusing part for me. Wouldn't Swift runtime 5.1 be shipped with your app regardless if you support older iOS versions (e.g. iOS 10 as well), thus enabling it to use these new runtime features or am I just not understanding this correctly?
Now the first question: Is that a correct assumption?
Yes, that is correct.
Wouldn't Swift runtime 5.1 be shipped with your app regardless if you support older iOS versions (e.g. iOS 10 as well), thus enabling it to use these new runtime features or am I just not understanding this correctly?
The embedded runtime is not exactly the same runtime as the one found in your OS. E.g. the runtime in your OS is tightly integrated:
By being in the OS, the Swift runtime libraries can be tightly integrated with other components of the OS, particularly the Objective-C runtime and Foundation framework. The OS runtime libraries can also be incorporated into the dyld shared cache so that they have minimal memory and load time overhead compared to dylibs outside the shared cache.
Source: https://swift.org/blog/abi-stability-and-apple/
Of course, the embedded runtime cannot be tightly integrated into older systems. The embedded runtime can only support features that were already possible on the current system it is being executed. Features that require a newer systems are simply not present when your app runs on an older one.
Note that this has never been different for ObjC. If a class or a method only exists starting with a certain OS version, you can still deploy backwards to older system versions but then you cannot use that class/method there as it simply doesn't exist.
if (#available(iOS 13, *)) {
// Code requiring iOS 13
} else {
// Alternative code for older OS versions
}
or in Swift:
if #available(iOS 13, *) {
// Code requiring iOS 13
} else {
// Alternative code for older OS versions
}
Just like with ObjC, new Swift features will only be available for new OSes from now on. Only if it is possible to make these features also available for older OSes, regardless if these shipped a runtime or need to use the embedded one, this feature may also deploy backwards, though not necessarily all the way.
E.g. 10.15 introduces a new feature in its bundled runtime, then maybe this feature can also be made available for 10.14 and 10.13 using a shim library but not for 10.12 down to 10.9, then this feature will be tagged as "Requiring macOS 10.13 or newer".
If you deploy to 10.15, nothing has to be done, as the runtime of 10.15 supports the feature. If you deploy to 10.14 or 10.13, then the compiler will add shim library (like it would add an embedded runtime) and on 10.13 and 10.14 the code in this library will be used while on 10.15 and later the code in the runtime will be used. If you deploy to systems earlier than 10.13, this is okay but you must not use this feature on these systems then.
Of course, if a new feature can be made available even trough the embedded runtime, it can certainly also be made available using a shim library for all systems that shipped with an own runtime which just didn't support this feature, as the shim library can then use the same code that the embedded runtime uses.
The ability to sometimes make new features available even to older systems is explained by the very last question on that page:
Is there anything that can be done to allow runtime support for new Swift features to be backward deployed to older OSes?
It may be possible for some kinds of runtime functionality to be backward deployed, potentially using techniques such as embedding a “shim” runtime library within an app. However, this may not always be possible. The ability to successfully backward-deploy functionality is fundamentally constrained by the limitations and existing bugs of the shipped binary artifact in the old operating system. The Core Team will consider the backward deployment implications of new proposals under review on a case-by-case basis going forward
Source: https://swift.org/blog/abi-stability-and-apple/

Instanced drawing with OpenGL ES 2.0 on iOS

In short:
Can anyone confirm whether it is possible to use the built-in variable gl_InstanceID (or gl_InstanceIDEXT) in a vertex shader using OpenGL ES 2.0 on iOS with GL_EXT_draw_instanced enabled?
Longer:
I want to draw multiple instances of an object using glDrawArraysInstanced and gl_InstanceID, and I want my application to run on multiple platforms, including iOS.
The specification clearly says that these features require ES 3.0. According to the iOS Device Compatibility Reference ES 3.0 is only available on a few devices (those based on the A7 GPU; so iPhone 5s, but not on iPhone 5 or earlier).
So my first assumption was that I needed to avoid using instanced drawing on older iOS devices.
However, further down in the compatibility reference document it says that the EXT_draw_instanced extension is supported for all SGX Series 5 processors (that includes iPhone 5 and 4s).
This makes me think that I could indeed use instanced drawing on older iOS devices too, by looking up and using the appropriate extension function (EXT or ARB) for glDrawArraysInstanced.
I'm currently just running some test code using SDL and GLEW on Windows so I haven't tested anything on iOS yet.
However, in my current setup I'm having trouble using the gl_InstanceID built-in variable in a vertex shader. I'm getting the following error message:
'gl_InstanceID' : variable is not available in current GLSL version
Enabling the "draw_instanced" extension in GLSL has no effect:
#extension GL_ARB_draw_instanced : enable
#extension GL_EXT_draw_instanced : enable
The error goes away when I specifically declare that I need ES 3.0 (GLSL 300 ES):
#version 300 es
Although that seem to work fine on my Windows desktop machine in an ES 2.0 context I doubt that this would work on an iPhone 5.
So, shall I abandon the idea of being able to use instanced drawing on older iOS devices?
From here:
Instanced drawing is available in the core OpenGL ES 3.0 API and in
OpenGL ES 2.0 through the EXT_draw_instanced and EXT_instanced_arrays
extensions.
You can see that it's available on all of their GPUs, PowerVR SGX, Apple A7, A8.
(Looks like #Shammi's not coming back... if they do, you can change the accepted answer :)

How to use ARM intrinsics in iOS?

I need to compute MSB (most significant bit) on millions of 32-bit integers on iPad very fast. I have my own (ugly) implementation of MSB written on plain C, which is slow. ARM processors have CLZ (count leading zeroes) hardware command, which can be very useful for that. According to ARM reference there is an intrinsic C function __CLZ. How can I add support of ARM intrinsic functions to my Xcode project?
P.S. I've managed to find the way of accessing hardware CLZ from NEON (by including arm_neon.h), but that's not what I need, because it's only works with vector, but I need scalar MSB.
I found ARM intrinsic functions names on page 44 of ARM C language extensions. Some of them works in Xcode. This prints 31, as expected:
NSLog(#"%u", __builtin_clz(1));
Notes:
I haven't found any references of this in Apple docs. Most likely Xcode inherited those functions from LLVM or CLANG.
You don't need to include any special headers or frameworks to use those functions. Xcode IDE autocomplete doesn't know about them.
Only a few functions from extensions list are implemented. According to pages 12-13 of the same document it should be two header files: arm_acle.h for non-NEON intrinsics and arm_neon.h for NEON intrinsics. Xcode have only the second file, but some of the functions from the first file declared somewhere else.
This may be obvious, but if if you use ARM-specific instructions, you will not be able to run your app in the iOS simulator. The simulator uses the native x86-64 hardware of your Mac.
You could create a wrapper function that uses a compiler directive to use the ARM command or fall back to the "ugly" code if you don't have support.

Is there a way to detect VFP/NEON/Thumb/... on iOS at runtime?

So it's fairly easy to figure out what kind of CPU an iOS device runs by querying sysctlbyname("hw.cpusubtype", ...), but there seems to be no obvious way to figure out what features the CPU actually has (think VFP, NEON, Thumb, ...). Can someone think of a way to do this?
Basically, what I need is something similar to getauxval(AT_HWCAP) on Linux/Android, which returns a bit mask of features supported by the CPU.
A few things to note:
The information must be retrieved at runtime from the OS. No preprocessor defines.
Fat binaries is not a solution. I really do need to know this stuff in an ARM v6 binary.
Thanks in advance!
sysctlbyname has “hw.optional.neon”. I do not see a name for VFP, except “hw.optional.vfp_shortvector”, which is a deprecated feature.
Do a matrix float multiplaction via accelerate.framework and measure the execution time. The difference will be huge enough between Neon and VFP driven math, you simply cannot miss.
Thumb is always there, and the presence of NEON means armv7= Thumb2.
First, consider carefully whether or not you really need to support armv6 binaries for iOS. According to published version share statistics, something like 98.5% of iOS devices are running iOS 5.0 or later, which does not support armv6 devices (armv6 binaries will still run on current iOS versions, obviously, but all new apps should really be targeting armv7; there’s basically zero reason for your customers to be shipping armv6 binaries for iOS today).
Similarly, your concerns about code size are misplaced. If you provide a fat library, and your customer builds an armv6 binary against it, only the armv6 bits of your library will be built into their application. Furthermore, code size is usually a nearly trivial fraction of application bundle size; most of the size of an application comes from other resources.
Ok. All that aside, if you really want to pursue this: VFP and thumb are supported on all iOS devices, so there’s no need to check for support. You can check for NEON and thumb-2 using the method that Eric Postpischil suggested (all armv7 iOS devices have NEON support, so availability of NEON coincides exactly with availability of thumb-2).

Resources