Minimum matrix sizes to benefit from matrix multiplication on GPU - metal

I am particularly interested in matrix multiplication using Metal Performance Shaders, but answers about other frameworks are also fine.
Matrix multiplication is theoretically highly parallelisable operation. I need to multiply a lot of matrices by themselves like A’ A (where apostrophe stands for transposition). The size of the matrices A is about 4000 x 300. I was wondering if it’s worth porting the multiplication code to the GPU given the size of these matrices. As I understand, multiplying on GPU will also involve copying the data from main memory to GPU memory (I’m using eGPU, so the memory is not shared). Then there must be a trade off between additional effort for copying the data back and forth, and speed up in the calculations. So my question is: at what sizes of the matrices (approx) I could start to see the benefits of doing it on GPU?
P.S. There is also this article which basically says to not bother because GPU doesn’t help, something about its memory cache being slow (in general on all GPUs): https://graphics.stanford.edu/papers/gpumatrixmult/gpumatrixmult.pdf

I recommend you check out the vDSP section of Apple's Accelerate framework. They have very fast SIMD functions for matrix multiplication and transposition.
They also added some Swift-friendly APIs recently.

I've made a test, and it's significantly faster (x 8-9) on GPU for my case, even including all the memory copying from CPU to GPU and back. I am comparing float32 matrix multiplication performance, since Metal doesn't support float64.
let count = 100
let N = 7005
let K = 700
let DIV = 8
let K2 = (K / DIV) * DIV + (K % DIV > 0 ? 1 : 0) * DIV
let N2 = (N / DIV) * DIV + (N % DIV > 0 ? 1 : 0) * DIV
print(N2)
print(K2)
printTimeElapsedWhenRunningCode(title: "vDSP(f)") {
let ATf = [Float].init(repeating: Float(1), count: N*K)
let Af = [Float].init(repeating: Float(1), count: N*K)
var C = Array(repeating: Float(0), count: K*K)
for _ in 0..<count {
vDSP_mmul(ATf, 1,
Af, 1,
&C, 1,
vDSP_Length(K),
vDSP_Length(K),
vDSP_Length(N))
}
}
guard let bufferA = device.makeBuffer(length: K2 * N2 * MemoryLayout<Float>.stride,
options: [.storageModeManaged]) else {
fatalError("Could not make buffer A")
}
guard let bufferC = device.makeBuffer(length: K2 * K2 * MemoryLayout<Float>.stride,
options: [.storageModeManaged]) else {
fatalError("Could not make buffer C")
}
let descA = MPSMatrixDescriptor(dimensions: N2,
columns: K2,
rowBytes: K2 * MemoryLayout<Float>.stride,
dataType: .float32)
let descC = MPSMatrixDescriptor(dimensions: K2,
columns: K2,
rowBytes: K2 * MemoryLayout<Float>.stride,
dataType: .float32)
let matrixA = MPSMatrix(buffer: bufferA, descriptor: descA)
let matrixC = MPSMatrix(buffer: bufferC, descriptor: descC)
let matrixMultiplication = MPSMatrixMultiplication(device: device,
transposeLeft: true,
transposeRight: false,
resultRows: K2,
resultColumns: K2,
interiorColumns: N2,
alpha: 1,
beta: 0)
guard let commandQueue = device.makeCommandQueue() else {
fatalError("Could not make command queue")
}
printTimeElapsedWhenRunningCode(title: "Metal") {
let Af = [Float].init(repeating: Float(1), count: N*K)
let zeros = [Float].init(repeating: Float(0), count: K2)
for i in 0..<count {
var dest = bufferA.contents()
Af.withUnsafeBufferPointer { pA in
var from = pA.baseAddress!
for _ in 0..<N {
dest.copyMemory(from: from, byteCount: K)
dest += K
if K2 > K {
dest.copyMemory(from: zeros, byteCount: K2 - K)
dest += K2 - K
}
from += K
}
}
for _ in 0..<(N2-N) {
dest.copyMemory(from: zeros, byteCount: K2)
}
bufferA.didModifyRange(0..<N2*K2)
let commandBuffer = commandQueue.makeCommandBuffer()!
matrixMultiplication.encode(commandBuffer: commandBuffer,
leftMatrix: matrixA,
rightMatrix: matrixA,
resultMatrix: matrixC)
let blitEncoder = commandBuffer.makeBlitCommandEncoder()!
blitEncoder.synchronize(resource: bufferC)
blitEncoder.endEncoding()
commandBuffer.commit()
if i == count - 1 {
commandBuffer.waitUntilCompleted()
}
}
}
Output:
AMD Radeon RX 5700 XT
7008
704
Time elapsed for vDSP(f): 5.156805992126465 s.
Time elapsed for Metal: 0.6834449768066406 s.
DONE.

Related

How do I convert an OpenGL GLKView to a MTLKit Metal based View?

A few years ago, Apple started warning anyone using GLKit in their app that OpenGL was going away:
warning: OpenGLES is deprecated. Consider migrating to Metal instead
warning: GLKit is deprecated. Consider migrating to MetalKit instead
My app uses a complex OpenGL class, and I don't know either OpenGL or Metal. Apple has a few WWDC sessions on this, but they are targeted at OpenGL experts. Since Apple is going to remove OpenGL someday, I want to start this now before I only have a few months to do it. What should I do?
tldr;
Once I started seeing the build error messages in iOS12:
warning: OpenGLES is deprecated. Consider migrating to Metal instead
warning: GLKit is deprecated. Consider migrating to MetalKit instead
I knew I had to do something. Who knows exactly when Apple will remove OpenGL and GLKit? Trust me, you don't want to wait until you have just a few months to convert to Metal, as the process is by no means straight forward.
What follows is the process I used to convert
an Objective-C/OpenGL view into Metal. It was a long arduous process and several times
I put my head on my desk and cried in frustration.
The fundamental steps I took were ones I would suggest others adopt too:
Remove all business logic and anything not directly related to OpenGL from the view, and restructure the primary app as necessary.
Create a test harness app that you will use for the conversion, and absolutely put it under version control.
Add the OpenGL view to the test harness.
Once the ViewController can drive the view, and you can see it, you are ready to start the transition.
In my case, I had three hurtles to jump: convert the view to Swift, recreate the functionality in Metal, then replace all GLK vector and matrix values and operations to simd.
My suggestion for proceeding:
Convert any ObjectiveC to Swift (I used Swiftify, free for limited translation, however I had a subscription)
Add a MTKView to the test harness, and put code switches in the ViewController so as to alternately look at either view (comparing both was a big help to me).
Since I didn't know either OpenGL or Metal, I spent a lot of time downloading open source Metal projects and tutorials.
Create the Metal boilerplate (based on examples/tutorials) along with a shader.
Put a pad on your desk so when you bang your head in frustration trying to get anything to show in the Metal view you don't seriously hurt yourself.
Once you are over the hill, convert the GLK values/operations to simd, making use of the translation functions shown later.
I cannot stress this enough—commit every time you change a few operations and test them! You will surely break things and that way you can reference earlier working code.
The test harness will prove useful, as you will likely find that timing changes result in undesired behavior. In my case I created two harnesses, a second that had more of the app code in it so I could better debug the actual usage.
Project
I forked an open source project Panorama. The master branch contains the Metal/simd code, and the Swift-OpenGL branch contains the original ObjectiveC code along with the Swift conversion. This lets a reader compare the two side-by-side. However, you don't need to referernce that to gleen much of how to convert OpenGL code in ObjectiveC to Swift, or to convert GLKit vectors and matrices to simd, as follows.
ObjectiveC to Swift
The OpenGL code makes much use of pointers, and those are a bit more burdensome in Swift. For instance:
GLfloat *m_TexCoordsData; // treated as an array of pairs of floats
glTexCoordPointer(2, GL_FLOAT, 0, m_TexCoordsData);
became
struct Pointer2 {
private var array: [SIMD2<Float>]
init(size: Int) {
let n: SIMD2<Float> = [Float.nan, Float.nan]
array = Array<SIMD2<Float>>(repeating: n, count: size)
}
subscript(index: Int) -> SIMD2<Float>{
get { return array[index] }
set(newValue) { array[index] = newValue }
}
mutating func usingRawPointer(block: WithRawPtr) {
array.withUnsafeBytes { (bufPtr) -> Void in
block(bufPtr.baseAddress!)
}
}
}
private var tPtr: Pointer2 // m_TexCoordsData
tPtr.usingRawPointer(block: { (ptr) in
glTexCoordPointer(2, UInt32(GL_FLOAT), 0, ptr)
})
This all went away in the final Metal code.
Doing the conversion turned up latent bugs too!
In addition, I had to cast (convert) many of the values to stricter Swift types:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
became
glTexParameteri(UInt32(GL_TEXTURE_2D), UInt32(GL_TEXTURE_WRAP_S), GLint(GL_REPEAT))
and this process was just tedious. However, the Swift code was thinner and easier to read IMHO.
A couple of functions translated easily:
GLKQuaternion GLKQuaternionFromTwoVectors(GLKVector3 u, GLKVector3 v) {
GLKVector3 w = GLKVector3CrossProduct(u, v);
GLKQuaternion q = GLKQuaternionMake(w.x, w.y, w.z, GLKVector3DotProduct(u, v));
q.w += GLKQuaternionLength(q);
return GLKQuaternionNormalize(q);
}
became
func GLKQuaternionFromTwoVectors(_ u: GLKVector3, _ v: GLKVector3) -> GLKQuaternion {
let w = GLKVector3CrossProduct(u, v)
var q = GLKQuaternionMake(w.x, w.y, w.z, GLKVector3DotProduct(u, v))
q.w += GLKQuaternionLength(q)
return GLKQuaternionNormalize(q)
}
Later you will see that the translation to simd was not so easy.
OpenGL to Metal
Unfortunately, there is no magic wand here. Apple has some WWDC sessions on this, but they didn't really enlighten me. Metal uses two types of kernels: compute and shader, with compute the easier. However in my case I had to use a shader, which was more difficult for me to grasp.
Metal Resources
A good place to start if you know nothing of Metal is this Metal Tutorial on the Ray Wenderlich site. A second article there is even more helpful: Moving From OpenGL to Metal on the Ray Wenderlich site. Both have copious references to more Metal material.
Two other excellent articles I found helpful: Donald Pinckney's Blog (Older). Another helpful author: Alex Barbulescu
The guy who literally wrote the book on Metal is Warren Moore. His book and articles are invaluable!
Things to keep in mind
OpenGL uses a clip space of -1 to 1 (z values). You need to account for this in your shader. Warren Moore personally suggested I insure my shader was not returning negative z values by using this code:
v.z = v.z * 0.5 + v.w * 0.5;
This obviates the need to completely redo your OpenGL code that might have used negative z values.
The backgroundColor of a MTLView is not set by using that property, but setting the clearColor.
The communication from App space to shader space is done using structures, which must be separately defined in each respectively. For instance, in my app this struct:
private struct Uniforms {
let projectionMatrix: simd_float4x4
let attitudeMatrix: simd_float4x4
}
is defined in the shader as:
struct Uniforms {
float4x4 projectionMatrix;
float4x4 attitudeMatrix;
};
These structs are application defined.
Textures
If you are using images to create textures, it's a bit different in Metal. This
NSDictionary *options = [NSDictionary dictionaryWithObjectsAndKeys:[NSNumber numberWithBool:YES], GLKTextureLoaderOriginBottomLeft, nil];
GLKTextureInfo *info=[GLKTextureLoader textureWithContentsOfFile:path options:options error:&error];
glBindTexture(GL_TEXTURE_2D, info.name);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, IMAGE_SCALING);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, IMAGE_SCALING);
became
let loader: MTKTextureLoader = MTKTextureLoader(device: mtlDevice)
do {
let texture = try loader.newTexture(cgImage: cgImage, options: [
MTKTextureLoader.Option.origin: MTKTextureLoader.Origin.bottomLeft,
MTKTextureLoader.Option.SRGB: false // yes and image washed out
])
return texture
} catch {
Note that you need to set the origin to bottomLeft.
Final Comments
Unless you plan on really learning Metal deeply, you will be doing a lot of experimentation and asking questions. Having a test harness app will prove invaluable as you spend hours trying to get your code to do what you want.
Translate GLK to simd
Any Apple code is surely full of GLKVectors, GLKMatrices, and related functions. Unfortunately, there are no tools to convert them—you must do it by hand, line by line, and sometimes there is not simd equivalent. Sometime I used Xcode's search and replace, but not often.
GLK -> SIMD
First, to get the simd macros, add this to your source files: import simd
GLfloat -> Float
GLint -> Int
GLKMatrix4 -> simd_float4 (typealias for SIMD4)
GLKMatrix4Identity -> matrix_identity_float4x4 (not easy to find)
GLKMatrix4Invert -> simd_inverse(simd_float4x4)
GLKMatrix4Make -> simd_float4x4(simd_float4, simd_float4, simd_float4, simd_float4)
GLKMatrix4MakeFrustum -> no replacement, function provided below
GLKMatrix4MakeLookAt -> no replacement, function provided below
GLKMatrix4MakeWithQuaternion -> simd_matrix4x4(simd_quatf)
GLKMatrix4Multiply -> simd_float4x4 * simd_float4x4
GLKMatrix4MultiplyVector3 -> no replacement, function provided below
GLKMatrix4MultiplyVector4 ->simd_float4x4 * simd_float4
GLKQuaternion -> simd_quatf
GLKQuaternionLength -> simd_quatf.length
GLKQuaternionMake -> simd_quaternion(_ x: Float, _y: Float, _ z: Float, _ w: Float)
GLKQuaternionNormalize -> simd_quatf.normalized
GLKTextureInfo -> MTLTexture
GLKVector3 -> simd_float3
GLKVector3CrossProduct -> simd_cross(simd_float3, simd_float3)
GLKVector3DotProduct -> simd_dot(simd_float3, simd_float3)
GLKVector3Make -> simd_make_float3(_ x: Float, _y: Float, _ z: Float)
GLKVector3Normalize -> simd_normalize(simd_float3)
GLKVector4 -> simd_float4
GLKVector4Make -> simd_make_float4(_ x: Float, _y: Float, _ z: Float, _ w: Float)
I should note that Dash was a tremendous help in digging through the simd functions.
The two functions referenced above:
func simd_make_look_at_float4x4(
eyeX: Float,
eyeY: Float,
eyeZ: Float,
centerX: Float,
centerY: Float,
centerZ: Float,
upX: Float,
upY: Float,
upZ: Float
) -> simd_float4x4 {
// https://stackoverflow.com/questions/9053377/ios-questions-about-camera-information-within-glkmatrix4makelookat-result
let ev = simd_float3(eyeX, eyeY, eyeZ)
let cv = simd_float3(centerX, centerY, centerZ)
let uv = simd_float3(upX, upY, upZ)
let subbed = ev - cv
let n = simd_normalize(subbed)
let cross_p = simd_cross(uv, n)
let u = simd_normalize(cross_p)
let v = simd_cross(n, u)
let c0: simd_float4 = [u[0], v[0], n[0], 0]
let c1: simd_float4 = [u[1], v[1], n[1], 0]
let c2: simd_float4 = [u[2], v[2], n[2], 0]
let v0 = simd_dot(-1*u, ev)
let v1 = simd_dot(-1*v, ev)
let v2 = simd_dot(-1*n, ev)
let c3: simd_float4 = [v0, v1, v2, 1]
let m: simd_float4x4 = simd_float4x4(columns: (c0, c1, c2, c3))
return m
}
func simd_make_frustum_float4x4(frustum: Float, aspectRatio: Float) -> simd_float4x4 {
// https://www.khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml/glFrustum.xml
let left = -frustum
let right = frustum
let bottom = -frustum/aspectRatio
let top = frustum/aspectRatio
let near = PanoramaView.Z_NEAR
let far = PanoramaView.Z_FAR
let m00 = (2.0 * near) / (right - left)
let m11 = (2.0 * near) / (top - bottom)
let m20 = (right + left) / (right - left)
let m21 = (top + bottom) / (top - bottom)
let m22 = -1 * (far + near) / (far - near)
let m23 = Float(-1)
let m32 = -1 * (2 * far * near) / (far - near)
let c0: simd_float4 = [m00, 0, 0, 0]
let c1: simd_float4 = [0, m11, 0, 0]
let c2: simd_float4 = [m20, m21, m22, m23]
let c3: simd_float4 = [0, 0, m32, 0]
let m = simd_float4x4(columns: (c0, c1, c2, c3))
return m
}
// Translated from the original Panorama code
func simd_make_quaternion_from_two_vectors(_ u: simd_float3, _ v: simd_float3) -> simd_quatf {
let w: simd_float3 = simd_cross(u, v)
var q: simd_quatf = simd_quaternion(w.x, w.y, w.z, simd_dot(u, v))
q.real += q.length
return q.normalized
}
Translate back and forth between GLK and simd
These functions are found in the Panorama repository mentioned earlier, in a file GLK-Metal-Tools.swift. If as recommended you translate back and forth after your controller is solely simd, you can put these in your view to as you slowly remove the GLK code.
func glkV3_to_simd(_ v3: GLKVector3) -> simd_float3 {
let v: simd_float3 = simd_make_float3(v3.x, v3.y, v3.z)
return v
}
func simd3_to_glk(_ v3: simd_float3) -> GLKVector3 {
let v = GLKVector3Make(v3[0], v3[1], v3[2])
return v
}
func glkV4_to_simd(_ v3: GLKVector4) -> simd_float4 {
let v: simd_float4 = simd_make_float4(v3.x, v3.y, v3.z, v3.w)
return v
}
func simd4x4_to_glk(_ m: simd_float4x4) -> GLKMatrix4 {
var array: [GLKVector4] = []
for i in 0..<4 {
let fv: simd_float4 = m[i]
let v: GLKVector4 = GLKVector4Make(fv[0], fv[1], fv[2], fv[3]);
array.append(v)
}
let mg: GLKMatrix4 = GLKMatrix4MakeWithColumns(array[0], array[1], array[2], array[3]);
return mg;
}
func glkm4_to_simd(_ m: GLKMatrix4) -> simd_float4x4 {
var array: [simd_float4] = []
for i in 0..<4 {
let fv: GLKVector4 = GLKMatrix4GetColumn(m, Int32(i))
let v: simd_float4 = simd_make_float4(fv[0], fv[1], fv[2], fv[3]);
array.append(v)
}
let ms: simd_float4x4 = simd_matrix(array[0], array[1], array[2], array[3]);
return ms;
}
I used these print routines to check various values during development, you might find them of use too:
func print4x4SIMD(
msg: String,
m: simd_float4x4
) {
var s = ""
s += "---COL: \(msg)\n"
let (c0, c1, c2, c3) = m.columns
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 0, c0[0], c0[1], c0[2], c0[3])
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 1, c1[0], c1[1], c1[2], c1[3])
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 2, c2[0], c2[1], c2[2], c2[3])
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 3, c3[0], c3[1], c3[2], c3[3])
print("\n\(s)\n")
}
func print4x4GLK(
msg: String,
m: GLKMatrix4
) {
var s = ""
s += "---COL: \(msg)\n"
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 0, m.m00, m.m01, m.m02, m.m03)
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 1, m.m10, m.m11, m.m12, m.m13)
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 2, m.m20, m.m21, m.m22, m.m23)
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 3, m.m30, m.m31, m.m32, m.m33)
print("\n\(s)\n")
}
simd_float4x4 rotation
While I didn't use this yet, I may need it someday (it's untested):
func matrix_from_rotation(radians: Float, v _v: simd_float3) -> simd_float4x4 {
// https://www.haroldserrano.com/blog/rotating-a-2d-object-using-metal
let v: simd_float3 = simd_normalize(_v)
let cos: Float = cos(radians)
let cosp: Float = 1 - cos
let sin: Float = sin(radians)
let col0 = simd_float4(
cos + cosp * v.x * v.x,
cosp * v.x * v.y + v.z * sin,
cosp * v.x * v.z - v.y * sin,
0
)
let col1 = simd_float4(
cosp * v.x * v.y - v.z * sin,
cos + cosp * v.y * v.y,
cosp * v.y * v.z + v.x * sin,
0.0
)
let col2 = simd_float4(
cosp * v.x * v.z + v.y * sin,
cosp * v.y * v.z - v.x * sin,
cos + cosp * v.z * v.z,
0.0
)
let col3 = simd_float4(0, 0, 0, 1)
let m: simd_float4x4 = simd_float4x4(columns: (col0, col1, col2, col3))
return m
}
Conclusion
This project took me about half a year working one weekend day per week, but it spanned a period of 18 months. The reason: I spent so many days making no progress, getting bizarre corrupted output, or no output, that when I finally got the primary view to show in Metal as it did in, I put the project away. I was just too burned out to go on.
That said, the end of in iOS is going to end, and as the years roll by that end is getting nearer.
I was originally going to stop when I got Metal working with the GLK vectors and matrices, but was urged to convert to simd now by Warren Moore.
It was a moment of pure ecstasy when I finally built my company's app and there wasn't a single compiler warning related to GLKit!

How to convert colors?

I'd like to do some kind of special color comparison.
During my research I found out that the comparison should not be done using RGB spectrum because some different spectres like HSL & HSV are designed to "more closely align with the way human vision perceives color-making attributes" (quote wikipedia).
So I need a way to convert different colorSystems into each other.
One of the most important conversion for my purposes would be to convert HEX to HSL (using Swift)
Because I'm a bloody beginner this code is all that I've got so far:
// conversion HEX to HSL
HexToHSL("#F23CFF") // HSL should be "HSL: 296° 100% 62%"
func HexToHSL(_ hex: String) {
let rgb = HexToRgb(hex)
let r = rgb[0],
g = rgb[1],
b = rgb[2],
a = rgb[3]
}
func RgbToHSL(r: Int, g: Int, b: Int) -> [Int] {
let r = r/255, g = g/255, b = b/255;
let max = [r, g, b].max()!, min = [r, g, b].min()!;
let (h, s, l) = Double(max + min)*0.5; // "Expression type 'Double' is ambiguous without more context"
if (max == min) {
h = s = 0;
} else {
let d = max - min;
s = l > 0.5 ? d / (2 - max - min) : d / (max + min);
h /= 6;
}
return [ h, s, l ];
}
func HexToRgb(_ hex: String) -> [Int] {
let hex = hex.substring(fromIndex: 1)
var rgbValue:UInt32 = 0
Scanner(string: hex).scanHexInt32(&rgbValue)
let red = Int((rgbValue & 0xFF0000) >> 16),
green = Int((rgbValue & 0x00FF00) >> 8),
blue = Int(rgbValue & 0x0000FF),
alpha = Int(255.0)
return [red, green, blue, alpha]
}
Any help how to fix the color conversion from HEX to HSL would be very appreciated, thanks in advance!
Note: Theres also a javascript sample for some kind of color conversion. Maybe it's helpful :)
Edit: I have fixed the code for rgb to hsl like this:
func RgbToHSL(_ rgb: [Int]) -> [Double] {
let r = Double(rgb[0])/255, g = Double(rgb[1])/255, b = Double(rgb[2])/255;
let max = [r, g, b].max()!, min = [r, g, b].min()!;
var h = Double(max + min)*0.5,
s = Double(max + min)*0.5,
l = Double(max + min)*0.5;
if (max == min) {
h = 0
s = 0
l = 0
} else {
let d = max - min;
s = l > 0.5 ? d / (2 - max - min) : d / (max + min);
switch (max) {
case r: h = (g - b) / d + (g < b ? 6 : 0); break;
case g: h = (b - r) / d + 2; break;
case b: h = (r - g) / d + 4; break;
default: break;
}
h /= 6;
}
return [ h, s, l ];
}
... but the result for rgb = [242, 60, 255] will be [0.8222222222222223, 1.0, 0.61764705882352944] -- doesn't looks fine because it should be 296° 100% 62%! :o
In order to compare colours, thus perform colour differences you need to use a perceptually uniform colourspace.
HSL and HSV are actually very poor colourspaces to do so, they should not be used for proper colorimetric computations because their Lightness and Value axis are not actual perceptual representation of Luminance contrary to colourspaces such as CIE L*a*b* and CIE L*u*v*.
There are multiple ways to compute colour difference in colour science, usually the simplest and the one assuming you are using a uniform colourspace is euclidean distance.
This is what DeltaE CIE 1976 does using the CIE L*a*b* colourspace. The CIE noticed that some colours with low DeltaE values were actually appearing quite different, this was a side effect of CIE L*a*b* colourspace not being perceptually uniform enough. From there research has produced many new colour difference formulas and new perceptually uniform colourspaces.
Here is a non-exhaustive list from oldest to most recent of notable colour difference formulas and perceptually uniform colourspaces, notice the implementation complexity almost follows the list order:
DeltaE CIE 1976
DeltaE CMC
DeltaE CIE 1994
DIN99
IPT
DeltaE CIE 2000
CIECAM02 & CAM02-UCS
CAM16 & CAM16-UCS
ICTCP
JzAzBz
I would suggest to look at something like ICTCP or JzAzBz which offer good performance and are not super complex to implement or at the very least use CIE L*a*b* with euclidean distance but avoid using HSL and HSV.
We have reference implementations for everything mentioned here in Colour.

Thresholding image works in Swift and Matlab but not Core Image kernel

tl;dr: When I threshold an image with a specific threshold in Swift, I get clean segmentation (and double checking it in Matlab perfectly matches), but when I do it in a Core Image kernel, it doesn't segment cleanly. Do I have a bug in my kernel?
I'm trying to threshold with a Core Image kernel. My code seems simple enough:
class ThresholdFilter: CIFilter
{
var inputImage : CIImage?
var threshold: Float = 0.554688 // This is set to a good value via Otsu's method
var thresholdKernel = CIColorKernel(source:
"kernel vec4 thresholdKernel(sampler image, float threshold) {" +
" vec4 pixel = sample(image, samplerCoord(image));" +
" const vec3 rgbToIntensity = vec3(0.114, 0.587, 0.299);" +
" float intensity = dot(pixel.rgb, rgbToIntensity);" +
" return intensity < threshold ? vec4(0, 0, 0, 1) : vec4(1, 1, 1, 1);" +
"}")
override var outputImage: CIImage! {
guard let inputImage = inputImage,
let thresholdKernel = thresholdKernel else {
return nil
}
let extent = inputImage.extent
let arguments : [Any] = [inputImage, threshold]
return thresholdKernel.apply(extent: extent, arguments: arguments)
}
}
And images like this simple leaf:
get properly thresholded:
But some images, like this (with a muddier background):
Become garbage:
I don't think it's simply a matter of choosing a poor threshold, as I can use this exact same threshold in Matlab and get a clean segmentation:
To double check, I "redid" the kernel in outputImage in pure Swift, just printing to the console:
let img: CGImage = inputImage.cgImage!
let imgProvider: CGDataProvider = img.dataProvider!
let imgBitmapData: CFData = imgProvider.data!
var imgBuffer = vImage_Buffer(data: UnsafeMutableRawPointer(mutating: CFDataGetBytePtr(imgBitmapData)), height: vImagePixelCount(img.height), width: vImagePixelCount(img.width), rowBytes: img.bytesPerRow)
for i in 0...img.height {
for j in 0...img.width {
let test = imgBuffer.data.load(fromByteOffset: (i * img.width + j) * 4, as: UInt32.self)
let r = Float((test >> 16) & 255) / 256
let g = Float((test >> 8) & 255) / 256
let b = Float(test & 255) / 256
let intensity = 0.114 * r + 0.587 * g + 0.299 * b
print(intensity > threshold ? "1" : "0", terminator: "")
}
print("")
}
And this prints a cleanly segmented image in 0s and 1s. I can't zoom out far enough to get it on my screen all at once, but you can see the hole in the leaf clearly segmented:
I was worried that pixel intensities might be different between Matlab and the kernel (since RGB to intensity can be done in different ways), so I used this console-printing method to check the exact intensities of different pixels, and they all matched the intensities I'm seeing for the same image in Matlab. As I'm using the same dot product between Swift and the kernel, I'm at a loss for why this threshold would work in Swift and Matlab, but not in the kernel.
Any ideas what's going on?
Solved it.
Core Image "helpfully" translates everything into light-linear color space because certain filters are helped by that, and you have to explicitly disable that if you want true colors. https://developer.apple.com/library/content/documentation/GraphicsImaging/Conceptual/CoreImaging/ci_performance/ci_performance.html#//apple_ref/doc/uid/TP30001185-CH10-SW7
You can do so when initializing the CIImage that you pass to the filter:
filter.inputImage = CIImage(image: image!, options: [kCIImageColorSpace: NSNull()])
I have no idea why this is only done within CIFilters and not everywhere else in an app or across all the other types of image processing; this seems like a very inconsistent and hidden "feature".

How to normalize disparity data in iOS?

In WWDC session "Image Editing with Depth" they mentioned few times normalizedDisparity and normalizedDisparityImage:
"The basic idea is that we're going to map our normalized disparity
values into values between 0 and 1"
"So once you know the min and max you can normalize the depth or disparity between 0 and 1."
I tried to first get the disparit image like this:
let disparityImage = depthImage.applyingFilter(
"CIDepthToDisparity", withInputParameters: nil)
Then I tried to get depthDataMap and do normalization but it didn't work. I'm I on the right track? would be appreciate some hint on what to do.
Edit:
This is my test code, sorry for the quality. I get the min and max then I try to loop over the data to normalize it (let normalizedPoint = (point - min) / (max - min))
let depthDataMap = depthData!.depthDataMap
let width = CVPixelBufferGetWidth(depthDataMap) //768 on an iPhone 7+
let height = CVPixelBufferGetHeight(depthDataMap) //576 on an iPhone 7+
CVPixelBufferLockBaseAddress(depthDataMap, CVPixelBufferLockFlags(rawValue: 0))
// Convert the base address to a safe pointer of the appropriate type
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthDataMap),
to: UnsafeMutablePointer<Float32>.self)
var min = floatBuffer[0]
var max = floatBuffer[0]
for x in 0..<width{
for y in 0..<height{
let distanceAtXYPoint = floatBuffer[Int(x * y)]
if(distanceAtXYPoint < min){
min = distanceAtXYPoint
}
if(distanceAtXYPoint > max){
max = distanceAtXYPoint
}
}
}
What I expected is the the data will reflect the disparity where the user clicked on the image but it didn't match. The code to find the disparity where the user clicked is here:
// Apply the filter with the sampleRect from the user’s tap. Don’t forget to clamp!
let minMaxImage = normalized?.clampingToExtent().applyingFilter(
"CIAreaMinMaxRed", withInputParameters:
[kCIInputExtentKey : CIVector(cgRect:rect2)])
// A four-byte buffer to store a single pixel value
var pixel = [UInt8](repeating: 0, count: 4)
// Render the image to a 1x1 rect. Be sure to use a nil color space.
context.render(minMaxImage!, toBitmap: &pixel, rowBytes: 4,
bounds: CGRect(x:0, y:0, width:1, height:1),
format: kCIFormatRGBA8, colorSpace: nil)
// The max is stored in the green channel. Min is in the red.
let disparity = Float(pixel[1]) / 255.0
There's a new blog post on raywenderlich.com called "Image Depth Maps Tutorial for iOS" contains sample app and details related to working with depth. The sample code shows how to normalize the depth data using a CVPixelBuffer extension:
extension CVPixelBuffer {
func normalize() {
let width = CVPixelBufferGetWidth(self)
let height = CVPixelBufferGetHeight(self)
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(self), to: UnsafeMutablePointer<Float>.self)
var minPixel: Float = 1.0
var maxPixel: Float = 0.0
for y in 0 ..< height {
for x in 0 ..< width {
let pixel = floatBuffer[y * width + x]
minPixel = min(pixel, minPixel)
maxPixel = max(pixel, maxPixel)
}
}
let range = maxPixel - minPixel
for y in 0 ..< height {
for x in 0 ..< width {
let pixel = floatBuffer[y * width + x]
floatBuffer[y * width + x] = (pixel - minPixel) / range
}
}
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}
}
Something to keep in mind when working with depth data that they are lower resolution than the actual image so you need to scale up (more info in the blog and in the WWDC video)
Will's answer above is very good, but it can be improved as follows. I'm using it with depth data from a photo, it's possible that if the depth data doesn't follow 16-bits, as mentioned above, it won't work. Haven't found such a photo yet. I'm surprised there isn't a filter to handle this in Core Image.
extension CVPixelBuffer {
func normalize() {
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
let width = CVPixelBufferGetWidthOfPlane(self, 0)
let height = CVPixelBufferGetHeightOfPlane(self, 0)
let count = width * height
let pixelBufferBase = unsafeBitCast(CVPixelBufferGetBaseAddressOfPlane(self, 0), to: UnsafeMutablePointer<Float>.self)
let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: pixelBufferBase, count: count)
let maxValue = vDSP.maximum(depthCopyBuffer)
let minValue = vDSP.minimum(depthCopyBuffer)
let range = maxValue - minValue
let negMinValue = -minValue
let subtractVector = vDSP.add(negMinValue, depthCopyBuffer)
let normalizedDisparity = vDSP.divide(subtractVector, range)
pixelBufferBase.initialize(from: normalizedDisparity, count: count)
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}
}
Try using the Accelerate Framework vDSP vector functions.. here is a normalize in two functions.
to change the cvPixel buffer to a 0..1 normalized range
myCVPixelBuffer.setUpNormalize()
import Accelerate
extension CVPixelBuffer {
func vectorNormalize( targetVector: UnsafeMutableBufferPointer<Float>) -> [Float] {
// range = max - min
// normalized to 0..1 is (pixel - minPixel) / range
// see Documentation "Using vDSP for Vector-based Arithmetic" in vDSP under system "Accelerate" documentation
// see also the Accelerate documentation section 'Vector extrema calculation'
// Maximium static func maximum<U>(U) -> Float
// Returns the maximum element of a single-precision vector.
//static func minimum<U>(U) -> Float
// Returns the minimum element of a single-precision vector.
let maxValue = vDSP.maximum(targetVector)
let minValue = vDSP.minimum(targetVector)
let range = maxValue - minValue
let negMinValue = -minValue
let subtractVector = vDSP.add(negMinValue, targetVector)
// adding negative value is subtracting
let result = vDSP.divide(subtractVector, range)
return result
}
func setUpNormalize() -> CVPixelBuffer {
// grayscale buffer float32 ie Float
// return normalized CVPixelBuffer
CVPixelBufferLockBaseAddress(self,
CVPixelBufferLockFlags(rawValue: 0))
let width = CVPixelBufferGetWidthOfPlane(self, 0)
let height = CVPixelBufferGetHeightOfPlane(self, 0)
let count = width * height
let bufferBaseAddress = CVPixelBufferGetBaseAddressOfPlane(self, 0)
// UnsafeMutableRawPointer
let pixelBufferBase = unsafeBitCast(bufferBaseAddress, to: UnsafeMutablePointer<Float>.self)
let depthCopy = UnsafeMutablePointer<Float>.allocate(capacity: count)
depthCopy.initialize(from: pixelBufferBase, count: count)
let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: depthCopy, count: count)
let normalizedDisparity = vectorNormalize(targetVector: depthCopyBuffer)
pixelBufferBase.initialize(from: normalizedDisparity, count: count)
// copy back the normalized map into the CVPixelBuffer
depthCopy.deallocate()
// depthCopyBuffer.deallocate()
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
return self
}
}
You can see it in action in a modified version of the Apple sample 'PhotoBrowse' app at
https://github.com/racewalkWill/PhotoBrowseModified

Is it feasable to use AVAudioEngine to detect pitch in real time?

I'm trying to write a music app where detection of pitch is the core of it all. I've seen solutions to this problem as well as apps on the AppStore. However most of them are pretty dated and I'd like to do this is Swift. I've been looking at AVAudioEngine as a way to do this, but I find the documentation lacking or maybe I haven't been looking hard enough.
What I have found is that I can tap the inputNode bus like this:
self.audioEngine = AVAudioEngine()
self.audioInputNode = self.audioEngine.inputNode!
self.audioInputNode.installTapOnBus(0, bufferSize:256, format: audioInputNode.outputFormatForBus(0), block: {(buffer, time) in
self.analyzeBuffer(buffer)
})
The bus is tapped 2-3 times per second and the buffer contains more than 16000 floats for each tap. Are these amplitude samples from the microphone?
The docs at least claims it's output from the node: "The buffer parameter is a buffer of audio captured from the output of an AVAudioNode."
Is it possible to use AVAudioEngine to detect pitch in real time or should I go about this another way?
A few different concepts here. AVAudioEngine is just the engine that gets you the raw PCM data, you could use Novocaine, Core-Audio directly or other options.
The PCM data is the floating point samples from the microphone.
As far as the pitch tracking goes, there are various techniques. One thing to note is that frequency detection is different from pitch detection.
FFT which is good but will not be able to detect the pitch of signals with missing fundamentals. You would need to run the signal through a low pass filter to reduce possible aliasing of frequencies higher than the Nyquist Frequency and then window it before passing it to the FFT, this is to reduce spectral leakage. The FFT will output spectral content inside a series of bins, the bin with the highest value is said to be the strongest frequency in the signal.
Autocorrelation which can give better results. It's basically the signal correlated with itself.
In the end its down to what you would like to detect, there are a few considerations to take into account. Things like the male voice and certain instruments can give incorrect results through a normal FFT running on buffers that haven't been preprocessed.
Check this PITCH DETECTION METHODS REVIEW
As far as Swift goes, it's not well suited for real-time, performance focused systems. You can check the old benchmarks of Swift vs C++
the C++ FFT implementation is over 24x faster
I realize that Hellium3 is really giving me information to what pitch is and if it's a good idea to do these things with Swift.
My question was originally about if tapping the PCM bus is the way to obtain input signals from the microphone.
Since asking this question I've done exactly that. Use the data obtained by tapping the PCM bus and analyse the buffer windows.
It works really well and it was my lack of understanding of what a PCM bus, buffer and sampling frequency is that made me ask the question in the first place.
Knowing those three makes it easier to see that this is right on.
Edit: On demand I'll paste my (deprecated) implementation of the PitchDetector.
class PitchDetector {
var samplingFrequency: Float
var harmonicConstant: Float
init(harmonicConstant: Float, samplingFrequency: Float) {
self.harmonicConstant = harmonicConstant
self.samplingFrequency = samplingFrequency
}
//------------------------------------------------------------------------------
// MARK: Signal processing
//------------------------------------------------------------------------------
func detectPitch(_ samples: [Float]) -> Pitch? {
let snac = self.snac(samples)
let (lags, peaks) = self.findKeyMaxima(snac)
let (τBest, clarity) = self.findBestPeak(lags, peaks: peaks)
if τBest > 0 {
let frequency = self.samplingFrequency / τBest
if PitchManager.sharedManager.inManageableRange(frequency) {
return Pitch(measuredFrequency: frequency, clarity: clarity)
}
}
return nil
}
// Returns a Special Normalision of the AutoCorrelation function array for various lags with values between -1 and 1
private func snac(_ samples: [Float]) -> [Float] {
let τMax = Int(self.samplingFrequency / PitchManager.sharedManager.noteFrequencies.first!) + 1
var snac = [Float](repeating: 0.0, count: samples.count)
let acf = self.acf(samples)
let norm = self.m(samples)
for τ in 1 ..< τMax {
snac[τ] = 2 * acf[τ + acf.count / 2] / norm[τ]
}
return snac
}
// Auto correlation function
private func acf(_ x: [Float]) -> [Float] {
let resultSize = 2 * x.count - 1
var result = [Float](repeating: 0, count: resultSize)
let xPad = repeatElement(Float(0.0), count: x.count - 1)
let xPadded = xPad + x + xPad
vDSP_conv(xPadded, 1, x, 1, &result, 1, vDSP_Length(resultSize), vDSP_Length(x.count))
return result
}
private func m(_ samples: [Float]) -> [Float] {
var sum: Float = 0.0
for i in 0 ..< samples.count {
sum += 2.0 * samples[i] * samples[i]
}
var m = [Float](repeating: 0.0, count: samples.count)
m[0] = sum
for i in 1 ..< samples.count {
m[i] = m[i - 1] - samples[i - 1] * samples[i - 1] - samples[samples.count - i - 1] * samples[samples.count - i - 1]
}
return m
}
/**
* Finds the indices of all key maximum points in data
*/
private func findKeyMaxima(_ data: [Float]) -> (lags: [Float], peaks: [Float]) {
var keyMaximaLags: [Float] = []
var keyMaximaPeaks: [Float] = []
var newPeakIncoming = false
var currentBestPeak: Float = 0.0
var currentBestτ = -1
for τ in 0 ..< data.count {
newPeakIncoming = newPeakIncoming || ((data[τ] < 0) && (data[τ + 1] > 0))
if newPeakIncoming {
if data[τ] > currentBestPeak {
currentBestPeak = data[τ]
currentBestτ = τ
}
let zeroCrossing = (data[τ] > 0) && (data[τ + 1] < 0)
if zeroCrossing {
let (τEst, peakEst) = self.approximateTruePeak(currentBestτ, data: data)
keyMaximaLags.append(τEst)
keyMaximaPeaks.append(peakEst)
newPeakIncoming = false
currentBestPeak = 0.0
currentBestτ = -1
}
}
}
if keyMaximaLags.count <= 1 {
let unwantedPeakOfLowPitchTone = (keyMaximaLags.count == 1 && data[Int(keyMaximaLags[0])] < data.max()!)
if unwantedPeakOfLowPitchTone {
keyMaximaLags.removeAll()
keyMaximaPeaks.removeAll()
}
let (τEst, peakEst) = self.approximateTruePeak(data.index(of: data.max()!)!, data: data)
keyMaximaLags.append(τEst)
keyMaximaPeaks.append(peakEst)
}
return (lags: keyMaximaLags, peaks: keyMaximaPeaks)
}
/**
* Approximates the true peak according to https://www.dsprelated.com/freebooks/sasp/Quadratic_Interpolation_Spectral_Peaks.html
*/
private func approximateTruePeak(_ τ: Int, data: [Float]) -> (τEst: Float, peakEst: Float) {
let α = data[τ - 1]
let β = data[τ]
let γ = data[τ + 1]
let p = 0.5 * ((α - γ) / (α - 2.0 * β + γ))
let peakEst = min(1.0, β - 0.25 * (α - γ) * p)
let τEst = Float(τ) + p
return (τEst, peakEst)
}
private func findBestPeak(_ lags: [Float], peaks: [Float]) -> (τBest: Float, clarity: Float) {
let threshold: Float = self.harmonicConstant * peaks.max()!
for i in 0 ..< peaks.count {
if peaks[i] > threshold {
return (τBest: lags[i], clarity: peaks[i])
}
}
return (τBest: lags[0], clarity: peaks[0])
}
}
All credit to Philip McLeod whose research is used in my implementation above. http://www.cs.otago.ac.nz/research/publications/oucs-2008-03.pdf

Resources