Is it feasable to use AVAudioEngine to detect pitch in real time?

Is it feasable to use AVAudioEngine to detect pitch in real time? - ios

I'm trying to write a music app where detection of pitch is the core of it all. I've seen solutions to this problem as well as apps on the AppStore. However most of them are pretty dated and I'd like to do this is Swift. I've been looking at AVAudioEngine as a way to do this, but I find the documentation lacking or maybe I haven't been looking hard enough.
What I have found is that I can tap the inputNode bus like this:
self.audioEngine = AVAudioEngine()
self.audioInputNode = self.audioEngine.inputNode!
self.audioInputNode.installTapOnBus(0, bufferSize:256, format: audioInputNode.outputFormatForBus(0), block: {(buffer, time) in
self.analyzeBuffer(buffer)
})
The bus is tapped 2-3 times per second and the buffer contains more than 16000 floats for each tap. Are these amplitude samples from the microphone?
The docs at least claims it's output from the node: "The buffer parameter is a buffer of audio captured from the output of an AVAudioNode."
Is it possible to use AVAudioEngine to detect pitch in real time or should I go about this another way?

A few different concepts here. AVAudioEngine is just the engine that gets you the raw PCM data, you could use Novocaine, Core-Audio directly or other options.
The PCM data is the floating point samples from the microphone.
As far as the pitch tracking goes, there are various techniques. One thing to note is that frequency detection is different from pitch detection.
FFT which is good but will not be able to detect the pitch of signals with missing fundamentals. You would need to run the signal through a low pass filter to reduce possible aliasing of frequencies higher than the Nyquist Frequency and then window it before passing it to the FFT, this is to reduce spectral leakage. The FFT will output spectral content inside a series of bins, the bin with the highest value is said to be the strongest frequency in the signal.
Autocorrelation which can give better results. It's basically the signal correlated with itself.
In the end its down to what you would like to detect, there are a few considerations to take into account. Things like the male voice and certain instruments can give incorrect results through a normal FFT running on buffers that haven't been preprocessed.
Check this PITCH DETECTION METHODS REVIEW
As far as Swift goes, it's not well suited for real-time, performance focused systems. You can check the old benchmarks of Swift vs C++
the C++ FFT implementation is over 24x faster

I realize that Hellium3 is really giving me information to what pitch is and if it's a good idea to do these things with Swift.
My question was originally about if tapping the PCM bus is the way to obtain input signals from the microphone.
Since asking this question I've done exactly that. Use the data obtained by tapping the PCM bus and analyse the buffer windows.
It works really well and it was my lack of understanding of what a PCM bus, buffer and sampling frequency is that made me ask the question in the first place.
Knowing those three makes it easier to see that this is right on.
Edit: On demand I'll paste my (deprecated) implementation of the PitchDetector.
class PitchDetector {
var samplingFrequency: Float
var harmonicConstant: Float
init(harmonicConstant: Float, samplingFrequency: Float) {
self.harmonicConstant = harmonicConstant
self.samplingFrequency = samplingFrequency
}
//------------------------------------------------------------------------------
// MARK: Signal processing
//------------------------------------------------------------------------------
func detectPitch(_ samples: [Float]) -> Pitch? {
let snac = self.snac(samples)
let (lags, peaks) = self.findKeyMaxima(snac)
let (τBest, clarity) = self.findBestPeak(lags, peaks: peaks)
if τBest > 0 {
let frequency = self.samplingFrequency / τBest
if PitchManager.sharedManager.inManageableRange(frequency) {
return Pitch(measuredFrequency: frequency, clarity: clarity)
}
}
return nil
}
// Returns a Special Normalision of the AutoCorrelation function array for various lags with values between -1 and 1
private func snac(_ samples: [Float]) -> [Float] {
let τMax = Int(self.samplingFrequency / PitchManager.sharedManager.noteFrequencies.first!) + 1
var snac = [Float](repeating: 0.0, count: samples.count)
let acf = self.acf(samples)
let norm = self.m(samples)
for τ in 1 ..< τMax {
snac[τ] = 2 * acf[τ + acf.count / 2] / norm[τ]
}
return snac
}
// Auto correlation function
private func acf(_ x: [Float]) -> [Float] {
let resultSize = 2 * x.count - 1
var result = [Float](repeating: 0, count: resultSize)
let xPad = repeatElement(Float(0.0), count: x.count - 1)
let xPadded = xPad + x + xPad
vDSP_conv(xPadded, 1, x, 1, &result, 1, vDSP_Length(resultSize), vDSP_Length(x.count))
return result
}
private func m(_ samples: [Float]) -> [Float] {
var sum: Float = 0.0
for i in 0 ..< samples.count {
sum += 2.0 * samples[i] * samples[i]
}
var m = [Float](repeating: 0.0, count: samples.count)
m[0] = sum
for i in 1 ..< samples.count {
m[i] = m[i - 1] - samples[i - 1] * samples[i - 1] - samples[samples.count - i - 1] * samples[samples.count - i - 1]
}
return m
}
/**
* Finds the indices of all key maximum points in data
*/
private func findKeyMaxima(_ data: [Float]) -> (lags: [Float], peaks: [Float]) {
var keyMaximaLags: [Float] = []
var keyMaximaPeaks: [Float] = []
var newPeakIncoming = false
var currentBestPeak: Float = 0.0
var currentBestτ = -1
for τ in 0 ..< data.count {
newPeakIncoming = newPeakIncoming || ((data[τ] < 0) && (data[τ + 1] > 0))
if newPeakIncoming {
if data[τ] > currentBestPeak {
currentBestPeak = data[τ]
currentBestτ = τ
}
let zeroCrossing = (data[τ] > 0) && (data[τ + 1] < 0)
if zeroCrossing {
let (τEst, peakEst) = self.approximateTruePeak(currentBestτ, data: data)
keyMaximaLags.append(τEst)
keyMaximaPeaks.append(peakEst)
newPeakIncoming = false
currentBestPeak = 0.0
currentBestτ = -1
}
}
}
if keyMaximaLags.count <= 1 {
let unwantedPeakOfLowPitchTone = (keyMaximaLags.count == 1 && data[Int(keyMaximaLags[0])] < data.max()!)
if unwantedPeakOfLowPitchTone {
keyMaximaLags.removeAll()
keyMaximaPeaks.removeAll()
}
let (τEst, peakEst) = self.approximateTruePeak(data.index(of: data.max()!)!, data: data)
keyMaximaLags.append(τEst)
keyMaximaPeaks.append(peakEst)
}
return (lags: keyMaximaLags, peaks: keyMaximaPeaks)
}
/**
* Approximates the true peak according to https://www.dsprelated.com/freebooks/sasp/Quadratic_Interpolation_Spectral_Peaks.html
*/
private func approximateTruePeak(_ τ: Int, data: [Float]) -> (τEst: Float, peakEst: Float) {
let α = data[τ - 1]
let β = data[τ]
let γ = data[τ + 1]
let p = 0.5 * ((α - γ) / (α - 2.0 * β + γ))
let peakEst = min(1.0, β - 0.25 * (α - γ) * p)
let τEst = Float(τ) + p
return (τEst, peakEst)
}
private func findBestPeak(_ lags: [Float], peaks: [Float]) -> (τBest: Float, clarity: Float) {
let threshold: Float = self.harmonicConstant * peaks.max()!
for i in 0 ..< peaks.count {
if peaks[i] > threshold {
return (τBest: lags[i], clarity: peaks[i])
}
}
return (τBest: lags[0], clarity: peaks[0])
}
}
All credit to Philip McLeod whose research is used in my implementation above. http://www.cs.otago.ac.nz/research/publications/oucs-2008-03.pdf

Related

How do I convert an OpenGL GLKView to a MTLKit Metal based View?

A few years ago, Apple started warning anyone using GLKit in their app that OpenGL was going away:
warning: OpenGLES is deprecated. Consider migrating to Metal instead
warning: GLKit is deprecated. Consider migrating to MetalKit instead
My app uses a complex OpenGL class, and I don't know either OpenGL or Metal. Apple has a few WWDC sessions on this, but they are targeted at OpenGL experts. Since Apple is going to remove OpenGL someday, I want to start this now before I only have a few months to do it. What should I do?

tldr;
Once I started seeing the build error messages in iOS12:
warning: OpenGLES is deprecated. Consider migrating to Metal instead
warning: GLKit is deprecated. Consider migrating to MetalKit instead
I knew I had to do something. Who knows exactly when Apple will remove OpenGL and GLKit? Trust me, you don't want to wait until you have just a few months to convert to Metal, as the process is by no means straight forward.
What follows is the process I used to convert
an Objective-C/OpenGL view into Metal. It was a long arduous process and several times
I put my head on my desk and cried in frustration.
The fundamental steps I took were ones I would suggest others adopt too:
Remove all business logic and anything not directly related to OpenGL from the view, and restructure the primary app as necessary.
Create a test harness app that you will use for the conversion, and absolutely put it under version control.
Add the OpenGL view to the test harness.
Once the ViewController can drive the view, and you can see it, you are ready to start the transition.
In my case, I had three hurtles to jump: convert the view to Swift, recreate the functionality in Metal, then replace all GLK vector and matrix values and operations to simd.
My suggestion for proceeding:
Convert any ObjectiveC to Swift (I used Swiftify, free for limited translation, however I had a subscription)
Add a MTKView to the test harness, and put code switches in the ViewController so as to alternately look at either view (comparing both was a big help to me).
Since I didn't know either OpenGL or Metal, I spent a lot of time downloading open source Metal projects and tutorials.
Create the Metal boilerplate (based on examples/tutorials) along with a shader.
Put a pad on your desk so when you bang your head in frustration trying to get anything to show in the Metal view you don't seriously hurt yourself.
Once you are over the hill, convert the GLK values/operations to simd, making use of the translation functions shown later.
I cannot stress this enough—commit every time you change a few operations and test them! You will surely break things and that way you can reference earlier working code.
The test harness will prove useful, as you will likely find that timing changes result in undesired behavior. In my case I created two harnesses, a second that had more of the app code in it so I could better debug the actual usage.
Project
I forked an open source project Panorama. The master branch contains the Metal/simd code, and the Swift-OpenGL branch contains the original ObjectiveC code along with the Swift conversion. This lets a reader compare the two side-by-side. However, you don't need to referernce that to gleen much of how to convert OpenGL code in ObjectiveC to Swift, or to convert GLKit vectors and matrices to simd, as follows.
ObjectiveC to Swift
The OpenGL code makes much use of pointers, and those are a bit more burdensome in Swift. For instance:
GLfloat *m_TexCoordsData; // treated as an array of pairs of floats
glTexCoordPointer(2, GL_FLOAT, 0, m_TexCoordsData);
became
struct Pointer2 {
private var array: [SIMD2<Float>]
init(size: Int) {
let n: SIMD2<Float> = [Float.nan, Float.nan]
array = Array<SIMD2<Float>>(repeating: n, count: size)
}
subscript(index: Int) -> SIMD2<Float>{
get { return array[index] }
set(newValue) { array[index] = newValue }
}
mutating func usingRawPointer(block: WithRawPtr) {
array.withUnsafeBytes { (bufPtr) -> Void in
block(bufPtr.baseAddress!)
}
}
}
private var tPtr: Pointer2 // m_TexCoordsData
tPtr.usingRawPointer(block: { (ptr) in
glTexCoordPointer(2, UInt32(GL_FLOAT), 0, ptr)
})
This all went away in the final Metal code.
Doing the conversion turned up latent bugs too!
In addition, I had to cast (convert) many of the values to stricter Swift types:
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
became
glTexParameteri(UInt32(GL_TEXTURE_2D), UInt32(GL_TEXTURE_WRAP_S), GLint(GL_REPEAT))
and this process was just tedious. However, the Swift code was thinner and easier to read IMHO.
A couple of functions translated easily:
GLKQuaternion GLKQuaternionFromTwoVectors(GLKVector3 u, GLKVector3 v) {
GLKVector3 w = GLKVector3CrossProduct(u, v);
GLKQuaternion q = GLKQuaternionMake(w.x, w.y, w.z, GLKVector3DotProduct(u, v));
q.w += GLKQuaternionLength(q);
return GLKQuaternionNormalize(q);
}
became
func GLKQuaternionFromTwoVectors(_ u: GLKVector3, _ v: GLKVector3) -> GLKQuaternion {
let w = GLKVector3CrossProduct(u, v)
var q = GLKQuaternionMake(w.x, w.y, w.z, GLKVector3DotProduct(u, v))
q.w += GLKQuaternionLength(q)
return GLKQuaternionNormalize(q)
}
Later you will see that the translation to simd was not so easy.
OpenGL to Metal
Unfortunately, there is no magic wand here. Apple has some WWDC sessions on this, but they didn't really enlighten me. Metal uses two types of kernels: compute and shader, with compute the easier. However in my case I had to use a shader, which was more difficult for me to grasp.
Metal Resources
A good place to start if you know nothing of Metal is this Metal Tutorial on the Ray Wenderlich site. A second article there is even more helpful: Moving From OpenGL to Metal on the Ray Wenderlich site. Both have copious references to more Metal material.
Two other excellent articles I found helpful: Donald Pinckney's Blog (Older). Another helpful author: Alex Barbulescu
The guy who literally wrote the book on Metal is Warren Moore. His book and articles are invaluable!
Things to keep in mind
OpenGL uses a clip space of -1 to 1 (z values). You need to account for this in your shader. Warren Moore personally suggested I insure my shader was not returning negative z values by using this code:
v.z = v.z * 0.5 + v.w * 0.5;
This obviates the need to completely redo your OpenGL code that might have used negative z values.
The backgroundColor of a MTLView is not set by using that property, but setting the clearColor.
The communication from App space to shader space is done using structures, which must be separately defined in each respectively. For instance, in my app this struct:
private struct Uniforms {
let projectionMatrix: simd_float4x4
let attitudeMatrix: simd_float4x4
}
is defined in the shader as:
struct Uniforms {
float4x4 projectionMatrix;
float4x4 attitudeMatrix;
};
These structs are application defined.
Textures
If you are using images to create textures, it's a bit different in Metal. This
NSDictionary *options = [NSDictionary dictionaryWithObjectsAndKeys:[NSNumber numberWithBool:YES], GLKTextureLoaderOriginBottomLeft, nil];
GLKTextureInfo *info=[GLKTextureLoader textureWithContentsOfFile:path options:options error:&error];
glBindTexture(GL_TEXTURE_2D, info.name);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_REPEAT);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, IMAGE_SCALING);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, IMAGE_SCALING);
became
let loader: MTKTextureLoader = MTKTextureLoader(device: mtlDevice)
do {
let texture = try loader.newTexture(cgImage: cgImage, options: [
MTKTextureLoader.Option.origin: MTKTextureLoader.Origin.bottomLeft,
MTKTextureLoader.Option.SRGB: false // yes and image washed out
])
return texture
} catch {
Note that you need to set the origin to bottomLeft.
Final Comments
Unless you plan on really learning Metal deeply, you will be doing a lot of experimentation and asking questions. Having a test harness app will prove invaluable as you spend hours trying to get your code to do what you want.
Translate GLK to simd
Any Apple code is surely full of GLKVectors, GLKMatrices, and related functions. Unfortunately, there are no tools to convert them—you must do it by hand, line by line, and sometimes there is not simd equivalent. Sometime I used Xcode's search and replace, but not often.
GLK -> SIMD
First, to get the simd macros, add this to your source files: import simd
GLfloat -> Float
GLint -> Int
GLKMatrix4 -> simd_float4 (typealias for SIMD4)
GLKMatrix4Identity -> matrix_identity_float4x4 (not easy to find)
GLKMatrix4Invert -> simd_inverse(simd_float4x4)
GLKMatrix4Make -> simd_float4x4(simd_float4, simd_float4, simd_float4, simd_float4)
GLKMatrix4MakeFrustum -> no replacement, function provided below
GLKMatrix4MakeLookAt -> no replacement, function provided below
GLKMatrix4MakeWithQuaternion -> simd_matrix4x4(simd_quatf)
GLKMatrix4Multiply -> simd_float4x4 * simd_float4x4
GLKMatrix4MultiplyVector3 -> no replacement, function provided below
GLKMatrix4MultiplyVector4 ->simd_float4x4 * simd_float4
GLKQuaternion -> simd_quatf
GLKQuaternionLength -> simd_quatf.length
GLKQuaternionMake -> simd_quaternion(_ x: Float, _y: Float, _ z: Float, _ w: Float)
GLKQuaternionNormalize -> simd_quatf.normalized
GLKTextureInfo -> MTLTexture
GLKVector3 -> simd_float3
GLKVector3CrossProduct -> simd_cross(simd_float3, simd_float3)
GLKVector3DotProduct -> simd_dot(simd_float3, simd_float3)
GLKVector3Make -> simd_make_float3(_ x: Float, _y: Float, _ z: Float)
GLKVector3Normalize -> simd_normalize(simd_float3)
GLKVector4 -> simd_float4
GLKVector4Make -> simd_make_float4(_ x: Float, _y: Float, _ z: Float, _ w: Float)
I should note that Dash was a tremendous help in digging through the simd functions.
The two functions referenced above:
func simd_make_look_at_float4x4(
eyeX: Float,
eyeY: Float,
eyeZ: Float,
centerX: Float,
centerY: Float,
centerZ: Float,
upX: Float,
upY: Float,
upZ: Float
) -> simd_float4x4 {
// https://stackoverflow.com/questions/9053377/ios-questions-about-camera-information-within-glkmatrix4makelookat-result
let ev = simd_float3(eyeX, eyeY, eyeZ)
let cv = simd_float3(centerX, centerY, centerZ)
let uv = simd_float3(upX, upY, upZ)
let subbed = ev - cv
let n = simd_normalize(subbed)
let cross_p = simd_cross(uv, n)
let u = simd_normalize(cross_p)
let v = simd_cross(n, u)
let c0: simd_float4 = [u[0], v[0], n[0], 0]
let c1: simd_float4 = [u[1], v[1], n[1], 0]
let c2: simd_float4 = [u[2], v[2], n[2], 0]
let v0 = simd_dot(-1*u, ev)
let v1 = simd_dot(-1*v, ev)
let v2 = simd_dot(-1*n, ev)
let c3: simd_float4 = [v0, v1, v2, 1]
let m: simd_float4x4 = simd_float4x4(columns: (c0, c1, c2, c3))
return m
}
func simd_make_frustum_float4x4(frustum: Float, aspectRatio: Float) -> simd_float4x4 {
// https://www.khronos.org/registry/OpenGL-Refpages/gl2.1/xhtml/glFrustum.xml
let left = -frustum
let right = frustum
let bottom = -frustum/aspectRatio
let top = frustum/aspectRatio
let near = PanoramaView.Z_NEAR
let far = PanoramaView.Z_FAR
let m00 = (2.0 * near) / (right - left)
let m11 = (2.0 * near) / (top - bottom)
let m20 = (right + left) / (right - left)
let m21 = (top + bottom) / (top - bottom)
let m22 = -1 * (far + near) / (far - near)
let m23 = Float(-1)
let m32 = -1 * (2 * far * near) / (far - near)
let c0: simd_float4 = [m00, 0, 0, 0]
let c1: simd_float4 = [0, m11, 0, 0]
let c2: simd_float4 = [m20, m21, m22, m23]
let c3: simd_float4 = [0, 0, m32, 0]
let m = simd_float4x4(columns: (c0, c1, c2, c3))
return m
}
// Translated from the original Panorama code
func simd_make_quaternion_from_two_vectors(_ u: simd_float3, _ v: simd_float3) -> simd_quatf {
let w: simd_float3 = simd_cross(u, v)
var q: simd_quatf = simd_quaternion(w.x, w.y, w.z, simd_dot(u, v))
q.real += q.length
return q.normalized
}
Translate back and forth between GLK and simd
These functions are found in the Panorama repository mentioned earlier, in a file GLK-Metal-Tools.swift. If as recommended you translate back and forth after your controller is solely simd, you can put these in your view to as you slowly remove the GLK code.
func glkV3_to_simd(_ v3: GLKVector3) -> simd_float3 {
let v: simd_float3 = simd_make_float3(v3.x, v3.y, v3.z)
return v
}
func simd3_to_glk(_ v3: simd_float3) -> GLKVector3 {
let v = GLKVector3Make(v3[0], v3[1], v3[2])
return v
}
func glkV4_to_simd(_ v3: GLKVector4) -> simd_float4 {
let v: simd_float4 = simd_make_float4(v3.x, v3.y, v3.z, v3.w)
return v
}
func simd4x4_to_glk(_ m: simd_float4x4) -> GLKMatrix4 {
var array: [GLKVector4] = []
for i in 0..<4 {
let fv: simd_float4 = m[i]
let v: GLKVector4 = GLKVector4Make(fv[0], fv[1], fv[2], fv[3]);
array.append(v)
}
let mg: GLKMatrix4 = GLKMatrix4MakeWithColumns(array[0], array[1], array[2], array[3]);
return mg;
}
func glkm4_to_simd(_ m: GLKMatrix4) -> simd_float4x4 {
var array: [simd_float4] = []
for i in 0..<4 {
let fv: GLKVector4 = GLKMatrix4GetColumn(m, Int32(i))
let v: simd_float4 = simd_make_float4(fv[0], fv[1], fv[2], fv[3]);
array.append(v)
}
let ms: simd_float4x4 = simd_matrix(array[0], array[1], array[2], array[3]);
return ms;
}
I used these print routines to check various values during development, you might find them of use too:
func print4x4SIMD(
msg: String,
m: simd_float4x4
) {
var s = ""
s += "---COL: \(msg)\n"
let (c0, c1, c2, c3) = m.columns
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 0, c0[0], c0[1], c0[2], c0[3])
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 1, c1[0], c1[1], c1[2], c1[3])
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 2, c2[0], c2[1], c2[2], c2[3])
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 3, c3[0], c3[1], c3[2], c3[3])
print("\n\(s)\n")
}
func print4x4GLK(
msg: String,
m: GLKMatrix4
) {
var s = ""
s += "---COL: \(msg)\n"
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 0, m.m00, m.m01, m.m02, m.m03)
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 1, m.m10, m.m11, m.m12, m.m13)
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 2, m.m20, m.m21, m.m22, m.m23)
s += String(format: "[%.2d] %10.4lf %10.4lf %10.4lf %10.4lf\n", 3, m.m30, m.m31, m.m32, m.m33)
print("\n\(s)\n")
}
simd_float4x4 rotation
While I didn't use this yet, I may need it someday (it's untested):
func matrix_from_rotation(radians: Float, v _v: simd_float3) -> simd_float4x4 {
// https://www.haroldserrano.com/blog/rotating-a-2d-object-using-metal
let v: simd_float3 = simd_normalize(_v)
let cos: Float = cos(radians)
let cosp: Float = 1 - cos
let sin: Float = sin(radians)
let col0 = simd_float4(
cos + cosp * v.x * v.x,
cosp * v.x * v.y + v.z * sin,
cosp * v.x * v.z - v.y * sin,
0
)
let col1 = simd_float4(
cosp * v.x * v.y - v.z * sin,
cos + cosp * v.y * v.y,
cosp * v.y * v.z + v.x * sin,
0.0
)
let col2 = simd_float4(
cosp * v.x * v.z + v.y * sin,
cosp * v.y * v.z - v.x * sin,
cos + cosp * v.z * v.z,
0.0
)
let col3 = simd_float4(0, 0, 0, 1)
let m: simd_float4x4 = simd_float4x4(columns: (col0, col1, col2, col3))
return m
}
Conclusion
This project took me about half a year working one weekend day per week, but it spanned a period of 18 months. The reason: I spent so many days making no progress, getting bizarre corrupted output, or no output, that when I finally got the primary view to show in Metal as it did in, I put the project away. I was just too burned out to go on.
That said, the end of in iOS is going to end, and as the years roll by that end is getting nearer.
I was originally going to stop when I got Metal working with the GLK vectors and matrices, but was urged to convert to simd now by Warren Moore.
It was a moment of pure ecstasy when I finally built my company's app and there wasn't a single compiler warning related to GLKit!

Minimum matrix sizes to benefit from matrix multiplication on GPU

I am particularly interested in matrix multiplication using Metal Performance Shaders, but answers about other frameworks are also fine.
Matrix multiplication is theoretically highly parallelisable operation. I need to multiply a lot of matrices by themselves like A’ A (where apostrophe stands for transposition). The size of the matrices A is about 4000 x 300. I was wondering if it’s worth porting the multiplication code to the GPU given the size of these matrices. As I understand, multiplying on GPU will also involve copying the data from main memory to GPU memory (I’m using eGPU, so the memory is not shared). Then there must be a trade off between additional effort for copying the data back and forth, and speed up in the calculations. So my question is: at what sizes of the matrices (approx) I could start to see the benefits of doing it on GPU?
P.S. There is also this article which basically says to not bother because GPU doesn’t help, something about its memory cache being slow (in general on all GPUs): https://graphics.stanford.edu/papers/gpumatrixmult/gpumatrixmult.pdf

I recommend you check out the vDSP section of Apple's Accelerate framework. They have very fast SIMD functions for matrix multiplication and transposition.
They also added some Swift-friendly APIs recently.

I've made a test, and it's significantly faster (x 8-9) on GPU for my case, even including all the memory copying from CPU to GPU and back. I am comparing float32 matrix multiplication performance, since Metal doesn't support float64.
let count = 100
let N = 7005
let K = 700
let DIV = 8
let K2 = (K / DIV) * DIV + (K % DIV > 0 ? 1 : 0) * DIV
let N2 = (N / DIV) * DIV + (N % DIV > 0 ? 1 : 0) * DIV
print(N2)
print(K2)
printTimeElapsedWhenRunningCode(title: "vDSP(f)") {
let ATf = [Float].init(repeating: Float(1), count: N*K)
let Af = [Float].init(repeating: Float(1), count: N*K)
var C = Array(repeating: Float(0), count: K*K)
for _ in 0..<count {
vDSP_mmul(ATf, 1,
Af, 1,
&C, 1,
vDSP_Length(K),
vDSP_Length(K),
vDSP_Length(N))
}
}
guard let bufferA = device.makeBuffer(length: K2 * N2 * MemoryLayout<Float>.stride,
options: [.storageModeManaged]) else {
fatalError("Could not make buffer A")
}
guard let bufferC = device.makeBuffer(length: K2 * K2 * MemoryLayout<Float>.stride,
options: [.storageModeManaged]) else {
fatalError("Could not make buffer C")
}
let descA = MPSMatrixDescriptor(dimensions: N2,
columns: K2,
rowBytes: K2 * MemoryLayout<Float>.stride,
dataType: .float32)
let descC = MPSMatrixDescriptor(dimensions: K2,
columns: K2,
rowBytes: K2 * MemoryLayout<Float>.stride,
dataType: .float32)
let matrixA = MPSMatrix(buffer: bufferA, descriptor: descA)
let matrixC = MPSMatrix(buffer: bufferC, descriptor: descC)
let matrixMultiplication = MPSMatrixMultiplication(device: device,
transposeLeft: true,
transposeRight: false,
resultRows: K2,
resultColumns: K2,
interiorColumns: N2,
alpha: 1,
beta: 0)
guard let commandQueue = device.makeCommandQueue() else {
fatalError("Could not make command queue")
}
printTimeElapsedWhenRunningCode(title: "Metal") {
let Af = [Float].init(repeating: Float(1), count: N*K)
let zeros = [Float].init(repeating: Float(0), count: K2)
for i in 0..<count {
var dest = bufferA.contents()
Af.withUnsafeBufferPointer { pA in
var from = pA.baseAddress!
for _ in 0..<N {
dest.copyMemory(from: from, byteCount: K)
dest += K
if K2 > K {
dest.copyMemory(from: zeros, byteCount: K2 - K)
dest += K2 - K
}
from += K
}
}
for _ in 0..<(N2-N) {
dest.copyMemory(from: zeros, byteCount: K2)
}
bufferA.didModifyRange(0..<N2*K2)
let commandBuffer = commandQueue.makeCommandBuffer()!
matrixMultiplication.encode(commandBuffer: commandBuffer,
leftMatrix: matrixA,
rightMatrix: matrixA,
resultMatrix: matrixC)
let blitEncoder = commandBuffer.makeBlitCommandEncoder()!
blitEncoder.synchronize(resource: bufferC)
blitEncoder.endEncoding()
commandBuffer.commit()
if i == count - 1 {
commandBuffer.waitUntilCompleted()
}
}
}
Output:
AMD Radeon RX 5700 XT
7008
704
Time elapsed for vDSP(f): 5.156805992126465 s.
Time elapsed for Metal: 0.6834449768066406 s.
DONE.

How to normalize disparity data in iOS?

In WWDC session "Image Editing with Depth" they mentioned few times normalizedDisparity and normalizedDisparityImage:
"The basic idea is that we're going to map our normalized disparity
values into values between 0 and 1"
"So once you know the min and max you can normalize the depth or disparity between 0 and 1."
I tried to first get the disparit image like this:
let disparityImage = depthImage.applyingFilter(
"CIDepthToDisparity", withInputParameters: nil)
Then I tried to get depthDataMap and do normalization but it didn't work. I'm I on the right track? would be appreciate some hint on what to do.
Edit:
This is my test code, sorry for the quality. I get the min and max then I try to loop over the data to normalize it (let normalizedPoint = (point - min) / (max - min))
let depthDataMap = depthData!.depthDataMap
let width = CVPixelBufferGetWidth(depthDataMap) //768 on an iPhone 7+
let height = CVPixelBufferGetHeight(depthDataMap) //576 on an iPhone 7+
CVPixelBufferLockBaseAddress(depthDataMap, CVPixelBufferLockFlags(rawValue: 0))
// Convert the base address to a safe pointer of the appropriate type
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthDataMap),
to: UnsafeMutablePointer<Float32>.self)
var min = floatBuffer[0]
var max = floatBuffer[0]
for x in 0..<width{
for y in 0..<height{
let distanceAtXYPoint = floatBuffer[Int(x * y)]
if(distanceAtXYPoint < min){
min = distanceAtXYPoint
}
if(distanceAtXYPoint > max){
max = distanceAtXYPoint
}
}
}
What I expected is the the data will reflect the disparity where the user clicked on the image but it didn't match. The code to find the disparity where the user clicked is here:
// Apply the filter with the sampleRect from the user’s tap. Don’t forget to clamp!
let minMaxImage = normalized?.clampingToExtent().applyingFilter(
"CIAreaMinMaxRed", withInputParameters:
[kCIInputExtentKey : CIVector(cgRect:rect2)])
// A four-byte buffer to store a single pixel value
var pixel = [UInt8](repeating: 0, count: 4)
// Render the image to a 1x1 rect. Be sure to use a nil color space.
context.render(minMaxImage!, toBitmap: &pixel, rowBytes: 4,
bounds: CGRect(x:0, y:0, width:1, height:1),
format: kCIFormatRGBA8, colorSpace: nil)
// The max is stored in the green channel. Min is in the red.
let disparity = Float(pixel[1]) / 255.0

There's a new blog post on raywenderlich.com called "Image Depth Maps Tutorial for iOS" contains sample app and details related to working with depth. The sample code shows how to normalize the depth data using a CVPixelBuffer extension:
extension CVPixelBuffer {
func normalize() {
let width = CVPixelBufferGetWidth(self)
let height = CVPixelBufferGetHeight(self)
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(self), to: UnsafeMutablePointer<Float>.self)
var minPixel: Float = 1.0
var maxPixel: Float = 0.0
for y in 0 ..< height {
for x in 0 ..< width {
let pixel = floatBuffer[y * width + x]
minPixel = min(pixel, minPixel)
maxPixel = max(pixel, maxPixel)
}
}
let range = maxPixel - minPixel
for y in 0 ..< height {
for x in 0 ..< width {
let pixel = floatBuffer[y * width + x]
floatBuffer[y * width + x] = (pixel - minPixel) / range
}
}
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}
}
Something to keep in mind when working with depth data that they are lower resolution than the actual image so you need to scale up (more info in the blog and in the WWDC video)

Will's answer above is very good, but it can be improved as follows. I'm using it with depth data from a photo, it's possible that if the depth data doesn't follow 16-bits, as mentioned above, it won't work. Haven't found such a photo yet. I'm surprised there isn't a filter to handle this in Core Image.
extension CVPixelBuffer {
func normalize() {
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
let width = CVPixelBufferGetWidthOfPlane(self, 0)
let height = CVPixelBufferGetHeightOfPlane(self, 0)
let count = width * height
let pixelBufferBase = unsafeBitCast(CVPixelBufferGetBaseAddressOfPlane(self, 0), to: UnsafeMutablePointer<Float>.self)
let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: pixelBufferBase, count: count)
let maxValue = vDSP.maximum(depthCopyBuffer)
let minValue = vDSP.minimum(depthCopyBuffer)
let range = maxValue - minValue
let negMinValue = -minValue
let subtractVector = vDSP.add(negMinValue, depthCopyBuffer)
let normalizedDisparity = vDSP.divide(subtractVector, range)
pixelBufferBase.initialize(from: normalizedDisparity, count: count)
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}
}

Try using the Accelerate Framework vDSP vector functions.. here is a normalize in two functions.
to change the cvPixel buffer to a 0..1 normalized range
myCVPixelBuffer.setUpNormalize()
import Accelerate
extension CVPixelBuffer {
func vectorNormalize( targetVector: UnsafeMutableBufferPointer<Float>) -> [Float] {
// range = max - min
// normalized to 0..1 is (pixel - minPixel) / range
// see Documentation "Using vDSP for Vector-based Arithmetic" in vDSP under system "Accelerate" documentation
// see also the Accelerate documentation section 'Vector extrema calculation'
// Maximium static func maximum<U>(U) -> Float
// Returns the maximum element of a single-precision vector.
//static func minimum<U>(U) -> Float
// Returns the minimum element of a single-precision vector.
let maxValue = vDSP.maximum(targetVector)
let minValue = vDSP.minimum(targetVector)
let range = maxValue - minValue
let negMinValue = -minValue
let subtractVector = vDSP.add(negMinValue, targetVector)
// adding negative value is subtracting
let result = vDSP.divide(subtractVector, range)
return result
}
func setUpNormalize() -> CVPixelBuffer {
// grayscale buffer float32 ie Float
// return normalized CVPixelBuffer
CVPixelBufferLockBaseAddress(self,
CVPixelBufferLockFlags(rawValue: 0))
let width = CVPixelBufferGetWidthOfPlane(self, 0)
let height = CVPixelBufferGetHeightOfPlane(self, 0)
let count = width * height
let bufferBaseAddress = CVPixelBufferGetBaseAddressOfPlane(self, 0)
// UnsafeMutableRawPointer
let pixelBufferBase = unsafeBitCast(bufferBaseAddress, to: UnsafeMutablePointer<Float>.self)
let depthCopy = UnsafeMutablePointer<Float>.allocate(capacity: count)
depthCopy.initialize(from: pixelBufferBase, count: count)
let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: depthCopy, count: count)
let normalizedDisparity = vectorNormalize(targetVector: depthCopyBuffer)
pixelBufferBase.initialize(from: normalizedDisparity, count: count)
// copy back the normalized map into the CVPixelBuffer
depthCopy.deallocate()
// depthCopyBuffer.deallocate()
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
return self
}
}
You can see it in action in a modified version of the Apple sample 'PhotoBrowse' app at
https://github.com/racewalkWill/PhotoBrowseModified

Cannot invoke initializer for type 'UnsafeMutablePointer<Float>' with an argument list of type '(UnsafeMutableRawPointer!)'

This association on line 5 does not contribute to the success of the app. But I honestly do not understand what happens at that moment.
Overloads for ÜnsafeMutablePointer exist with partially matching parameter list: (RawPointer).
But what does it mean? Thanks
override func buffer(withCsound cs: CsoundObj) -> Data {
let length = Int(AKSettings.shared().numberOfChannels) *
Int(AKSettings.shared().samplesPerControlPeriod) * 4
let num = length / 4
let floats = UnsafeMutablePointer<Float>(malloc(length))
/* The phase and amplitude are different for each line to get a nice
* gimmick. */
let phase = (self.amplifier + 0.8) / 1.8
for i in 0 ... num - 1 {
/* The amplitude is placed within the for-loop because it can fade
* to a slightly different value during one plot refresh. */
let amplitude = self.amplifier * self.amplitude
/* It is incredibly important that `time` and `phase` aren't
* multiplied with the frequency or else it will bump at each
* frequency change. */
var t = (time + Double(i) / Double(num) * self.frequency + phase)
floats[i] = Float(sin(t * 2 * 3.14))
/* It is multiplied with a "regular" 0.5 Hz sine to get both ends
* to fade out nicely. It's sort of a simplistic window function. */
t = Double(i) / Double(num)
floats[i] *= Float(sin(t * 1 * 3.14) * amplitude)
floats[i] *= 1 - pow(1 - Float(i) / Float(num), 2.0)
time += self.frequency / 44100 / 2
/* Fade smoothly to the next frequency and amplitude. */
self.frequency += (nextFrequency - self.frequency) / 44100.0 / 4.0
self.amplitude += (nextAmplitude - self.amplitude) / 44100.0 / 2.0
}
/* We keep the time between 0 and 1 to make sure it never overflows /
* loses the necessary precision. */
time = fmod(time, 1.0)
return Data(bytesNoCopy: UnsafeMutablePointer<UInt8>(floats), count: length, deallocator: .free)
}

You can find some similar articles searching with the error message.
For example: How to use UnsafeMutablePointer in Swift 3?
And you'd better read MIGRATING TO SWIFT 3, especially this article.
To be specific to your case.
Change this line:
let floats = UnsafeMutablePointer<Float>(malloc(length))
to:
let rawBytes = malloc(length)!
let floats = rawBytes.assumingMemoryBound(to: Float.self)
And change the last line:
return Data(bytesNoCopy: UnsafeMutablePointer<UInt8>(floats), count: length, deallocator: .free)
to:
return Data(bytesNoCopy: rawBytes, count: length, deallocator: .free)
Option 2.
Change the line let floats = UnsafeMutablePointer<Float>(malloc(length)) to:
var data = Data(count: length)
data.withUnsafeMutableBytes {(floats: UnsafeMutablePointer<Float>) in
And change the last line to:
}
return data
(All lines between them are enclosed in the closure {(floats: UnsafeMutablePointer<Float>) in ...}.)

let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer)
var buf = unsafeBitCast(baseAddress, to: UnsafeMutablePointer<UInt8>.self)

How to check if a float is a multiple of another float in Swift

I receive a float number from the user and check if it's valid depending on an increment defined (0.1, 0.5, 0.01, etc). For example, if the increment is 0.5, then 1.0, 1.5, 2.0 are valid but 1.2 is not. I'm using modulo to check if it is valid.
if (input % increment) == 0 {
println("Pass")
} else {
println("Fail")
}
The problem is that when the increment is 0.1, most of the valid values of input are detected as invalid.
I used the formula given in this answer and it improves the detection of valid inputs but still fails most of the time.

Your problem is that 0.1 isn't exactly representable in binary whereas 0.5 for instance is. this means that 0.1 * 5 doesn't equal 0.5. It's a lot like in decimal where (1 / 3) * 3 ≃ 0.3333333 * 3 = 0.99999999 not 1. To solve this sort of problem you can introduce an epsilon. An epsilon is a very small value, and you check whether your result is only a very small distance from the value you actually want, if it is you hope that the value is correct.

Based on James Snook's answer, I ended up with this solution.
let epsilon = 1e-6
let division = input / increment
let diff = abs(division - round(division))
if (diff < epsilon) {
println("Pass")
} else {
println("Fail")
}

I believe that the modulo works only with integers, if I'm right you should do your own workaround to see if the given number is a multiple or not, which translate to: the result of the division must be an integer.
The code should be something like this:
let floatDivision = input/increment;
let isInteger = floor(floatDivision) == floatDivision // true
if (isInteger) {
println("Pass")
} else {
println("Fail")
}
And this should work with any increment number (even more than one decimal point digit).
EDIT
as James Said, the float division of 1.2 over 0.1 is not coded exactly as 12 but 11.9999... So I added the epsilon in the comparison:
let input = 1.2;
let increment = 0.1;
let epsilon = 0.00000000000001;
let floatDivision = input/increment;
let dif = abs(round(floatDivision) - floatDivision);
println(String(format: "%.20f", floatDivision)); // 11.99999999999999822364
println(String(format: "%.20f", round(floatDivision))); // 12.00000000000000000000
println(String(format: "%.20f", dif)) // 0.00000000000000177636
let isInteger = dif < epsilon // Pass if (isInteger) {
println("Pass") } else {
println("Fail") }
Best of luck.

To handle numbers with one number after the decimal point you could do:
if (input * 10) % (increment * 10) == 0 {
println("Pass")
} else {
println("Fail")
}
To handle numbers with more than one number after the decimal point just increase the multiple. For example, if (input * 100000) % (increment * 100000) == 0.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Is it feasable to use AVAudioEngine to detect pitch in real time? - ios

Related

How do I convert an OpenGL GLKView to a MTLKit Metal based View?

Minimum matrix sizes to benefit from matrix multiplication on GPU

How to normalize disparity data in iOS?

Cannot invoke initializer for type 'UnsafeMutablePointer<Float>' with an argument list of type '(UnsafeMutableRawPointer!)'

How to check if a float is a multiple of another float in Swift

Categories

Resources