There seems to be at least half a dozen matrix libraries in the Apple system. One of them is the simd library with types that work the same in CPU and GPU code.
import simd
let mat = float3x3(...)
let vec = float3(...)
mat * vec
I'm having trouble finding documentation for it. Unlike most things it does not show up in Xcode's documentation browser. I know that a different library (GLKit) has matrix types that have functions for building rotation matrixes. For example,
GLKMatrix3RotateY(mat, radians)
Are there similar functions for the simd matrix types?
You can go through simd_quat. Quaternions have a simple connection to the angle-axis representation. The SIMD library can construct a quaternion from angle-axis, and there's also a function to construct a float3x3 from a quaternion. (This is C++, but same idea should work in Swift).
inline float3x3 MakeRotation(float radians, float x, float y, float z) {
simd_quatf quat = simd_quaternion(radians, (simd_float3){x, y, z});
return simd_matrix3x3(quat);
There are not currently utility functions for creating such matrices in simd.framework, Metal, or MetalKit. However, you can use GLKit's matrix functions and convert the resulting GLKMatrix4s into float4x4s before, for example, copying them into a Metal buffer for use in a shader.
A GLKMatrix4 is just a union containing an array of 16 floats, stored in column-major order.
Therefore, we can write an extension on float4x4 that allows initializing a simd matrix with a GLKit matrix:
extension float4x4 {
init(matrix: GLKMatrix4) {
self.init(columns: (float4(x: matrix.m00, y: matrix.m01, z: matrix.m02, w: matrix.m03),
float4(x: matrix.m10, y: matrix.m11, z: matrix.m12, w: matrix.m13),
float4(x: matrix.m20, y: matrix.m21, z: matrix.m22, w: matrix.m23),
float4(x: matrix.m30, y: matrix.m31, z: matrix.m32, w: matrix.m33)))
I verified that the resulting matrix matched my expectations by creating a GLKit matrix that represents a 45-degree rotation counterclockwise about the +Z axis, and ensuring that it does, in fact, rotate the unit vector <1, 0, 0> onto the unit vector <sqrt(2)/2, sqrt(2)/2, 0>:
let rotation = GLKMatrix4MakeZRotation(.pi / 4)
let simdRotation = float4x4(matrix: rotation)
let v = float4(1, 0, 0, 0)
let vp = simdRotation * v
> float4(0.707107, 0.707107, 0.0, 0.0)
Note that I'm abiding the convention here that matrix-vector multiplication treats the vector as a column vector and places the matrix on the left, which is the most common convention in current use.
There is one caveat you should be aware of with respect to GLKit and Metal's clip space. You can read about the issue, and how to correct for it, here.
I have a 3d vector I'm applying as a physics force:
let force = SCNVector3(x: 0, y: 0, z: -5)
node.physicsBody?.applyForce(force, asImpulse: true)
I need to rotate the force based on the mobile device's position which is available to me as a 4x4 matrix transform or euler angles.
var transform :matrix_float4x4 - The position and orientation of the camera in world coordinate space.
var eulerAngles :vector_float3 - The orientation of the camera, expressed as roll, pitch, and yaw values.
I think this is more of a fundamental 3d graphics question, but the application of this is a Swift based iOS app using SceneKit and ARKit.
There are some utilities available to me in the SceneKit and simd libraries. Unfortunately my naive attempts to do things like simd_mul(force, are failing me.
#orangenkopf provided a great answer that helped me come up with this:
let force = simd_make_float4(0, 0, -5, 0)
let rotatedForce = simd_mul(, force)
let vectorForce = SCNVector3(x:rotatedForce.x, y:rotatedForce.y, z:rotatedForce.z)
node.physicsBody?.applyForce(vectorForce, asImpulse: true)
Your idea is right. You need to multiply the transform and the direction.
I can't find any documentation on simd_mul. But i suspect you have at least one of the following problems:
simd_mul applies both the rotation and the translation of the transform
The transform of the camera is in world coordinate space. Depending your node hierachy this can result in a direction that is way off.
SceneKit does not provide much linear algebra functions, so we have to build our own:
extension SCNMatrix4 {
static public func *(left: SCNMatrix4, right: SCNVector4) -> SCNVector4 {
let x = left.m11*right.x + left.m21*right.y + left.m31*right.z + left.m41*right.w
let y = left.m12*right.x + left.m22*right.y + left.m32*right.z + left.m42*right.w
let z = left.m13*right.x + left.m23*right.y + left.m33*right.z + left.m43*right.w
let w = left.m14*right.x + left.m24*right.y + left.m43*right.z + left.m44*right.w
return SCNVector4(x: x, y: y, z: z, w: w)
extension SCNVector4 {
public func to3() -> SCNVector3 {
return SCNVector3(self.x , self.y, self.z)
Now do the following:
Convert the camera transform to the nodes local coordinate system
Create the force as a 4d vector, set the fourth element to 0 to ignore the translation
Multiply the transform and the vector
// Convert the tranform to a SCNMatrix4
let transform = SCNMatrix4FromMat4(
// Convert the matrix to the nodes coordinate space
let localTransform = node.convertTransform(transform, from: nil)
let force = SCNVector4(0, 0, -5, 0)
let rotatedForce = (localTransform * force).to3()
node.physicsBody?.applyForce(rotatedForce, asImpulse: true)
I'm trying to estimate my device position related to a QR code in space. I'm using ARKit and the Vision framework, both introduced in iOS11, but the answer to this question probably doesn't depend on them.
With the Vision framework, I'm able to get the rectangle that bounds a QR code in the camera frame. I'd like to match this rectangle to the device translation and rotation necessary to transform the QR code from a standard position.
For instance if I observe the frame:
* *
* *
while if I was 1m away from the QR code, centered on it, and assuming the QR code has a side of 10cm I'd see:
* *
A0 B0
D0 C0
* *
what has been my device transformation between those two frames? I understand that an exact result might not be possible, because maybe the observed QR code is slightly non planar and we're trying to estimate an affine transform on something that is not one perfectly.
I guess the sceneView.pointOfView?.camera?.projectionTransform is more helpful than the sceneView.pointOfView?.camera?.projectionTransform?.camera.projectionMatrix since the later already takes into account transform inferred from the ARKit that I'm not interested into for this problem.
How would I fill
func get transform(
qrCodeRectangle: VNBarcodeObservation,
cameraTransform: SCNMatrix4) {
// qrCodeRectangle.topLeft etc is the position in [0, 1] * [0, 1] of A0
// expected real world position of the QR code in a referential coordinate system
let a0 = SCNVector3(x: -0.05, y: 0.05, z: 1)
let b0 = SCNVector3(x: 0.05, y: 0.05, z: 1)
let c0 = SCNVector3(x: 0.05, y: -0.05, z: 1)
let d0 = SCNVector3(x: -0.05, y: -0.05, z: 1)
let A0, B0, C0, D0 = ?? // CGPoints representing position in
// camera frame for camera in 0, 0, 0 facing Z+
// then get transform from 0, 0, 0 to current position/rotation that sees
// a0, b0, c0, d0 through the camera as qrCodeRectangle
After trying number of things, I ended up going for camera pose estimation using openCV projection and perspective solver, solvePnP This gives me a rotation and translation that should represent the camera pose in the QR code referential. However when using those values and placing objects corresponding to the inverse transformation, where the QR code should be in the camera space, I get inaccurate shifted values, and I'm not able to get the rotation to work:
// some flavor of pseudo code below
func renderer(_ sender: SCNSceneRenderer, updateAtTime time: TimeInterval) {
guard let currentFrame = sceneView.session.currentFrame, let pov = sceneView.pointOfView else { return }
let intrisics =
let QRCornerCoordinatesInQRRef = [(-0.05, -0.05, 0), (0.05, -0.05, 0), (-0.05, 0.05, 0), (0.05, 0.05, 0)]
// uses VNDetectBarcodesRequest to find a QR code and returns a bounding rectangle
guard let qr = findQRCode(in: currentFrame) else { return }
let imageSize = CGSize(
width: CVPixelBufferGetWidth(currentFrame.capturedImage),
height: CVPixelBufferGetHeight(currentFrame.capturedImage)
let observations = [
].map({ (imageSize.height * (1 - $0.y), imageSize.width * $0.x) })
// image and SceneKit coordinated are not the same
// replacing this by:
// (imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))
// weirdly fixes an issue, see below
let rotation, translation = openCV.solvePnP(QRCornerCoordinatesInQRRef, observations, intrisics)
// calls openCV solvePnP and get the results
let positionInCameraRef = -rotation.inverted * translation
let node = SCNNode(geometry: someGeometry)
node.position = translation
node.orientation = rotation.asQuaternion
Here is the output:
where A, B, C, D are the QR code corners in the order they are passed to the program.
The predicted origin stays in place when the phone rotates, but it's shifted from where it should be. Surprisingly, if I shift the observations values, I'm able to correct this:
// (imageSize.height * (1 - $0.y), imageSize.width * $0.x)
// replaced by:
(imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))
and now the predicted origin stays robustly in place. However I don't understand where the shift values come from.
Finally, I've tried to get an orientation fixed relatively to the QR code referential:
var n = SCNNode(geometry: redGeometry)
n.position = SCNVector3(0.1, 0, 0)
n = SCNNode(geometry: blueGeometry)
n.position = SCNVector3(0, 0.1, 0)
n = SCNNode(geometry: greenGeometry)
n.position = SCNVector3(0, 0, 0.1)
The orientation is fine when I look at the QR code straight, but then it shifts by something that seems to be related to the phone rotation:
Outstanding questions I have are:
How do I solve the rotation?
where do the position shift values come from?
What simple relationship do rotation, translation, QRCornerCoordinatesInQRRef, observations, intrisics verify? Is it O ~ K^-1 * (R_3x2 | T) Q ? Because if so that's off by a few order of magnitude.
If that's helpful, here are a few numerical values:
Intrisics matrix
Mat 3x3
1090.318, 0.000, 618.661
0.000, 1090.318, 359.616
0.000, 0.000, 1.000
1280.0, 720.0
414.0, 736.0
==== Edit2 ====
I've noticed that the rotation works fine when the phone stays horizontally parallel to the QR code (ie the rotation matrix is [[a, 0, b], [0, 1, 0], [c, 0, d]]), no matter what the actual QR code orientation is:
Other rotation don't work.
Coordinate systems' correspondence
Take into consideration that Vision/CoreML coordinate system doesn't correspond to ARKit/SceneKit coordinate system. For details look at this post.
Rotation's direction
I suppose the problem is not in matrix. It's in vertices placement. For tracking 2D images you need to place ABCD vertices counter-clockwise (the starting point is A vertex located in imaginary origin x:0, y:0). I think Apple Documentation on VNRectangleObservation class (info about projected rectangular regions detected by an image analysis request) is vague. You placed your vertices in the same order as is in official documentation:
var bottomLeft: CGPoint
var bottomRight: CGPoint
var topLeft: CGPoint
var topRight: CGPoint
But they need to be placed the same way like positive rotation direction (about Z axis) occurs in Cartesian coordinates system:
World Coordinate Space in ARKit (as well as in SceneKit and Vision) always follows a right-handed convention (the positive Y axis points upward, the positive Z axis points toward the viewer and the positive X axis points toward the viewer's right), but is oriented based on your session's configuration. Camera works in Local Coordinate Space.
Rotation direction about any axis is positive (Counter-Clockwise) and negative (Clockwise). For tracking in ARKit and Vision it's critically important.
The order of rotation also makes sense. ARKit, as well as SceneKit, applies rotation relative to the node’s pivot property in the reverse order of the components: first roll (about Z axis), then yaw (about Y axis), then pitch (about X axis). So the rotation order is ZYX.
Math (Trig.):
Notes: the bottom is l (the QR code length), the left angle is k, and the top angle is i (the camera)
I'm attempting to write an augmented reality app using SceneKit, and I need accurate 3D points from the current rendered frame, given a 2D pixel and depth using SCNSceneRenderer's unprojectPoint method. This requires an x, y, and z where the x and y is a pixel coordinate and normally the z is a value read from the depth buffer at that frame.
The SCNView's delegate has this method to render the depth frame:
func renderer(_ renderer: SCNSceneRenderer, willRenderScene scene: SCNScene, atTime time: TimeInterval) {
func renderDepthFrame(){
// setup our viewport
let viewport: CGRect = CGRect(x: 0, y: 0, width: Double(SettingsModel.model.width), height: Double(SettingsModel.model.height))
// depth pass descriptor
let renderPassDescriptor = MTLRenderPassDescriptor()
let depthDescriptor: MTLTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: MTLPixelFormat.depth32Float, width: Int(SettingsModel.model.width), height: Int(SettingsModel.model.height), mipmapped: false)
let depthTex = scnView!.device!.makeTexture(descriptor: depthDescriptor)
depthTex.label = "Depth Texture"
renderPassDescriptor.depthAttachment.texture = depthTex
renderPassDescriptor.depthAttachment.loadAction = .clear
renderPassDescriptor.depthAttachment.clearDepth = 1.0
renderPassDescriptor.depthAttachment.storeAction = .store
let commandBuffer = commandQueue.makeCommandBuffer()
scnRenderer.scene = scene
scnRenderer.pointOfView = scnView.pointOfView!
scnRenderer!.render(atTime: 0, viewport: viewport, commandBuffer: commandBuffer, passDescriptor: renderPassDescriptor)
// setup our depth buffer so the cpu can access it
let depthImageBuffer: MTLBuffer = scnView!.device!.makeBuffer(length: depthTex.width * depthTex.height*4, options: .storageModeShared)
depthImageBuffer.label = "Depth Buffer"
let blitCommandEncoder: MTLBlitCommandEncoder = commandBuffer.makeBlitCommandEncoder()
blitCommandEncoder.copy(from: renderPassDescriptor.depthAttachment.texture!, sourceSlice: 0, sourceLevel: 0, sourceOrigin: MTLOriginMake(0, 0, 0), sourceSize: MTLSizeMake(Int(SettingsModel.model.width), Int(SettingsModel.model.height), 1), to: depthImageBuffer, destinationOffset: 0, destinationBytesPerRow: 4*Int(SettingsModel.model.width), destinationBytesPerImage: 4*Int(SettingsModel.model.width)*Int(SettingsModel.model.height))
commandBuffer.addCompletedHandler({(buffer) -> Void in
let rawPointer: UnsafeMutableRawPointer = UnsafeMutableRawPointer(mutating: depthImageBuffer.contents())
let typedPointer: UnsafeMutablePointer<Float> = rawPointer.assumingMemoryBound(to: Float.self)
self.currentMap = Array(UnsafeBufferPointer(start: typedPointer, count: Int(SettingsModel.model.width)*Int(SettingsModel.model.height)))
This works. I get depth values between 0 and 1. The problem is that I can't use them in the unprojectPoint because they don't appear to be scaled the same as the initial pass, despite using the same SCNScene and SCNCamera.
My questions:
Is there any way to get the depth values directly from SceneKit SCNView's main pass without having to do an extra pass with a separate SCNRenderer?
Why don't the depth values in my pass match the values I get from doing a hit test and then unprojecting? The depth values from my pass go from 0.78 to 0.94. The depth values in the hit test range from 0.89 to 0.97, which curiously enough, matches the OpenGL depth values of the scene when I rendered it in Python.
My hunch is this is a difference in viewports and SceneKit is doing something to scale the depth values from -1 to 1 just like OpenGL.
EDIT: And in case you're wondering, I can't use the hitTest method directly. It's too slow for what I'm trying to achieve.
SceneKit uses a log scale reverse Z-Buffer by default. You can disable the reverse Z-Buffer quite easily (scnView.usesReverseZ = false) but taking the log depth to [0, 1] range with linear distribution requires access to the depth buffer, the value of the far clipping range and the value of the near clipping range. Here is the process of taking a non-reverse-z-log-depth to a linearly distributed depth in the range of [0, 1]:
float delogDepth(float depth, float nearClip, float farClip) {
// The depth buffer is in Log Format. Probably a 24bit float depth with 8 for stencil.
// We need to undo the log format.
float logTuneConstant = nearClip / farClip;
float deloggedDepth = ((pow(logTuneConstant * farClip + 1.0, depth) - 1.0) / logTuneConstant) / farClip;
// The values are going to hover around a particular range. Linearize that distribution.
// This part may not be necessary, depending on how you will use the depth.
float negativeOneOneDepth = deloggedDepth * 2.0 - 1.0;
float zeroOneDepth = ((2.0 * nearClip) / (farClip + nearClip - negativeOneOneDepth * (farClip - nearClip)));
return zeroOneDepth;
As a workaround, I switched to OpenGL ES and read the depth buffer by adding a fragment shader that packs the depth value into the RGBA renderbuffer SCNShadable.
See here for more info:
I understand this is a valid approach as it is used in shadow mapping quite often on OpenGL ES devices and WebGL, but this feels hacky to me and I shouldn't have to do this. I would still be interested in another answer if someone can figure out Metal's viewport transformation.
The iOS 5 documentation reveals that GLKMatrix4MakeLookAt operates the same as gluLookAt.
The definition is provided here:
static __inline__ GLKMatrix4 GLKMatrix4MakeLookAt(float eyeX, float eyeY, float eyeZ,
float centerX, float centerY, float centerZ,
float upX, float upY, float upZ)
GLKVector3 ev = { eyeX, eyeY, eyeZ };
GLKVector3 cv = { centerX, centerY, centerZ };
GLKVector3 uv = { upX, upY, upZ };
GLKVector3 n = GLKVector3Normalize(GLKVector3Add(ev, GLKVector3Negate(cv)));
GLKVector3 u = GLKVector3Normalize(GLKVector3CrossProduct(uv, n));
GLKVector3 v = GLKVector3CrossProduct(n, u);
GLKMatrix4 m = { u.v[0], v.v[0], n.v[0], 0.0f,
u.v[1], v.v[1], n.v[1], 0.0f,
u.v[2], v.v[2], n.v[2], 0.0f,
GLKVector3DotProduct(GLKVector3Negate(u), ev),
GLKVector3DotProduct(GLKVector3Negate(v), ev),
GLKVector3DotProduct(GLKVector3Negate(n), ev),
1.0f };
return m;
I'm trying to extract camera information from this:
1. Read the camera position
GLKVector3 cPos = GLKVector3Make(mx.m30, mx.m31, mx.m32);
2. Read the camera right vector as `u` in the above
GLKVector3 cRight = GLKVector3Make(mx.m00, mx.m10, mx.m20);
3. Read the camera up vector as `u` in the above
GLKVector3 cUp = GLKVector3Make(mx.m01, mx.m11, mx.m21);
4. Read the camera look-at vector as `n` in the above
GLKVector3 cLookAt = GLKVector3Make(mx.m02, mx.m12, mx.m22);
There are two questions:
The look-at vector seems negated as they defined it, since they perform (eye - center) rather than (center - eye). Indeed, when I call GLKMatrix4MakeLookAt with a camera position of (0,0,-10) and a center of (0,0,1) my extracted look at is (0,0,-1), i.e. the negative of what I expect. So should I negate what I extract?
The camera position I extract is the result of the view transformation matrix premultiplying the view rotation matrix, hence the dot products in their definition. I believe this is incorrect - can anyone suggest how else I should calculate the position?
Many thanks for your time.
Per its documentation, gluLookAt calculates centre - eye, uses that for some intermediate steps, then negatives it for placement into the resulting matrix. So if you want centre - eye back, the taking negative is explicitly correct.
You'll also notice that the result returned is equivalent to a multMatrix with the rotational part of the result followed by a glTranslate by -eye. Since the classic OpenGL matrix operations post multiply, that means gluLookAt is defined to post multiply the rotational by the translational. So Apple's implementation is correct, and the same as first moving the camera, then rotating it — which is correct.
So if you define R = (the matrix defining the rotational part of your instruction), T = (the translational analogue), you get R.T. If you want to extract T you could premultiply by the inverse of R and then pull the results out of the final column, since matrix multiplication is associative.
As a bonus, because R is orthonormal, the inverse is just the transpose.