Is there a Metal library function to create a simd rotation matrix? - metal

There seems to be at least half a dozen matrix libraries in the Apple system. One of them is the simd library with types that work the same in CPU and GPU code.
import simd
let mat = float3x3(...)
let vec = float3(...)
mat * vec
I'm having trouble finding documentation for it. Unlike most things it does not show up in Xcode's documentation browser. I know that a different library (GLKit) has matrix types that have functions for building rotation matrixes. For example,
GLKMatrix3MakeXRotation(radians)
GLKMatrix3RotateY(mat, radians)
Are there similar functions for the simd matrix types?

You can go through simd_quat. Quaternions have a simple connection to the angle-axis representation. The SIMD library can construct a quaternion from angle-axis, and there's also a function to construct a float3x3 from a quaternion. (This is C++, but same idea should work in Swift).
inline float3x3 MakeRotation(float radians, float x, float y, float z) {
simd_quatf quat = simd_quaternion(radians, (simd_float3){x, y, z});
return simd_matrix3x3(quat);
}

There are not currently utility functions for creating such matrices in simd.framework, Metal, or MetalKit. However, you can use GLKit's matrix functions and convert the resulting GLKMatrix4s into float4x4s before, for example, copying them into a Metal buffer for use in a shader.
A GLKMatrix4 is just a union containing an array of 16 floats, stored in column-major order.
Therefore, we can write an extension on float4x4 that allows initializing a simd matrix with a GLKit matrix:
extension float4x4 {
init(matrix: GLKMatrix4) {
self.init(columns: (float4(x: matrix.m00, y: matrix.m01, z: matrix.m02, w: matrix.m03),
float4(x: matrix.m10, y: matrix.m11, z: matrix.m12, w: matrix.m13),
float4(x: matrix.m20, y: matrix.m21, z: matrix.m22, w: matrix.m23),
float4(x: matrix.m30, y: matrix.m31, z: matrix.m32, w: matrix.m33)))
}
}
I verified that the resulting matrix matched my expectations by creating a GLKit matrix that represents a 45-degree rotation counterclockwise about the +Z axis, and ensuring that it does, in fact, rotate the unit vector <1, 0, 0> onto the unit vector <sqrt(2)/2, sqrt(2)/2, 0>:
let rotation = GLKMatrix4MakeZRotation(.pi / 4)
let simdRotation = float4x4(matrix: rotation)
let v = float4(1, 0, 0, 0)
let vp = simdRotation * v
print("\(vp)")
> float4(0.707107, 0.707107, 0.0, 0.0)
Note that I'm abiding the convention here that matrix-vector multiplication treats the vector as a column vector and places the matrix on the left, which is the most common convention in current use.
There is one caveat you should be aware of with respect to GLKit and Metal's clip space. You can read about the issue, and how to correct for it, here.

Related

How to cast a CMRotationMatrix into a float4x4?

I am building a Metal app that uses float4x4 matrices to rotate 3D objects and/or the world view.
The app is also reading the device attitude that can be expressed as a rotation matrix in the CMRotationMatrix format (CoreMotion).
How can I cast my attitude.rotationMatrix into a float4x4 object?
You can create an extension that allows you to initialize a float4x4 from the elements of a CMRotationMatrix:
extension float4x4 {
init(_ m: CMRotationMatrix) {
let x = float4(Float(m.m11), Float(m.m21), Float(m.m31), 0)
let y = float4(Float(m.m12), Float(m.m22), Float(m.m32), 0)
let z = float4(Float(m.m13), Float(m.m23), Float(m.m33), 0)
let w = float4( 0, 0, 0, 1)
self.init(columns: (x, y, z, w))
}
}
Using it is as simple as calling the initializer:
let simdMatrix = float4x4(deviceMotion.attitude.rotationMatrix)
This creates a matrix that is suitable for transforming objects whose transformations are expressed relative to the reference attitude frame. You may find that you need to permute the columns so that they match the world coordinate system of your application.

How to point the camera towards a SCNVector3 point below iOS 11

I just started learning how to use SceneKit yesterday, so I may get some stuff wrong or incorrect. I am trying to make my cameraNode look at a SCNVector3 point in the scene.
I am trying to make my app available to people below iOS 11.0. However, the look(at:) function is only for iOS 11.0+.
Here is my function where I initialise the camera:
func initCamera() {
cameraNode = SCNNode()
cameraNode.camera = SCNCamera()
cameraNode.position = SCNVector3(5, 12, 10)
if #available(iOS 11.0, *) {
cameraNode.look(at: SCNVector3(0, 5, 0)) // Calculate the look angle
} else {
// How can I calculate the orientation? <-----------
}
print(cameraNode.rotation) // Prints: SCNVector4(x: -0.7600127, y: 0.62465125, z: 0.17941462, w: 0.7226559)
gameScene.rootNode.addChildNode(cameraNode)
}
The orientation of SCNVector4(x: -0.7600127, y: 0.62465125, z: 0.17941462, w: 0.7226559) in degrees is x: -43.5, y: 35.8, z: 10.3, and I don't understand w. (Also, why isn't z = 0? I thought z was the roll...?)
Here is my workings out for recreating what I thought the Y-angle should be:
So I worked it out to be 63.4 degrees, but the returned rotation shows that it should be 35.8 degrees. Is there something wrong with my calculations, do I not fully understand SCNVector4, or is there another method to do this?
I looked at Explaining in Detail the ScnVector4 method for what SCNVector4 is, but I still don't really understand what w is for. It says that w is the 'angle of rotation' which I thought was what I thought X, Y & Z were for.
If you have any questions, please ask!
Although #rickster has given the explanations of the properties of the node, I have figured out a method to rotate the node to look at a point using maths (trigonometry).
Here is my code:
// Extension for Float
extension Float {
/// Convert degrees to radians
func asRadians() -> Float {
return self * Float.pi / 180
}
}
and also:
// Extension for SCNNode
extension SCNNode {
/// Look at a SCNVector3 point
func lookAt(_ point: SCNVector3) {
// Find change in positions
let changeX = self.position.x - point.x // Change in X position
let changeY = self.position.y - point.y // Change in Y position
let changeZ = self.position.z - point.z // Change in Z position
// Calculate the X and Y angles
let angleX = atan2(changeZ, changeY) * (changeZ > 0 ? -1 : 1)
let angleY = atan2(changeZ, changeX)
// Calculate the X and Y rotations
let xRot = Float(-90).asRadians() - angleX // X rotation
let yRot = Float(90).asRadians() - angleY // Y rotation
self.eulerAngles = SCNVector3(CGFloat(xRot), CGFloat(yRot), 0) // Rotate
}
}
And you call the function using:
cameraNode.lookAt(SCNVector3(0, 5, 0))
Hope this helps people in the future!
There are three ways to express a 3D rotation in SceneKit:
What you're doing on paper is calculating separate angles around the x, y, and z axes. These are called Euler angles, or pitch, yaw, and roll. You might get results that more resemble your hand-calculations if you use eulerAngles or simdEulerAngles instead of `rotation. (Or you might not, because one of the difficulties of an Euler-angle system is that you have to apply each of those three rotations in the correct order.)
simdRotation or rotation uses a four-component vector (float4 or SCNVector4) to express an axis-angle representation of the rotation. This relies on a bit of math that isn't obvious for many newcomers to 3D graphics: the result of any sequence of rotations around different axes can be minimally expressed as a single rotation around a new axis.
For example, a rotation of π/2 radians (90°) around the z-axis (0,0,1) followed by a rotation of π/2 around the y-axis (0,1,0) has the same result as a rotation of 2π/3 around the axis (-1/√3, 1/√3, 1/√3).
This is where you're getting confused about the x, y, z, and w components of a SceneKit rotation vector — the first three components are lengths, expressing a 3D vector, and the fourth is a rotation in radians around that vector.
Quaternions are another way to express 3D rotation (and one that's even further off the beaten path for those of us with the formal math education common to undergraduate computer science curricula, but not crazy advanced, either). These have lots of great features for 3D graphics, like being easy to compose and interpolate between. In SceneKit, the simdOrientation or orientation property lets you work with a node's rotation as a quaternion.
Explaining how quaternions work is too much for one SO answer, but the practical upshot is this: if you're working with a good vector math library (like the SIMD library built into iOS 9 and later), you can basically treat them as opaque — just convert from whichever other rotation representation is easiest for you, and reap the benefits.

iOS revert camera projection

I'm trying to estimate my device position related to a QR code in space. I'm using ARKit and the Vision framework, both introduced in iOS11, but the answer to this question probably doesn't depend on them.
With the Vision framework, I'm able to get the rectangle that bounds a QR code in the camera frame. I'd like to match this rectangle to the device translation and rotation necessary to transform the QR code from a standard position.
For instance if I observe the frame:
* *
B
C
A
D
* *
while if I was 1m away from the QR code, centered on it, and assuming the QR code has a side of 10cm I'd see:
* *
A0 B0
D0 C0
* *
what has been my device transformation between those two frames? I understand that an exact result might not be possible, because maybe the observed QR code is slightly non planar and we're trying to estimate an affine transform on something that is not one perfectly.
I guess the sceneView.pointOfView?.camera?.projectionTransform is more helpful than the sceneView.pointOfView?.camera?.projectionTransform?.camera.projectionMatrix since the later already takes into account transform inferred from the ARKit that I'm not interested into for this problem.
How would I fill
func get transform(
qrCodeRectangle: VNBarcodeObservation,
cameraTransform: SCNMatrix4) {
// qrCodeRectangle.topLeft etc is the position in [0, 1] * [0, 1] of A0
// expected real world position of the QR code in a referential coordinate system
let a0 = SCNVector3(x: -0.05, y: 0.05, z: 1)
let b0 = SCNVector3(x: 0.05, y: 0.05, z: 1)
let c0 = SCNVector3(x: 0.05, y: -0.05, z: 1)
let d0 = SCNVector3(x: -0.05, y: -0.05, z: 1)
let A0, B0, C0, D0 = ?? // CGPoints representing position in
// camera frame for camera in 0, 0, 0 facing Z+
// then get transform from 0, 0, 0 to current position/rotation that sees
// a0, b0, c0, d0 through the camera as qrCodeRectangle
}
====Edit====
After trying number of things, I ended up going for camera pose estimation using openCV projection and perspective solver, solvePnP This gives me a rotation and translation that should represent the camera pose in the QR code referential. However when using those values and placing objects corresponding to the inverse transformation, where the QR code should be in the camera space, I get inaccurate shifted values, and I'm not able to get the rotation to work:
// some flavor of pseudo code below
func renderer(_ sender: SCNSceneRenderer, updateAtTime time: TimeInterval) {
guard let currentFrame = sceneView.session.currentFrame, let pov = sceneView.pointOfView else { return }
let intrisics = currentFrame.camera.intrinsics
let QRCornerCoordinatesInQRRef = [(-0.05, -0.05, 0), (0.05, -0.05, 0), (-0.05, 0.05, 0), (0.05, 0.05, 0)]
// uses VNDetectBarcodesRequest to find a QR code and returns a bounding rectangle
guard let qr = findQRCode(in: currentFrame) else { return }
let imageSize = CGSize(
width: CVPixelBufferGetWidth(currentFrame.capturedImage),
height: CVPixelBufferGetHeight(currentFrame.capturedImage)
)
let observations = [
qr.bottomLeft,
qr.bottomRight,
qr.topLeft,
qr.topRight,
].map({ (imageSize.height * (1 - $0.y), imageSize.width * $0.x) })
// image and SceneKit coordinated are not the same
// replacing this by:
// (imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))
// weirdly fixes an issue, see below
let rotation, translation = openCV.solvePnP(QRCornerCoordinatesInQRRef, observations, intrisics)
// calls openCV solvePnP and get the results
let positionInCameraRef = -rotation.inverted * translation
let node = SCNNode(geometry: someGeometry)
pov.addChildNode(node)
node.position = translation
node.orientation = rotation.asQuaternion
}
Here is the output:
where A, B, C, D are the QR code corners in the order they are passed to the program.
The predicted origin stays in place when the phone rotates, but it's shifted from where it should be. Surprisingly, if I shift the observations values, I'm able to correct this:
// (imageSize.height * (1 - $0.y), imageSize.width * $0.x)
// replaced by:
(imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))
and now the predicted origin stays robustly in place. However I don't understand where the shift values come from.
Finally, I've tried to get an orientation fixed relatively to the QR code referential:
var n = SCNNode(geometry: redGeometry)
node.addChildNode(n)
n.position = SCNVector3(0.1, 0, 0)
n = SCNNode(geometry: blueGeometry)
node.addChildNode(n)
n.position = SCNVector3(0, 0.1, 0)
n = SCNNode(geometry: greenGeometry)
node.addChildNode(n)
n.position = SCNVector3(0, 0, 0.1)
The orientation is fine when I look at the QR code straight, but then it shifts by something that seems to be related to the phone rotation:
Outstanding questions I have are:
How do I solve the rotation?
where do the position shift values come from?
What simple relationship do rotation, translation, QRCornerCoordinatesInQRRef, observations, intrisics verify? Is it O ~ K^-1 * (R_3x2 | T) Q ? Because if so that's off by a few order of magnitude.
If that's helpful, here are a few numerical values:
Intrisics matrix
Mat 3x3
1090.318, 0.000, 618.661
0.000, 1090.318, 359.616
0.000, 0.000, 1.000
imageSize
1280.0, 720.0
screenSize
414.0, 736.0
==== Edit2 ====
I've noticed that the rotation works fine when the phone stays horizontally parallel to the QR code (ie the rotation matrix is [[a, 0, b], [0, 1, 0], [c, 0, d]]), no matter what the actual QR code orientation is:
Other rotation don't work.
Coordinate systems' correspondence
Take into consideration that Vision/CoreML coordinate system doesn't correspond to ARKit/SceneKit coordinate system. For details look at this post.
Rotation's direction
I suppose the problem is not in matrix. It's in vertices placement. For tracking 2D images you need to place ABCD vertices counter-clockwise (the starting point is A vertex located in imaginary origin x:0, y:0). I think Apple Documentation on VNRectangleObservation class (info about projected rectangular regions detected by an image analysis request) is vague. You placed your vertices in the same order as is in official documentation:
var bottomLeft: CGPoint
var bottomRight: CGPoint
var topLeft: CGPoint
var topRight: CGPoint
But they need to be placed the same way like positive rotation direction (about Z axis) occurs in Cartesian coordinates system:
World Coordinate Space in ARKit (as well as in SceneKit and Vision) always follows a right-handed convention (the positive Y axis points upward, the positive Z axis points toward the viewer and the positive X axis points toward the viewer's right), but is oriented based on your session's configuration. Camera works in Local Coordinate Space.
Rotation direction about any axis is positive (Counter-Clockwise) and negative (Clockwise). For tracking in ARKit and Vision it's critically important.
The order of rotation also makes sense. ARKit, as well as SceneKit, applies rotation relative to the node’s pivot property in the reverse order of the components: first roll (about Z axis), then yaw (about Y axis), then pitch (about X axis). So the rotation order is ZYX.
Math (Trig.):
Notes: the bottom is l (the QR code length), the left angle is k, and the top angle is i (the camera)

Difference between CATransform3DMakeRotation and CATransform3DRotate

I was looking at the official documentation of CATransform3DMakeRotation and CATransform3DRotate and I cannot understand what's their difference. Where does someone use CATransform3DMakeRotation and where CATransform3DRotate?
You can represent a wide variety of 3D transformations using a 4 x 4 matrix, including translation, scaling, rotation, skewing, and perspective.
You can represent multiple successive transformations in a single matrix by multiplying together the matrices representing each individual transformation.
CATransform3DMakeRotation creates a matrix that represents a single transformation: rotation by a given angle around a given axis.
CATransform3DRotate creates a matrix just like CATransform3DMakeRotation does, and then multiplies that matrix by another matrix, thus adding the rotation to an existing sequence of transformations.
So you really only need one or the other. If you have one, you can easily define the other.
You can write CATransform3DRotate using CATransform3DMakeRotation like this:
func CATransform3DRotate(_ t: CATransform3D, _ angle: CGFloat, _ x: CGFloat, _ y: CGFloat, _ z: CGFloat) -> CATransform3D {
let rotation = CATransform3DMakeRotation(angle, x, y, z)
return CATransform3DConcat(rotation, t)
}
CATransform3DConcat returns the product of the two matrices.
Or you can write CATransform3DMakeRotation using CATransform3DRotate like this:
func myCATransform3DMakeRotation(_ angle: CGFloat, _ x: CGFloat, _ y: CGFloat, _ z: CGFloat) -> CATransform3D {
return CATransform3DRotate(CATransform3DIdentity, angle, x, y, z)
}
CATransform3DIdentity is the identity matrix, and represents no transformation at all.
If you want to understand more about transformation matrices, how to construct and combine them, and why you need a 4x4 matrix for 3D transformations, type homogeneous coordinates 3d into your favorite search engine.
CATransform3DMakeRotation creates a new transform.
CATransform3DRotate takes an existing transform and rotates it.
If you're just trying to rotate, there isn't a different really. But if you need to scale, then rotate, then translate, there could be a difference by the end.

Converting OpenCV's findHomography perspective matrix to iOS' CATransform3D

I'd like to take the perspective transform matrix returned from OpenCV's findHomography function and convert it (either in C++ or Objective-C) to iOS' CATransform3D. I'd like them to be as close as possible in terms of accurately reproducing the "warp" effect on the Core Graphics side. Example code would really be appreciated!
From iOS' CATransform3D.h:
/* Homogeneous three-dimensional transforms. */
struct CATransform3D
{
CGFloat m11, m12, m13, m14;
CGFloat m21, m22, m23, m24;
CGFloat m31, m32, m33, m34;
CGFloat m41, m42, m43, m44;
};
Similar questions:
Apply homography matrix using Core Graphics
convert an opencv affine matrix to CGAffineTransform
Disclaimer
I have never tried this so take it with a grain of salt.
CATRansform3D is a 4x4 matrix which operates on a 3 dimensional homogeneous vector (4x1) to produce another vector of the same type. I am assuming that when rendered, objects described by a 4x1 vector have each element divided by the 4th element and the 3rd element is used only to determine which objects appear on top of which. Assuming this is correct...
Reasoning
The 3x3 matrix returned by findHomography operates on a 2 dimensional homogeneous vector. That process can be thought of in 4 steps
The first column of the homography is multiplied by x
The second column of the homography is multiplied by y
The third column of the homography is multiplied by 1
the resulting 1st and 2nd vector elements are divided by the 3rd
You need this process to be replicated in a 4x4 vector in which I am assuming the 3rd element in the resulting vector is meaningless for your purposes.
Solution
Construct your matrix like this (H is your homography matrix)
[H(0,0), H(0,1), 0, H(0,2),
H(1,0), H(1,1), 0, H(1,2),
0, 0, 1, 0
H(2,0), H(2,1), 0, H(2,2)]
This clearly satisfies 1,2 and 3. 4 is satisfied because the homogeneous element is always the last one. That is why the "homogeneous row" if you will had to get bumped down one line. The 1 on the 3rd row is to let the z component of the vector pass through unmolested.
All of the above is done in row major notation (like openCV) to try to keep things from being confusing. You can look at Tommy's answer to see how the conversion to column major looks (you basically just transpose it). Note however that at the moment Tommy and I disagree about how to construct the matrix.
From my reading of the documentation, m11 in CATransform3D is equivalent to a in CGAffineTransform, m12 is equivalent to b and so on.
As per your comment below, I understand the matrix OpenCV returns to be 3x3 (which, in retrospect, is the size you'd expect). So you'd fill in the other elements with those equivalent to the identity matrix. As per Hammer's answer, you want to preserve the portion that deals with the (usually implicit) homogenous coordinate in its place while padding everything else with the identity.
[aside: my original answer was wrong. I've edited it to be correct since I've posted code and Hammer hasn't. This post is marked as community wiki to reflect that it's in no sense solely my answer]
So I think you'd want:
CATransform3D MatToTransform(Mat cvTransform)
{
CATransform3D transform;
transform.m11 = cvTransform.at<float>(0, 0);
transform.m12 = cvTransform.at<float>(1, 0);
transform.m13 = 0.0f;
transform.m14 = cvTransform.at<float>(2, 0);
transform.m21 = cvTransform.at<float>(0, 1);
transform.m22 = cvTransform.at<float>(1, 1);
transform.m23 = 0.0f;
transform.m24 = cvTransform.at<float>(2, 1);
transform.m31 = 0.0f;
transform.m32 = 0.0f;
transform.m33 = 1.0f;
transform.m34 = 0.0f;
transform.m41 = cvTransform.at<float>(0, 2);
transform.m42 = cvTransform.at<float>(1, 2);
transform.m43 = 0.0f;
transform.m44 = cvTransform.at<float>(2, 2);
return transform;
}
Or use cvGetReal1D if you're keeping C++ out of it.
Tommy answer worked for me, but I needed to use double, instead of float. This is also shortened version of the code:
CATransform3D MatToCATransform3D(cv::Mat H) {
return {
H.at<double>(0, 0), H.at<double>(1, 0), 0.0, H.at<double>(2, 0),
H.at<double>(0, 1), H.at<double>(1, 1), 0.0, H.at<double>(2, 1),
0.0, 0.0, 1.0, 0.0,
H.at<double>(0, 2), H.at<double>(1, 2), 0.0f, H.at<double>(2, 2)
};
}

Resources