I'm using both ARKit & Vision, following along Apple's sample project, "Using Vision in Real Time with ARKit". So I am not setting up my camera as ARKit handles that for me.
Using Vision's VNDetectFaceRectanglesRequest, I'm able to get back a collection of VNFaceObservation objects.
Following various guides online, I'm able to transform the VNFaceObservation's boundingBox to one that I can use on my ViewController's UIView.
The Y-axis is correct when placed on my UIView in ARKit, but the X-axis is completely off & inaccurate.
// face is an instance of VNFaceObservation
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -view.frame.height)
let translate = CGAffineTransform.identity.scaledBy(x: view.frame.width, y: view.frame.height)
let rect = face.boundingBox.applying(translate).applying(transform)
What is the correct way to display the boundingBox on the screen (in ARKit/UIKit) so that the X & Y axis match up correctly to the detected face rectangle? I can't use self.cameraLayer.layerRectConverted(fromMetadataOutputRect: transformedRect) since I'm not using AVCaptureSession.
Update: Digging into this further, the camera's image is 1920 x 1440. Most of it is not displayed on ARKit's screen space. The iPhone XS screen is 375 x 812 points.
After I get Vision's observation boundingBox, I've transformed it to fit the current view (375 x 812). This isn't working since the actual width seems to be 500 (the left & right sides are out of the screen view). How do I CGAffineTransform the CGRect bounding box (seems like 500x812, a total guess) from 375x812?
The key piece missing here is ARFrame's displayTransform(for:viewportSize:). You can read the documentation for it here.
This function will generate the appropriate transform for a given frame and viewport size (the CGRect of the view you're displaying the image and bounding box in).
func visionTransform(frame: ARFrame, viewport: CGRect) -> CGAffineTransform {
let orientation = UIApplication.shared.statusBarOrientation
let transform = frame.displayTransform(for: orientation,
viewportSize: viewport.size)
let scale = CGAffineTransform(scaleX: viewport.width,
y: viewport.height)
var t = CGAffineTransform()
if orientation.isPortrait {
t = CGAffineTransform(scaleX: -1, y: 1)
t = t.translatedBy(x: -viewport.width, y: 0)
} else if orientation.isLandscape {
t = CGAffineTransform(scaleX: 1, y: -1)
t = t.translatedBy(x: 0, y: -viewport.height)
return transform.concatenating(scale).concatenating(t)
You can then use this like so:
let transform = visionTransform(frame: yourARFrame, viewport: yourViewport)
let rect = face.boundingBox.applying(transform)
I have a 3d vector I'm applying as a physics force:
let force = SCNVector3(x: 0, y: 0, z: -5)
node.physicsBody?.applyForce(force, asImpulse: true)
I need to rotate the force based on the mobile device's position which is available to me as a 4x4 matrix transform or euler angles.
var transform :matrix_float4x4 - The position and orientation of the camera in world coordinate space.
var eulerAngles :vector_float3 - The orientation of the camera, expressed as roll, pitch, and yaw values.
I think this is more of a fundamental 3d graphics question, but the application of this is a Swift based iOS app using SceneKit and ARKit.
There are some utilities available to me in the SceneKit and simd libraries. Unfortunately my naive attempts to do things like simd_mul(force, currentFrame.camera.transform) are failing me.
#orangenkopf provided a great answer that helped me come up with this:
let force = simd_make_float4(0, 0, -5, 0)
let rotatedForce = simd_mul(currentFrame.camera.transform, force)
let vectorForce = SCNVector3(x:rotatedForce.x, y:rotatedForce.y, z:rotatedForce.z)
node.physicsBody?.applyForce(vectorForce, asImpulse: true)
Your idea is right. You need to multiply the transform and the direction.
I can't find any documentation on simd_mul. But i suspect you have at least one of the following problems:
simd_mul applies both the rotation and the translation of the transform
The transform of the camera is in world coordinate space. Depending your node hierachy this can result in a direction that is way off.
SceneKit does not provide much linear algebra functions, so we have to build our own:
extension SCNMatrix4 {
static public func *(left: SCNMatrix4, right: SCNVector4) -> SCNVector4 {
let x = left.m11*right.x + left.m21*right.y + left.m31*right.z + left.m41*right.w
let y = left.m12*right.x + left.m22*right.y + left.m32*right.z + left.m42*right.w
let z = left.m13*right.x + left.m23*right.y + left.m33*right.z + left.m43*right.w
let w = left.m14*right.x + left.m24*right.y + left.m43*right.z + left.m44*right.w
return SCNVector4(x: x, y: y, z: z, w: w)
extension SCNVector4 {
public func to3() -> SCNVector3 {
return SCNVector3(self.x , self.y, self.z)
Now do the following:
Convert the camera transform to the nodes local coordinate system
Create the force as a 4d vector, set the fourth element to 0 to ignore the translation
Multiply the transform and the vector
// Convert the tranform to a SCNMatrix4
let transform = SCNMatrix4FromMat4(currentFrame.camera.transform)
// Convert the matrix to the nodes coordinate space
let localTransform = node.convertTransform(transform, from: nil)
let force = SCNVector4(0, 0, -5, 0)
let rotatedForce = (localTransform * force).to3()
node.physicsBody?.applyForce(rotatedForce, asImpulse: true)
I'm trying to estimate my device position related to a QR code in space. I'm using ARKit and the Vision framework, both introduced in iOS11, but the answer to this question probably doesn't depend on them.
With the Vision framework, I'm able to get the rectangle that bounds a QR code in the camera frame. I'd like to match this rectangle to the device translation and rotation necessary to transform the QR code from a standard position.
For instance if I observe the frame:
* *
* *
while if I was 1m away from the QR code, centered on it, and assuming the QR code has a side of 10cm I'd see:
* *
A0 B0
D0 C0
* *
what has been my device transformation between those two frames? I understand that an exact result might not be possible, because maybe the observed QR code is slightly non planar and we're trying to estimate an affine transform on something that is not one perfectly.
I guess the sceneView.pointOfView?.camera?.projectionTransform is more helpful than the sceneView.pointOfView?.camera?.projectionTransform?.camera.projectionMatrix since the later already takes into account transform inferred from the ARKit that I'm not interested into for this problem.
How would I fill
func get transform(
qrCodeRectangle: VNBarcodeObservation,
cameraTransform: SCNMatrix4) {
// qrCodeRectangle.topLeft etc is the position in [0, 1] * [0, 1] of A0
// expected real world position of the QR code in a referential coordinate system
let a0 = SCNVector3(x: -0.05, y: 0.05, z: 1)
let b0 = SCNVector3(x: 0.05, y: 0.05, z: 1)
let c0 = SCNVector3(x: 0.05, y: -0.05, z: 1)
let d0 = SCNVector3(x: -0.05, y: -0.05, z: 1)
let A0, B0, C0, D0 = ?? // CGPoints representing position in
// camera frame for camera in 0, 0, 0 facing Z+
// then get transform from 0, 0, 0 to current position/rotation that sees
// a0, b0, c0, d0 through the camera as qrCodeRectangle
After trying number of things, I ended up going for camera pose estimation using openCV projection and perspective solver, solvePnP This gives me a rotation and translation that should represent the camera pose in the QR code referential. However when using those values and placing objects corresponding to the inverse transformation, where the QR code should be in the camera space, I get inaccurate shifted values, and I'm not able to get the rotation to work:
// some flavor of pseudo code below
func renderer(_ sender: SCNSceneRenderer, updateAtTime time: TimeInterval) {
guard let currentFrame = sceneView.session.currentFrame, let pov = sceneView.pointOfView else { return }
let intrisics = currentFrame.camera.intrinsics
let QRCornerCoordinatesInQRRef = [(-0.05, -0.05, 0), (0.05, -0.05, 0), (-0.05, 0.05, 0), (0.05, 0.05, 0)]
// uses VNDetectBarcodesRequest to find a QR code and returns a bounding rectangle
guard let qr = findQRCode(in: currentFrame) else { return }
let imageSize = CGSize(
width: CVPixelBufferGetWidth(currentFrame.capturedImage),
height: CVPixelBufferGetHeight(currentFrame.capturedImage)
let observations = [
].map({ (imageSize.height * (1 - $0.y), imageSize.width * $0.x) })
// image and SceneKit coordinated are not the same
// replacing this by:
// (imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))
// weirdly fixes an issue, see below
let rotation, translation = openCV.solvePnP(QRCornerCoordinatesInQRRef, observations, intrisics)
// calls openCV solvePnP and get the results
let positionInCameraRef = -rotation.inverted * translation
let node = SCNNode(geometry: someGeometry)
node.position = translation
node.orientation = rotation.asQuaternion
Here is the output:
where A, B, C, D are the QR code corners in the order they are passed to the program.
The predicted origin stays in place when the phone rotates, but it's shifted from where it should be. Surprisingly, if I shift the observations values, I'm able to correct this:
// (imageSize.height * (1 - $0.y), imageSize.width * $0.x)
// replaced by:
(imageSize.height * (1.35 - $0.y), imageSize.width * ($0.x - 0.2))
and now the predicted origin stays robustly in place. However I don't understand where the shift values come from.
Finally, I've tried to get an orientation fixed relatively to the QR code referential:
var n = SCNNode(geometry: redGeometry)
n.position = SCNVector3(0.1, 0, 0)
n = SCNNode(geometry: blueGeometry)
n.position = SCNVector3(0, 0.1, 0)
n = SCNNode(geometry: greenGeometry)
n.position = SCNVector3(0, 0, 0.1)
The orientation is fine when I look at the QR code straight, but then it shifts by something that seems to be related to the phone rotation:
Outstanding questions I have are:
How do I solve the rotation?
where do the position shift values come from?
What simple relationship do rotation, translation, QRCornerCoordinatesInQRRef, observations, intrisics verify? Is it O ~ K^-1 * (R_3x2 | T) Q ? Because if so that's off by a few order of magnitude.
If that's helpful, here are a few numerical values:
Intrisics matrix
Mat 3x3
1090.318, 0.000, 618.661
0.000, 1090.318, 359.616
0.000, 0.000, 1.000
1280.0, 720.0
414.0, 736.0
==== Edit2 ====
I've noticed that the rotation works fine when the phone stays horizontally parallel to the QR code (ie the rotation matrix is [[a, 0, b], [0, 1, 0], [c, 0, d]]), no matter what the actual QR code orientation is:
Other rotation don't work.
Coordinate systems' correspondence
Take into consideration that Vision/CoreML coordinate system doesn't correspond to ARKit/SceneKit coordinate system. For details look at this post.
Rotation's direction
I suppose the problem is not in matrix. It's in vertices placement. For tracking 2D images you need to place ABCD vertices counter-clockwise (the starting point is A vertex located in imaginary origin x:0, y:0). I think Apple Documentation on VNRectangleObservation class (info about projected rectangular regions detected by an image analysis request) is vague. You placed your vertices in the same order as is in official documentation:
var bottomLeft: CGPoint
var bottomRight: CGPoint
var topLeft: CGPoint
var topRight: CGPoint
But they need to be placed the same way like positive rotation direction (about Z axis) occurs in Cartesian coordinates system:
World Coordinate Space in ARKit (as well as in SceneKit and Vision) always follows a right-handed convention (the positive Y axis points upward, the positive Z axis points toward the viewer and the positive X axis points toward the viewer's right), but is oriented based on your session's configuration. Camera works in Local Coordinate Space.
Rotation direction about any axis is positive (Counter-Clockwise) and negative (Clockwise). For tracking in ARKit and Vision it's critically important.
The order of rotation also makes sense. ARKit, as well as SceneKit, applies rotation relative to the node’s pivot property in the reverse order of the components: first roll (about Z axis), then yaw (about Y axis), then pitch (about X axis). So the rotation order is ZYX.
Math (Trig.):
Notes: the bottom is l (the QR code length), the left angle is k, and the top angle is i (the camera)
For a voxel art app, the goal is to let users move and rotate a camera in a SceneKit scene then tap to place a block.
The code below lets a user rotate a camera by panning. After the gesture ends, we move an existing block so it is -X units on the camera's Z-axis (i.e., -X units in front of the camera).
cameraNode is the scene's point of view and is a child of userNode. When the user moves a joystick, we update the position of userNode.
Question: Other SO posts manipulate camera nodes by applying a transform instead of changing the rotation and position properties. Is one approach better than the other?
func sceneViewPannedOneFinger(sender: UIPanGestureRecognizer) {
// Get pan distance & convert to radians
let translation = sender.translationInView(sender.view!)
var xRadians = GLKMathDegreesToRadians(Float(translation.x))
var yRadians = GLKMathDegreesToRadians(Float(translation.y))
// Get x & y radians
xRadians = (xRadians / 6) + curXRadians
yRadians = (yRadians / 6) + curYRadians
// Limit yRadians to prevent rotating 360 degrees vertically
yRadians = max(Float(-M_PI_2), min(Float(M_PI_2), yRadians))
// Set rotation values to avoid Gimbal Lock
cameraNode.rotation = SCNVector4(x: 1, y: 0, z: 0, w: yRadians)
userNode.rotation = SCNVector4(x: 0, y: 1, z: 0, w: xRadians)
// Save value for next rotation
if sender.state == UIGestureRecognizerState.Ended {
curXRadians = xRadians
curYRadians = yRadians
// Set preview block
private func setPreviewBlock(var futurePosition: SCNVector3 = SCNVector3Zero, reach: Float = 8) -> SCNVector3 {
// Get future position
if SCNVector3EqualToVector3(futurePosition, SCNVector3Zero) {
futurePosition = userNode.position
// Get current position after accounting for rotations
let hAngle = Float(cameraNode.rotation.w * cameraNode.rotation.x)
let vAngle = Float(userNode.rotation.w * userNode.rotation.y)
var position = getSphericalCoords(hAngle, t: vAngle, r: reach)
position += userNode.position
// Snap position to grid
position = position.rounded()
// Ensure preview block never dips below floor
position.y = max(0, position.y)
// Return if snapped position hasn't changed
if SCNVector3EqualToVector3(position, previewBlock.position) {
return position
// If here, animate preview block to new position
previewBlock.position = position
// Return position
return position
func getSphericalCoords(s: Float, t: Float, r: Float) -> SCNVector3 {
return SCNVector3(-(cos(s) * sin(t) * r),
sin(s) * r,
-(cos(s) * cos(t) * r))
Hi I'm trying to rotate a 3D object in scenekit with no success heres my code:
let rotateAction = SCNAction.rotateByAngle(90, aroundAxis: SCNVector3Make(0, 1, 0), duration: 3)
let moveAction = SCNAction.moveByX(25, y: 0, z: 0, duration: 6)
ship.runAction(rotateAction, completionHandler: {ship.runAction(moveAction)})
I have managed to get it rotating on the correct axis but for some reason its not rotating by the 90 degrees that I've stated it just spins numerous times for the 3 seconds. I appreciate any help thanks.
the angle for rotateByAngle is in radians, so for 90 degrees you'd have to make the angle 1.571 radians.
if you'd like to be more thorough, add a little function to convert degrees to radians. it also will make the code easier to understand in the future.
func degToRadians(degrees:Double) -> Double
return degrees * (M_PI / 180);
SCNAction Class Reference