ARKit - Projection of ARAnchor to 2D space - ios

I am trying to project an ARAnchor to the 2D space but I am facing on an orientation issue...
Below my function to project the top left, top right, bottom left, bottom right corner position to 2D space:
/// Returns the projection of an `ARImageAnchor` from the 3D world space
/// detected by ARKit into the 2D space of a view rendering the scene.
/// - Parameter from: An Anchor instance for projecting.
/// - Returns: An optional `CGRect` corresponding on `ARImageAnchor` projection.
internal func projection(from anchor: ARImageAnchor,
alignment: ARPlaneAnchor.Alignment,
debug: Bool = false) -> CGRect? {
guard let camera = session.currentFrame?.camera else {
return nil
let refImg = anchor.referenceImage
let anchor3DPoint = anchor.transform.columns.3
let size = view.bounds.size
let width = Float(refImg.physicalSize.width / 2)
let height = Float(refImg.physicalSize.height / 2)
/// Upper left corner point
let projection = ProjectionHelper.projection(from: anchor3DPoint,
width: width,
height: height,
focusAlignment: alignment)
let topLeft = projection.0
let topLeftProjected = camera.projectPoint(topLeft,
orientation: .portrait,
viewportSize: size)
let topRight:simd_float3 = projection.1
let topRightProjected = camera.projectPoint(topRight,
orientation: .portrait,
viewportSize: size)
let bottomLeft = projection.2
let bottomLeftProjected = camera.projectPoint(bottomLeft,
orientation: .portrait,
viewportSize: size)
let bottomRight = projection.3
let bottomRightProjected = camera.projectPoint(bottomRight,
orientation: .portrait,
viewportSize: size)
let result = CGRect(origin: topLeftProjected,
size: CGSize(width: topRightProjected.distance(point: topLeftProjected),
height: bottomRightProjected.distance(point: bottomLeftProjected)))
return result
This function works pretty well when I am in front of the world origin. However, if I move left or right the calculation of the corner points does not work.

I found a solution to get corner 3D points of an ARImageAnchor depending on the anchor.transform and project them to 2D space:
extension simd_float4 {
var vector_float3: vector_float3 { return simd_float3([x, y, z]) }
/// Returns the projection of an `ARImageAnchor` from the 3D world space
/// detected by ARKit into the 2D space of a view rendering the scene.
/// - Parameter from: An Anchor instance for projecting.
/// - Returns: An optional `CGRect` corresponding on `ARImageAnchor` projection.
internal func projection(from anchor: ARImageAnchor) -> CGRect? {
guard let camera = session.currentFrame?.camera else {
return nil
let refImg = anchor.referenceImage
let transform = anchor.transform.transpose
let size = view.bounds.size
let width = Float(refImg.physicalSize.width / 2)
let height = Float(refImg.physicalSize.height / 2)
// Get corner 3D points
let pointsWorldSpace = [
matrix_multiply(simd_float4([width, 0, -height, 1]), transform).vector_float3, // top right
matrix_multiply(simd_float4([width, 0, height, 1]), transform).vector_float3, // bottom right
matrix_multiply(simd_float4([-width, 0, -height, 1]), transform).vector_float3, // bottom left
matrix_multiply(simd_float4([-width, 0, height, 1]), transform).vector_float3 // top left
// Project 3D point to 2D space
let pointsViewportSpace = { (point) -> CGPoint in
return camera.projectPoint(
orientation: .portrait,
viewportSize: size
// Create a rectangle shape of the projection
// to calculate the Intersection Over Union of other `ARImageAnchor`
let result = CGRect(
origin: pointsViewportSpace[3],
size: CGSize(
width: pointsViewportSpace[0].distance(point: pointsViewportSpace[3]),
height: pointsViewportSpace[1].distance(point: pointsViewportSpace[2])
return result


Vision Recognized Object results into ARView as AnchorEntity

I get ARFrame's from the session delegate of an ARView where I then perform inference with CoreML + Vision using a YOLOv5 model. I successfully get an array of [VNRecognizedObjectObservation]'s
I pass these observations to a function like this:
func add(inferenceResults: [VNRecognizedObjectObservation], from frame: ARFrame) {
for inference in inferenceResults {
//NOTE: 1
let flippedNormalizedBoundingBox = inference.boundingBox.flipYCoordinateFromBottomLeftToUpperLeft
let point =
let label = inference.labels.first?.identifier ?? "Unknown"
guard arView.entity(at: point) == nil else {
let estimatedPlane = ARRaycastQuery.Target.estimatedPlane
let alignment = ARRaycastQuery.TargetAlignment.any
//NOTE: 2
let raycastQuery = frame.raycastQuery(from: point, allowing: estimatedPlane, alignment: alignment)
guard let raycastResult = arView.session.raycast(raycastQuery).first else {
print("No Ray cast results")
let newAnchor = AnchorEntity(world: raycastResult.worldTransform)
let squareMaterial = SimpleMaterial(color: .blue, isMetallic: true)
let textMaterial = SimpleMaterial(color: .white, isMetallic: true)
let squareEntity = ModelEntity(mesh: MeshResource.generatePlane(width: 0.1, height: 0.1, cornerRadius: 0), materials: [squareMaterial])
let textMesh = MeshResource.generateText(label, extrusionDepth: 0.1, font: .systemFont(ofSize: 2), containerFrame: .zero, alignment: .center, lineBreakMode: .byCharWrapping)
let textEntity = ModelEntity(mesh: textMesh, materials: [textMaterial])
textEntity.scale = SIMD3<Float>(0.03, 0.03, 0.1)
squareEntity.addChild(textEntity) = label
Some extensions
extension CGRect {
/// This will change the Y origin from the lower left corner to the upper left corner
public var flipYCoordinateFromBottomLeftToUpperLeft: CGRect {
return CGRect.init(x: self.origin.x, y: (1 - self.origin.y - self.height), width: self.width, height: self.height)
/// Returns a `CGPoint` that represents the center of the `CGRect`
/// - Returns: A `CGPoint` constructed by obtaining the `midX` and `midY` values
public func center() -> CGPoint {
let midY = self.midY
let midX = self.midX
let point = CGPoint(x: midX, y: midY)
return point
I end up getting results like this
NOTE 1: BBOX's from vision are normalized and have an odd origin.
PROBLEM 1: Because I can do inference quickly I don't want to keep adding AnchorEntities at the same location. This is an attempt to stop further processing but it does not ever break
NOTE 2: I know there is a rayCast function from the ARView but it seems like I want to use the rayCast function from the ARFrame I speculate that after a few milliseconds of inference on a background thread the results may be different depending on which object I do the recast from? Because the user moved?
PROBLEM 2: My AnchorEntities are alway black
PROBLEM 3: The text and BBOX is never aligned with the camera. "Billboard style"
In general I would like to apply a square with a label in AR that was reflective of the size of the BBOX from vision. I need to get past these few problems first before I refine to that level. Any help is appreciated! AR is Fun.

Transforming ARFrame#capturedImage to view size

When using the ARSessionDelegate to process the raw camera image in ARKit...
func session(_ session: ARSession, didUpdate frame: ARFrame) {
guard let currentFrame = session.currentFrame else { return }
let capturedImage = currentFrame.capturedImage
debugPrint("Display size", UIScreen.main.bounds.size)
debugPrint("Camera frame resolution", CVPixelBufferGetWidth(capturedImage), CVPixelBufferGetHeight(capturedImage))
// ...
... as documented, the camera image data doesn't match the screen size, for example, on iPhone X I get:
Display size: 375x812pt
Camera resolution: 1920x1440px
Now there is the displayTransform(for:viewportSize:) API to transform camera coordinates to view coordinates. When using the API like this:
let ciimage = CIImage(cvImageBuffer: capturedImage)
let transform = currentFrame.displayTransform(for: .portrait, viewportSize: UIScreen.main.bounds.size)
var transformedImage = ciimage.transformed(by: transform)
debugPrint("Transformed size", transformedImage.extent.size)
I get a size of 2340x1920 which seems incorrect, the result should have an aspect ratio of 375:812 (~0.46). What do I miss here / what's the correct way to use this API to transform the camera image to an image "as displayed by ARSCNView"?
(Example project: ARKitCameraImage)
This turned out to be quite complicated because displayTransform(for:viewportSize) expects normalized image coordinates, it seems you have to flip the coordinates only in portrait mode and the image needs to be not only transformed but also cropped. The following code does the trick for me. Suggestions how to improve this would be appreciated.
guard let frame = session.currentFrame else { return }
let imageBuffer = frame.capturedImage
let imageSize = CGSize(width: CVPixelBufferGetWidth(imageBuffer), height: CVPixelBufferGetHeight(imageBuffer))
let viewPort = sceneView.bounds
let viewPortSize = sceneView.bounds.size
let interfaceOrientation : UIInterfaceOrientation
if #available(iOS 13.0, *) {
interfaceOrientation = self.sceneView.window!.windowScene!.interfaceOrientation
} else {
interfaceOrientation = UIApplication.shared.statusBarOrientation
let image = CIImage(cvImageBuffer: imageBuffer)
// The camera image doesn't match the view rotation and aspect ratio
// Transform the image:
// 1) Convert to "normalized image coordinates"
let normalizeTransform = CGAffineTransform(scaleX: 1.0/imageSize.width, y: 1.0/imageSize.height)
// 2) Flip the Y axis (for some mysterious reason this is only necessary in portrait mode)
let flipTransform = (interfaceOrientation.isPortrait) ? CGAffineTransform(scaleX: -1, y: -1).translatedBy(x: -1, y: -1) : .identity
// 3) Apply the transformation provided by ARFrame
// This transformation converts:
// - From Normalized image coordinates (Normalized image coordinates range from (0,0) in the upper left corner of the image to (1,1) in the lower right corner)
// - To view coordinates ("a coordinate space appropriate for rendering the camera image onscreen")
// See also:
let displayTransform = frame.displayTransform(for: interfaceOrientation, viewportSize: viewPortSize)
// 4) Convert to view size
let toViewPortTransform = CGAffineTransform(scaleX: viewPortSize.width, y: viewPortSize.height)
// Transform the image and crop it to the viewport
let transformedImage = image.transformed(by: normalizeTransform.concatenating(flipTransform).concatenating(displayTransform).concatenating(toViewPortTransform)).cropped(to: viewPort)
Thank you so much for your answer! I was working on this for a week.
Here's an alternative way to do it without messing with the orientation. Instead of using the capturedImage property you can use a snapshot of the screen.
func session(_ session: ARSession, didUpdate frame: ARFrame) {
guard let image = CIImage(image: sceneView.snapshot()) else { return }
let imageSize = image.extent.size
// Convert to "normalized image coordinates"
let resize = CGAffineTransform(scaleX: 1.0 / imageSize.width, y: 1.0 / imageSize.height)
// Convert to view size
let viewSize = CGAffineTransform(scaleX: sceneView.bounds.size.width, y: sceneView.bounds.size.height)
// Transform image
let editedImage = image.transformed(by: resize.concatenating(viewSize)).cropped(to: sceneView.bounds)
sceneView.scene.background.contents = context.createCGImage(editedImage, from: editedImage.extent)

Crop Image from Camera in Swift without move to another ViewController

I have an image overlay inside CameraViewController:
I want to get the image from inside this red square.
I don't want to move to another view controller to setup a CropViewController, the crop should be done inside this Controller.
This code behind almost works, the problem is that the image generated from camera is 1080x1920 and the self.cropView.bounds is (0,0,185,120) and of course it do not represent the same scale used to take the image
extension UIImage {
func crop(rect: CGRect) -> UIImage {
var rect = rect
let imageRef = self.cgImage!.cropping(to: rect)
let image = UIImage(cgImage: imageRef!, scale: self.scale, orientation: self.imageOrientation)
return image
You can always crop visually any image in a quadrilateral (a four sided shape - doesn't have to be rectangle) using a Core Image filter call CIPerspectiveCorrection.
Let's say you have an imageView frame that is 414 width by 716 height, with an image that is 1600 width by 900 height in size. (You are using a content mode of .aspectFit, right?) Let's say you want to crop a 4 sided shape that's corners - in (X,Y) coordinates in the imageView - are (50,50), (75,75), (100,300), and (25,200). Note that I'm listing the points in top left (TL, top right (TR), bottom right (BR), bottom left (BL) order. Also note that this is not a straight forward rectangle.
What you need to do is this:
Convert the UIImage to a CIImage where the "extent" is the UIImage size,
Convert those UIImageView coordinates to CIImage coordinates,
pass them and the CIImage into the CIPerspectiveCorrection filter for cropping, and
render the CIImage output into a UIImageView.
The below code is a little rough around the edges, but hopefully you get the concept:
class ViewController: UIViewController {
let uiTL = CGPoint(x: 50, y: 50)
let uiTR = CGPoint(x: 75, y: 75)
let uiBL = CGPoint(x: 100, y: 300)
let uiBR = CGPoint(x: 25, y: 200)
var ciImage:CIImage!
var ctx:CIContext!
#IBOutlet weak var imageView: UIImageView!
override func viewDidLoad() {
ctx = CIContext(options: nil)
ciImage = CIImage(image: imageView.image!)
override func viewWillLayoutSubviews() {
let ciTL = createVector(createScaledPoint(uiTL))
let ciTR = createVector(createScaledPoint(uiTR))
let ciBR = createVector(createScaledPoint(uiBR))
let ciBL = createVector(createScaledPoint(uiBL))
imageView.image = doPerspectiveCorrection(CIImage(image: imageView.image!)!,
context: ctx,
topLeft: ciTL,
topRight: ciTR,
bottomRight: ciBR,
bottomLeft: ciBL)
func doPerspectiveCorrection(
_ image:CIImage,
-> UIImage {
let filter = CIFilter(name: "CIPerspectiveCorrection")
filter?.setValue(topLeft, forKey: "inputTopLeft")
filter?.setValue(topRight, forKey: "inputTopRight")
filter?.setValue(bottomRight, forKey: "inputBottomRight")
filter?.setValue(bottomLeft, forKey: "inputBottomLeft")
filter!.setValue(image, forKey: kCIInputImageKey)
let cgImage = context.createCGImage((filter?.outputImage)!, from: (filter?.outputImage!.extent)!)
return UIImage(cgImage: cgImage!)
func createScaledPoint(_ pt:CGPoint) -> CGPoint {
let x = (pt.x / imageView.frame.width) * ciImage.extent.width
let y = (pt.y / imageView.frame.height) * ciImage.extent.height
return CGPoint(x: x, y: y)
func createVector(_ point:CGPoint) -> CIVector {
return CIVector(x: point.x, y: ciImage.extent.height - point.y)
func createPoint(_ vector:CGPoint) -> CGPoint {
return CGPoint(x: vector.x, y: ciImage.extent.height - vector.y)
EDIT: I'm putting this here to explain things. The two of us swapped projects, and there was an issue with the questioner's code where a nil return was happening. First, here's the corrected code, which should be in the cropImage() function:
let ciTL = createVector(createScaledPoint(topLeft, overlay: cameraView, image: image), image: image)
let ciTR = createVector(createScaledPoint(topRight, overlay: cameraView, image: image), image: image)
let ciBR = createVector(createScaledPoint(bottomRight, overlay: cameraView, image: image), image: image)
let ciBL = createVector(createScaledPoint(bottomLeft, overlay: cameraView, image: image), image: image)
The issue is with the last two lines, which were transposed by passing bottomLeft where it should have been bottomRight, and vice-versa. (Easy mistake to make, I've done it too!)
Some explanation to help those who use CIPerspectiveCorrection (and other filters that use CIVectors).
A CIVector can have anywhere from - I think 2 to, well, almost infinite amount of components. It depends on the filter. In this case there are two components (X, Y). Simple enough, but the twist is that the 4 CIVectors describe 4 points inside the CIImage extent where the origin is the bottom left, not the top left.
Note I did not say a 4 sided shape. You can actually have a "figure 8" like shape where the "bottom right" point is left of the "bottom left" point! This would result in a shape where two sides cross each other.
All that matters is that all 4 points lie with the CIImage extent. If they don't, the filter with return nil for it's output image.
One last note for those who haven't work with CIImage filters before - the filters will not execute until you ask for the outputImage. You can instantiate one, fill in the parameters, chain them, whatever. You can even make a typo in the filter name (or any of their keys). Until your code asks for the filter.outputImage, nothing happens.

Adding AnchorPoint to SKNode breaks SKScene positioning

I am trying to have my SKCameraNode start in the bottom left corner, and have my background anchored there as well. When I set the anchor point to CGPointZero, here is what my camera shows:
Interestingly, If I set my AnchorPoint to CGPoint(x:0.5, y:0.2), I get it mostly lined up. Does it have to do with the camera scale?
If I change my scene size, I can change where the background nodes show up. Usually they appear with their anchor point placed in the center of the screen, which implies the anchorPoint of the scene is in the center of the screen.
I am new to using the SKCameraNode, and so I am probably setting it's constraints incorrectly.
Here are my camera constraints: I don't have my player added yet, but I want to set my world up first before I add my player. Again I am trying to have everything anchored off CGPointZero.
//Camera Settings
func setCameraConstraints() {
guard let camera = camera else { return }
if let player = worldLayer.childNodeWithName("playerNode") as? EntityNode {
let zeroRange = SKRange(constantValue: 0.0)
let playerNode = player
let playerLocationConstraint = SKConstraint.distance(zeroRange, toNode: playerNode)
let scaledSize = CGSize(width: SKMViewSize!.width * camera.xScale, height: SKMViewSize!.height * camera.yScale)
let boardContentRect = worldFrame
let xInset = min((scaledSize.width / 2), boardContentRect.width / 2)
let yInset = min((scaledSize.height / 2), boardContentRect.height / 2)
let insetContentRect = boardContentRect.insetBy(dx: xInset, dy: yInset)
let xRange = SKRange(lowerLimit: insetContentRect.minX, upperLimit: insetContentRect.maxX)
let yRange = SKRange(lowerLimit: insetContentRect.minY, upperLimit: insetContentRect.maxY)
let levelEdgeConstraint = SKConstraint.positionX(xRange, y: yRange)
levelEdgeConstraint.referenceNode = worldLayer
camera.constraints = [playerLocationConstraint, levelEdgeConstraint]
I have been using a Udemy course to learn the SKCameraNode, and I have been trying to modify it.
Here is where I set the SKMViewSize:
convenience init(screenSize: CGSize, canvasSize: CGSize) {
if (screenSize.height < screenSize.width) {
SKMViewSize = screenSize
else {
SKMViewSize = CGSize(width: screenSize.height, height: screenSize.width)
SKMSceneSize = canvasSize
SKMScale = (SKMViewSize!.height / SKMSceneSize!.height)
let scale:CGFloat = min( SKMSceneSize!.width/SKMViewSize!.width, SKMSceneSize!.height/SKMViewSize!.height )
SKMUIRect = CGRect(x: ((((SKMViewSize!.width * scale) - SKMSceneSize!.width) * 0.5) * -1.0), y: ((((SKMViewSize!.height * scale) - SKMSceneSize!.height) * 0.5) * -1.0), width: SKMViewSize!.width * scale, height: SKMViewSize!.height * scale)
How can I get both the camera to be constrained by my world, and have everything anchored to the CGPointZero?

Draw a grid with SpriteKit

What would be the best way to draw a grid like this by using the SpriteKit 2D game engine?
Input programatically the number of columns and rows (5x5, 10x3, 3x4 etc.).
Draw it programmatically using something like SKSpriteNode or SKShapeNode, since just using images of a square like this doesn't seem very efficient to me.
The squares should have a fixed size (let's say each is 40x40).
The grid should be vertically and horizontally centred in the view.
I'm planning to use a SKSpriteNode (from an image) as a player moving in different squares in this grid.
So, I'll save in a 2 dimensional array the central point (x,y) of each square and then move from the player's current position to that position. If you have a better suggestion for this too, I'd like to hear it.
I would appreciate a solution in Swift (preferably 2.1), but Objective-C would do too. Planning on using this only on iPhone devices.
My question is close to this one. Any help is appreciated.
I suggest you implement the grid as a texture of an SKSpriteNode because Sprite Kit will renders the grid in a single draw call. Here's a example of how to do that:
class Grid:SKSpriteNode {
var rows:Int!
var cols:Int!
var blockSize:CGFloat!
convenience init?(blockSize:CGFloat,rows:Int,cols:Int) {
guard let texture = Grid.gridTexture(blockSize: blockSize,rows: rows, cols:cols) else {
return nil
self.init(texture: texture, color:SKColor.clear, size: texture.size())
self.blockSize = blockSize
self.rows = rows
self.cols = cols
class func gridTexture(blockSize:CGFloat,rows:Int,cols:Int) -> SKTexture? {
// Add 1 to the height and width to ensure the borders are within the sprite
let size = CGSize(width: CGFloat(cols)*blockSize+1.0, height: CGFloat(rows)*blockSize+1.0)
guard let context = UIGraphicsGetCurrentContext() else {
return nil
let bezierPath = UIBezierPath()
let offset:CGFloat = 0.5
// Draw vertical lines
for i in 0...cols {
let x = CGFloat(i)*blockSize + offset
bezierPath.move(to: CGPoint(x: x, y: 0))
bezierPath.addLine(to: CGPoint(x: x, y: size.height))
// Draw horizontal lines
for i in 0...rows {
let y = CGFloat(i)*blockSize + offset
bezierPath.move(to: CGPoint(x: 0, y: y))
bezierPath.addLine(to: CGPoint(x: size.width, y: y))
bezierPath.lineWidth = 1.0
let image = UIGraphicsGetImageFromCurrentImageContext()
return SKTexture(image: image!)
func gridPosition(row:Int, col:Int) -> CGPoint {
let offset = blockSize / 2.0 + 0.5
let x = CGFloat(col) * blockSize - (blockSize * CGFloat(cols)) / 2.0 + offset
let y = CGFloat(rows - row - 1) * blockSize - (blockSize * CGFloat(rows)) / 2.0 + offset
return CGPoint(x:x, y:y)
And here's how to create a grid and add a game piece to the grid
class GameScene: SKScene {
override func didMove(to: SKView) {
if let grid = Grid(blockSize: 40.0, rows:5, cols:5) {
grid.position = CGPoint (x:frame.midX, y:frame.midY)
let gamePiece = SKSpriteNode(imageNamed: "Spaceship")
gamePiece.position = grid.gridPosition(row: 1, col: 0)
To determine which grid square was touched, add this to init
self.isUserInteractionEnabled = true
and this to the Grid class:
override func touchesBegan(_ touches: Set<UITouch>, withEvent event: UIEvent?) {
for touch in touches {
let position = touch.location(in:self)
let node = atPoint(position)
if node != self {
let action = SKAction.rotate(by:CGFloat.pi*2, duration: 1)
else {
let x = size.width / 2 + position.x
let y = size.height / 2 - position.y
let row = Int(floor(x / blockSize))
let col = Int(floor(y / blockSize))
print("\(row) \(col)")
