Vision Recognized Object results into ARView as AnchorEntity - ios

I get ARFrame's from the session delegate of an ARView where I then perform inference with CoreML + Vision using a YOLOv5 model. I successfully get an array of [VNRecognizedObjectObservation]'s
I pass these observations to a function like this:
func add(inferenceResults: [VNRecognizedObjectObservation], from frame: ARFrame) {
for inference in inferenceResults {
//NOTE: 1
let flippedNormalizedBoundingBox = inference.boundingBox.flipYCoordinateFromBottomLeftToUpperLeft
let point = flippedNormalizedBoundingBox.center()
let label = inference.labels.first?.identifier ?? "Unknown"
//PROBLEM: 1
guard arView.entity(at: point) == nil else {
break
}
let estimatedPlane = ARRaycastQuery.Target.estimatedPlane
let alignment = ARRaycastQuery.TargetAlignment.any
//NOTE: 2
let raycastQuery = frame.raycastQuery(from: point, allowing: estimatedPlane, alignment: alignment)
guard let raycastResult = arView.session.raycast(raycastQuery).first else {
print("No Ray cast results")
break
}
let newAnchor = AnchorEntity(world: raycastResult.worldTransform)
//PROBLEM: 2
let squareMaterial = SimpleMaterial(color: .blue, isMetallic: true)
let textMaterial = SimpleMaterial(color: .white, isMetallic: true)
let squareEntity = ModelEntity(mesh: MeshResource.generatePlane(width: 0.1, height: 0.1, cornerRadius: 0), materials: [squareMaterial])
let textMesh = MeshResource.generateText(label, extrusionDepth: 0.1, font: .systemFont(ofSize: 2), containerFrame: .zero, alignment: .center, lineBreakMode: .byCharWrapping)
let textEntity = ModelEntity(mesh: textMesh, materials: [textMaterial])
textEntity.scale = SIMD3<Float>(0.03, 0.03, 0.1)
squareEntity.addChild(textEntity)
newAnchor.name = label
newAnchor.addChild(squareEntity)
//PROBLEM 3
self.arView.scene.addAnchor(newAnchor)
}
}
Some extensions
extension CGRect {
/// This will change the Y origin from the lower left corner to the upper left corner
public var flipYCoordinateFromBottomLeftToUpperLeft: CGRect {
return CGRect.init(x: self.origin.x, y: (1 - self.origin.y - self.height), width: self.width, height: self.height)
}
/// Returns a `CGPoint` that represents the center of the `CGRect`
/// - Returns: A `CGPoint` constructed by obtaining the `midX` and `midY` values
public func center() -> CGPoint {
let midY = self.midY
let midX = self.midX
let point = CGPoint(x: midX, y: midY)
return point
}
}
I end up getting results like this
NOTE 1: BBOX's from vision are normalized and have an odd origin.
PROBLEM 1: Because I can do inference quickly I don't want to keep adding AnchorEntities at the same location. This is an attempt to stop further processing but it does not ever break
NOTE 2: I know there is a rayCast function from the ARView but it seems like I want to use the rayCast function from the ARFrame I speculate that after a few milliseconds of inference on a background thread the results may be different depending on which object I do the recast from? Because the user moved?
PROBLEM 2: My AnchorEntities are alway black
PROBLEM 3: The text and BBOX is never aligned with the camera. "Billboard style"
In general I would like to apply a square with a label in AR that was reflective of the size of the BBOX from vision. I need to get past these few problems first before I refine to that level. Any help is appreciated! AR is Fun.

Related

How to Rotate ARKit Entity Towards the User's Screen/Camera

I have it set up so that the mesh I'm adding to my scene is rendering correctly (right side up so that the text I add as a child is legible), but the rotation is always the same (globally) whereas I want the rotation (on the XZ plane) to be towards the camera, but I'm not exactly sure how to go about this
My code looks like this:
#objc func handleTap() {
guard let view = self.view else { return }
guard let query = view.makeRaycastQuery(from: view.center, allowing: .estimatedPlane, alignment: .any) else { return }
guard let raycastResult = view.session.raycast(query).first else { return }
// set a transform to an existing entity
var transform = Transform(matrix: raycastResult.worldTransform)
// Create a new anchor to add content to
if(oldAnchor != nil) {
view.scene.removeAnchor(oldAnchor!)
}
let anchor = AnchorEntity()
oldAnchor = anchor
view.scene.anchors.append(anchor)
let material = SimpleMaterial(color: .lightGray, isMetallic: false)
// Add a curve entity
let curveEntity = try! ModelEntity.loadModel(named: "curve")
curveEntity.transform = transform
curveEntity.transform.rotation = simd_quatf(angle: 0, axis: SIMD3<Float>(1, 1, 1))
curveEntity.scale = [0.0002, 0.0002, 0.0002]
let curveRadians = 90.0 * Float.pi / 180.0
curveEntity.setOrientation(simd_quatf(angle: curveRadians, axis: SIMD3<Float>(1,0,0)), relativeTo: curveEntity)
// Adding text and children and materials etc.
...
anchor.addChild(curveEntity)
}
Without the curveEntity.transform.rotation = simd_quatf(angle: 0, axis: SIMD3<Float>(1, 1, 1)) line, the rotation of the curve is relative to the normal of the surface that gets raycasted upon, instead of constant.

ARKit - Projection of ARAnchor to 2D space

I am trying to project an ARAnchor to the 2D space but I am facing on an orientation issue...
Below my function to project the top left, top right, bottom left, bottom right corner position to 2D space:
/// Returns the projection of an `ARImageAnchor` from the 3D world space
/// detected by ARKit into the 2D space of a view rendering the scene.
///
/// - Parameter from: An Anchor instance for projecting.
/// - Returns: An optional `CGRect` corresponding on `ARImageAnchor` projection.
internal func projection(from anchor: ARImageAnchor,
alignment: ARPlaneAnchor.Alignment,
debug: Bool = false) -> CGRect? {
guard let camera = session.currentFrame?.camera else {
return nil
}
let refImg = anchor.referenceImage
let anchor3DPoint = anchor.transform.columns.3
let size = view.bounds.size
let width = Float(refImg.physicalSize.width / 2)
let height = Float(refImg.physicalSize.height / 2)
/// Upper left corner point
let projection = ProjectionHelper.projection(from: anchor3DPoint,
width: width,
height: height,
focusAlignment: alignment)
let topLeft = projection.0
let topLeftProjected = camera.projectPoint(topLeft,
orientation: .portrait,
viewportSize: size)
let topRight:simd_float3 = projection.1
let topRightProjected = camera.projectPoint(topRight,
orientation: .portrait,
viewportSize: size)
let bottomLeft = projection.2
let bottomLeftProjected = camera.projectPoint(bottomLeft,
orientation: .portrait,
viewportSize: size)
let bottomRight = projection.3
let bottomRightProjected = camera.projectPoint(bottomRight,
orientation: .portrait,
viewportSize: size)
let result = CGRect(origin: topLeftProjected,
size: CGSize(width: topRightProjected.distance(point: topLeftProjected),
height: bottomRightProjected.distance(point: bottomLeftProjected)))
return result
}
This function works pretty well when I am in front of the world origin. However, if I move left or right the calculation of the corner points does not work.
I found a solution to get corner 3D points of an ARImageAnchor depending on the anchor.transform and project them to 2D space:
extension simd_float4 {
var vector_float3: vector_float3 { return simd_float3([x, y, z]) }
}
/// Returns the projection of an `ARImageAnchor` from the 3D world space
/// detected by ARKit into the 2D space of a view rendering the scene.
///
/// - Parameter from: An Anchor instance for projecting.
/// - Returns: An optional `CGRect` corresponding on `ARImageAnchor` projection.
internal func projection(from anchor: ARImageAnchor) -> CGRect? {
guard let camera = session.currentFrame?.camera else {
return nil
}
let refImg = anchor.referenceImage
let transform = anchor.transform.transpose
let size = view.bounds.size
let width = Float(refImg.physicalSize.width / 2)
let height = Float(refImg.physicalSize.height / 2)
// Get corner 3D points
let pointsWorldSpace = [
matrix_multiply(simd_float4([width, 0, -height, 1]), transform).vector_float3, // top right
matrix_multiply(simd_float4([width, 0, height, 1]), transform).vector_float3, // bottom right
matrix_multiply(simd_float4([-width, 0, -height, 1]), transform).vector_float3, // bottom left
matrix_multiply(simd_float4([-width, 0, height, 1]), transform).vector_float3 // top left
]
// Project 3D point to 2D space
let pointsViewportSpace = pointsWorldSpace.map { (point) -> CGPoint in
return camera.projectPoint(
point,
orientation: .portrait,
viewportSize: size
)
}
// Create a rectangle shape of the projection
// to calculate the Intersection Over Union of other `ARImageAnchor`
let result = CGRect(
origin: pointsViewportSpace[3],
size: CGSize(
width: pointsViewportSpace[0].distance(point: pointsViewportSpace[3]),
height: pointsViewportSpace[1].distance(point: pointsViewportSpace[2])
)
)
return result
}

Swift 4: How to create a face map with ios11 vision framework from face landmark points

I am using the ios 11 vision framework to yield the face landmark points in real time. I am able to get the face landmark points and overlay the camera layer with the UIBezierPath of the face landmark points. However, I would like to get something like the bottom right picture. Currently I have something that looks like the left picture, and I tried looping through the points and adding midpoints, but I don't know how to generate all those triangles from the points. How would I go about generating the map on the right from the points on the left?
I'm not sure I can with all the points I have, not that it will help too much, but I also have points from the bounding box of the entire face. Lastly, is there any framework that would allow me to recognize all the points I need, such as openCV or something else, please let me know. Thanks!
Here is the code I've been using from https://github.com/DroidsOnRoids/VisionFaceDetection:
func detectLandmarks(on image: CIImage) {
try? faceLandmarksDetectionRequest.perform([faceLandmarks], on: image)
if let landmarksResults = faceLandmarks.results as? [VNFaceObservation] {
for observation in landmarksResults {
DispatchQueue.main.async {
if let boundingBox = self.faceLandmarks.inputFaceObservations?.first?.boundingBox {
let faceBoundingBox = boundingBox.scaled(to: self.view.bounds.size)
//different types of landmarks
let faceContour = observation.landmarks?.faceContour
self.convertPointsForFace(faceContour, faceBoundingBox)
let leftEye = observation.landmarks?.leftEye
self.convertPointsForFace(leftEye, faceBoundingBox)
let rightEye = observation.landmarks?.rightEye
self.convertPointsForFace(rightEye, faceBoundingBox)
let leftPupil = observation.landmarks?.leftPupil
self.convertPointsForFace(leftPupil, faceBoundingBox)
let rightPupil = observation.landmarks?.rightPupil
self.convertPointsForFace(rightPupil, faceBoundingBox)
let nose = observation.landmarks?.nose
self.convertPointsForFace(nose, faceBoundingBox)
let lips = observation.landmarks?.innerLips
self.convertPointsForFace(lips, faceBoundingBox)
let leftEyebrow = observation.landmarks?.leftEyebrow
self.convertPointsForFace(leftEyebrow, faceBoundingBox)
let rightEyebrow = observation.landmarks?.rightEyebrow
self.convertPointsForFace(rightEyebrow, faceBoundingBox)
let noseCrest = observation.landmarks?.noseCrest
self.convertPointsForFace(noseCrest, faceBoundingBox)
let outerLips = observation.landmarks?.outerLips
self.convertPointsForFace(outerLips, faceBoundingBox)
}
}
}
}
}
func convertPointsForFace(_ landmark: VNFaceLandmarkRegion2D?, _ boundingBox: CGRect) {
if let points = landmark?.points, let count = landmark?.pointCount {
let convertedPoints = convert(points, with: count)
let faceLandmarkPoints = convertedPoints.map { (point: (x: CGFloat, y: CGFloat)) -> (x: CGFloat, y: CGFloat) in
let pointX = point.x * boundingBox.width + boundingBox.origin.x
let pointY = point.y * boundingBox.height + boundingBox.origin.y
return (x: pointX, y: pointY)
}
DispatchQueue.main.async {
self.draw(points: faceLandmarkPoints)
}
}
}
func draw(points: [(x: CGFloat, y: CGFloat)]) {
let newLayer = CAShapeLayer()
newLayer.strokeColor = UIColor.blue.cgColor
newLayer.lineWidth = 4.0
let path = UIBezierPath()
path.move(to: CGPoint(x: points[0].x, y: points[0].y))
for i in 0..<points.count - 1 {
let point = CGPoint(x: points[i].x, y: points[i].y)
path.addLine(to: point)
path.move(to: point)
}
path.addLine(to: CGPoint(x: points[0].x, y: points[0].y))
newLayer.path = path.cgPath
shapeLayer.addSublayer(newLayer)
}
I did end up finding a solution that works. I used delaunay triangulation via https://github.com/AlexLittlejohn/DelaunaySwift, and I modified it to work with the points generated via the vision framework's face landmark detection request. This is not easily explained with a code snippet, so I have linked my github repo below that shows my solution. Note that this doesn't get the points from the forehead, as the vision framework only gets points from the eyebrows down.
https://github.com/ahashim1/Face
What you want in the image on the right is a Candide mesh. You need to map these points to the mesh and that will be it. I don't think you need to go the route that has been discussed in the comments.
P.S I found Candide while going through the APK contents of a famous filters app(reminds me of casper) - haven't had the time to try it myself yet.

Draw a grid with SpriteKit

What would be the best way to draw a grid like this by using the SpriteKit 2D game engine?
Requirements:
Input programatically the number of columns and rows (5x5, 10x3, 3x4 etc.).
Draw it programmatically using something like SKSpriteNode or SKShapeNode, since just using images of a square like this doesn't seem very efficient to me.
The squares should have a fixed size (let's say each is 40x40).
The grid should be vertically and horizontally centred in the view.
I'm planning to use a SKSpriteNode (from an image) as a player moving in different squares in this grid.
So, I'll save in a 2 dimensional array the central point (x,y) of each square and then move from the player's current position to that position. If you have a better suggestion for this too, I'd like to hear it.
I would appreciate a solution in Swift (preferably 2.1), but Objective-C would do too. Planning on using this only on iPhone devices.
My question is close to this one. Any help is appreciated.
I suggest you implement the grid as a texture of an SKSpriteNode because Sprite Kit will renders the grid in a single draw call. Here's a example of how to do that:
class Grid:SKSpriteNode {
var rows:Int!
var cols:Int!
var blockSize:CGFloat!
convenience init?(blockSize:CGFloat,rows:Int,cols:Int) {
guard let texture = Grid.gridTexture(blockSize: blockSize,rows: rows, cols:cols) else {
return nil
}
self.init(texture: texture, color:SKColor.clear, size: texture.size())
self.blockSize = blockSize
self.rows = rows
self.cols = cols
}
class func gridTexture(blockSize:CGFloat,rows:Int,cols:Int) -> SKTexture? {
// Add 1 to the height and width to ensure the borders are within the sprite
let size = CGSize(width: CGFloat(cols)*blockSize+1.0, height: CGFloat(rows)*blockSize+1.0)
UIGraphicsBeginImageContext(size)
guard let context = UIGraphicsGetCurrentContext() else {
return nil
}
let bezierPath = UIBezierPath()
let offset:CGFloat = 0.5
// Draw vertical lines
for i in 0...cols {
let x = CGFloat(i)*blockSize + offset
bezierPath.move(to: CGPoint(x: x, y: 0))
bezierPath.addLine(to: CGPoint(x: x, y: size.height))
}
// Draw horizontal lines
for i in 0...rows {
let y = CGFloat(i)*blockSize + offset
bezierPath.move(to: CGPoint(x: 0, y: y))
bezierPath.addLine(to: CGPoint(x: size.width, y: y))
}
SKColor.white.setStroke()
bezierPath.lineWidth = 1.0
bezierPath.stroke()
context.addPath(bezierPath.cgPath)
let image = UIGraphicsGetImageFromCurrentImageContext()
UIGraphicsEndImageContext()
return SKTexture(image: image!)
}
func gridPosition(row:Int, col:Int) -> CGPoint {
let offset = blockSize / 2.0 + 0.5
let x = CGFloat(col) * blockSize - (blockSize * CGFloat(cols)) / 2.0 + offset
let y = CGFloat(rows - row - 1) * blockSize - (blockSize * CGFloat(rows)) / 2.0 + offset
return CGPoint(x:x, y:y)
}
}
And here's how to create a grid and add a game piece to the grid
class GameScene: SKScene {
override func didMove(to: SKView) {
if let grid = Grid(blockSize: 40.0, rows:5, cols:5) {
grid.position = CGPoint (x:frame.midX, y:frame.midY)
addChild(grid)
let gamePiece = SKSpriteNode(imageNamed: "Spaceship")
gamePiece.setScale(0.0625)
gamePiece.position = grid.gridPosition(row: 1, col: 0)
grid.addChild(gamePiece)
}
}
}
Update:
To determine which grid square was touched, add this to init
self.isUserInteractionEnabled = true
and this to the Grid class:
override func touchesBegan(_ touches: Set<UITouch>, withEvent event: UIEvent?) {
for touch in touches {
let position = touch.location(in:self)
let node = atPoint(position)
if node != self {
let action = SKAction.rotate(by:CGFloat.pi*2, duration: 1)
node.run(action)
}
else {
let x = size.width / 2 + position.x
let y = size.height / 2 - position.y
let row = Int(floor(x / blockSize))
let col = Int(floor(y / blockSize))
print("\(row) \(col)")
}
}
}

Spawning a circle in a random spot on screen

I've been racking my brain and searching here and all over to try to find out how to generate a random position on screen to spawn a circle. I'm hoping someone here can help me because I'm completely stumped. Basically, I'm trying to create a shape that always spawns in a random spot on screen when the user touches.
override func touchesBegan(touches: Set<NSObject>, withEvent event: UIEvent) {
let screenSize: CGRect = UIScreen.mainScreen().bounds
let screenHeight = screenSize.height
let screenWidth = screenSize.width
let currentBall = SKShapeNode(circleOfRadius: 100)
currentBall.position = CGPointMake(CGFloat(arc4random_uniform(UInt32(Float(screenWidth)))), CGFloat(arc4random_uniform(UInt32(Float(screenHeight)))))
self.removeAllChildren()
self.addChild(currentBall)
}
If you all need more of my code, there really isn't any more. But thank you for whatever help you can give! (Just to reiterate, this code kind of works... But a majority of the spawned balls seem to spawn offscreen)
The problem there is that you scene is bigger than your screen bounds
let viewMidX = view!.bounds.midX
let viewMidY = view!.bounds.midY
print(viewMidX)
print(viewMidY)
let sceneHeight = view!.scene!.frame.height
let sceneWidth = view!.scene!.frame.width
print(sceneWidth)
print(sceneHeight)
let currentBall = SKShapeNode(circleOfRadius: 100)
currentBall.fillColor = .green
let x = view!.scene!.frame.midX - viewMidX + CGFloat(arc4random_uniform(UInt32(viewMidX*2)))
let y = view!.scene!.frame.midY - viewMidY + CGFloat(arc4random_uniform(UInt32(viewMidY*2)))
print(x)
print(y)
currentBall.position = CGPoint(x: x, y: y)
view?.scene?.addChild(currentBall)
self.removeAllChildren()
self.addChild(currentBall)
First: Determine the area that will be valid. It might not be the frame of the superview because perhaps the ball (let's call it ballView) might be cut off. The area will likely be (in pseudocode):
CGSize( Width of the superview - width of ballView , Height of the superview - height of ballView)
Once you have a view of that size, just place it on screen with the origin 0, 0.
Secondly: Now you have a range of valid coordinates. Just use a random function (like the one you are using) to select one of them.
Create a swift file with the following:
extension Int
{
static func random(range: Range<Int>) -> Int
{
var offset = 0
if range.startIndex < 0 // allow negative ranges
{
offset = abs(range.startIndex)
}
let mini = UInt32(range.startIndex + offset)
let maxi = UInt32(range.endIndex + offset)
return Int(mini + arc4random_uniform(maxi - mini)) - offset
}
}
And now you can specify a random number as follows:
Int.random(1...1000) //generate a random number integer somewhere from 1 to 1000.
You can generate the values for the x and y coordinates now using this function.
Given the following random generators:
public extension CGFloat {
public static var random: CGFloat { return CGFloat(arc4random()) / CGFloat(UInt32.max) }
public static func random(between x: CGFloat, and y: CGFloat) -> CGFloat {
let (start, end) = x < y ? (x, y) : (y, x)
return start + CGFloat.random * (end - start)
}
}
public extension CGRect {
public var randomPoint: CGPoint {
var point = CGPoint()
point.x = CGFloat.random(between: origin.x, and: origin.x + width)
point.y = CGFloat.random(between: origin.y, and: origin.y + height)
return point
}
}
You can paste the following into a playground:
import XCPlayground
import SpriteKit
let view = SKView(frame: CGRect(x: 0, y: 0, width: 500, height: 500))
XCPShowView("game", view)
let scene = SKScene(size: view.frame.size)
view.presentScene(scene)
let wait = SKAction.waitForDuration(0.5)
let popIn = SKAction.scaleTo(1, duration: 0.25)
let popOut = SKAction.scaleTo(0, duration: 0.25)
let remove = SKAction.removeFromParent()
let popInAndOut = SKAction.sequence([popIn, wait, popOut, remove])
let addBall = SKAction.runBlock { [unowned scene] in
let ballRadius: CGFloat = 25
let ball = SKShapeNode(circleOfRadius: ballRadius)
var popInArea = scene.frame
popInArea.inset(dx: ballRadius, dy: ballRadius)
ball.position = popInArea.randomPoint
ball.xScale = 0
ball.yScale = 0
ball.runAction(popInAndOut)
scene.addChild(ball)
}
scene.runAction(SKAction.repeatActionForever(SKAction.sequence([addBall, wait])))
(Just make sure to also paste in the random generators, too, or to copy them to the playground's Sources, as well as to open the assistant editor so you can see the animation.)

Resources