Show depth data with ARKit and MetalKit - ios

I am total beginner in Swift & iOS, and I am trying to:
Visualise the depth map on the phone screen, instead of the actual video recording.
Save both the RGB and depth data stream.
I am currently stuck on the first one. I am using ARKit4 with MetalKit. It seems that I can get the depth data from the frame, but the visualisation that I am rendering is really bad. According to the ARKit4 video (https://youtu.be/SpZyxHkmfqE?t=1132 - with timestamp), the quality of the depth map is really low, the colors are actually different, and the distant objects are not shown at all (of course, I do not mean really distant objects, but even on ~1m it already completely fails in the indoor static environment). Examples are in the bottom of the question.
My ViewController.swift:
import UIKit
import Metal
import MetalKit
import ARKit
extension MTKView : RenderDestinationProvider {
}
class ViewController: UIViewController, MTKViewDelegate, ARSessionDelegate {
var session: ARSession!
var configuration = ARWorldTrackingConfiguration()
var renderer: Renderer!
var depthBuffer: CVPixelBuffer!
var confidenceBuffer: CVPixelBuffer!
override func viewDidLoad() {
super.viewDidLoad()
// Set the view's delegate
session = ARSession()
session.delegate = self
// Set the view to use the default device
if let view = self.view as? MTKView {
view.device = MTLCreateSystemDefaultDevice()
view.backgroundColor = UIColor.clear
view.delegate = self
guard view.device != nil else {
print("Metal is not supported on this device")
return
}
// Configure the renderer to draw to the view
renderer = Renderer(session: session, metalDevice: view.device!, renderDestination: view)
renderer.drawRectResized(size: view.bounds.size)
}
//let tapGesture = UITapGestureRecognizer(target: self, action: #selector(ViewController.handleTap(gestureRecognize:)))
//view.addGestureRecognizer(tapGesture)
}
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(animated)
// Create a session configuration
//let configuration = ARWorldTrackingConfiguration()
configuration.frameSemantics = .sceneDepth
// Run the view's session
session.run(configuration)
UIApplication.shared.isIdleTimerDisabled = true
}
override func viewWillDisappear(_ animated: Bool) {
super.viewWillDisappear(animated)
// Pause the view's session
session.pause()
}
/*#objc
func handleTap(gestureRecognize: UITapGestureRecognizer) {
// Create anchor using the camera's current position
if let currentFrame = session.currentFrame {
// Create a transform with a translation of 0.2 meters in front of the camera
var translation = matrix_identity_float4x4
translation.columns.3.z = -0.2
let transform = simd_mul(currentFrame.camera.transform, translation)
// Add a new anchor to the session
let anchor = ARAnchor(transform: transform)
session.add(anchor: anchor)
}
}
*/
// MARK: - MTKViewDelegate
// Called whenever view changes orientation or layout is changed
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
renderer.drawRectResized(size: size)
}
// Called whenever the view needs to render
func draw(in view: MTKView) {
renderer.update()
}
// MARK: - ARSessionDelegate
func session(_ session: ARSession, didFailWithError error: Error) {
// Present an error message to the user
}
func sessionWasInterrupted(_ session: ARSession) {
// Inform the user that the session has been interrupted, for example, by presenting an overlay
}
func sessionInterruptionEnded(_ session: ARSession) {
// Reset tracking and/or remove existing anchors if consistent tracking is required
}
}
My Renderer.swift (only the modified functions updateCaptureImageTextures(frame: ARFrame) and drawCapturedImage(renderEncoder: MTLRenderCommandEncoder):
import Foundation
import Metal
import MetalKit
import ARKit
protocol RenderDestinationProvider {
var currentRenderPassDescriptor: MTLRenderPassDescriptor? { get }
var currentDrawable: CAMetalDrawable? { get }
var colorPixelFormat: MTLPixelFormat { get set }
var depthStencilPixelFormat: MTLPixelFormat { get set }
var sampleCount: Int { get set }
}
// The max number of command buffers in flight
let kMaxBuffersInFlight: Int = 3
// The max number anchors our uniform buffer will hold
let kMaxAnchorInstanceCount: Int = 64
// The 16 byte aligned size of our uniform structures
let kAlignedSharedUniformsSize: Int = (MemoryLayout<SharedUniforms>.size & ~0xFF) + 0x100
let kAlignedInstanceUniformsSize: Int = ((MemoryLayout<InstanceUniforms>.size * kMaxAnchorInstanceCount) & ~0xFF) + 0x100
// Vertex data for an image plane
let kImagePlaneVertexData: [Float] = [
-1.0, -1.0, 0.0, 1.0,
1.0, -1.0, 1.0, 1.0,
-1.0, 1.0, 0.0, 0.0,
1.0, 1.0, 1.0, 0.0,
]
class Renderer {
let session: ARSession
let device: MTLDevice
let inFlightSemaphore = DispatchSemaphore(value: kMaxBuffersInFlight)
var renderDestination: RenderDestinationProvider
// Metal objects
var commandQueue: MTLCommandQueue!
var sharedUniformBuffer: MTLBuffer!
var anchorUniformBuffer: MTLBuffer!
var imagePlaneVertexBuffer: MTLBuffer!
var capturedImagePipelineState: MTLRenderPipelineState!
var capturedImageDepthState: MTLDepthStencilState!
var anchorPipelineState: MTLRenderPipelineState!
var anchorDepthState: MTLDepthStencilState!
var capturedImageTextureY: CVMetalTexture?
var capturedImageTextureCbCr: CVMetalTexture?
// Captured image texture cache
var capturedImageTextureCache: CVMetalTextureCache!
// Metal vertex descriptor specifying how vertices will by laid out for input into our
// anchor geometry render pipeline and how we'll layout our Model IO vertices
var geometryVertexDescriptor: MTLVertexDescriptor!
// MetalKit mesh containing vertex data and index buffer for our anchor geometry
var cubeMesh: MTKMesh!
// Used to determine _uniformBufferStride each frame.
// This is the current frame number modulo kMaxBuffersInFlight
var uniformBufferIndex: Int = 0
// Offset within _sharedUniformBuffer to set for the current frame
var sharedUniformBufferOffset: Int = 0
// Offset within _anchorUniformBuffer to set for the current frame
var anchorUniformBufferOffset: Int = 0
// Addresses to write shared uniforms to each frame
var sharedUniformBufferAddress: UnsafeMutableRawPointer!
// Addresses to write anchor uniforms to each frame
var anchorUniformBufferAddress: UnsafeMutableRawPointer!
// The number of anchor instances to render
var anchorInstanceCount: Int = 0
// The current viewport size
var viewportSize: CGSize = CGSize()
// Flag for viewport size changes
var viewportSizeDidChange: Bool = false
var depthTexture: CVMetalTexture?
var confidenceTexture: CVMetalTexture?
.......................................
func updateCapturedImageTextures(frame: ARFrame) {
// Create two textures (Y and CbCr) from the provided frame's captured image
//
guard let depthData = frame.sceneDepth ?? frame.sceneDepth else { return }
var pixelBufferDepth: CVPixelBuffer!
pixelBufferDepth = depthData.depthMap
var texturePixelFormat: MTLPixelFormat!
setMTLPixelFormat(&texturePixelFormat, basedOn: pixelBufferDepth)
depthTexture = createTexture(fromPixelBuffer: pixelBufferDepth, pixelFormat: texturePixelFormat, planeIndex: 0)
pixelBufferDepth = depthData.confidenceMap
setMTLPixelFormat(&texturePixelFormat, basedOn: pixelBufferDepth)
confidenceTexture = createTexture(fromPixelBuffer: pixelBufferDepth, pixelFormat: texturePixelFormat, planeIndex: 0)
let pixelBuffer = frame.capturedImage
if (CVPixelBufferGetPlaneCount(pixelBuffer) < 2) {
return
}
capturedImageTextureY = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat:.r8Unorm, planeIndex:0)
capturedImageTextureCbCr = createTexture(fromPixelBuffer: pixelBuffer, pixelFormat:.rg8Unorm, planeIndex:1)
}
func createTexture(fromPixelBuffer pixelBuffer: CVPixelBuffer, pixelFormat: MTLPixelFormat, planeIndex: Int) -> CVMetalTexture? {
let width = CVPixelBufferGetWidthOfPlane(pixelBuffer, planeIndex)
let height = CVPixelBufferGetHeightOfPlane(pixelBuffer, planeIndex)
var texture: CVMetalTexture? = nil
let status = CVMetalTextureCacheCreateTextureFromImage(nil, capturedImageTextureCache, pixelBuffer, nil, pixelFormat, width, height, planeIndex, &texture)
if status != kCVReturnSuccess {
texture = nil
}
return texture
}
func drawCapturedImage(renderEncoder: MTLRenderCommandEncoder) {
guard let textureY = capturedImageTextureY, let textureCbCr = capturedImageTextureCbCr, let depthTexture = depthTexture, let confidenceTexture = confidenceTexture else {
return
}
// Push a debug group allowing us to identify render commands in the GPU Frame Capture tool
renderEncoder.pushDebugGroup("DrawCapturedImage")
// Set render command encoder state
renderEncoder.setCullMode(.none)
renderEncoder.setRenderPipelineState(capturedImagePipelineState)
renderEncoder.setDepthStencilState(capturedImageDepthState)
// Set mesh's vertex buffers
renderEncoder.setVertexBuffer(imagePlaneVertexBuffer, offset: 0, index: Int(kBufferIndexMeshPositions.rawValue))
// Set any textures read/sampled from our render pipeline
//renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureY), index: Int(kTextureIndexY.rawValue))
//renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(textureCbCr), index: Int(kTextureIndexCbCr.rawValue))
renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(depthTexture), index: 2)
//renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(confidenceTexture), index: 3)
// Draw each submesh of our mesh
renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
renderEncoder.popDebugGroup()
}
}
Everything else is the same like in MetalKit default template of Xcode.
So, do I access the data in some wrong way? Do I have some configuration parameters wrong? Do I just render the depth map in some bad way? Or the sensor on new iPhone just really has so bad data (though does not look like, as I have managed to acquire decent 3D point clouds with some apps from AppStore, even on distance of 3-4 meters).
Update: I've figured out that the quality is better if I change renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(depthTexture), index: 2) to renderEncoder.setFragmentTexture(CVMetalTextureGetTexture(depthTexture), index: 1). This is, however, just a random observation because the documentation is... well, not very extensive. The rendered image is, however, still green-to-white, while I want it to be either grayscale, or looking as the RGB map shown in the referenced video (that would be perfect, but the grayscale version would be enough).

Related

Number text recognition not highlighting/recognizing text

I am following the apple phone number recognition sample. Normally it creates a red outline around the recognized text. Mine does not seem to do recognizing the text and creating the red outline even though I used their code. The only difference is my view controller class is called "TextScanViewController" where their's is just "ViewController". I went through and made sure that any "ViewControllers" were changed to "TextScanViewController". Am I missing something else that I should change?
Here is what it should look like (when I use the original Apple project) compared to what it is doing (should have red outlines but is not showing them even if the text is perfectly in the center of the rectangle)
Should look like:
Looks like:
There are 5 different swift files I am using (PreviewView, TextScanViewController, VisionViewController, StringUtils, AppDelegate)
TextScanViewController:
import UIKit
import AVFoundation
import Vision
class TextScanViewController: UIViewController {
// MARK: - UI objects
#IBOutlet weak var previewView: PreviewView!
#IBOutlet weak var cutoutView: UIView!
#IBOutlet weak var numberView: UILabel!
var maskLayer = CAShapeLayer()
// Device orientation. Updated whenever the orientation changes to a
// different supported orientation.
var currentOrientation = UIDeviceOrientation.portrait
// MARK: - Capture related objects
private let captureSession = AVCaptureSession()
let captureSessionQueue = DispatchQueue(label: "com.example.apple-samplecode.CaptureSessionQueue")
var captureDevice: AVCaptureDevice?
var videoDataOutput = AVCaptureVideoDataOutput()
let videoDataOutputQueue = DispatchQueue(label: "com.example.apple-samplecode.VideoDataOutputQueue")
// MARK: - Region of interest (ROI) and text orientation
// Region of video data output buffer that recognition should be run on.
// Gets recalculated once the bounds of the preview layer are known.
var regionOfInterest = CGRect(x: 0, y: 0, width: 1, height: 1)
// Orientation of text to search for in the region of interest.
var textOrientation = CGImagePropertyOrientation.up
// MARK: - Coordinate transforms
var bufferAspectRatio: Double!
// Transform from UI orientation to buffer orientation.
var uiRotationTransform = CGAffineTransform.identity
// Transform bottom-left coordinates to top-left.
var bottomToTopTransform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -1)
// Transform coordinates in ROI to global coordinates (still normalized).
var roiToGlobalTransform = CGAffineTransform.identity
// Vision -> AVF coordinate transform.
var visionToAVFTransform = CGAffineTransform.identity
// MARK: - View controller methods
override func viewDidLoad() {
super.viewDidLoad()
// Set up preview view.
previewView.session = captureSession
// Set up cutout view.
cutoutView.backgroundColor = UIColor.gray.withAlphaComponent(0.5)
maskLayer.backgroundColor = UIColor.clear.cgColor
maskLayer.fillRule = .evenOdd
cutoutView.layer.mask = maskLayer
// Starting the capture session is a blocking call. Perform setup using
// a dedicated serial dispatch queue to prevent blocking the main thread.
captureSessionQueue.async {
self.setupCamera()
// Calculate region of interest now that the camera is setup.
DispatchQueue.main.async {
// Figure out initial ROI.
self.calculateRegionOfInterest()
}
}
}
override func viewWillTransition(to size: CGSize, with coordinator: UIViewControllerTransitionCoordinator) {
super.viewWillTransition(to: size, with: coordinator)
// Only change the current orientation if the new one is landscape or
// portrait. You can't really do anything about flat or unknown.
let deviceOrientation = UIDevice.current.orientation
if deviceOrientation.isPortrait || deviceOrientation.isLandscape {
currentOrientation = deviceOrientation
}
// Handle device orientation in the preview layer.
if let videoPreviewLayerConnection = previewView.videoPreviewLayer.connection {
if let newVideoOrientation = AVCaptureVideoOrientation(deviceOrientation: deviceOrientation) {
videoPreviewLayerConnection.videoOrientation = newVideoOrientation
}
}
// Orientation changed: figure out new region of interest (ROI).
calculateRegionOfInterest()
}
override func viewDidLayoutSubviews() {
super.viewDidLayoutSubviews()
updateCutout()
}
// MARK: - Setup
func calculateRegionOfInterest() {
// In landscape orientation the desired ROI is specified as the ratio of
// buffer width to height. When the UI is rotated to portrait, keep the
// vertical size the same (in buffer pixels). Also try to keep the
// horizontal size the same up to a maximum ratio.
let desiredHeightRatio = 0.15
let desiredWidthRatio = 0.6
let maxPortraitWidth = 0.8
// Figure out size of ROI.
let size: CGSize
if currentOrientation.isPortrait || currentOrientation == .unknown {
size = CGSize(width: min(desiredWidthRatio * bufferAspectRatio, maxPortraitWidth), height: desiredHeightRatio / bufferAspectRatio)
} else {
size = CGSize(width: desiredWidthRatio, height: desiredHeightRatio)
}
// Make it centered.
regionOfInterest.origin = CGPoint(x: (1 - size.width) / 2, y: (1 - size.height) / 2)
regionOfInterest.size = size
// ROI changed, update transform.
setupOrientationAndTransform()
// Update the cutout to match the new ROI.
DispatchQueue.main.async {
// Wait for the next run cycle before updating the cutout. This
// ensures that the preview layer already has its new orientation.
self.updateCutout()
}
}
func updateCutout() {
// Figure out where the cutout ends up in layer coordinates.
let roiRectTransform = bottomToTopTransform.concatenating(uiRotationTransform)
let cutout = previewView.videoPreviewLayer.layerRectConverted(fromMetadataOutputRect: regionOfInterest.applying(roiRectTransform))
// Create the mask.
let path = UIBezierPath(rect: cutoutView.frame)
path.append(UIBezierPath(rect: cutout))
maskLayer.path = path.cgPath
// Move the number view down to under cutout.
var numFrame = cutout
numFrame.origin.y += numFrame.size.height
numberView.frame = numFrame
}
func setupOrientationAndTransform() {
// Recalculate the affine transform between Vision coordinates and AVF coordinates.
// Compensate for region of interest.
let roi = regionOfInterest
roiToGlobalTransform = CGAffineTransform(translationX: roi.origin.x, y: roi.origin.y).scaledBy(x: roi.width, y: roi.height)
// Compensate for orientation (buffers always come in the same orientation).
switch currentOrientation {
case .landscapeLeft:
textOrientation = CGImagePropertyOrientation.up
uiRotationTransform = CGAffineTransform.identity
case .landscapeRight:
textOrientation = CGImagePropertyOrientation.down
uiRotationTransform = CGAffineTransform(translationX: 1, y: 1).rotated(by: CGFloat.pi)
case .portraitUpsideDown:
textOrientation = CGImagePropertyOrientation.left
uiRotationTransform = CGAffineTransform(translationX: 1, y: 0).rotated(by: CGFloat.pi / 2)
default: // We default everything else to .portraitUp
textOrientation = CGImagePropertyOrientation.right
uiRotationTransform = CGAffineTransform(translationX: 0, y: 1).rotated(by: -CGFloat.pi / 2)
}
// Full Vision ROI to AVF transform.
visionToAVFTransform = roiToGlobalTransform.concatenating(bottomToTopTransform).concatenating(uiRotationTransform)
}
func setupCamera() {
guard let captureDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: AVMediaType.video, position: .back) else {
print("Could not create capture device.")
return
}
self.captureDevice = captureDevice
// NOTE:
// Requesting 4k buffers allows recognition of smaller text but will
// consume more power. Use the smallest buffer size necessary to keep
// down battery usage.
if captureDevice.supportsSessionPreset(.hd4K3840x2160) {
captureSession.sessionPreset = AVCaptureSession.Preset.hd4K3840x2160
bufferAspectRatio = 3840.0 / 2160.0
} else {
captureSession.sessionPreset = AVCaptureSession.Preset.hd1920x1080
bufferAspectRatio = 1920.0 / 1080.0
}
guard let deviceInput = try? AVCaptureDeviceInput(device: captureDevice) else {
print("Could not create device input.")
return
}
if captureSession.canAddInput(deviceInput) {
captureSession.addInput(deviceInput)
}
// Configure video data output.
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_420YpCbCr8BiPlanarFullRange]
if captureSession.canAddOutput(videoDataOutput) {
captureSession.addOutput(videoDataOutput)
// NOTE:
// There is a trade-off to be made here. Enabling stabilization will
// give temporally more stable results and should help the recognizer
// converge. But if it's enabled the VideoDataOutput buffers don't
// match what's displayed on screen, which makes drawing bounding
// boxes very hard. Disable it in this app to allow drawing detected
// bounding boxes on screen.
videoDataOutput.connection(with: AVMediaType.video)?.preferredVideoStabilizationMode = .off
} else {
print("Could not add VDO output")
return
}
// Set zoom and autofocus to help focus on very small text.
do {
try captureDevice.lockForConfiguration()
captureDevice.videoZoomFactor = 2
captureDevice.autoFocusRangeRestriction = .near
captureDevice.unlockForConfiguration()
} catch {
print("Could not set zoom level due to error: \(error)")
return
}
captureSession.startRunning()
}
// MARK: - UI drawing and interaction
func showString(string: String) {
// Found a definite number.
// Stop the camera synchronously to ensure that no further buffers are
// received. Then update the number view asynchronously.
captureSessionQueue.sync {
self.captureSession.stopRunning()
DispatchQueue.main.async {
self.numberView.text = string
self.numberView.isHidden = false
}
}
}
#IBAction func handleTap(_ sender: UITapGestureRecognizer) {
captureSessionQueue.async {
if !self.captureSession.isRunning {
self.captureSession.startRunning()
}
DispatchQueue.main.async {
self.numberView.isHidden = true
}
}
}
}
// MARK: - AVCaptureVideoDataOutputSampleBufferDelegate
extension TextScanViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// This is implemented in VisionViewController.
}
}
// MARK: - Utility extensions
extension AVCaptureVideoOrientation {
init?(deviceOrientation: UIDeviceOrientation) {
switch deviceOrientation {
case .portrait: self = .portrait
case .portraitUpsideDown: self = .portraitUpsideDown
case .landscapeLeft: self = .landscapeRight
case .landscapeRight: self = .landscapeLeft
default: return nil
}
}
}
PreviewView:
import Foundation
import UIKit
import AVFoundation
class PreviewView: UIView {
var videoPreviewLayer: AVCaptureVideoPreviewLayer {
guard let layer = layer as? AVCaptureVideoPreviewLayer else {
fatalError("Expected `AVCaptureVideoPreviewLayer` type for layer. Check PreviewView.layerClass implementation.")
}
return layer
}
var session: AVCaptureSession? {
get {
return videoPreviewLayer.session
}
set {
videoPreviewLayer.session = newValue
}
}
// MARK: UIView
override class var layerClass: AnyClass {
return AVCaptureVideoPreviewLayer.self
}
}
VisionViewController:
import UIKit
import AVFoundation
import Vision
class VisionViewController: TextScanViewController {
var request: VNRecognizeTextRequest!
// Temporal string tracker
let numberTracker = StringTracker()
override func viewDidLoad() {
// Set up vision request before letting ViewController set up the camera
// so that it exists when the first buffer is received.
request = VNRecognizeTextRequest(completionHandler: recognizeTextHandler)
super.viewDidLoad()
}
// MARK: - Text recognition
// Vision recognition handler.
func recognizeTextHandler(request: VNRequest, error: Error?) {
var numbers = [String]()
var redBoxes = [CGRect]() // Shows all recognized text lines
var greenBoxes = [CGRect]() // Shows words that might be serials
guard let results = request.results as? [VNRecognizedTextObservation] else {
return
}
let maximumCandidates = 1
for visionResult in results {
guard let candidate = visionResult.topCandidates(maximumCandidates).first else { continue }
// Draw red boxes around any detected text, and green boxes around
// any detected phone numbers. The phone number may be a substring
// of the visionResult. If a substring, draw a green box around the
// number and a red box around the full string. If the number covers
// the full result only draw the green box.
var numberIsSubstring = true
if let result = candidate.string.extractPhoneNumber() {
let (range, number) = result
// Number may not cover full visionResult. Extract bounding box
// of substring.
if let box = try? candidate.boundingBox(for: range)?.boundingBox {
numbers.append(number)
greenBoxes.append(box)
numberIsSubstring = !(range.lowerBound == candidate.string.startIndex && range.upperBound == candidate.string.endIndex)
}
}
if numberIsSubstring {
redBoxes.append(visionResult.boundingBox)
}
}
// Log any found numbers.
numberTracker.logFrame(strings: numbers)
show(boxGroups: [(color: UIColor.red.cgColor, boxes: redBoxes), (color: UIColor.green.cgColor, boxes: greenBoxes)])
// Check if we have any temporally stable numbers.
if let sureNumber = numberTracker.getStableString() {
showString(string: sureNumber)
numberTracker.reset(string: sureNumber)
}
}
override func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) {
// Configure for running in real-time.
request.recognitionLevel = .fast
// Language correction won't help recognizing phone numbers. It also
// makes recognition slower.
request.usesLanguageCorrection = false
// Only run on the region of interest for maximum speed.
request.regionOfInterest = regionOfInterest
let requestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: textOrientation, options: [:])
do {
try requestHandler.perform([request])
} catch {
print(error)
}
}
}
// MARK: - Bounding box drawing
// Draw a box on screen. Must be called from main queue.
var boxLayer = [CAShapeLayer]()
func draw(rect: CGRect, color: CGColor) {
let layer = CAShapeLayer()
layer.opacity = 0.5
layer.borderColor = color
layer.borderWidth = 2
layer.frame = rect
boxLayer.append(layer)
previewView.videoPreviewLayer.insertSublayer(layer, at: 1)
}
// Remove all drawn boxes. Must be called on main queue.
func removeBoxes() {
for layer in boxLayer {
layer.removeFromSuperlayer()
}
boxLayer.removeAll()
}
typealias ColoredBoxGroup = (color: CGColor, boxes: [CGRect])
// Draws groups of colored boxes.
func show(boxGroups: [ColoredBoxGroup]) {
DispatchQueue.main.async {
let layer = self.previewView.videoPreviewLayer
self.removeBoxes()
for boxGroup in boxGroups {
let color = boxGroup.color
for box in boxGroup.boxes {
let rect = layer.layerRectConverted(fromMetadataOutputRect: box.applying(self.visionToAVFTransform))
self.draw(rect: rect, color: color)
}
}
}
}
}
StringUtils:
import Foundation
extension Character {
// Given a list of allowed characters, try to convert self to those in list
// if not already in it. This handles some common misclassifications for
// characters that are visually similar and can only be correctly recognized
// with more context and/or domain knowledge. Some examples (should be read
// in Menlo or some other font that has different symbols for all characters):
// 1 and l are the same character in Times New Roman
// I and l are the same character in Helvetica
// 0 and O are extremely similar in many fonts
// oO, wW, cC, sS, pP and others only differ by size in many fonts
func getSimilarCharacterIfNotIn(allowedChars: String) -> Character {
let conversionTable = [
"s": "S",
"S": "5",
"5": "S",
"o": "O",
"Q": "O",
"O": "0",
"0": "O",
"l": "I",
"I": "1",
"1": "I",
"B": "8",
"8": "B"
]
// Allow a maximum of two substitutions to handle 's' -> 'S' -> '5'.
let maxSubstitutions = 2
var current = String(self)
var counter = 0
while !allowedChars.contains(current) && counter < maxSubstitutions {
if let altChar = conversionTable[current] {
current = altChar
counter += 1
} else {
// Doesn't match anything in our table. Give up.
break
}
}
return current.first!
}
}
extension String {
// Extracts the first US-style phone number found in the string, returning
// the range of the number and the number itself as a tuple.
// Returns nil if no number is found.
func extractPhoneNumber() -> (Range<String.Index>, String)? {
// Do a first pass to find any substring that could be a US phone
// number. This will match the following common patterns and more:
// xxx-xxx-xxxx
// xxx xxx xxxx
// (xxx) xxx-xxxx
// (xxx)xxx-xxxx
// xxx.xxx.xxxx
// xxx xxx-xxxx
// xxx/xxx.xxxx
// +1-xxx-xxx-xxxx
// Note that this doesn't only look for digits since some digits look
// very similar to letters. This is handled later.
let pattern = #"""
(?x) # Verbose regex, allows comments
(?:\+1-?)? # Potential international prefix, may have -
[(]? # Potential opening (
\b(\w{3}) # Capture xxx
[)]? # Potential closing )
[\ -./]? # Potential separator
(\w{3}) # Capture xxx
[\ -./]? # Potential separator
(\w{4})\b # Capture xxxx
"""#
guard let range = self.range(of: pattern, options: .regularExpression, range: nil, locale: nil) else {
// No phone number found.
return nil
}
// Potential number found. Strip out punctuation, whitespace and country
// prefix.
var phoneNumberDigits = ""
let substring = String(self[range])
let nsrange = NSRange(substring.startIndex..., in: substring)
do {
// Extract the characters from the substring.
let regex = try NSRegularExpression(pattern: pattern, options: [])
if let match = regex.firstMatch(in: substring, options: [], range: nsrange) {
for rangeInd in 1 ..< match.numberOfRanges {
let range = match.range(at: rangeInd)
let matchString = (substring as NSString).substring(with: range)
phoneNumberDigits += matchString as String
}
}
} catch {
print("Error \(error) when creating pattern")
}
// Must be exactly 10 digits.
guard phoneNumberDigits.count == 10 else {
return nil
}
// Substitute commonly misrecognized characters, for example: 'S' -> '5' or 'l' -> '1'
var result = ""
let allowedChars = "0123456789"
for var char in phoneNumberDigits {
char = char.getSimilarCharacterIfNotIn(allowedChars: allowedChars)
guard allowedChars.contains(char) else {
return nil
}
result.append(char)
}
return (range, result)
}
}
class StringTracker {
var frameIndex: Int64 = 0
typealias StringObservation = (lastSeen: Int64, count: Int64)
// Dictionary of seen strings. Used to get stable recognition before
// displaying anything.
var seenStrings = [String: StringObservation]()
var bestCount = Int64(0)
var bestString = ""
func logFrame(strings: [String]) {
for string in strings {
if seenStrings[string] == nil {
seenStrings[string] = (lastSeen: Int64(0), count: Int64(-1))
}
seenStrings[string]?.lastSeen = frameIndex
seenStrings[string]?.count += 1
print("Seen \(string) \(seenStrings[string]?.count ?? 0) times")
}
var obsoleteStrings = [String]()
// Go through strings and prune any that have not been seen in while.
// Also find the (non-pruned) string with the greatest count.
for (string, obs) in seenStrings {
// Remove previously seen text after 30 frames (~1s).
if obs.lastSeen < frameIndex - 30 {
obsoleteStrings.append(string)
}
// Find the string with the greatest count.
let count = obs.count
if !obsoleteStrings.contains(string) && count > bestCount {
bestCount = Int64(count)
bestString = string
}
}
// Remove old strings.
for string in obsoleteStrings {
seenStrings.removeValue(forKey: string)
}
frameIndex += 1
}
func getStableString() -> String? {
// Require the recognizer to see the same string at least 10 times.
if bestCount >= 10 {
return bestString
} else {
return nil
}
}
func reset(string: String) {
seenStrings.removeValue(forKey: string)
bestCount = 0
bestString = ""
}
}
AppDelegate:
import UIKit
#UIApplicationMain
class AppDelegate: UIResponder, UIApplicationDelegate {
var window: UIWindow?
}
I was using the wrong class on the view controller.. instead of it being TextScanViewController it should have been set to Visionviewcontroller... it was skipping a whole class. I didn't realize how classes are inherited and that there was an important order to them. I have a lot to learn but learning a lot! :)

AVCaptureVideoDataOutputSampleBufferDelegate drop frames using CIFilters for video filtering

I have very strange case where AVCaptureVideoDataOutputSampleBufferDelegate drops frames if I use 13 different filter chains. Let me explain:
I have CameraController setup, nothing special, here is my delegate method:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if !paused {
if connection.output?.connection(with: .audio) == nil {
//capture video
// my try to avoid "Out of buffers error", no luck ;(
lastCapturedBuffer = nil
let err = CMSampleBufferCreateCopy(allocator: kCFAllocatorDefault, sampleBuffer: sampleBuffer, sampleBufferOut: &lastCapturedBuffer)
if err == noErr {
}
connection.videoOrientation = .portrait
// getting image
let pixelBuffer = CMSampleBufferGetImageBuffer(lastCapturedBuffer!)
// remove if any
CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
// captured - is just ciimage property
captured = CIImage(cvPixelBuffer: pixelBuffer!)
//remove if any
CVPixelBufferUnlockBaseAddress(pixelBuffer!,CVPixelBufferLockFlags(rawValue: 0))
//CVPixelBufferUnlockBaseAddress(pixelBuffer!, .readOnly)
// transform image to targer resolution
let srcWidth = CGFloat(captured.extent.width)
let srcHeight = CGFloat(captured.extent.height)
let dstWidth: CGFloat = ConstantsManager.shared.k_video_width
let dstHeight: CGFloat = ConstantsManager.shared.k_video_height
let scaleX = dstWidth / srcWidth
let scaleY = dstHeight / srcHeight
var transform = CGAffineTransform.init(scaleX: scaleX, y: scaleY)
captured = captured.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: dstWidth, height: dstHeight))
// mirror for front camera
if front {
var t = CGAffineTransform.init(scaleX: -1, y: 1)
t = t.translatedBy(x: -ConstantsManager.shared.k_video_width, y: 0)
captured = captured.transformed(by: t)
}
// video capture logic
let writable = canWrite()
if writable,
sessionAtSourceTime == nil {
sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(lastCapturedBuffer!)
videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
}
if writable, (videoWriterInput.isReadyForMoreMediaData) {
videoWriterInput.append(lastCapturedBuffer!)
}
// apply effect in realtime <- here is problem. If I comment next line, it will be fixed but effect will n't be applied
captured = FilterManager.shared.applyFilterForCamera(inputImage: captured)
// current frame in case user wants to save image as photo
self.capturedPhoto = captured
// sent frame to Camcoder view controller
self.delegate?.didCapturedFrame(frame: captured)
} else {
// capture sound
let writable = canWrite()
if writable, (audioWriterInput.isReadyForMoreMediaData) {
//print("write audio buffer")
audioWriterInput?.append(lastCapturedBuffer!)
}
}
} else {
// paused
}
}
I also implemented didDrop delegate method, here is how I figure out why it drops frames:
func captureOutput(_ output: AVCaptureOutput, didDrop sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
print("did drop")
var mode: CMAttachmentMode = 0
let reason = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_DroppedFrameReason, attachmentModeOut: &mode)
print("reason \(String(describing: reason))") // Optional(OutOfBuffers)
}
So I did it like a pro and just commented parts of code to find where is the problem. So, it here:
captured = FilterManager.shared.applyFilterForCamera(inputImage: captured)
FilterManager - is singleton, here is called func:
func applyFilterForCamera(inputImage: CIImage) -> CIImage {
return currentVsFilter!.apply(sourceImage: inputImage)
}
currentVsFilter is object of VSFilter type - here is example of one:
import Foundation
import AVKit
class TestFilter: CustomFilter {
let _name = "Тестовый Фильтр"
let _displayName = "Test Filter"
var tempImage: CIImage?
var final: CGImage?
override func name() -> String {
return _name
}
override func displayName() -> String {
return _displayName
}
override init() {
super.init()
print("Test Filter init")
// setup my custom kernel filter
self.noise.type = GlitchFilter.GlitchType.allCases[2]
}
// this returns composition for playback using AVPlayer
override func composition(asset: AVAsset) -> AVMutableVideoComposition {
let composition = AVMutableVideoComposition(asset: asset, applyingCIFiltersWithHandler: { request in
let inputImage = request.sourceImage.cropped(to: request.sourceImage.extent)
DispatchQueue.global(qos: .userInitiated).async {
let output = self.apply(sourceImage: inputImage, forComposition: true)
request.finish(with: output, context: nil)
}
})
let size = FilterManager.shared.cropRectForOrientation().size
composition.renderSize = size
return composition
}
// this returns actual filtered CIImage, used for both AVPlayer composition and realtime camera
override func apply(sourceImage: CIImage, forComposition: Bool = false) -> CIImage {
// rendered text
tempImage = FilterManager.shared.textRenderedImage()
// some filters chained one by one
self.screenBlend?.setValue(tempImage, forKey: kCIInputImageKey)
self.screenBlend?.setValue(sourceImage, forKey: kCIInputBackgroundImageKey)
self.noise.inputImage = self.screenBlend?.outputImage
self.noise.inputAmount = CGFloat.random(in: 1.0...3.0)
// result
tempImage = self.noise.outputImage
// correct crop
let rect = forComposition ? FilterManager.shared.cropRectForOrientation() : FilterManager.shared.cropRect
final = self.context.createCGImage(tempImage!, from: rect!)
return CIImage(cgImage: final!)
}
}
And now, the most strange thing, I have 30 VSFilters and when I got to 13(switching one by one by UIButton) I got error "Out of Buffer", this one:
kCMSampleBufferDroppedFrameReason_OutOfBuffers
What I tested:
I changed vsFilters order in filters array inside FilterManager singleton - same
I tried switch from first to 12 one by one, then go back - works, but after I switched to 13tn(of 30th from 0) - bug
Looks like it can handle only 12 VSFIlter objects, like if it retains them somehow or maybe it's related to threading, I don't know.
This app made for iOs devices, tested on iPhone X iOs 13.3.1
This is video editor app to apply different effects to both live stream from camera and video files from camera roll
Maybe someone has experience with this?
Have a great day
Best, Victor
Edit 1. If I reinit cameraController(AVCaptureSession. input/output devices) it works but this is ugly option and it adds lag when switching filters
Ok, so I finally won this battle. In case some one else get this "OutOfBuffer" problem, here is my solution
As I figured out, CIFilter grabs CVPixelBuffer and don't release it while filtering images. It's kinda creates one huge buffer, I guess. Strange thing: it don't create memory leak, so I guess it grabs not particular buffer but creates strong reference to it. As rumors(me) say, it can handle only 12 such references.
So, my approach was to copy CVPixelBuffer and then work with it instead of buffer I got from AVCaptureVideoDataOutputSampleBufferDelegate didOutput func
Here is my new code:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if !paused {
//print("camera controller \(id) got frame")
if connection.output?.connection(with: .audio) == nil {
//capture video
connection.videoOrientation = .portrait
// getting image
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
// this works!
let copyBuffer = pixelBuffer.copy()
// captured - is just ciimage property
captured = CIImage(cvPixelBuffer: copyBuffer)
//remove if any
// transform image to targer resolution
let srcWidth = CGFloat(captured.extent.width)
let srcHeight = CGFloat(captured.extent.height)
let dstWidth: CGFloat = ConstantsManager.shared.k_video_width
let dstHeight: CGFloat = ConstantsManager.shared.k_video_height
let scaleX = dstWidth / srcWidth
let scaleY = dstHeight / srcHeight
var transform = CGAffineTransform.init(scaleX: scaleX, y: scaleY)
captured = captured.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: dstWidth, height: dstHeight))
// mirror for front camera
if front {
var t = CGAffineTransform.init(scaleX: -1, y: 1)
t = t.translatedBy(x: -ConstantsManager.shared.k_video_width, y: 0)
captured = captured.transformed(by: t)
}
// video capture logic
let writable = canWrite()
if writable,
sessionAtSourceTime == nil {
sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
}
if writable, (videoWriterInput.isReadyForMoreMediaData) {
videoWriterInput.append(sampleBuffer)
}
self.captured = FilterManager.shared.applyFilterForCamera(inputImage: self.captured)
// current frame in case user wants to save image as photo
self.capturedPhoto = captured
// sent frame to Camcoder view controller
self.delegate?.didCapturedFrame(frame: captured)
} else {
// capture sound
let writable = canWrite()
if writable, (audioWriterInput.isReadyForMoreMediaData) {
//print("write audio buffer")
audioWriterInput?.append(sampleBuffer)
}
}
} else {
// paused
//print("paused camera controller \(id)")
}
}
and there is func to copy buffer:
func copy() -> CVPixelBuffer {
precondition(CFGetTypeID(self) == CVPixelBufferGetTypeID(), "copy() cannot be called on a non-CVPixelBuffer")
var _copy : CVPixelBuffer?
CVPixelBufferCreate(
kCFAllocatorDefault,
CVPixelBufferGetWidth(self),
CVPixelBufferGetHeight(self),
CVPixelBufferGetPixelFormatType(self),
nil,
&_copy)
guard let copy = _copy else { fatalError() }
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
CVPixelBufferLockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
let copyBaseAddress = CVPixelBufferGetBaseAddress(copy)
let currBaseAddress = CVPixelBufferGetBaseAddress(self)
print("copy data size: \(CVPixelBufferGetDataSize(copy))")
print("self data size: \(CVPixelBufferGetDataSize(self))")
memcpy(copyBaseAddress, currBaseAddress, CVPixelBufferGetDataSize(copy))
//memcpy(copyBaseAddress, currBaseAddress, CVPixelBufferGetDataSize(self) * 2)
CVPixelBufferUnlockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
return copy
}
I used it as extension
I hope, this will help anyone with similar problem
Best, Victor

MTKView Drawing Performance

What I am Trying to Do
I am trying to show filters on a camera feed by using a Metal view: MTKView. I am closely following the method of Apple's sample code - Enhancing Live Video by Leveraging TrueDepth Camera Data (link).
What I Have So Far
Following code works great (mainly interpreted from above-mentioned sample code) :
class MetalObject: NSObject, MTKViewDelegate {
private var metalBufferView : MTKView?
private var metalDevice = MTLCreateSystemDefaultDevice()
private var metalCommandQueue : MTLCommandQueue!
private var ciContext : CIContext!
private let colorSpace = CGColorSpaceCreateDeviceRGB()
private var videoPixelBuffer : CVPixelBuffer?
private let syncQueue = DispatchQueue(label: "Preview View Sync Queue", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem)
private var textureWidth : Int = 0
private var textureHeight : Int = 0
private var textureMirroring = false
private var sampler : MTLSamplerState!
private var renderPipelineState : MTLRenderPipelineState!
private var vertexCoordBuffer : MTLBuffer!
private var textCoordBuffer : MTLBuffer!
private var internalBounds : CGRect!
private var textureTranform : CGAffineTransform?
private var previewImage : CIImage?
init(with frame: CGRect) {
super.init()
self.metalBufferView = MTKView(frame: frame, device: self.metalDevice)
self.metalBufferView!.contentScaleFactor = UIScreen.main.nativeScale
self.metalBufferView!.framebufferOnly = true
self.metalBufferView!.colorPixelFormat = .bgra8Unorm
self.metalBufferView!.isPaused = true
self.metalBufferView!.enableSetNeedsDisplay = false
self.metalBufferView!.delegate = self
self.metalCommandQueue = self.metalDevice!.makeCommandQueue()
self.ciContext = CIContext(mtlDevice: self.metalDevice!)
//Configure Metal
let defaultLibrary = self.metalDevice!.makeDefaultLibrary()!
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineDescriptor.vertexFunction = defaultLibrary.makeFunction(name: "vertexPassThrough")
pipelineDescriptor.fragmentFunction = defaultLibrary.makeFunction(name: "fragmentPassThrough")
// To determine how our textures are sampled, we create a sampler descriptor, which
// will be used to ask for a sampler state object from our device below.
let samplerDescriptor = MTLSamplerDescriptor()
samplerDescriptor.sAddressMode = .clampToEdge
samplerDescriptor.tAddressMode = .clampToEdge
samplerDescriptor.minFilter = .linear
samplerDescriptor.magFilter = .linear
sampler = self.metalDevice!.makeSamplerState(descriptor: samplerDescriptor)
do {
renderPipelineState = try self.metalDevice!.makeRenderPipelineState(descriptor: pipelineDescriptor)
} catch {
fatalError("Unable to create preview Metal view pipeline state. (\(error))")
}
}
final func update (newVideoPixelBuffer: CVPixelBuffer?) {
self.syncQueue.async {
var filteredImage : CIImage
self.videoPixelBuffer = newVideoPixelBuffer
//---------
//Core image filters
//Strictly CIFilters, chained together
//---------
self.previewImage = filteredImage
//Ask Metal View to draw
self.metalBufferView?.draw()
}
}
//MARK: - Metal View Delegate
final func draw(in view: MTKView) {
print (Thread.current)
guard let drawable = self.metalBufferView!.currentDrawable,
let currentRenderPassDescriptor = self.metalBufferView!.currentRenderPassDescriptor,
let previewImage = self.previewImage else {
return
}
// create a texture for the CI image to render to
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .bgra8Unorm,
width: Int(previewImage.extent.width),
height: Int(previewImage.extent.height),
mipmapped: false)
textureDescriptor.usage = [.shaderWrite, .shaderRead]
let texture = self.metalDevice!.makeTexture(descriptor: textureDescriptor)!
if texture.width != textureWidth ||
texture.height != textureHeight ||
self.metalBufferView!.bounds != internalBounds {
setupTransform(width: texture.width, height: texture.height, mirroring: mirroring, rotation: rotation)
}
// Set up command buffer and encoder
guard let commandQueue = self.metalCommandQueue else {
print("Failed to create Metal command queue")
return
}
guard let commandBuffer = commandQueue.makeCommandBuffer() else {
print("Failed to create Metal command buffer")
return
}
// add rendering of the image to the command buffer
ciContext.render(previewImage,
to: texture,
commandBuffer: commandBuffer,
bounds: previewImage.extent,
colorSpace: self.colorSpace)
guard let commandEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: currentRenderPassDescriptor) else {
print("Failed to create Metal command encoder")
return
}
// add vertex and fragment shaders to the command buffer
commandEncoder.label = "Preview display"
commandEncoder.setRenderPipelineState(renderPipelineState!)
commandEncoder.setVertexBuffer(vertexCoordBuffer, offset: 0, index: 0)
commandEncoder.setVertexBuffer(textCoordBuffer, offset: 0, index: 1)
commandEncoder.setFragmentTexture(texture, index: 0)
commandEncoder.setFragmentSamplerState(sampler, index: 0)
commandEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
commandEncoder.endEncoding()
commandBuffer.present(drawable) // Draw to the screen
commandBuffer.commit()
}
final func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
}
}
Notes
The reason MTKViewDelegate is used instead of subclassing MTKView is that when it was subclassed, the draw call was called on the main thread. With the delegate method shown above, it seems to be a different metal related thread call each loop. Above method seem to give much better performance.
Details on CIFilter usage on update method above had to be redacted. All it is a heavy chain of CIFilters stacked. Unfortunately there is no room for any tweaks with these filters.
Issue
Above code seems to slow down the main thread a lot, causing rest of the app UI to be choppy. For example, scrolling a UIScrollview gets seem to be slow and choppy.
Goal
Tweak Metal view to ease up on CPU and go easy on the main thread to leave enough juice for rest of the UI.
According to the above graphics, preparation of command buffer is all done in CPU until presented and committed(?). Is there a way to offload that from CPU?
Any hints, feedback, tips, etc to improve the drawing efficiency would be appreciated.
There are a few things you can do to improve the performance:
Render into the view’s drawable directly instead of rendering into a texture and then rendering again to render that texture into the view.
Use the newish CIRenderDestination API to defer the actual texture retrieval to the moment the view is actually rendered to (i.e. when Core Image is done).
Here’s the draw(in view: MTKView) I’m using in my Core Image project, modified for your case:
public func draw(in view: MTKView) {
if let currentDrawable = view.currentDrawable,
let commandBuffer = self.commandQueue.makeCommandBuffer() {
let drawableSize = view.drawableSize
// optional: scale the image to fit the view
let scaleX = drawableSize.width / image.extent.width
let scaleY = drawableSize.height / image.extent.height
let scale = min(scaleX, scaleY)
let scaledImage = previewImage.transformed(by: CGAffineTransform(scaleX: scale, y: scale))
// optional: center in the view
let originX = max(drawableSize.width - scaledImage.extent.size.width, 0) / 2
let originY = max(drawableSize.height - scaledImage.extent.size.height, 0) / 2
let centeredImage = scaledImage.transformed(by: CGAffineTransform(translationX: originX, y: originY))
// create a render destination that allows to lazily fetch the target texture
// which allows the encoder to process all CI commands _before_ the texture is actually available;
// this gives a nice speed boost because the CPU doesn’t need to wait for the GPU to finish
// before starting to encode the next frame
let destination = CIRenderDestination(width: Int(drawableSize.width),
height: Int(drawableSize.height),
pixelFormat: view.colorPixelFormat,
commandBuffer: commandBuffer,
mtlTextureProvider: { () -> MTLTexture in
return currentDrawable.texture
})
let task = try! self.context.startTask(toRender: centeredImage, to: destination)
// bonus: you can Quick Look the task to see what’s actually scheduled for the GPU
commandBuffer.present(currentDrawable)
commandBuffer.commit()
// optional: you can wait for the task execution and Quick Look the info object to get insights and metrics
DispatchQueue.global(qos: .background).async {
let info = try! task.waitUntilCompleted()
}
}
}
If this is still too slow, you can try setting the priorityRequestLow CIContextOption when creating your CIContext to tell Core Image to render in low priority.

ARKit 3D Head tracking in scene

I am using ARKit to create an augmented camera app. When the ARSession initialises, a 3d character is shown in a ARSCNView. I am trying to get the character's
head to track the ARCamera's point of view so they are always looking at the camera as the user moves to take a photo.
I've used Apple's chameleon demo, which adds a focus node that tracks the cameras point of view using SCNLookAtConstraint but I am getting
strange behaviour. The head drops to the side and rotates as the ARCamera pans. If I add a SCNTransformConstraint to restrict the
head movement to up/down/side-to-side, it stays vertical but then looks away and doesn't track.
I've tried picking the chameleon demo apart to see why mine is not working but after a few days I am stuck.
The code I am using is:
class Daisy: SCNScene, ARCharacter, CAAnimationDelegate {
// Rig for animation
private var contentRootNode: SCNNode! = SCNNode()
private var geometryRoot: SCNNode!
private var head: SCNNode!
private var leftEye: SCNNode!
private var rightEye: SCNNode!
// Head tracking properties
private var focusOfTheHead = SCNNode()
private let focusNodeBasePosition = simd_float3(0, 0.1, 0.25)
// State properties
private var modelLoaded: Bool = false
private var headIsMoving: Bool = false
private var shouldTrackCamera: Bool = false
/*
* MARK: - Init methods
*/
override init() {
super.init()
loadModel()
setupSpecialNodes()
setupConstraints()
}
/*
* MARK: - Setup methods
*/
func loadModel() {
guard let virtualObjectScene = SCNScene(named: "daisy_3.dae", inDirectory: "art.scnassets") else {
print("virtualObjectScene not intialised")
return
}
let wrapper = SCNNode()
for child in virtualObjectScene.rootNode.childNodes {
wrapper.addChildNode(child)
}
self.rootNode.addChildNode(contentRootNode)
contentRootNode.addChildNode(wrapper)
hide()
modelLoaded = true
}
private func setupSpecialNodes() {
// Assign characters rig elements to nodes
geometryRoot = self.rootNode.childNode(withName: "D_Rig", recursively: true)
head = self.rootNode.childNode(withName: "D_RigFBXASC032Head", recursively: true)
leftEye = self.rootNode.childNode(withName: "D_Eye_L", recursively: true)
rightEye = self.rootNode.childNode(withName: "D_Eye_R", recursively: true)
// Set up looking position nodes
focusOfTheHead.simdPosition = focusNodeBasePosition
geometryRoot.addChildNode(focusOfTheHead)
}
/*
* MARK: - Head animations
*/
func updateForScene(_ scene: ARSCNView) {
guard shouldTrackCamera, let pointOfView = scene.pointOfView else {
print("Not going to updateForScene")
return
}
followUserWithHead(to: pointOfView)
}
private func followUserWithHead(to pov: SCNNode) {
guard !headIsMoving else { return }
// Update the focus node to the point of views position
let target = focusOfTheHead.simdConvertPosition(pov.simdWorldPosition, to: nil)
// Slightly delay the head movement and the animate it to the new focus position
DispatchQueue.main.asyncAfter(deadline: .now() + 0.2, execute: {
let moveToTarget = SCNAction.move(to: SCNVector3(target.x, target.y, target.z), duration: 1.5)
self.headIsMoving = true
self.focusOfTheHead.runAction(moveToTarget, completionHandler: {
self.headIsMoving = false
})
})
}
private func setupConstraints() {
let headConstraint = SCNLookAtConstraint(target: focusOfTheHead)
headConstraint.isGimbalLockEnabled = true
let headRotationConstraint = SCNTransformConstraint(inWorldSpace: false) { (node, transform) -> SCNMatrix4 in
// Only track the up/down and side to side movement
var eulerX = node.presentation.eulerAngles.x
var eulerZ = node.presentation.eulerAngles.z
// Restrict the head movement so it doesn't rotate too far
if eulerX < self.rad(-90) { eulerX = self.rad(-90) }
if eulerX > self.rad(90) { eulerX = self.rad(90) }
if eulerZ < self.rad(-30) { eulerZ = self.rad(-30) }
if eulerZ > self.rad(30) { eulerZ = self.rad(30) }
let tempNode = SCNNode()
tempNode.transform = node.presentation.transform
tempNode.eulerAngles = SCNVector3(eulerX, 0, eulerZ)
return tempNode.transform
}
head?.constraints = [headConstraint, headRotationConstraint]
}
// Helper to convert degrees to radians
private func rad(_ deg: Float) -> Float {
return deg * Float.pi / 180
}
}
The model in the Scene editor is:
I have solved the problem I was having. There were 2 issues:
The target in followUserWithHead should have converted the simdWorldPosition for it's parent and been convert from (not to)
focusOfTheHead.parent!.simdConvertPosition(pov.simdWorldPosition, from: nil)
The local coordinates for the head node are incorrect. The z-axis should be the x-axis so when I got the focus the head movement tracking, the ear was always following the camera.
I didn't realise that the Debug View Hierarchy in Xcode will show the details of an SCNScene. This helped me to debug the scene and find where the nodes were tracking. You can export the scene as a dae and then load into SceneKit editor
Edit:
I used localFront as mnuages suggested in the comments below, which got the tracking working in the correct direction. The head did occasionally moved about though. I have put this down to the animation that was running on the model trying to apply a transform that was then changed on the next update cycle. I decided to remove the tracking from the head and use the same approach to track the eyes only.

Setting up Metal in Swift 3 on an iOS device

I've been trying to convert Apple's MetalBasicTessellation project to work in swift 3 on an iPad Air 3, but thus far have been unsuccessful. Frustratingly, the project comes with an iOS implementation (written in objectiveC, and a swift playground), but no swift 3 implementation.
I have gotten the code to compile, but fails to run on my iPad with the following error:
2017-05-14 14:25:54.268400-0700 MetalBasicTessellation[2436:570250] -[MTLRenderPipelineDescriptorInternal validateWithDevice:], line 1728: error 'tessellation is only supported on MTLFeatureSet_iOS_GPUFamily3_v1 and later'
I am pretty sure that the iPad Air 2 is compliant, but I have the feeling the error is due to an improperly configured MetalKitView. I have reverse-engineered what I could from the project's objective-c and playground files, but I have gone as far as I am able to understand with my current expertise.
//
// ViewController.swift
// MetalBasicTessellation
//
// Created by vladimir sierra on 5/10/17.
//
//
import UIKit
import Metal
import MetalKit
class ViewController: UIViewController {
#IBOutlet weak var mtkView: MTKView!
// Seven steps required to set up metal for rendering:
// 1. Create a MTLDevice
// 2. Create a CAMetalLayer
// 3. Create a Vertex Buffer
// 4. Create a Vertex Shader
// 5. Create a Fragment Shader
// 6. Create a Render Pipeline
// 7. Create a Command Queue
var device: MTLDevice! // to be initialized in viewDidLoad
//var metalLayer: CAMetalLayer! // to be initialized in viewDidLoad
var vertexBuffer: MTLBuffer! // to be initialized in viewDidLoad
var library: MTLLibrary!
// once we create a vertex and fragment shader, we combine them in an object called render pipeline. In Metal the shaders are precompiled, and the render pipeline configuration is compiled after you first set it up. This makes everything extremely efficient
var renderPipeline: MTLRenderPipelineState! // to be initialized in viewDidLoad
var commandQueue: MTLCommandQueue! // to be initialized in viewDidLoad
//var timer: CADisplayLink! // function to be called every time the device screen refreshes so we can redraw the screen
override func viewDidLayoutSubviews() {
super.viewDidLayoutSubviews()
/*
if let window = view.window {
let scale = window.screen.nativeScale // (2 for iPhone 5s, 6 and iPads; 3 for iPhone 6 Plus)
let layerSize = view.bounds.size
// apply the scale to increase the drawable texture size.
view.contentScaleFactor = scale
//metalLayer.frame = CGRect(x: 0, y: 0, width: layerSize.width, height: layerSize.height)
//metalLayer.drawableSize = CGSize(width: layerSize.width * scale, height: layerSize.height * scale)
} */
}
override func viewDidLoad() {
super.viewDidLoad()
device = MTLCreateSystemDefaultDevice() // returns a reference to the default MTLDevice
//device.supportsFeatureSet(MTLFeatureSet_iOS_GPUFamily3_v2)
// set up layer to display metal content
//metalLayer = CAMetalLayer() // initialize metalLayer
//metalLayer.device = device // device the layer should use
//metalLayer.pixelFormat = .bgra8Unorm // normalized 8 bit rgba
//metalLayer.framebufferOnly = true // set to true for performance issues
//view.layer.addSublayer(metalLayer) // add sublayer to main view's layer
// precompile custom metal functions
let defaultLibrary = device.newDefaultLibrary()! // MTLLibrary object with precompiled shaders
let fragmentProgram = defaultLibrary.makeFunction(name: "tessellation_fragment")
let vertexProgram = defaultLibrary.makeFunction(name: "tessellation_vertex_triangle")
// Setup Compute Pipeline
let kernelFunction = defaultLibrary.makeFunction(name: "tessellation_kernel_triangle")
var computePipeline: MTLComputePipelineState?
do {
computePipeline = try device.makeComputePipelineState(function: kernelFunction!)
} catch let error as NSError {
print("compute pipeline error: " + error.description)
}
// Setup Vertex Descriptor
let vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float4
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0;
vertexDescriptor.layouts[0].stepFunction = .perPatchControlPoint
vertexDescriptor.layouts[0].stepRate = 1
vertexDescriptor.layouts[0].stride = 4*MemoryLayout<Float>.size
// Setup Render Pipeline
let renderPipelineDescriptor = MTLRenderPipelineDescriptor()
renderPipelineDescriptor.vertexDescriptor = vertexDescriptor
//renderPipelineDescriptor.fragmentFunction = defaultLibrary.makeFunction(name: "tessellation_fragment")
renderPipelineDescriptor.fragmentFunction = fragmentProgram
//renderPipelineDescriptor.vertexFunction = defaultLibrary.makeFunction(name: "tessellation_vertex_triangle")
renderPipelineDescriptor.vertexFunction = vertexProgram
//renderPipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm // normalized 8 bit rgba
renderPipelineDescriptor.colorAttachments[0].pixelFormat = mtkView.colorPixelFormat
renderPipelineDescriptor.isTessellationFactorScaleEnabled = false
renderPipelineDescriptor.tessellationFactorFormat = .half
renderPipelineDescriptor.tessellationControlPointIndexType = .none
renderPipelineDescriptor.tessellationFactorStepFunction = .constant
renderPipelineDescriptor.tessellationOutputWindingOrder = .clockwise
renderPipelineDescriptor.tessellationPartitionMode = .fractionalEven
renderPipelineDescriptor.maxTessellationFactor = 64;
// Compile renderPipeline
do {
renderPipeline = try device.makeRenderPipelineState(descriptor: renderPipelineDescriptor)
} catch let error as NSError {
print("render pipeline error: " + error.description)
}
// Setup Buffers
let tessellationFactorsBuffer = device.makeBuffer(length: 256, options: MTLResourceOptions.storageModePrivate)
let controlPointPositions: [Float] = [
-0.8, -0.8, 0.0, 1.0, // lower-left
0.0, 0.8, 0.0, 1.0, // upper-middle
0.8, -0.8, 0.0, 1.0, // lower-right
]
let controlPointsBuffer = device.makeBuffer(bytes: controlPointPositions, length:256 , options: [])
// Tessellation Pass
let commandBuffer = commandQueue.makeCommandBuffer()
let computeCommandEncoder = commandBuffer.makeComputeCommandEncoder()
computeCommandEncoder.setComputePipelineState(computePipeline!)
let edgeFactor: [Float] = [16.0]
let insideFactor: [Float] = [8.0]
computeCommandEncoder.setBytes(edgeFactor, length: MemoryLayout<Float>.size, at: 0)
computeCommandEncoder.setBytes(insideFactor, length: MemoryLayout<Float>.size, at: 1)
computeCommandEncoder.setBuffer(tessellationFactorsBuffer, offset: 0, at: 2)
computeCommandEncoder.dispatchThreadgroups(MTLSizeMake(1, 1, 1), threadsPerThreadgroup: MTLSizeMake(1, 1, 1))
computeCommandEncoder.endEncoding()
let renderPassDescriptor = mtkView.currentRenderPassDescriptor
let renderCommandEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor!)
renderCommandEncoder.setRenderPipelineState(renderPipeline!)
renderCommandEncoder.setVertexBuffer(controlPointsBuffer, offset: 0, at: 0)
renderCommandEncoder.setTriangleFillMode(.lines)
renderCommandEncoder.setTessellationFactorBuffer(tessellationFactorsBuffer, offset: 0, instanceStride: 0)
renderCommandEncoder.drawPatches(numberOfPatchControlPoints: 3, patchStart: 0, patchCount: 1, patchIndexBuffer: nil, patchIndexBufferOffset: 0, instanceCount: 1, baseInstance: 0)
renderCommandEncoder.endEncoding()
commandBuffer.present(mtkView.currentDrawable!)
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
/*
// finally create an ordered list of commands forthe GPU to execute
commandQueue = device.makeCommandQueue()
timer = CADisplayLink(target: self, selector: #selector(ViewController.gameloop)) // call gameloop every time the screen refreshes
timer.add(to: RunLoop.main, forMode: RunLoopMode.defaultRunLoopMode)
*/
}
override func didReceiveMemoryWarning() {
super.didReceiveMemoryWarning()
// Dispose of any resources that can be recreated.
}
/*
func render() {
guard let drawable = metalLayer?.nextDrawable() else { return } // returns the texture to draw into in order for something to appear on the screen
//objectToDraw.render(commandQueue: commandQueue, renderPipeline: renderPipeline, drawable: drawable, clearColor: nil)
}
// this is the routine that gets run every time the screen refreshes
func gameloop() {
autoreleasepool {
self.render()
}
} */
}
The entire git can be found here
Would some kind metal-guru-soul lend a hand? Documentation out there is pretty sparse.
The docs for MTLFeatureSet_iOS_GPUFamily3_v1 say:
Introduced with the Apple A9 GPU and iOS 9.0.
(Emphasis added.)
Meanwhile, the iOS Device Compatibility Reference: Hardware GPU Information article says the iPad Air 2 has an A8 GPU.
I don't believe your device is capable.
In general, the configuration of the MTKView will not affect the feature set that's supported. That's inherent in the device (the combination of hardware and OS version). You can query whether a device supports a given feature set using the supportsFeatureSet(_:) method of MTLDevice. Since a device can be (and usually is) acquired independently of any other object such as an MTKView, the feature set support can't depend on such other objects.

Resources