is iOS/Metal possible to read pixel depth? - ios

I want to read depth on a certain point(x, y). In OpenGL, it's glReadPixels, but OpenGL ES does not support reading depth. So I wonder if iOS/Metal can do the trick.
There are few examples that I can find, I've checked other questions but few people answered and those answers are not complete.
A similar question is:
iOS/Metal: how to read from the depth buffer at a point?
but it's four years ago. and I don't understand the only answer. Forgive me that I am a very new metal user. I wonder if there is a better way now with newest swift and metal.
I have a draw function shown below,
func draw(in view: MTKView) {
guard let drawable = view.currentDrawable,
let pipelineState = pipelineState,
let descriptor = view.currentRenderPassDescriptor else { return }
let commandBuffer = commandQueue.makeCommandBuffer()
let commandEncoder = commandBuffer?.makeRenderCommandEncoder(descriptor: descriptor)
commandEncoder?.setRenderPipelineState(pipelineState)
plane2.draw(view, commandEncoder)
plane.draw(view, commandEncoder)
commandEncoder?.endEncoding()
commandBuffer?.present(drawable)
commandBuffer?.commit()
}
my pipelineState is:
let library = device.makeDefaultLibrary()
let vertexFunction = library?.makeFunction(name: "vertex_shader")
let fragmentFunction = library?.makeFunction(name: "fragment_shader")
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
do {
pipelineState = try device.makeRenderPipelineState(descriptor: pipelineDescriptor)
} catch let error as NSError {
print("error: \(error.localizedDescription)")
}
My app is very simple. just lines, triangles without textures, only primitive geometry. I expect to read the depth from a Touched point and unproject which to my camera view.

Related

ARKit draw using addChild Node

I am trying to draw the using ARKit add child node while transmitting the into webrtc. If drawing more annotations app gets freezes. Can anyone please guide me to overcome this issue?
This is the code I used to transfer the frame in webrtc:
guard let currentFrame = localArView.session.currentFrame else {
return }
guard let sceneImage: CGImage = self.localArView.snapshot().cgImage else { return }
let rtcPixelBuffer = RTCCVPixelBuffer(pixelBuffer: self.getCVPixelBuffer(UIImage(cgImage: sceneImage).cgImage!)!)
let timeStampNs: Int64 = Int64(currentFrame.timestamp * 1000000000)
let videoFrame = RTCVideoFrame(buffer: rtcPixelBuffer, rotation: RTCVideoRotation._0, timeStampNs: timeStampNs)
self.webRTCClient.didCaptureLocalFrame(videoFrame)

How to combine MTLTextures into the currentDrawable

I am new to using Metal but I have been following the tutorial here that takes the camera output and renders it on to the screen using metal.
Now I want to take an image, turn it into a MTLTexture, and position and render that texture on top of the camera output.
My current rendering code is as follows:
private func render(texture: MTLTexture, withCommandBuffer commandBuffer: MTLCommandBuffer, device: MTLDevice) {
guard
let currentRenderPassDescriptor = metalView.currentRenderPassDescriptor,
let currentDrawable = metalView.currentDrawable,
let renderPipelineState = renderPipelineState,
let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: currentRenderPassDescriptor)
else {
semaphore.signal()
return
}
encoder.pushDebugGroup("RenderFrame")
encoder.setRenderPipelineState(renderPipelineState)
encoder.setFragmentTexture(texture, index: 0)
encoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4, instanceCount: 1)
encoder.popDebugGroup()
encoder.endEncoding()
commandBuffer.addScheduledHandler { [weak self] (buffer) in
guard let unwrappedSelf = self else { return }
unwrappedSelf.didRenderTexture(texture, withCommandBuffer: buffer, device: device)
unwrappedSelf.semaphore.signal()
}
commandBuffer.present(currentDrawable)
commandBuffer.commit()
}
I know that I can convert a UIImage to a MTLTexture using the following code:
let textureLoader = MTKTextureLoader(device: device)
let cgImage = UIImage(named: "myImage")!.cgImage!
let imageTexture = try! textureLoader.newTexture(cgImage: cgImage, options: nil)
So now I have two MTLTextures. Is there a simple function that allows me to combine them? I've been trying to search online and someone mentioned a function called over, but I haven't actually been able to find that one. Any help would be greatly appreciated.
You can simply do this inside the shader by adding or multiplying color values. I guess that's what shaders are for.

MTKView Drawing Performance

What I am Trying to Do
I am trying to show filters on a camera feed by using a Metal view: MTKView. I am closely following the method of Apple's sample code - Enhancing Live Video by Leveraging TrueDepth Camera Data (link).
What I Have So Far
Following code works great (mainly interpreted from above-mentioned sample code) :
class MetalObject: NSObject, MTKViewDelegate {
private var metalBufferView : MTKView?
private var metalDevice = MTLCreateSystemDefaultDevice()
private var metalCommandQueue : MTLCommandQueue!
private var ciContext : CIContext!
private let colorSpace = CGColorSpaceCreateDeviceRGB()
private var videoPixelBuffer : CVPixelBuffer?
private let syncQueue = DispatchQueue(label: "Preview View Sync Queue", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem)
private var textureWidth : Int = 0
private var textureHeight : Int = 0
private var textureMirroring = false
private var sampler : MTLSamplerState!
private var renderPipelineState : MTLRenderPipelineState!
private var vertexCoordBuffer : MTLBuffer!
private var textCoordBuffer : MTLBuffer!
private var internalBounds : CGRect!
private var textureTranform : CGAffineTransform?
private var previewImage : CIImage?
init(with frame: CGRect) {
super.init()
self.metalBufferView = MTKView(frame: frame, device: self.metalDevice)
self.metalBufferView!.contentScaleFactor = UIScreen.main.nativeScale
self.metalBufferView!.framebufferOnly = true
self.metalBufferView!.colorPixelFormat = .bgra8Unorm
self.metalBufferView!.isPaused = true
self.metalBufferView!.enableSetNeedsDisplay = false
self.metalBufferView!.delegate = self
self.metalCommandQueue = self.metalDevice!.makeCommandQueue()
self.ciContext = CIContext(mtlDevice: self.metalDevice!)
//Configure Metal
let defaultLibrary = self.metalDevice!.makeDefaultLibrary()!
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineDescriptor.vertexFunction = defaultLibrary.makeFunction(name: "vertexPassThrough")
pipelineDescriptor.fragmentFunction = defaultLibrary.makeFunction(name: "fragmentPassThrough")
// To determine how our textures are sampled, we create a sampler descriptor, which
// will be used to ask for a sampler state object from our device below.
let samplerDescriptor = MTLSamplerDescriptor()
samplerDescriptor.sAddressMode = .clampToEdge
samplerDescriptor.tAddressMode = .clampToEdge
samplerDescriptor.minFilter = .linear
samplerDescriptor.magFilter = .linear
sampler = self.metalDevice!.makeSamplerState(descriptor: samplerDescriptor)
do {
renderPipelineState = try self.metalDevice!.makeRenderPipelineState(descriptor: pipelineDescriptor)
} catch {
fatalError("Unable to create preview Metal view pipeline state. (\(error))")
}
}
final func update (newVideoPixelBuffer: CVPixelBuffer?) {
self.syncQueue.async {
var filteredImage : CIImage
self.videoPixelBuffer = newVideoPixelBuffer
//---------
//Core image filters
//Strictly CIFilters, chained together
//---------
self.previewImage = filteredImage
//Ask Metal View to draw
self.metalBufferView?.draw()
}
}
//MARK: - Metal View Delegate
final func draw(in view: MTKView) {
print (Thread.current)
guard let drawable = self.metalBufferView!.currentDrawable,
let currentRenderPassDescriptor = self.metalBufferView!.currentRenderPassDescriptor,
let previewImage = self.previewImage else {
return
}
// create a texture for the CI image to render to
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .bgra8Unorm,
width: Int(previewImage.extent.width),
height: Int(previewImage.extent.height),
mipmapped: false)
textureDescriptor.usage = [.shaderWrite, .shaderRead]
let texture = self.metalDevice!.makeTexture(descriptor: textureDescriptor)!
if texture.width != textureWidth ||
texture.height != textureHeight ||
self.metalBufferView!.bounds != internalBounds {
setupTransform(width: texture.width, height: texture.height, mirroring: mirroring, rotation: rotation)
}
// Set up command buffer and encoder
guard let commandQueue = self.metalCommandQueue else {
print("Failed to create Metal command queue")
return
}
guard let commandBuffer = commandQueue.makeCommandBuffer() else {
print("Failed to create Metal command buffer")
return
}
// add rendering of the image to the command buffer
ciContext.render(previewImage,
to: texture,
commandBuffer: commandBuffer,
bounds: previewImage.extent,
colorSpace: self.colorSpace)
guard let commandEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: currentRenderPassDescriptor) else {
print("Failed to create Metal command encoder")
return
}
// add vertex and fragment shaders to the command buffer
commandEncoder.label = "Preview display"
commandEncoder.setRenderPipelineState(renderPipelineState!)
commandEncoder.setVertexBuffer(vertexCoordBuffer, offset: 0, index: 0)
commandEncoder.setVertexBuffer(textCoordBuffer, offset: 0, index: 1)
commandEncoder.setFragmentTexture(texture, index: 0)
commandEncoder.setFragmentSamplerState(sampler, index: 0)
commandEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
commandEncoder.endEncoding()
commandBuffer.present(drawable) // Draw to the screen
commandBuffer.commit()
}
final func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
}
}
Notes
The reason MTKViewDelegate is used instead of subclassing MTKView is that when it was subclassed, the draw call was called on the main thread. With the delegate method shown above, it seems to be a different metal related thread call each loop. Above method seem to give much better performance.
Details on CIFilter usage on update method above had to be redacted. All it is a heavy chain of CIFilters stacked. Unfortunately there is no room for any tweaks with these filters.
Issue
Above code seems to slow down the main thread a lot, causing rest of the app UI to be choppy. For example, scrolling a UIScrollview gets seem to be slow and choppy.
Goal
Tweak Metal view to ease up on CPU and go easy on the main thread to leave enough juice for rest of the UI.
According to the above graphics, preparation of command buffer is all done in CPU until presented and committed(?). Is there a way to offload that from CPU?
Any hints, feedback, tips, etc to improve the drawing efficiency would be appreciated.
There are a few things you can do to improve the performance:
Render into the view’s drawable directly instead of rendering into a texture and then rendering again to render that texture into the view.
Use the newish CIRenderDestination API to defer the actual texture retrieval to the moment the view is actually rendered to (i.e. when Core Image is done).
Here’s the draw(in view: MTKView) I’m using in my Core Image project, modified for your case:
public func draw(in view: MTKView) {
if let currentDrawable = view.currentDrawable,
let commandBuffer = self.commandQueue.makeCommandBuffer() {
let drawableSize = view.drawableSize
// optional: scale the image to fit the view
let scaleX = drawableSize.width / image.extent.width
let scaleY = drawableSize.height / image.extent.height
let scale = min(scaleX, scaleY)
let scaledImage = previewImage.transformed(by: CGAffineTransform(scaleX: scale, y: scale))
// optional: center in the view
let originX = max(drawableSize.width - scaledImage.extent.size.width, 0) / 2
let originY = max(drawableSize.height - scaledImage.extent.size.height, 0) / 2
let centeredImage = scaledImage.transformed(by: CGAffineTransform(translationX: originX, y: originY))
// create a render destination that allows to lazily fetch the target texture
// which allows the encoder to process all CI commands _before_ the texture is actually available;
// this gives a nice speed boost because the CPU doesn’t need to wait for the GPU to finish
// before starting to encode the next frame
let destination = CIRenderDestination(width: Int(drawableSize.width),
height: Int(drawableSize.height),
pixelFormat: view.colorPixelFormat,
commandBuffer: commandBuffer,
mtlTextureProvider: { () -> MTLTexture in
return currentDrawable.texture
})
let task = try! self.context.startTask(toRender: centeredImage, to: destination)
// bonus: you can Quick Look the task to see what’s actually scheduled for the GPU
commandBuffer.present(currentDrawable)
commandBuffer.commit()
// optional: you can wait for the task execution and Quick Look the info object to get insights and metrics
DispatchQueue.global(qos: .background).async {
let info = try! task.waitUntilCompleted()
}
}
}
If this is still too slow, you can try setting the priorityRequestLow CIContextOption when creating your CIContext to tell Core Image to render in low priority.

Filtering Depth Data on iOS 12 appears to be rotated

I am having an issue where the Depth Data for the .builtInDualCamera appears to be rotated 90 degrees when isFilteringEnabled = true
Here is my code:
fileprivate let session = AVCaptureSession()
fileprivate let meta = AVCaptureMetadataOutput()
fileprivate let video = AVCaptureVideoDataOutput()
fileprivate let depth = AVCaptureDepthDataOutput()
fileprivate let camera: AVCaptureDevice
fileprivate let input: AVCaptureDeviceInput
fileprivate let synchronizer: AVCaptureDataOutputSynchronizer
init(delegate: CaptureSessionDelegate?) throws {
self.delegate = delegate
session.sessionPreset = .vga640x480
// Setup Camera Input
let discovery = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInDualCamera], mediaType: .video, position: .unspecified)
if let device = discovery.devices.first {
camera = device
} else {
throw SessionError.CameraNotAvailable("Unable to load camera")
}
input = try AVCaptureDeviceInput(device: camera)
session.addInput(input)
// Setup Metadata Output (Face)
session.addOutput(meta)
if meta.availableMetadataObjectTypes.contains(AVMetadataObject.ObjectType.face) {
meta.metadataObjectTypes = [ AVMetadataObject.ObjectType.face ]
} else {
print("Can't Setup Metadata: \(meta.availableMetadataObjectTypes)")
}
// Setup Video Output
video.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_32BGRA]
session.addOutput(video)
video.connection(with: .video)?.videoOrientation = .portrait
// ****** THE ISSUE IS WITH THIS BLOCK HERE ******
// Setup Depth Output
depth.isFilteringEnabled = true
session.addOutput(depth)
depth.connection(with: .depthData)?.videoOrientation = .portrait
// Setup Synchronizer
synchronizer = AVCaptureDataOutputSynchronizer(dataOutputs: [depth, video, meta])
let outputRect = CGRect(x: 0, y: 0, width: 1, height: 1)
let videoRect = video.outputRectConverted(fromMetadataOutputRect: outputRect)
let depthRect = depth.outputRectConverted(fromMetadataOutputRect: outputRect)
// Ratio of the Depth to Video
scale = max(videoRect.width, videoRect.height) / max(depthRect.width, depthRect.height)
// Set Camera to the framerate of the Depth Data Collection
try camera.lockForConfiguration()
if let fps = camera.activeDepthDataFormat?.videoSupportedFrameRateRanges.first?.minFrameDuration {
camera.activeVideoMinFrameDuration = fps
}
camera.unlockForConfiguration()
super.init()
synchronizer.setDelegate(self, queue: syncQueue)
}
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput data: AVCaptureSynchronizedDataCollection) {
guard let delegate = self.delegate else {
return
}
// Check to see if all the data is actually here
guard
let videoSync = data.synchronizedData(for: video) as? AVCaptureSynchronizedSampleBufferData,
!videoSync.sampleBufferWasDropped,
let depthSync = data.synchronizedData(for: depth) as? AVCaptureSynchronizedDepthData,
!depthSync.depthDataWasDropped
else {
return
}
// It's OK if the face isn't found.
let face: AVMetadataFaceObject?
if let metaSync = data.synchronizedData(for: meta) as? AVCaptureSynchronizedMetadataObjectData {
face = (metaSync.metadataObjects.first { $0 is AVMetadataFaceObject }) as? AVMetadataFaceObject
} else {
face = nil
}
// Convert Buffers to CIImage
let videoImage = convertVideoImage(fromBuffer: videoSync.sampleBuffer)
let depthImage = convertDepthImage(fromData: depthSync.depthData, andFace: face)
// Call Delegate
delegate.captureImages(video: videoImage, depth: depthImage, face: face)
}
fileprivate func convertVideoImage(fromBuffer sampleBuffer: CMSampleBuffer) -> CIImage {
// Convert from "CoreMovie?" to CIImage - fairly straight-forward
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let image = CIImage(cvPixelBuffer: pixelBuffer!)
return image
}
fileprivate func convertDepthImage(fromData depthData: AVDepthData, andFace face: AVMetadataFaceObject?) -> CIImage {
var convertedDepth: AVDepthData
// Convert 16-bif floats up to 32
if depthData.depthDataType != kCVPixelFormatType_DisparityFloat32 {
convertedDepth = depthData.converting(toDepthDataType: kCVPixelFormatType_DisparityFloat32)
} else {
convertedDepth = depthData
}
// Pixel buffer comes straight from depthData
let pixelBuffer = convertedDepth.depthDataMap
let image = CIImage(cvPixelBuffer: pixelBuffer)
return image
}
The original Video Looks like this: (For reference)
When the values are:
// Setup Depth Output
depth.isFilteringEnabled = false
depth.connection(with: .depthData)?.videoOrientation = .portrait
The Image looks like this: (you can see the closer jacket is white, the farther jacket is grey, and the distance is dark grey - as expected)
When the values are:
// Setup Depth Output
depth.isFilteringEnabled = true
depth.connection(with: .depthData)?.videoOrientation = .portrait
The image looks like this: (You can see the color values appear to be in the right places, but the shapes in the smoothing filter appear to be rotated)
When the values are:
// Setup Depth Output
depth.isFilteringEnabled = true
depth.connection(with: .depthData)?.videoOrientation = .landscapeRight
The image looks like this: (Both the colors and the shapes appear to be horizontal)
Am I doing something wrong to get these incorrect values?
I have tried re-ordering the code
// Setup Depth Output
depth.connection(with: .depthData)?.videoOrientation = .portrait
depth.isFilteringEnabled = true
But that does nothing.
I think this is an issue related to iOS 12, because I remember this working just fine under iOS 11 (although I don't have any images saved to prove it)
Any Help is appreciated, thanks!
Unlike the suggestion to review other answers on rotating the image after creation, which I found did not work, in the AVDepthData documentation, there is a method available that does the orientation correction for you.
The method is called: depthDataByApplyingExifOrientation: which returns an instance of AVDepthData with the orientation applied, ie. you can create your image in the correct orientation you desire by passing in the parameter of your choice.
This is my helper method that returns a UIImage with the orientation fix.
- (UIImage *)createDepthMapImageFromCapturePhoto:(AVCapturePhoto *)photo {
// AVCapturePhoto which has depthData - in swift you should confirm this exists
AVDepthData *frontDepthData = [photo depthData];
// Overwrite the instance with the correct orientation applied.
frontDepthData = [frontDepthData depthDataByApplyingExifOrientation:kCGImagePropertyOrientationRight];
// Create the CIImage from the depth data using the available method.
CIImage *ciDepthImage = [CIImage imageWithDepthData:frontDepthData];
// Create CIContext which enables converting CIImage to CGImage
CIContext *context = [[CIContext alloc] init];
// Create the CGImage
CGImageRef img = [context createCGImage:ciDepthImage fromRect:[ciDepthImage extent]];
// Create the final image.
UIImage *depthImage = [UIImage imageWithCGImage:img];
// Return the depth image.
return depthImage;
}

How to draw detected rectangle path on live camera feed using CAShapeLayer and UIBezeirPath

I am developing an application to detect rectangles in a live camera feed and highlight the detected rectangle. I did camera thing using AVFoundation and used below methods in order to do detect and highlight the detected rectangle.
var detector: CIDetector?;
override func viewDidLoad() {
super.viewDidLoad();
detector = self.prepareRectangleDetector();
}
func captureOutput(captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, fromConnection connection: AVCaptureConnection!) { // re check this method
// Need to shimmy this through type-hell
let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
// Force the type change - pass through opaque buffer
let opaqueBuffer = Unmanaged<CVImageBuffer>.passUnretained(imageBuffer!).toOpaque()
let pixelBuffer = Unmanaged<CVPixelBuffer>.fromOpaque(opaqueBuffer).takeUnretainedValue()
let sourceImage = CIImage(CVPixelBuffer: pixelBuffer, options: nil)
// Do some detection on the image
self.performRectangleDetection(sourceImage);
var outputImage = sourceImage
// Do some clipping
var drawFrame = outputImage.extent
let imageAR = drawFrame.width / drawFrame.height
let viewAR = videoDisplayViewBounds.width / videoDisplayViewBounds.height
if imageAR > viewAR {
drawFrame.origin.x += (drawFrame.width - drawFrame.height * viewAR) / 2.0
drawFrame.size.width = drawFrame.height / viewAR
} else {
drawFrame.origin.y += (drawFrame.height - drawFrame.width / viewAR) / 2.0
drawFrame.size.height = drawFrame.width / viewAR
}
//videoDisplayView is a GLKView which is used to display camera feed
videoDisplayView.bindDrawable()
if videoDisplayView.context != EAGLContext.currentContext() {
EAGLContext.setCurrentContext(videoDisplayView.context)
}
// clear eagl view to grey
glClearColor(0.5, 0.5, 0.5, 1.0);
glClear(0x00004000)
// set the blend mode to "source over" so that CI will use that
glEnable(0x0BE2);
glBlendFunc(1, 0x0303);
renderContext.drawImage(outputImage, inRect: videoDisplayViewBounds, fromRect: drawFrame);
videoDisplayView.display();
}
func prepareRectangleDetector() -> CIDetector {
let options: [String: AnyObject] = [CIDetectorAccuracy: CIDetectorAccuracyHigh];
return CIDetector(ofType: CIDetectorTypeRectangle, context: nil, options: options);
}
func performRectangleDetection(image: CIImage){
let resultImage: CIImage? = nil;
if let detector = detector {
// Get the detections
let features = detector.featuresInImage(image, options: [CIDetectorAspectRatio:NSNumber(float:1.43)]);
if features.count != 0{ // feature found
for feature in features as! [CIRectangleFeature] {
self.previewImageView.layer.sublayers = nil;
let line: CAShapeLayer = CAShapeLayer();
line.frame = self.videoDisplayView.bounds;
let linePath: UIBezierPath = UIBezierPath();
linePath.moveToPoint(feature.topLeft);
linePath.addLineToPoint(feature.topRight);
linePath.addLineToPoint(feature.bottomRight);
linePath.addLineToPoint(feature.bottomLeft);
linePath.addLineToPoint(feature.topLeft);
linePath.closePath();
line.lineWidth = 5.0;
line.path = linePath.CGPath;
line.fillColor = UIColor.clearColor().CGColor;
line.strokeColor = UIColor(netHex: 0x3399CC, alpha: 1.0).CGColor;
// videoDisplayParentView is the parent of videoDisplayView and they both have same bounds
self.videoDisplayParentView.layer.addSublayer(line);
}
}
}
}
I used CAShapeLayer and UIBezierPath to draw the rectangle. This is very very slow. Path gets visible after minutes.
Can Someone please help me to figure out why it is slow or let me know if I am doing something wrong here. Any help would be highly appreciated.
Or if there is some way easy than this I would like to know it too.
If you get into the business of adding a sublayer to a GLKView it will be slow. The GLKView here refreshes multiple times every second (as it is in captureOutput:didOutputSampleBuffer:.. method), the process of creating and adding the sublayer every time will not be able to keep up with.
A better way is to draw the path using CoreImage and compositing it over resultImage.

Resources