How Do I create a Simple Camera App in Swift Using CoreML that does not take Live Input? - ios

I have been trying to create a simple camera image recognition app in xcode with swift that allows a user to take a picture. The photo is then input into an already trained coreML model and the output with the predicted accuracy is output to a label.
I have searched multiple websites, and all I can find is tutorials such as
https://medium.freecodecamp.org/ios-coreml-vision-image-recognition-3619cf319d0b
that allow for the real time recognition of images. I do not want it to be real time but rather just allow someone to take a picture. I would like to know how to possibly transform this code in a way where it is not live input:
import UIKit
import AVFoundation
import Vision
class ViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate {
let label: UILabel = {
let label = UILabel()
label.textColor = .white
label.translatesAutoresizingMaskIntoConstraints = false
label.text = "Label"
label.font = label.font.withSize(30)
return label
}()
override func viewDidLoad() {
super.viewDidLoad()
// establish the capture session and add the label
setupCaptureSession()
view.addSubview(label)
setupLabel()
// Do any additional setup after loading the view, typically from a nib.
}
func setupCaptureSession() {
// create a new capture session
let captureSession = AVCaptureSession()
// find the available cameras
let availableDevices = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: AVMediaType.video, position: .back).devices
do {
// select a camera
if let captureDevice = availableDevices.first {
captureSession.addInput(try AVCaptureDeviceInput(device: captureDevice))
}
} catch {
// print an error if the camera is not available
print(error.localizedDescription)
}
// setup the video output to the screen and add output to our capture session
let captureOutput = AVCaptureVideoDataOutput()
captureSession.addOutput(captureOutput)
let previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
previewLayer.frame = view.frame
view.layer.addSublayer(previewLayer)
// buffer the video and start the capture session
captureOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "videoQueue"))
captureSession.startRunning()
}
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// load our CoreML Pokedex model
guard let model = try? VNCoreMLModel(for: aslModel().model) else { return }
// run an inference with CoreML
let request = VNCoreMLRequest(model: model) { (finishedRequest, error) in
// grab the inference results
guard let results = finishedRequest.results as? [VNClassificationObservation] else { return }
// grab the highest confidence result
guard let Observation = results.first else { return }
// create the label text components
let predclass = "\(Observation.identifier)"
let predconfidence = String(format: "%.02f%", Observation.confidence * 100)
// set the label text
DispatchQueue.main.async(execute: {
self.label.text = "\(predclass) \(predconfidence)"
})
}
// create a Core Video pixel buffer which is an image buffer that holds pixels in main memory
// Applications generating frames, compressing or decompressing video, or using Core Image
// can all make use of Core Video pixel buffers
guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
// execute the request
try? VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([request])
}
func setupLabel() {
// constrain the label in the center
label.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true
// constrain the the label to 50 pixels from the bottom
label.bottomAnchor.constraint(equalTo: view.bottomAnchor, constant: -50).isActive = true
}
override func didReceiveMemoryWarning() {
super.didReceiveMemoryWarning()
// Dispose of any resources that can be recreated.
}
}
Right now like stated before it takes in live image input.

I wrote a post on Medium about that, but it is in Portuguese. See if auto translation of Medium allows you to understand the post.
Swift + Core ML
I hope this can help you.

Related

Unwanted "smoothing" in AVDepthData on iPhone 13 (not evident in iPhone 12)

We are writing an app which analyzes a real world 3D data by using the TrueDepth camera on the front of an iPhone, and an AVCaptureSession configured to produce AVDepthData along with image data. This worked great on iPhone 12, but the same code on iPhone 13 produces an unwanted "smoothing" effect which makes the scene impossible to process and breaks our app. We are unable to find any information on this effect, from Apple or otherwise, much less how to avoid it, so we are asking you experts.
At the bottom of this post (Figure 3) is our code which configures the capture session, using an AVCaptureDataOutputSynchronizer, to produce frames of 640x480 image and depth data. I boiled it down as much as possible, sorry it's so long. The main two parts are the configure function, which sets up our capture session, and the dataOutputSynchronizer function, near the bottom, which fires when a sycned set of data is available. In the latter function I've included my code which extracts the information from the AVDepthData object, including looping through all 640x480 depth data points (in meters). I've excluded further processing for brevity (believe it or not :)).
On an iPhone 12 device, the PNG data and the depth data merge nicely. The front view and side view of the merged pointcloud are below (Figure 1) . The angles visible in the side view are due to the application of the focal length which "de-perspectives" the data and places them in their proper position in xyz space.
The same code on an iPhone 13 produces depth maps that result in point cloud further below (Figure 2 -- straight on view, angled view, and side view). There is no longer any clear distinction between objects and the background becasue the depth data appears to be "smoothed" between the mannequin and the background -- i.e., there are seven or eight points between the subject and background that are not realistic and make it impossible to do any meaningful processing such as segmenting the scene.
Has anyone else encountered this issue, or have any insight into how we might change our code to avoid it? Any help or ideas are MUCH appreciated, since this is a definite showstopper (we can't tell people to only run our App on older phones :)). Thank you!
Figure 1 -- Merged depth data and image into point cloud, from iPhone 12
Figure 2 -- Merged depth data and image into point cloud, from iPhone 13; unwanted smoothing effect visible
Figure 3 -- Our configuration code and capture handler; edited to remove downstream processing of captured data (which was basically formatting it into an XML file and uploading to the cloud)
import Foundation
import Combine
import AVFoundation
import Photos
import UIKit
import FirebaseStorage
public struct AlertError {
public var title: String = ""
public var message: String = ""
public var primaryButtonTitle = "Accept"
public var secondaryButtonTitle: String?
public var primaryAction: (() -> ())?
public var secondaryAction: (() -> ())?
public init(title: String = "", message: String = "", primaryButtonTitle: String = "Accept", secondaryButtonTitle: String? = nil, primaryAction: (() -> ())? = nil, secondaryAction: (() -> ())? = nil) {
self.title = title
self.message = message
self.primaryAction = primaryAction
self.primaryButtonTitle = primaryButtonTitle
self.secondaryAction = secondaryAction
}
}
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
//
//
// this is the CameraService class, which configures and runs a capture session
// which acquires syncronized image and depth data
// using an AVCaptureDataOutputSynchronizer
//
//
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
public class CameraService: NSObject,
AVCaptureVideoDataOutputSampleBufferDelegate,
AVCaptureDepthDataOutputDelegate,
AVCaptureDataOutputSynchronizerDelegate,
MyFirebaseProtocol,
ObservableObject{
#Published public var shouldShowAlertView = false
#Published public var shouldShowSpinner = false
public var labelStatus: String = "Ready"
var images: [UIImage?] = []
public var alertError: AlertError = AlertError()
public let session = AVCaptureSession()
var isSessionRunning = false
var isConfigured = false
var setupResult: SessionSetupResult = .success
private let sessionQueue = DispatchQueue(label: "session queue") // Communicate with the session and other session objects on this queue.
#objc dynamic var videoDeviceInput: AVCaptureDeviceInput!
private let videoDeviceDiscoverySession = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInTrueDepthCamera], mediaType: .video, position: .front)
var videoCaptureDevice : AVCaptureDevice? = nil
let videoDataOutput: AVCaptureVideoDataOutput = AVCaptureVideoDataOutput() // Define frame output.
let depthDataOutput = AVCaptureDepthDataOutput()
var outputSynchronizer: AVCaptureDataOutputSynchronizer? = nil
let dataOutputQueue = DispatchQueue(label: "video data queue", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem)
var scanStateCounter: Int = 0
var m_DepthDatasetsToUpload = [AVCaptureSynchronizedDepthData]()
var m_FrameBufferToUpload = [AVCaptureSynchronizedSampleBufferData]()
var firebaseDepthDatasetsArray: [String] = []
#Published var firebaseImageUploadCount = 0
#Published var firebaseTextFileUploadCount = 0
public func configure() {
/*
Setup the capture session.
In general, it's not safe to mutate an AVCaptureSession or any of its
inputs, outputs, or connections from multiple threads at the same time.
Don't perform these tasks on the main queue because
AVCaptureSession.startRunning() is a blocking call, which can
take a long time. Dispatch session setup to the sessionQueue, so
that the main queue isn't blocked, which keeps the UI responsive.
*/
sessionQueue.async {
self.configureSession()
}
}
// MARK: Checks for user's permisions
public func checkForPermissions() {
switch AVCaptureDevice.authorizationStatus(for: .video) {
case .authorized:
// The user has previously granted access to the camera.
break
case .notDetermined:
/*
The user has not yet been presented with the option to grant
video access. Suspend the session queue to delay session
setup until the access request has completed.
*/
sessionQueue.suspend()
AVCaptureDevice.requestAccess(for: .video, completionHandler: { granted in
if !granted {
self.setupResult = .notAuthorized
}
self.sessionQueue.resume()
})
default:
// The user has previously denied access.
setupResult = .notAuthorized
DispatchQueue.main.async {
self.alertError = AlertError(title: "Camera Access", message: "SwiftCamera doesn't have access to use your camera, please update your privacy settings.", primaryButtonTitle: "Settings", secondaryButtonTitle: nil, primaryAction: {
UIApplication.shared.open(URL(string: UIApplication.openSettingsURLString)!,
options: [:], completionHandler: nil)
}, secondaryAction: nil)
self.shouldShowAlertView = true
}
}
}
// MARK: Session Management
// Call this on the session queue.
/// - Tag: ConfigureSession
private func configureSession() {
if setupResult != .success {
return
}
session.beginConfiguration()
session.sessionPreset = AVCaptureSession.Preset.vga640x480
// Add video input.
do {
var defaultVideoDevice: AVCaptureDevice?
let frontCameraDevice = AVCaptureDevice.default(.builtInTrueDepthCamera, for: .video, position: .front)
// If the rear wide angle camera isn't available, default to the front wide angle camera.
defaultVideoDevice = frontCameraDevice
videoCaptureDevice = defaultVideoDevice
guard let videoDevice = defaultVideoDevice else {
print("Default video device is unavailable.")
setupResult = .configurationFailed
session.commitConfiguration()
return
}
let videoDeviceInput = try AVCaptureDeviceInput(device: videoDevice)
if session.canAddInput(videoDeviceInput) {
session.addInput(videoDeviceInput)
self.videoDeviceInput = videoDeviceInput
} else if session.inputs.isEmpty == false {
self.videoDeviceInput = videoDeviceInput
} else {
print("Couldn't add video device input to the session.")
setupResult = .configurationFailed
session.commitConfiguration()
return
}
} catch {
print("Couldn't create video device input: \(error)")
setupResult = .configurationFailed
session.commitConfiguration()
return
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// MARK: add video output to session
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
videoDataOutput.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString) : NSNumber(value: kCVPixelFormatType_32BGRA)] as [String : Any]
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "camera_frame_processing_queue"))
if session.canAddOutput(self.videoDataOutput) {
session.addOutput(self.videoDataOutput)
} else if session.outputs.contains(videoDataOutput) {
} else {
print("Couldn't create video device output")
setupResult = .configurationFailed
session.commitConfiguration()
return
}
guard let connection = self.videoDataOutput.connection(with: AVMediaType.video),
connection.isVideoOrientationSupported else { return }
connection.videoOrientation = .portrait
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// MARK: add depth output to session
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Add a depth data output
if session.canAddOutput(depthDataOutput) {
session.addOutput(depthDataOutput)
depthDataOutput.isFilteringEnabled = false
//depthDataOutput.setDelegate(T##delegate: AVCaptureDepthDataOutputDelegate?##AVCaptureDepthDataOutputDelegate?, callbackQueue: <#T##DispatchQueue?#>)
depthDataOutput.setDelegate(self, callbackQueue: DispatchQueue(label: "depth_frame_processing_queue"))
if let connection = depthDataOutput.connection(with: .depthData) {
connection.isEnabled = true
} else {
print("No AVCaptureConnection")
}
} else if session.outputs.contains(depthDataOutput){
} else {
print("Could not add depth data output to the session")
session.commitConfiguration()
return
}
// Search for highest resolution with half-point depth values
let depthFormats = videoCaptureDevice!.activeFormat.supportedDepthDataFormats
let filtered = depthFormats.filter({
CMFormatDescriptionGetMediaSubType($0.formatDescription) == kCVPixelFormatType_DepthFloat16
})
let selectedFormat = filtered.max(by: {
first, second in CMVideoFormatDescriptionGetDimensions(first.formatDescription).width < CMVideoFormatDescriptionGetDimensions(second.formatDescription).width
})
do {
try videoCaptureDevice!.lockForConfiguration()
videoCaptureDevice!.activeDepthDataFormat = selectedFormat
videoCaptureDevice!.unlockForConfiguration()
} catch {
print("Could not lock device for configuration: \(error)")
session.commitConfiguration()
return
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Use an AVCaptureDataOutputSynchronizer to synchronize the video data and depth data outputs.
// The first output in the dataOutputs array, in this case the AVCaptureVideoDataOutput, is the "master" output.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
outputSynchronizer = AVCaptureDataOutputSynchronizer(dataOutputs: [videoDataOutput, depthDataOutput])
outputSynchronizer!.setDelegate(self, queue: dataOutputQueue)
session.commitConfiguration()
self.isConfigured = true
//self.start()
}
// MARK: Device Configuration
/// - Tag: Stop capture session
public func stop(completion: (() -> ())? = nil) {
sessionQueue.async {
//print("entered stop")
if self.isSessionRunning {
//print(self.setupResult)
if self.setupResult == .success {
//print("entered success")
DispatchQueue.main.async{
self.session.stopRunning()
self.isSessionRunning = self.session.isRunning
if !self.session.isRunning {
DispatchQueue.main.async {
completion?()
}
}
}
}
}
}
}
/// - Tag: Start capture session
public func start() {
// We use our capture session queue to ensure our UI runs smoothly on the main thread.
sessionQueue.async {
if !self.isSessionRunning && self.isConfigured {
switch self.setupResult {
case .success:
self.session.startRunning()
self.isSessionRunning = self.session.isRunning
if self.session.isRunning {
}
case .configurationFailed, .notAuthorized:
print("Application not authorized to use camera")
DispatchQueue.main.async {
self.alertError = AlertError(title: "Camera Error", message: "Camera configuration failed. Either your device camera is not available or its missing permissions", primaryButtonTitle: "Accept", secondaryButtonTitle: nil, primaryAction: nil, secondaryAction: nil)
self.shouldShowAlertView = true
}
}
}
}
}
// ------------------------------------------------------------------------
// MARK: CAPTURE HANDLERS
// ------------------------------------------------------------------------
public func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
//printWithTime("Capture")
guard let syncedDepthData: AVCaptureSynchronizedDepthData =
synchronizedDataCollection.synchronizedData(for: depthDataOutput) as? AVCaptureSynchronizedDepthData else {
return
}
guard let syncedVideoData: AVCaptureSynchronizedSampleBufferData =
synchronizedDataCollection.synchronizedData(for: videoDataOutput) as? AVCaptureSynchronizedSampleBufferData else {
return
}
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
//
//
// Below is the code that extracts the information from depth data
// The depth data is 640x480, which matches the size of the synchronized image
// I save this info to a file, upload it to the cloud, and merge it with the image
// on a PC to create a pointcloud
//
//
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
let depth_data : AVDepthData = syncedDepthData.depthData
let cvpixelbuffer : CVPixelBuffer = depth_data.depthDataMap
let height : Int = CVPixelBufferGetHeight(cvpixelbuffer)
let width : Int = CVPixelBufferGetWidth(cvpixelbuffer)
let quality : AVDepthData.Quality = depth_data.depthDataQuality
let accuracy : AVDepthData.Accuracy = depth_data.depthDataAccuracy
let pixelsize : Float = depth_data.cameraCalibrationData!.pixelSize
let camcaldata : AVCameraCalibrationData = depth_data.cameraCalibrationData!
let intmat : matrix_float3x3 = camcaldata.intrinsicMatrix
let cal_lensdistort_x : CGFloat = camcaldata.lensDistortionCenter.x
let cal_lensdistort_y : CGFloat = camcaldata.lensDistortionCenter.y
let cal_matrix_width : CGFloat = camcaldata.intrinsicMatrixReferenceDimensions.width
let cal_matrix_height : CGFloat = camcaldata.intrinsicMatrixReferenceDimensions.height
let intrinsics_fx : Float = camcaldata.intrinsicMatrix.columns.0.x
let intrinsics_fy : Float = camcaldata.intrinsicMatrix.columns.1.y
let intrinsics_ox : Float = camcaldata.intrinsicMatrix.columns.2.x
let intrinsics_oy : Float = camcaldata.intrinsicMatrix.columns.2.y
let pixelformattype : OSType = CVPixelBufferGetPixelFormatType(cvpixelbuffer)
CVPixelBufferLockBaseAddress(cvpixelbuffer, CVPixelBufferLockFlags(rawValue: 0))
let int16Buffer = unsafeBitCast(CVPixelBufferGetBaseAddress(cvpixelbuffer), to: UnsafeMutablePointer<Float16>.self)
let int16PerRow = CVPixelBufferGetBytesPerRow(cvpixelbuffer) / 2
for x in 0...height-1
{
for y in 0...width-1
{
let luma = int16Buffer[x * int16PerRow + y]
/////////////////////////
// SAVE DEPTH VALUE 'luma' to FILE FOR PROCESSING
}
}
CVPixelBufferUnlockBaseAddress(cvpixelbuffer, CVPixelBufferLockFlags(rawValue: 0))
}

AVCaptureVideoDataOutputSampleBufferDelegate drop frames using CIFilters for video filtering

I have very strange case where AVCaptureVideoDataOutputSampleBufferDelegate drops frames if I use 13 different filter chains. Let me explain:
I have CameraController setup, nothing special, here is my delegate method:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if !paused {
if connection.output?.connection(with: .audio) == nil {
//capture video
// my try to avoid "Out of buffers error", no luck ;(
lastCapturedBuffer = nil
let err = CMSampleBufferCreateCopy(allocator: kCFAllocatorDefault, sampleBuffer: sampleBuffer, sampleBufferOut: &lastCapturedBuffer)
if err == noErr {
}
connection.videoOrientation = .portrait
// getting image
let pixelBuffer = CMSampleBufferGetImageBuffer(lastCapturedBuffer!)
// remove if any
CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
// captured - is just ciimage property
captured = CIImage(cvPixelBuffer: pixelBuffer!)
//remove if any
CVPixelBufferUnlockBaseAddress(pixelBuffer!,CVPixelBufferLockFlags(rawValue: 0))
//CVPixelBufferUnlockBaseAddress(pixelBuffer!, .readOnly)
// transform image to targer resolution
let srcWidth = CGFloat(captured.extent.width)
let srcHeight = CGFloat(captured.extent.height)
let dstWidth: CGFloat = ConstantsManager.shared.k_video_width
let dstHeight: CGFloat = ConstantsManager.shared.k_video_height
let scaleX = dstWidth / srcWidth
let scaleY = dstHeight / srcHeight
var transform = CGAffineTransform.init(scaleX: scaleX, y: scaleY)
captured = captured.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: dstWidth, height: dstHeight))
// mirror for front camera
if front {
var t = CGAffineTransform.init(scaleX: -1, y: 1)
t = t.translatedBy(x: -ConstantsManager.shared.k_video_width, y: 0)
captured = captured.transformed(by: t)
}
// video capture logic
let writable = canWrite()
if writable,
sessionAtSourceTime == nil {
sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(lastCapturedBuffer!)
videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
}
if writable, (videoWriterInput.isReadyForMoreMediaData) {
videoWriterInput.append(lastCapturedBuffer!)
}
// apply effect in realtime <- here is problem. If I comment next line, it will be fixed but effect will n't be applied
captured = FilterManager.shared.applyFilterForCamera(inputImage: captured)
// current frame in case user wants to save image as photo
self.capturedPhoto = captured
// sent frame to Camcoder view controller
self.delegate?.didCapturedFrame(frame: captured)
} else {
// capture sound
let writable = canWrite()
if writable, (audioWriterInput.isReadyForMoreMediaData) {
//print("write audio buffer")
audioWriterInput?.append(lastCapturedBuffer!)
}
}
} else {
// paused
}
}
I also implemented didDrop delegate method, here is how I figure out why it drops frames:
func captureOutput(_ output: AVCaptureOutput, didDrop sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
print("did drop")
var mode: CMAttachmentMode = 0
let reason = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_DroppedFrameReason, attachmentModeOut: &mode)
print("reason \(String(describing: reason))") // Optional(OutOfBuffers)
}
So I did it like a pro and just commented parts of code to find where is the problem. So, it here:
captured = FilterManager.shared.applyFilterForCamera(inputImage: captured)
FilterManager - is singleton, here is called func:
func applyFilterForCamera(inputImage: CIImage) -> CIImage {
return currentVsFilter!.apply(sourceImage: inputImage)
}
currentVsFilter is object of VSFilter type - here is example of one:
import Foundation
import AVKit
class TestFilter: CustomFilter {
let _name = "Тестовый Фильтр"
let _displayName = "Test Filter"
var tempImage: CIImage?
var final: CGImage?
override func name() -> String {
return _name
}
override func displayName() -> String {
return _displayName
}
override init() {
super.init()
print("Test Filter init")
// setup my custom kernel filter
self.noise.type = GlitchFilter.GlitchType.allCases[2]
}
// this returns composition for playback using AVPlayer
override func composition(asset: AVAsset) -> AVMutableVideoComposition {
let composition = AVMutableVideoComposition(asset: asset, applyingCIFiltersWithHandler: { request in
let inputImage = request.sourceImage.cropped(to: request.sourceImage.extent)
DispatchQueue.global(qos: .userInitiated).async {
let output = self.apply(sourceImage: inputImage, forComposition: true)
request.finish(with: output, context: nil)
}
})
let size = FilterManager.shared.cropRectForOrientation().size
composition.renderSize = size
return composition
}
// this returns actual filtered CIImage, used for both AVPlayer composition and realtime camera
override func apply(sourceImage: CIImage, forComposition: Bool = false) -> CIImage {
// rendered text
tempImage = FilterManager.shared.textRenderedImage()
// some filters chained one by one
self.screenBlend?.setValue(tempImage, forKey: kCIInputImageKey)
self.screenBlend?.setValue(sourceImage, forKey: kCIInputBackgroundImageKey)
self.noise.inputImage = self.screenBlend?.outputImage
self.noise.inputAmount = CGFloat.random(in: 1.0...3.0)
// result
tempImage = self.noise.outputImage
// correct crop
let rect = forComposition ? FilterManager.shared.cropRectForOrientation() : FilterManager.shared.cropRect
final = self.context.createCGImage(tempImage!, from: rect!)
return CIImage(cgImage: final!)
}
}
And now, the most strange thing, I have 30 VSFilters and when I got to 13(switching one by one by UIButton) I got error "Out of Buffer", this one:
kCMSampleBufferDroppedFrameReason_OutOfBuffers
What I tested:
I changed vsFilters order in filters array inside FilterManager singleton - same
I tried switch from first to 12 one by one, then go back - works, but after I switched to 13tn(of 30th from 0) - bug
Looks like it can handle only 12 VSFIlter objects, like if it retains them somehow or maybe it's related to threading, I don't know.
This app made for iOs devices, tested on iPhone X iOs 13.3.1
This is video editor app to apply different effects to both live stream from camera and video files from camera roll
Maybe someone has experience with this?
Have a great day
Best, Victor
Edit 1. If I reinit cameraController(AVCaptureSession. input/output devices) it works but this is ugly option and it adds lag when switching filters
Ok, so I finally won this battle. In case some one else get this "OutOfBuffer" problem, here is my solution
As I figured out, CIFilter grabs CVPixelBuffer and don't release it while filtering images. It's kinda creates one huge buffer, I guess. Strange thing: it don't create memory leak, so I guess it grabs not particular buffer but creates strong reference to it. As rumors(me) say, it can handle only 12 such references.
So, my approach was to copy CVPixelBuffer and then work with it instead of buffer I got from AVCaptureVideoDataOutputSampleBufferDelegate didOutput func
Here is my new code:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if !paused {
//print("camera controller \(id) got frame")
if connection.output?.connection(with: .audio) == nil {
//capture video
connection.videoOrientation = .portrait
// getting image
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
// this works!
let copyBuffer = pixelBuffer.copy()
// captured - is just ciimage property
captured = CIImage(cvPixelBuffer: copyBuffer)
//remove if any
// transform image to targer resolution
let srcWidth = CGFloat(captured.extent.width)
let srcHeight = CGFloat(captured.extent.height)
let dstWidth: CGFloat = ConstantsManager.shared.k_video_width
let dstHeight: CGFloat = ConstantsManager.shared.k_video_height
let scaleX = dstWidth / srcWidth
let scaleY = dstHeight / srcHeight
var transform = CGAffineTransform.init(scaleX: scaleX, y: scaleY)
captured = captured.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: dstWidth, height: dstHeight))
// mirror for front camera
if front {
var t = CGAffineTransform.init(scaleX: -1, y: 1)
t = t.translatedBy(x: -ConstantsManager.shared.k_video_width, y: 0)
captured = captured.transformed(by: t)
}
// video capture logic
let writable = canWrite()
if writable,
sessionAtSourceTime == nil {
sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
}
if writable, (videoWriterInput.isReadyForMoreMediaData) {
videoWriterInput.append(sampleBuffer)
}
self.captured = FilterManager.shared.applyFilterForCamera(inputImage: self.captured)
// current frame in case user wants to save image as photo
self.capturedPhoto = captured
// sent frame to Camcoder view controller
self.delegate?.didCapturedFrame(frame: captured)
} else {
// capture sound
let writable = canWrite()
if writable, (audioWriterInput.isReadyForMoreMediaData) {
//print("write audio buffer")
audioWriterInput?.append(sampleBuffer)
}
}
} else {
// paused
//print("paused camera controller \(id)")
}
}
and there is func to copy buffer:
func copy() -> CVPixelBuffer {
precondition(CFGetTypeID(self) == CVPixelBufferGetTypeID(), "copy() cannot be called on a non-CVPixelBuffer")
var _copy : CVPixelBuffer?
CVPixelBufferCreate(
kCFAllocatorDefault,
CVPixelBufferGetWidth(self),
CVPixelBufferGetHeight(self),
CVPixelBufferGetPixelFormatType(self),
nil,
&_copy)
guard let copy = _copy else { fatalError() }
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
CVPixelBufferLockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
let copyBaseAddress = CVPixelBufferGetBaseAddress(copy)
let currBaseAddress = CVPixelBufferGetBaseAddress(self)
print("copy data size: \(CVPixelBufferGetDataSize(copy))")
print("self data size: \(CVPixelBufferGetDataSize(self))")
memcpy(copyBaseAddress, currBaseAddress, CVPixelBufferGetDataSize(copy))
//memcpy(copyBaseAddress, currBaseAddress, CVPixelBufferGetDataSize(self) * 2)
CVPixelBufferUnlockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
return copy
}
I used it as extension
I hope, this will help anyone with similar problem
Best, Victor

slow frame rate when rendering cifiltered ciimage and MTKView while using face detection (Vision and CIDetection)

I have an app which does real time filtering on camera feed, i'm getting each frame from camera and then do some filtering using CIFilter and then pass the final frame(CIImage) to MTKView to be shown on my swiftUI view, it works fine, but when i want to do face/body detection, real time, on camera feed, frame rate goes down to 8 frames per second and super laggy.
i tried anything i could find on the internet, using vision, CIDetector, CoreML, everything is the same result, well, i would do this on global thread, which makes the UI responsive but the feed which i'm showing into the main view is still laggy, but things like scrollview are working fine.
so i tried to change the view from MTKView to UIImageView, Xcode shows its rendering at 120FPS (which i dont understand why, its 30FPS when not using any face detection) but the feed is still laggy, cannot keep up somehow to the output frame rate, i'm new to this, i dont understand why is it like that.
i also tried just to pass the coming image to MTKView (without any filtering in between, with face detection) also the same laggy result, without face detection, it goes to 30FPS (why not 120?).
this is the code i'm using for converting sampleBuffer to ciImage
extension CICameraCapture: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
var ciImage = CIImage(cvImageBuffer: imageBuffer)
if self.cameraPosition == AVCaptureDevice.Position.front {
ciImage = ciImage.oriented(.downMirrored)
}
ciImage = ciImage.transformed(by: CGAffineTransform(rotationAngle: 3 * .pi / 2))
ciImage = ciImage.transformToOrigin(withSize: ciImage.extent.size)
detectFace(image: ciImage) // this is for detecting face realtime, i have done it in vision
//and also cidetector - cidetector is a little bit faster when setted to low accuracy
//but still not desired result(frame rate)
DispatchQueue.main.async {
self.callback(ciImage)
}
}
}
and this is the MTKView code, which is very simple and basic implementation of it:
import MetalKit
import CoreImage
class MetalRenderView: MTKView {
//var textureCache: CVMetalTextureCache?
override init(frame frameRect: CGRect, device: MTLDevice?) {
super.init(frame: frameRect, device: device)
if super.device == nil {
fatalError("No support for Metal. Sorry")
}
framebufferOnly = false
preferredFramesPerSecond = 120
sampleCount = 2
}
required init(coder: NSCoder) {
fatalError("init(coder:) has not been implemented")
}
private lazy var commandQueue: MTLCommandQueue? = {
[unowned self] in
return self.device!.makeCommandQueue()
}()
private lazy var ciContext: CIContext = {
[unowned self] in
return CIContext(mtlDevice: self.device!)
}()
var image: CIImage? {
didSet {
renderImage()
}
}
private func renderImage() {
guard var image = image else { return }
image = image.transformToOrigin(withSize: drawableSize) // this is an extension to resize
//the image to the render size so i dont get the render error while rendering a frame
let commandBuffer = commandQueue?.makeCommandBuffer()
let destination = CIRenderDestination(width: Int(drawableSize.width),
height: Int(drawableSize.height),
pixelFormat: .bgra8Unorm,
commandBuffer: commandBuffer) { () -> MTLTexture in
return self.currentDrawable!.texture
}
try! ciContext.startTask(toRender: image, to: destination)
commandBuffer?.present(currentDrawable!)
commandBuffer?.commit()
draw()
}
}
and here is the code for face detection using CIDetector:
func detectFace (image: CIImage){
//DispatchQueue.global().async {
let options = [CIDetectorAccuracy: CIDetectorAccuracyHigh,
CIDetectorSmile: true, CIDetectorTypeFace: true] as [String : Any]
let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil,
options: options)!
let faces = faceDetector.features(in: image)
if let face = faces.first as? CIFaceFeature {
AppState.shared.mouth = face.mouthPosition
AppState.shared.leftEye = face.leftEyePosition
AppState.shared.rightEye = face.rightEyePosition
}
//}
}
what I have tried
1) different face detection methods, using Vision, CIDetector and also CoreML(this one not very deeply as i dont have experience in it)
I would get the detection info, but frame rate is 8 or at the best case its 15 (which would be a delayed detection)
2) I've read somewhere that it might be result of the image colorsapce so i have tried different video setting and different rendering colorspace, still no change in the frame rate.
3) I'm somehow sure that it might be regarding to pixelbuffer release time, so i deep copied the imageBuffer and pass it to the detection, beside some memory issues it went up to 15 FPS, but still not minimum 30FPS. in here i also tried to convert imageBuffer to ciimage and then render ciimage to cgimage and the back to ciimage to just release the buffer, but also could not get more than 15FPS (well on average, sometimes goes to 17 or 19, but still laggy)
i'm new in this and still trying to figure it out, i would appreciate any suggestions, samples or tips that could direct me to a better path of solving this.
update
this is the camera capture setup code:
class CICameraCapture: NSObject {
typealias Callback = (CIImage?) -> ()
private var cameraPosition = AVCaptureDevice.Position.front
var ciContext: CIContext?
let callback: Callback
private let session = AVCaptureSession()
private let sampleBufferQueue = DispatchQueue(label: "buffer", qos: .userInitiated)//, attributes: [], autoreleaseFrequency: .workItem)
// face detection
//private var sequenceHandler = VNSequenceRequestHandler()
//var request: VNCoreMLRequest!
//var visionModel: VNCoreMLModel!
//let detectionQ = DispatchQueue(label: "detectionQ", qos: .background)//, attributes: [], autoreleaseFrequency: .workItem)
init(callback: #escaping Callback) {
self.callback = callback
super.init()
prepareSession()
ciContext = CIContext(mtlDevice: MTLCreateSystemDefaultDevice()!)
}
func start() {
session.startRunning()
}
func stop() {
session.stopRunning()
}
private func prepareSession() {
session.sessionPreset = .high //.hd1920x1080
let cameraDiscovery = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInDualCamera, .builtInWideAngleCamera], mediaType: .video, position: cameraPosition)
guard let camera = cameraDiscovery.devices.first else { fatalError("Can't get hold of the camera") }
//try! camera.lockForConfiguration()
//camera.activeVideoMinFrameDuration = camera.formats[0].videoSupportedFrameRateRanges[0].minFrameDuration
//camera.activeVideoMaxFrameDuration = camera.formats[0].videoSupportedFrameRateRanges[0].maxFrameDuration
//camera.unlockForConfiguration()
guard let input = try? AVCaptureDeviceInput(device: camera) else { fatalError("Can't get hold of the camera") }
session.addInput(input)
let output = AVCaptureVideoDataOutput()
output.videoSettings = [:]
//print(output.videoSettings.description)
//[875704438, 875704422, 1111970369]
//output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32BGRA)]
output.setSampleBufferDelegate(self, queue: sampleBufferQueue)
session.addOutput(output)
session.commitConfiguration()
}
}

Filtering Depth Data on iOS 12 appears to be rotated

I am having an issue where the Depth Data for the .builtInDualCamera appears to be rotated 90 degrees when isFilteringEnabled = true
Here is my code:
fileprivate let session = AVCaptureSession()
fileprivate let meta = AVCaptureMetadataOutput()
fileprivate let video = AVCaptureVideoDataOutput()
fileprivate let depth = AVCaptureDepthDataOutput()
fileprivate let camera: AVCaptureDevice
fileprivate let input: AVCaptureDeviceInput
fileprivate let synchronizer: AVCaptureDataOutputSynchronizer
init(delegate: CaptureSessionDelegate?) throws {
self.delegate = delegate
session.sessionPreset = .vga640x480
// Setup Camera Input
let discovery = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInDualCamera], mediaType: .video, position: .unspecified)
if let device = discovery.devices.first {
camera = device
} else {
throw SessionError.CameraNotAvailable("Unable to load camera")
}
input = try AVCaptureDeviceInput(device: camera)
session.addInput(input)
// Setup Metadata Output (Face)
session.addOutput(meta)
if meta.availableMetadataObjectTypes.contains(AVMetadataObject.ObjectType.face) {
meta.metadataObjectTypes = [ AVMetadataObject.ObjectType.face ]
} else {
print("Can't Setup Metadata: \(meta.availableMetadataObjectTypes)")
}
// Setup Video Output
video.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_32BGRA]
session.addOutput(video)
video.connection(with: .video)?.videoOrientation = .portrait
// ****** THE ISSUE IS WITH THIS BLOCK HERE ******
// Setup Depth Output
depth.isFilteringEnabled = true
session.addOutput(depth)
depth.connection(with: .depthData)?.videoOrientation = .portrait
// Setup Synchronizer
synchronizer = AVCaptureDataOutputSynchronizer(dataOutputs: [depth, video, meta])
let outputRect = CGRect(x: 0, y: 0, width: 1, height: 1)
let videoRect = video.outputRectConverted(fromMetadataOutputRect: outputRect)
let depthRect = depth.outputRectConverted(fromMetadataOutputRect: outputRect)
// Ratio of the Depth to Video
scale = max(videoRect.width, videoRect.height) / max(depthRect.width, depthRect.height)
// Set Camera to the framerate of the Depth Data Collection
try camera.lockForConfiguration()
if let fps = camera.activeDepthDataFormat?.videoSupportedFrameRateRanges.first?.minFrameDuration {
camera.activeVideoMinFrameDuration = fps
}
camera.unlockForConfiguration()
super.init()
synchronizer.setDelegate(self, queue: syncQueue)
}
func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput data: AVCaptureSynchronizedDataCollection) {
guard let delegate = self.delegate else {
return
}
// Check to see if all the data is actually here
guard
let videoSync = data.synchronizedData(for: video) as? AVCaptureSynchronizedSampleBufferData,
!videoSync.sampleBufferWasDropped,
let depthSync = data.synchronizedData(for: depth) as? AVCaptureSynchronizedDepthData,
!depthSync.depthDataWasDropped
else {
return
}
// It's OK if the face isn't found.
let face: AVMetadataFaceObject?
if let metaSync = data.synchronizedData(for: meta) as? AVCaptureSynchronizedMetadataObjectData {
face = (metaSync.metadataObjects.first { $0 is AVMetadataFaceObject }) as? AVMetadataFaceObject
} else {
face = nil
}
// Convert Buffers to CIImage
let videoImage = convertVideoImage(fromBuffer: videoSync.sampleBuffer)
let depthImage = convertDepthImage(fromData: depthSync.depthData, andFace: face)
// Call Delegate
delegate.captureImages(video: videoImage, depth: depthImage, face: face)
}
fileprivate func convertVideoImage(fromBuffer sampleBuffer: CMSampleBuffer) -> CIImage {
// Convert from "CoreMovie?" to CIImage - fairly straight-forward
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let image = CIImage(cvPixelBuffer: pixelBuffer!)
return image
}
fileprivate func convertDepthImage(fromData depthData: AVDepthData, andFace face: AVMetadataFaceObject?) -> CIImage {
var convertedDepth: AVDepthData
// Convert 16-bif floats up to 32
if depthData.depthDataType != kCVPixelFormatType_DisparityFloat32 {
convertedDepth = depthData.converting(toDepthDataType: kCVPixelFormatType_DisparityFloat32)
} else {
convertedDepth = depthData
}
// Pixel buffer comes straight from depthData
let pixelBuffer = convertedDepth.depthDataMap
let image = CIImage(cvPixelBuffer: pixelBuffer)
return image
}
The original Video Looks like this: (For reference)
When the values are:
// Setup Depth Output
depth.isFilteringEnabled = false
depth.connection(with: .depthData)?.videoOrientation = .portrait
The Image looks like this: (you can see the closer jacket is white, the farther jacket is grey, and the distance is dark grey - as expected)
When the values are:
// Setup Depth Output
depth.isFilteringEnabled = true
depth.connection(with: .depthData)?.videoOrientation = .portrait
The image looks like this: (You can see the color values appear to be in the right places, but the shapes in the smoothing filter appear to be rotated)
When the values are:
// Setup Depth Output
depth.isFilteringEnabled = true
depth.connection(with: .depthData)?.videoOrientation = .landscapeRight
The image looks like this: (Both the colors and the shapes appear to be horizontal)
Am I doing something wrong to get these incorrect values?
I have tried re-ordering the code
// Setup Depth Output
depth.connection(with: .depthData)?.videoOrientation = .portrait
depth.isFilteringEnabled = true
But that does nothing.
I think this is an issue related to iOS 12, because I remember this working just fine under iOS 11 (although I don't have any images saved to prove it)
Any Help is appreciated, thanks!
Unlike the suggestion to review other answers on rotating the image after creation, which I found did not work, in the AVDepthData documentation, there is a method available that does the orientation correction for you.
The method is called: depthDataByApplyingExifOrientation: which returns an instance of AVDepthData with the orientation applied, ie. you can create your image in the correct orientation you desire by passing in the parameter of your choice.
This is my helper method that returns a UIImage with the orientation fix.
- (UIImage *)createDepthMapImageFromCapturePhoto:(AVCapturePhoto *)photo {
// AVCapturePhoto which has depthData - in swift you should confirm this exists
AVDepthData *frontDepthData = [photo depthData];
// Overwrite the instance with the correct orientation applied.
frontDepthData = [frontDepthData depthDataByApplyingExifOrientation:kCGImagePropertyOrientationRight];
// Create the CIImage from the depth data using the available method.
CIImage *ciDepthImage = [CIImage imageWithDepthData:frontDepthData];
// Create CIContext which enables converting CIImage to CGImage
CIContext *context = [[CIContext alloc] init];
// Create the CGImage
CGImageRef img = [context createCGImage:ciDepthImage fromRect:[ciDepthImage extent]];
// Create the final image.
UIImage *depthImage = [UIImage imageWithCGImage:img];
// Return the depth image.
return depthImage;
}

Face Detection with Camera

How can I do face detection in realtime just as "Camera" does?
I noticed that AVCaptureStillImageOutput is deprecated after 10.0, so I use
AVCapturePhotoOutput instead. However, I found that the image I saved for facial detection is not so satisfied? Any ideas?
UPDATE
After giving a try of #Shravya Boggarapu mentioned. Currently, I use AVCaptureMetadataOutput to detect the face without CIFaceDetector. It works as expected. However, when I'm trying to draw bounds of the face, it seems mislocated. Any idea?
let metaDataOutput = AVCaptureMetadataOutput()
captureSession.sessionPreset = AVCaptureSessionPresetPhoto
let backCamera = AVCaptureDevice.defaultDevice(withDeviceType: .builtInWideAngleCamera, mediaType: AVMediaTypeVideo, position: .back)
do {
let input = try AVCaptureDeviceInput(device: backCamera)
if (captureSession.canAddInput(input)) {
captureSession.addInput(input)
// MetadataOutput instead
if(captureSession.canAddOutput(metaDataOutput)) {
captureSession.addOutput(metaDataOutput)
metaDataOutput.setMetadataObjectsDelegate(self, queue: DispatchQueue.main)
metaDataOutput.metadataObjectTypes = [AVMetadataObjectTypeFace]
previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
previewLayer?.frame = cameraView.bounds
previewLayer?.videoGravity = AVLayerVideoGravityResizeAspectFill
cameraView.layer.addSublayer(previewLayer!)
captureSession.startRunning()
}
}
} catch {
print(error.localizedDescription)
}
and
extension CameraViewController: AVCaptureMetadataOutputObjectsDelegate {
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputMetadataObjects metadataObjects: [Any]!, from connection: AVCaptureConnection!) {
if findFaceControl {
findFaceControl = false
for metadataObject in metadataObjects {
if (metadataObject as AnyObject).type == AVMetadataObjectTypeFace {
print("😇😍😎")
print(metadataObject)
let bounds = (metadataObject as! AVMetadataFaceObject).bounds
print("origin x: \(bounds.origin.x)")
print("origin y: \(bounds.origin.y)")
print("size width: \(bounds.size.width)")
print("size height: \(bounds.size.height)")
print("cameraView width: \(self.cameraView.frame.width)")
print("cameraView height: \(self.cameraView.frame.height)")
var face = CGRect()
face.origin.x = bounds.origin.x * self.cameraView.frame.width
face.origin.y = bounds.origin.y * self.cameraView.frame.height
face.size.width = bounds.size.width * self.cameraView.frame.width
face.size.height = bounds.size.height * self.cameraView.frame.height
print(face)
showBounds(at: face)
}
}
}
}
}
Original
see in Github
var captureSession = AVCaptureSession()
var photoOutput = AVCapturePhotoOutput()
var previewLayer: AVCaptureVideoPreviewLayer?
override func viewWillAppear(_ animated: Bool) {
super.viewWillAppear(true)
captureSession.sessionPreset = AVCaptureSessionPresetHigh
let backCamera = AVCaptureDevice.defaultDevice(withMediaType: AVMediaTypeVideo)
do {
let input = try AVCaptureDeviceInput(device: backCamera)
if (captureSession.canAddInput(input)) {
captureSession.addInput(input)
if(captureSession.canAddOutput(photoOutput)){
captureSession.addOutput(photoOutput)
captureSession.startRunning()
previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
previewLayer?.videoGravity = AVLayerVideoGravityResizeAspectFill
previewLayer?.frame = cameraView.bounds
cameraView.layer.addSublayer(previewLayer!)
}
}
} catch {
print(error.localizedDescription)
}
}
func captureImage() {
let settings = AVCapturePhotoSettings()
let previewPixelType = settings.availablePreviewPhotoPixelFormatTypes.first!
let previewFormat = [kCVPixelBufferPixelFormatTypeKey as String: previewPixelType
]
settings.previewPhotoFormat = previewFormat
photoOutput.capturePhoto(with: settings, delegate: self)
}
func capture(_ captureOutput: AVCapturePhotoOutput, didFinishProcessingPhotoSampleBuffer photoSampleBuffer: CMSampleBuffer?, previewPhotoSampleBuffer: CMSampleBuffer?, resolvedSettings: AVCaptureResolvedPhotoSettings, bracketSettings: AVCaptureBracketedStillImageSettings?, error: Error?) {
if let error = error {
print(error.localizedDescription)
}
// Not include previewPhotoSampleBuffer
if let sampleBuffer = photoSampleBuffer,
let dataImage = AVCapturePhotoOutput.jpegPhotoDataRepresentation(forJPEGSampleBuffer: sampleBuffer, previewPhotoSampleBuffer: nil) {
self.imageView.image = UIImage(data: dataImage)
self.imageView.isHidden = false
self.previewLayer?.isHidden = true
self.findFace(img: self.imageView.image!)
}
}
The findFace works with normal image. However, the image I capture via camera will not work or sometimes only recognize one face.
Normal Image
Capture Image
func findFace(img: UIImage) {
guard let faceImage = CIImage(image: img) else { return }
let accuracy = [CIDetectorAccuracy: CIDetectorAccuracyHigh]
let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil, options: accuracy)
// For converting the Core Image Coordinates to UIView Coordinates
let detectedImageSize = faceImage.extent.size
var transform = CGAffineTransform(scaleX: 1, y: -1)
transform = transform.translatedBy(x: 0, y: -detectedImageSize.height)
if let faces = faceDetector?.features(in: faceImage, options: [CIDetectorSmile: true, CIDetectorEyeBlink: true]) {
for face in faces as! [CIFaceFeature] {
// Apply the transform to convert the coordinates
var faceViewBounds = face.bounds.applying(transform)
// Calculate the actual position and size of the rectangle in the image view
let viewSize = imageView.bounds.size
let scale = min(viewSize.width / detectedImageSize.width,
viewSize.height / detectedImageSize.height)
let offsetX = (viewSize.width - detectedImageSize.width * scale) / 2
let offsetY = (viewSize.height - detectedImageSize.height * scale) / 2
faceViewBounds = faceViewBounds.applying(CGAffineTransform(scaleX: scale, y: scale))
print("faceBounds = \(faceViewBounds)")
faceViewBounds.origin.x += offsetX
faceViewBounds.origin.y += offsetY
showBounds(at: faceViewBounds)
}
if faces.count != 0 {
print("Number of faces: \(faces.count)")
} else {
print("No faces 😢")
}
}
}
func showBounds(at bounds: CGRect) {
let indicator = UIView(frame: bounds)
indicator.frame = bounds
indicator.layer.borderWidth = 3
indicator.layer.borderColor = UIColor.red.cgColor
indicator.backgroundColor = .clear
self.imageView.addSubview(indicator)
faceBoxes.append(indicator)
}
There are two ways to detect faces: CIFaceDetector and AVCaptureMetadataOutput. Depending on your requirements, choose what is relevant for you.
CIFaceDetector has more features, it gives you the location of the eyes and mouth, a smile detector, etc.
On the other hand, AVCaptureMetadataOutput is computed on the frames and the detected faces are tracked and there is no extra code to be added by us. I find that, because of tracking. faces are detected more reliably in this process. The downside of this is that you will simply detect faces, no the position of the eyes or mouth.
Another advantage of this method is that orientation issues are smaller as you can use videoOrientation whenever the device orientation changes and the orientation of the faces will relative to that orientation.
In my case, my application uses YUV420 as the required format so using CIDetector (which works with RGB) in real-time was not viable. Using AVCaptureMetadataOutput saved a lot of effort and performed more reliably due to continuous tracking.
Once I had the bounding box for the faces, I coded extra features, such as skin detection and applied it on the still image.
Note: When you capture a still image, the face box information is added along with the metadata so there are no sync issues.
You can also use a combination of the two to get better results.
Explore and evaluate the pros and cons as per your application.
The face rectangle is wrt image origin. So, for the screen, it may be different.
Use:
for (AVMetadataFaceObject *faceFeatures in metadataObjects) {
CGRect face = faceFeatures.bounds;
CGRect facePreviewBounds = CGRectMake(face.origin.y * previewLayerRect.size.width,
face.origin.x * previewLayerRect.size.height,
face.size.width * previewLayerRect.size.height,
face.size.height * previewLayerRect.size.width);
/* Draw rectangle facePreviewBounds on screen */
}
To perform face detection on iOS, there are either CIDetector (Apple)
or Mobile Vision (Google) API.
IMO, Google Mobile Vision provides better performance.
If you are interested, here is the project you can play with. (iOS 10.2, Swift 3)
After WWDC 2017, Apple introduces CoreML in iOS 11.
The Vision framework makes the face detection more accurate :)
I've made a Demo Project. containing Vision v.s. CIDetector. Also, it contains face landmarks detection in real time.
A bit late, but here it is the solution for the coordinates problem. There is a method you can call on the preview layer to transform the metadata object to your coordinate system: transformedMetadataObject(for: metadataObject).
guard let transformedObject = previewLayer.transformedMetadataObject(for: metadataObject) else {
continue
}
let bounds = transformedObject.bounds
showBounds(at: bounds)
Source: https://developer.apple.com/documentation/avfoundation/avcapturevideopreviewlayer/1623501-transformedmetadataobjectformeta
By the way, in case you are using (or upgrade your project to) Swift 4, the delegate method of AVCaptureMetadataOutputsObject has change into:
func metadataOutput(_ output: AVCaptureMetadataOutput, didOutput metadataObjects: [AVMetadataObject], from connection: AVCaptureConnection)
Kind regards
extension CameraViewController: AVCaptureMetadataOutputObjectsDelegate {
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputMetadataObjects metadataObjects: [Any]!, from connection: AVCaptureConnection!) {
if findFaceControl {
findFaceControl = false
let faces = metadata.flatMap { $0 as? AVMetadataFaceObject } .flatMap { (face) -> CGRect in
guard let localizedFace =
previewLayer?.transformedMetadataObject(for: face) else { return nil }
return localizedFace.bounds }
for face in faces {
let temp = UIView(frame: face)
temp.layer.borderColor = UIColor.white
temp.layer.borderWidth = 2.0
view.addSubview(view: temp)
}
}
}
}
Be sure to remove the views created by didOutputMetadataObjects.
Keeping track of the active facial ids is the best way to do this ^
Also when you're trying to find the location of faces for your preview layer, it is much easier to use facial data and transform. Also I think CIDetector is junk, metadataoutput will use hardware stuff for face detection making it really fast.
Create CaptureSession
For AVCaptureVideoDataOutput create following settings
output.videoSettings = [ kCVPixelBufferPixelFormatTypeKey as AnyHashable: Int(kCMPixelFormat_32BGRA) ]
3.When you receive CMSampleBuffer, create image
DispatchQueue.main.async {
let sampleImg = self.imageFromSampleBuffer(sampleBuffer: sampleBuffer)
self.imageView.image = sampleImg
}
func imageFromSampleBuffer(sampleBuffer : CMSampleBuffer) -> UIImage
{
// Get a CMSampleBuffer's Core Video image buffer for the media data
let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
// Lock the base address of the pixel buffer
CVPixelBufferLockBaseAddress(imageBuffer!, CVPixelBufferLockFlags.readOnly);
// Get the number of bytes per row for the pixel buffer
let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer!);
// Get the number of bytes per row for the pixel buffer
let bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer!);
// Get the pixel buffer width and height
let width = CVPixelBufferGetWidth(imageBuffer!);
let height = CVPixelBufferGetHeight(imageBuffer!);
// Create a device-dependent RGB color space
let colorSpace = CGColorSpaceCreateDeviceRGB();
// Create a bitmap graphics context with the sample buffer data
var bitmapInfo: UInt32 = CGBitmapInfo.byteOrder32Little.rawValue
bitmapInfo |= CGImageAlphaInfo.premultipliedFirst.rawValue & CGBitmapInfo.alphaInfoMask.rawValue
//let bitmapInfo: UInt32 = CGBitmapInfo.alphaInfoMask.rawValue
let context = CGContext.init(data: baseAddress, width: width, height: height, bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo)
// Create a Quartz image from the pixel data in the bitmap graphics context
let quartzImage = context?.makeImage();
// Unlock the pixel buffer
CVPixelBufferUnlockBaseAddress(imageBuffer!, CVPixelBufferLockFlags.readOnly);
// Create an image object from the Quartz image
let image = UIImage.init(cgImage: quartzImage!);
return (image);
}
By looking at your code I detected 2 things that could lead to wrong/poor face detection.
One of them is the face detector features options where you are filtering the results by [CIDetectorSmile: true, CIDetectorEyeBlink: true]. Try to set it to nil: faceDetector?.features(in: faceImage, options: nil)
Another guess I have is the result image orientation. I noticed you use AVCapturePhotoOutput.jpegPhotoDataRepresentation method to generate the source image for the detection and the system, by default, it generates that image with a specific orientation, of type Left/LandscapeLeft, I think. So, basically you can tell the face detector to have that in mind by using the CIDetectorImageOrientation key.
CIDetectorImageOrientation: the value for this key is an integer NSNumber from 1..8 such as that found in kCGImagePropertyOrientation. If present, the detection will be done based on that orientation but the coordinates in the returned features will still be based on those of the image.
Try to set it like faceDetector?.features(in: faceImage, options: [CIDetectorImageOrientation: 8 /*Left, bottom*/]).

Resources