I have made an MLModel in CreateML that will detect hockey pucks in images. I use the camera on the phone to take a video, and while it is being recorded, I convert each frame to a CGImage and try to detect pucks in each frame.
At first when I received the memory crashes, I tried removing a trajectory detection I was running at the same time, however this made no change. When monitoring the memory usage during runtime, my app uses a small and consistent amount of memory; it is "Other processes" that goes over the limit, which is quite confusing. I also removed a for loop that filtered out objects with low confidence (below 0.5) but this does not have an effect either.
Being new to MLModel and machine learning, can anybody steer me in the right direction? Please let me know if any more details are needed, if I missed something.I will attach all of the code because it is only 100 lines or so, and it may be important for context. However, the initializeCaptureSession method and captureOutput method would probably be the ones to look at.
import UIKit
import AVFoundation
import ImageIO
import Vision
class ViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate, AVCaptureAudioDataOutputSampleBufferDelegate {
var cameraPreviewLayer: AVCaptureVideoPreviewLayer?
var camera: AVCaptureDevice?
var microphone: AVCaptureDevice?
let session = AVCaptureSession()
var videoDataOutput = AVCaptureVideoDataOutput()
var audioDataOutput = AVCaptureAudioDataOutput()
#IBOutlet var trajectoriesLabel: UILabel!
#IBOutlet var pucksLabel: UILabel!
override func viewDidLoad() {
super.viewDidLoad()
initializeCaptureSession()
// Do any additional setup after loading the view.
}
// Lazily create a single instance of VNDetectTrajectoriesRequest.
private lazy var request: VNDetectTrajectoriesRequest = {
request.objectMinimumNormalizedRadius = 0.0
request.objectMaximumNormalizedRadius = 0.5
return VNDetectTrajectoriesRequest(frameAnalysisSpacing: .zero, trajectoryLength: 10, completionHandler: completionHandler)
}()
// AVCaptureVideoDataOutputSampleBufferDelegate callback.
func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
// Process the results.
do {
let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer)
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else{
print("cannot make pixelbuffer for image conversion")
return
}
CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
let width = CVPixelBufferGetWidth(pixelBuffer)
let height = CVPixelBufferGetHeight(pixelBuffer)
let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedFirst.rawValue | CGBitmapInfo.byteOrder32Little.rawValue)
guard let context = CGContext(data: baseAddress, width: width, height: height, bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo.rawValue) else{
print("cannot make context for image conversion")
return
}
guard let cgImage = context.makeImage() else{
print("cannot make cgimage for image conversion")
return
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
let model = try VNCoreMLModel(for: PucksV7(configuration: MLModelConfiguration()).model)
let request = VNCoreMLRequest(model: model)
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
try? handler.perform([request])
guard let pucks = request.results as? [VNDetectedObjectObservation] else{
print("Could not convert detected pucks")
return
}
DispatchQueue.main.async {
self.pucksLabel.text = "Pucks: \(pucks.count)"
}
try requestHandler.perform([request])
} catch {
// Handle the error.
}
}
func completionHandler(request: VNRequest, error: Error?) {
//identify results
guard let observations = request.results as? [VNTrajectoryObservation] else { return }
// Process the results.
self.trajectoriesLabel.text = "Trajectories: \(observations.count)"
}
func initializeCaptureSession(){
session.sessionPreset = .hd1920x1080
camera = AVCaptureDevice.default(for: .video)
microphone = AVCaptureDevice.default(for: .audio)
do{
session.beginConfiguration()
//adding camera
let cameraCaptureInput = try AVCaptureDeviceInput(device: camera!)
if session.canAddInput(cameraCaptureInput){
session.addInput(cameraCaptureInput)
}
//output
let queue = DispatchQueue(label: "output")
if session.canAddOutput(videoDataOutput) {
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_32BGRA]
videoDataOutput.setSampleBufferDelegate(self, queue: queue)
session.addOutput(videoDataOutput)
}
let captureConnection = videoDataOutput.connection(with: .video)
// Always process the frames
captureConnection?.isEnabled = true
do {
try camera!.lockForConfiguration()
camera!.unlockForConfiguration()
} catch {
print(error)
}
session.commitConfiguration()
cameraPreviewLayer = AVCaptureVideoPreviewLayer(session: session)
cameraPreviewLayer?.videoGravity = .resizeAspectFill
cameraPreviewLayer?.frame = view.bounds
cameraPreviewLayer?.connection?.videoOrientation = .landscapeRight
view.layer.insertSublayer(cameraPreviewLayer!, at: 0)
DispatchQueue.global(qos: .background).async {
self.session.startRunning()
}
} catch {
print(error.localizedDescription)
}
}
}
Execution speed. You are dispatching threads faster than they can be processed.
In my experience, not on this platform, object detection using a cnn is not fast enough to process every frame from the camera in real-time at 30 fps.
With hardware acceleration, like the "Apple Neural Engine", it is possible (I have an FPGA on my desk that does this task in real time in "hardware" using 15 watts).
I would suggest processing every 50th frame and speed it up until it fails.
The other issue is image size. To be performant the image must be as small as possible and still detect the feature.
The larger the input image, the more convolution layers are required. Most models are in the smaller ranges like 200x200 pixels.
Related
I'm trying to captur image from camera preview but can't get image from preview layer. What I want to do is kinda similar to iOS 15 OCR mode in Photo app which processes image during camera preview, does not require user to take a shot nor start recording video, just process image in preview. I looked into docs and searched on net but could not find any useful info.
What I tried was, save previewLayer and call previewLayer.draw(in: context) periodically. But the image drawn in the context is blank. Now I wonder if it is possible first of all.
There might be some security issue there to restrict processing image in camera preview that only genuine app is allowed to access I guess, so I probably need to find other ways.
Please enlighten me if any workaround.
Thanks!
Ok. With MadProgrammer's help I got things working properly. Anurag Ajwani's site is very helpful.
Here is my simple snippet to capture video frames. You need to ensure permissions before CameraView gets instantiated.
class VideoCapture: NSObject, AVCaptureVideoDataOutputSampleBufferDelegate {
//private var previewLayer: AVCaptureVideoPreviewLayer? = nil
private var session: AVCaptureSession? = nil
private var videoOutput: AVCaptureVideoDataOutput? = nil
private var videoHandler: ((UIImage) -> Void)?
override init() {
super.init()
let deviceSession = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInDualWideCamera, .builtInWideAngleCamera], mediaType: .video, position: .back)
guard deviceSession.devices.count > 0 else { return }
if let input = try? AVCaptureDeviceInput(device: deviceSession.devices.first!) {
let session = AVCaptureSession()
session.addInput(input)
let videoOutput = AVCaptureVideoDataOutput()
videoOutput.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString): NSNumber(value: kCVPixelFormatType_32BGRA)] as [String:Any]
videoOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "my.image.handling.queue"))
videoOutput.alwaysDiscardsLateVideoFrames = true
if session.canAddOutput(videoOutput) {
session.sessionPreset = .high
session.addOutput(videoOutput)
self.videoOutput = videoOutput
}
for connection in videoOutput.connections {
if connection.isVideoOrientationSupported {
connection.videoOrientation = .portrait
}
}
session.commitConfiguration()
self.session = session
/*
self.previewLayer = AVCaptureVideoPreviewLayer(session: session)
if let previewLayer = self.previewLayer {
previewLayer.videoGravity = .resizeAspectFill
layer.insertSublayer(previewLayer, at: 0)
CameraPreviewView.initialized = true
}
*/
}
}
func startCapturing(_ videoHandler: #escaping (UIImage) -> Void) -> Void {
if let session = session {
session.startRunning()
}
self.videoHandler = videoHandler
}
// AVCaptureVideoDataOutputSampleBufferDelegate
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
debugPrint("unable to get video frame")
return
}
//print("got video frame")
if let videoHandler = self.videoHandler {
let rect = CGRect(x: 0, y: 0, width: CVPixelBufferGetWidth(imageBuffer), height: CVPixelBufferGetHeight(imageBuffer))
let ciImage = CIImage.init(cvImageBuffer: imageBuffer)
let ciContext = CIContext()
let cgImage = ciContext.createCGImage(ciImage, from: rect)
guard cgImage != nil else {return }
let uiImage = UIImage(cgImage: cgImage!)
videoHandler(uiImage)
}
}
}
struct CameraView: View {
#State var capturedVideo: UIImage? = nil
let videoCapture = VideoCapture()
var body: some View {
VStack {
ZStack(alignment: .center) {
if let capturedVideo = self.capturedVideo {
Image(uiImage: capturedVideo)
.resizable()
.scaledToFill()
}
}
}
.background(Color.black)
.onAppear {
self.videoCapture.startCapturing { uiImage in
self.capturedVideo = uiImage
}
}
}
I can't seem to find a way to not use the document scanner, and supplement it with AVFoundation instead. I'm trying to create a feature where the user can click a button, scan text, and then save that to some textview w/o having the user click the camera button, keep scan, save, etc.
I've got it to work with object detection, but I can't get it to work for text-recognition. So, is there any way to use Apple's vision framework for real-time text recognition? Any help would be much appreciated
For performance reasons, I'd prefer to not convert the CMSampleBuffer to a UIImage, and would instead use the following to create an AVCaptureVideoPreviewLayer for live video:
class CameraFeedView: UIView {
private var previewLayer: AVCaptureVideoPreviewLayer!
override class var layerClass: AnyClass {
return AVCaptureVideoPreviewLayer.self
}
init(frame: CGRect, session: AVCaptureSession, videoOrientation: AVCaptureVideoOrientation) {
super.init(frame: frame)
previewLayer = layer as? AVCaptureVideoPreviewLayer
previewLayer.session = session
previewLayer.videoGravity = .resizeAspect
previewLayer.connection?.videoOrientation = videoOrientation
}
required init?(coder: NSCoder) {
fatalError("init(coder:) has not been implemented")
}
}
Once you have this, you can work on the live video data using Vision:
class CameraViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
private let videoDataOutputQueue = DispatchQueue(label: "CameraFeedDataOutput", qos: .userInitiated,
attributes: [], autoreleaseFrequency: .workItem)
private var drawingView: UILabel = {
let view = UILabel(frame: UIScreen.main.bounds)
view.font = UIFont.boldSystemFont(ofSize: 30.0)
view.textColor = .red
view.translatesAutoresizingMaskIntoConstraints = false
return view
}()
private var cameraFeedSession: AVCaptureSession?
private var cameraFeedView: CameraFeedView! //Wrap
override func viewDidLoad() {
super.viewDidLoad()
do {
try setupAVSession()
} catch {
print("setup av session failed")
}
}
func setupAVSession() throws {
// Create device discovery session for a wide angle camera
let wideAngle = AVCaptureDevice.DeviceType.builtInWideAngleCamera
let discoverySession = AVCaptureDevice.DiscoverySession(deviceTypes: [wideAngle], mediaType: .video, position: .back)
// Select a video device, make an input
guard let videoDevice = discoverySession.devices.first else {
print("Could not find a wide angle camera device.")
}
guard let deviceInput = try? AVCaptureDeviceInput(device: videoDevice) else {
print("Could not create video device input.")
}
let session = AVCaptureSession()
session.beginConfiguration()
// We prefer a 1080p video capture but if camera cannot provide it then fall back to highest possible quality
if videoDevice.supportsSessionPreset(.hd1920x1080) {
session.sessionPreset = .hd1920x1080
} else {
session.sessionPreset = .high
}
// Add a video input
guard session.canAddInput(deviceInput) else {
print("Could not add video device input to the session")
}
session.addInput(deviceInput)
let dataOutput = AVCaptureVideoDataOutput()
if session.canAddOutput(dataOutput) {
session.addOutput(dataOutput)
// Add a video data output
dataOutput.alwaysDiscardsLateVideoFrames = true
dataOutput.videoSettings = [
String(kCVPixelBufferPixelFormatTypeKey): Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)
]
dataOutput.setSampleBufferDelegate(self, queue: videoDataOutputQueue)
} else {
print("Could not add video data output to the session")
}
let captureConnection = dataOutput.connection(with: .video)
captureConnection?.preferredVideoStabilizationMode = .standard
captureConnection?.videoOrientation = .portrait
// Always process the frames
captureConnection?.isEnabled = true
session.commitConfiguration()
cameraFeedSession = session
// Get the interface orientaion from window scene to set proper video orientation on capture connection.
let videoOrientation: AVCaptureVideoOrientation
switch view.window?.windowScene?.interfaceOrientation {
case .landscapeRight:
videoOrientation = .landscapeRight
default:
videoOrientation = .portrait
}
// Create and setup video feed view
cameraFeedView = CameraFeedView(frame: view.bounds, session: session, videoOrientation: videoOrientation)
setupVideoOutputView(cameraFeedView)
cameraFeedSession?.startRunning()
}
The key functions to implement once you've got an AVCaptureSession set up are the delegate and request handler:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer, orientation: .down)
let request = VNRecognizeTextRequest(completionHandler: textDetectHandler)
do {
// Perform the text-detection request.
try requestHandler.perform([request])
} catch {
print("Unable to perform the request: \(error).")
}
}
func textDetectHandler(request: VNRequest, error: Error?) {
guard let observations =
request.results as? [VNRecognizedTextObservation] else { return }
// Process each observation to find the recognized body pose points.
let recognizedStrings = observations.compactMap { observation in
// Return the string of the top VNRecognizedText instance.
return observation.topCandidates(1).first?.string
}
DispatchQueue.main.async {
self.drawingView.text = recognizedStrings.first
}
}
}
Note, you will probably want to process each of the recognizedStrings in order to choose the one with the highest confidence, but this is a proof of concept. You could also add a bounding box, and the docs have an example of that.
I need to detect real face from iPhone front camera. So I have used vision framework to achieve it. But it is detecting the face from static image (human photo) also which is not required. Here is my code snippet.
class ViewController {
func sessionPrepare() {
session = AVCaptureSession()
guard let session = session, let captureDevice = frontCamera else { return }
do {
let deviceInput = try AVCaptureDeviceInput(device: captureDevice)
session.beginConfiguration()
if session.canAddInput(deviceInput) {
session.addInput(deviceInput)
}
let output = AVCaptureVideoDataOutput()
output.videoSettings = [
String(kCVPixelBufferPixelFormatTypeKey) : Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)
]
output.alwaysDiscardsLateVideoFrames = true
if session.canAddOutput(output) {
session.addOutput(output)
}
session.commitConfiguration()
let queue = DispatchQueue(label: "output.queue")
output.setSampleBufferDelegate(self, queue: queue)
print("setup delegate")
} catch {
print("can't setup session")
}
}
}
}
It is also detecting face from a static image if I place it in front of camera.
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let attachments = CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate)
let ciImage = CIImage(cvImageBuffer: pixelBuffer!, options: attachments as! [String : Any]?)
let ciImageWithOrientation = ciImage.applyingOrientation(Int32(UIImageOrientation.leftMirrored.rawValue))
detectFace(on: ciImageWithOrientation)
}
}
func detectFace(on image: CIImage) {
try? faceDetectionRequest.perform([faceDetection], on: image)
if let results = faceDetection.results as? [VNFaceObservation] {
if !results.isEmpty {
faceLandmarks.inputFaceObservations = results
detectLandmarks(on: image)
DispatchQueue.main.async {
self.shapeLayer.sublayers?.removeAll()
}
}
}
}
func detectLandmarks(on image: CIImage) {
try? faceLandmarksDetectionRequest.perform([faceLandmarks], on: image)
if let landmarksResults = faceLandmarks.results as? [VNFaceObservation] {
for observation in landmarksResults {
DispatchQueue.main.async {
if let boundingBox = self.faceLandmarks.inputFaceObservations?.first?.boundingBox {
let faceBoundingBox = boundingBox.scaled(to: self.view.bounds.size)
//different types of landmarks
let faceContour = observation.landmarks?.faceContour
let leftEye = observation.landmarks?.leftEye
let rightEye = observation.landmarks?.rightEye
let nose = observation.landmarks?.nose
let lips = observation.landmarks?.innerLips
let leftEyebrow = observation.landmarks?.leftEyebrow
let rightEyebrow = observation.landmarks?.rightEyebrow
let noseCrest = observation.landmarks?.noseCrest
let outerLips = observation.landmarks?.outerLips
}
}
}
}
}
So is there any way to get it done using only from real time camera detection? I would be very grateful for your help and advice
I need to do the same and after a lot of experiments finally, I have found this
https://github.com/syaringan357/iOS-MobileFaceNet-MTCNN-FaceAntiSpoofing
It is detecting only Live camera faces. But it is not using Vision framework.
I am trying to process the realtime video form the iPhone camera by using the function in AVCaptureVideoDataOutputSampleBufferDelegate.
The video had been edited but the direction of the video is changed, and the proportion of the video is strange.
I use the following code to edit the video.
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view.
guard let captureDevice = AVCaptureDevice.default(for: .video) else { return }
guard let input = try? AVCaptureDeviceInput(device: captureDevice) else { return }
captureSession.addInput(input)
let dataOutput = AVCaptureVideoDataOutput()
dataOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "videoQueue"))
captureSession.addOutput(dataOutput)
let preview = AVCaptureVideoPreviewLayer(session: captureSession)
preview.frame = cview.frame
cview.layer.addSublayer(preview)
captureSession.startRunning()
}
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let cameraImage = CIImage(cvPixelBuffer: imageBuffer!)
let comicEffect = CIFilter(name: "CIComicEffect")
comicEffect!.setValue(cameraImage, forKey: kCIInputImageKey)
let filteredImage = UIImage(ciImage: comicEffect!.value(forKey: kCIOutputImageKey) as! CIImage!)
DispatchQueue.main.async {
self.image.image = filteredImage
}
}
And it returns the following output:
To make the picture easier to compare, I removed the comicEffect:
The correct proportion should be like:
May I know how should I solve this problem?
I am implementing a camera application. I initiate the camera as follows:
let input = try AVCaptureDeviceInput(device: captureDevice!)
captureSession = AVCaptureSession()
captureSession?.addInput(input)
videoPreviewLayer = AVCaptureVideoPreviewLayer(session: captureSession!)
videoPreviewLayer?.videoGravity = AVLayerVideoGravity.resizeAspectFill
videoPreviewLayer?.frame = view.layer.bounds
previewView.layer.insertSublayer(videoPreviewLayer!, at: 0)
Now I want to have a small rectangle on top of the preview layer. In that rectangle area, I want to zoom a specific area from the preview layer. To do it, I add a new UIView on top of other views, but I don't know how to display a specific area from the previewer (e.g. zoom factor = 2).
The following figure shows what I want to have:
How can I do it?
Finally, I found a solution.
The idea is to extract the real-time frames from the output of the camera, then use an UIImage view to show the enlarged frame. Following is the portion of code to add an video output:
let videoOutput = AVCaptureVideoDataOutput()
videoOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "sample buffer"))
guard captureSession.canAddOutput(videoOutput) else { return }
captureSession.addOutput(videoOutput)
and we need to implement a delegate function:
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
guard let uiImage = imageFromSampleBuffer(sampleBuffer: sampleBuffer) else { return }
DispatchQueue.main.async { [unowned self] in
self.delegate?.captured(image: uiImage)
}
}
private func imageFromSampleBuffer(sampleBuffer: CMSampleBuffer) -> UIImage? {
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return nil }
let ciImage = CIImage(cvPixelBuffer: imageBuffer)
guard let cgImage = context.createCGImage(ciImage, from: ciImage.extent) else { return nil }
return UIImage(cgImage: cgImage)
}
The code was taken from this article.