I am implementing a camera application. I initiate the camera as follows:
let input = try AVCaptureDeviceInput(device: captureDevice!)
captureSession = AVCaptureSession()
videoPreviewLayer = AVCaptureVideoPreviewLayer(session: captureSession!)
videoPreviewLayer?.videoGravity = AVLayerVideoGravity.resizeAspectFill
videoPreviewLayer?.frame = view.layer.bounds
previewView.layer.insertSublayer(videoPreviewLayer!, at: 0)
Now I want to have a small rectangle on top of the preview layer. In that rectangle area, I want to zoom a specific area from the preview layer. To do it, I add a new UIView on top of other views, but I don't know how to display a specific area from the previewer (e.g. zoom factor = 2).
The following figure shows what I want to have:
How can I do it?
Finally, I found a solution.
The idea is to extract the real-time frames from the output of the camera, then use an UIImage view to show the enlarged frame. Following is the portion of code to add an video output:
let videoOutput = AVCaptureVideoDataOutput()
videoOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "sample buffer"))
guard captureSession.canAddOutput(videoOutput) else { return }
and we need to implement a delegate function:
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
guard let uiImage = imageFromSampleBuffer(sampleBuffer: sampleBuffer) else { return }
DispatchQueue.main.async { [unowned self] in
self.delegate?.captured(image: uiImage)
private func imageFromSampleBuffer(sampleBuffer: CMSampleBuffer) -> UIImage? {
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return nil }
let ciImage = CIImage(cvPixelBuffer: imageBuffer)
guard let cgImage = context.createCGImage(ciImage, from: ciImage.extent) else { return nil }
return UIImage(cgImage: cgImage)
The code was taken from this article.
I have made an MLModel in CreateML that will detect hockey pucks in images. I use the camera on the phone to take a video, and while it is being recorded, I convert each frame to a CGImage and try to detect pucks in each frame.
At first when I received the memory crashes, I tried removing a trajectory detection I was running at the same time, however this made no change. When monitoring the memory usage during runtime, my app uses a small and consistent amount of memory; it is "Other processes" that goes over the limit, which is quite confusing. I also removed a for loop that filtered out objects with low confidence (below 0.5) but this does not have an effect either.
Being new to MLModel and machine learning, can anybody steer me in the right direction? Please let me know if any more details are needed, if I missed something.I will attach all of the code because it is only 100 lines or so, and it may be important for context. However, the initializeCaptureSession method and captureOutput method would probably be the ones to look at.
import UIKit
import AVFoundation
import ImageIO
import Vision
class ViewController: UIViewController, AVCaptureVideoDataOutputSampleBufferDelegate, AVCaptureAudioDataOutputSampleBufferDelegate {
var cameraPreviewLayer: AVCaptureVideoPreviewLayer?
var camera: AVCaptureDevice?
var microphone: AVCaptureDevice?
let session = AVCaptureSession()
var videoDataOutput = AVCaptureVideoDataOutput()
var audioDataOutput = AVCaptureAudioDataOutput()
#IBOutlet var trajectoriesLabel: UILabel!
#IBOutlet var pucksLabel: UILabel!
override func viewDidLoad() {
// Do any additional setup after loading the view.
// Lazily create a single instance of VNDetectTrajectoriesRequest.
private lazy var request: VNDetectTrajectoriesRequest = {
request.objectMinimumNormalizedRadius = 0.0
request.objectMaximumNormalizedRadius = 0.5
return VNDetectTrajectoriesRequest(frameAnalysisSpacing: .zero, trajectoryLength: 10, completionHandler: completionHandler)
// AVCaptureVideoDataOutputSampleBufferDelegate callback.
func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
// Process the results.
do {
let requestHandler = VNImageRequestHandler(cmSampleBuffer: sampleBuffer)
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else{
print("cannot make pixelbuffer for image conversion")
CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
let width = CVPixelBufferGetWidth(pixelBuffer)
let height = CVPixelBufferGetHeight(pixelBuffer)
let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedFirst.rawValue | CGBitmapInfo.byteOrder32Little.rawValue)
guard let context = CGContext(data: baseAddress, width: width, height: height, bitsPerComponent: 8, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo.rawValue) else{
print("cannot make context for image conversion")
guard let cgImage = context.makeImage() else{
print("cannot make cgimage for image conversion")
CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
let model = try VNCoreMLModel(for: PucksV7(configuration: MLModelConfiguration()).model)
let request = VNCoreMLRequest(model: model)
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
try? handler.perform([request])
guard let pucks = request.results as? [VNDetectedObjectObservation] else{
print("Could not convert detected pucks")
DispatchQueue.main.async {
self.pucksLabel.text = "Pucks: \(pucks.count)"
try requestHandler.perform([request])
} catch {
// Handle the error.
func completionHandler(request: VNRequest, error: Error?) {
//identify results
guard let observations = request.results as? [VNTrajectoryObservation] else { return }
// Process the results.
self.trajectoriesLabel.text = "Trajectories: \(observations.count)"
func initializeCaptureSession(){
session.sessionPreset = .hd1920x1080
camera = AVCaptureDevice.default(for: .video)
microphone = AVCaptureDevice.default(for: .audio)
//adding camera
let cameraCaptureInput = try AVCaptureDeviceInput(device: camera!)
if session.canAddInput(cameraCaptureInput){
let queue = DispatchQueue(label: "output")
if session.canAddOutput(videoDataOutput) {
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: kCVPixelFormatType_32BGRA]
videoDataOutput.setSampleBufferDelegate(self, queue: queue)
let captureConnection = videoDataOutput.connection(with: .video)
// Always process the frames
captureConnection?.isEnabled = true
do {
try camera!.lockForConfiguration()
} catch {
cameraPreviewLayer = AVCaptureVideoPreviewLayer(session: session)
cameraPreviewLayer?.videoGravity = .resizeAspectFill
cameraPreviewLayer?.frame = view.bounds
cameraPreviewLayer?.connection?.videoOrientation = .landscapeRight
view.layer.insertSublayer(cameraPreviewLayer!, at: 0)
DispatchQueue.global(qos: .background).async {
} catch {
Execution speed. You are dispatching threads faster than they can be processed.
In my experience, not on this platform, object detection using a cnn is not fast enough to process every frame from the camera in real-time at 30 fps.
With hardware acceleration, like the "Apple Neural Engine", it is possible (I have an FPGA on my desk that does this task in real time in "hardware" using 15 watts).
I would suggest processing every 50th frame and speed it up until it fails.
The other issue is image size. To be performant the image must be as small as possible and still detect the feature.
The larger the input image, the more convolution layers are required. Most models are in the smaller ranges like 200x200 pixels.
I'm trying to captur image from camera preview but can't get image from preview layer. What I want to do is kinda similar to iOS 15 OCR mode in Photo app which processes image during camera preview, does not require user to take a shot nor start recording video, just process image in preview. I looked into docs and searched on net but could not find any useful info.
What I tried was, save previewLayer and call previewLayer.draw(in: context) periodically. But the image drawn in the context is blank. Now I wonder if it is possible first of all.
There might be some security issue there to restrict processing image in camera preview that only genuine app is allowed to access I guess, so I probably need to find other ways.
Please enlighten me if any workaround.
Ok. With MadProgrammer's help I got things working properly. Anurag Ajwani's site is very helpful.
Here is my simple snippet to capture video frames. You need to ensure permissions before CameraView gets instantiated.
class VideoCapture: NSObject, AVCaptureVideoDataOutputSampleBufferDelegate {
//private var previewLayer: AVCaptureVideoPreviewLayer? = nil
private var session: AVCaptureSession? = nil
private var videoOutput: AVCaptureVideoDataOutput? = nil
private var videoHandler: ((UIImage) -> Void)?
override init() {
let deviceSession = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInDualWideCamera, .builtInWideAngleCamera], mediaType: .video, position: .back)
guard deviceSession.devices.count > 0 else { return }
if let input = try? AVCaptureDeviceInput(device: deviceSession.devices.first!) {
let session = AVCaptureSession()
let videoOutput = AVCaptureVideoDataOutput()
videoOutput.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString): NSNumber(value: kCVPixelFormatType_32BGRA)] as [String:Any]
videoOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "my.image.handling.queue"))
videoOutput.alwaysDiscardsLateVideoFrames = true
if session.canAddOutput(videoOutput) {
session.sessionPreset = .high
self.videoOutput = videoOutput
for connection in videoOutput.connections {
if connection.isVideoOrientationSupported {
connection.videoOrientation = .portrait
self.session = session
self.previewLayer = AVCaptureVideoPreviewLayer(session: session)
if let previewLayer = self.previewLayer {
previewLayer.videoGravity = .resizeAspectFill
layer.insertSublayer(previewLayer, at: 0)
CameraPreviewView.initialized = true
func startCapturing(_ videoHandler: #escaping (UIImage) -> Void) -> Void {
if let session = session {
self.videoHandler = videoHandler
// AVCaptureVideoDataOutputSampleBufferDelegate
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
debugPrint("unable to get video frame")
//print("got video frame")
if let videoHandler = self.videoHandler {
let rect = CGRect(x: 0, y: 0, width: CVPixelBufferGetWidth(imageBuffer), height: CVPixelBufferGetHeight(imageBuffer))
let ciImage = CIImage.init(cvImageBuffer: imageBuffer)
let ciContext = CIContext()
let cgImage = ciContext.createCGImage(ciImage, from: rect)
guard cgImage != nil else {return }
let uiImage = UIImage(cgImage: cgImage!)
struct CameraView: View {
#State var capturedVideo: UIImage? = nil
let videoCapture = VideoCapture()
var body: some View {
VStack {
ZStack(alignment: .center) {
if let capturedVideo = self.capturedVideo {
Image(uiImage: capturedVideo)
.onAppear {
self.videoCapture.startCapturing { uiImage in
self.capturedVideo = uiImage
I am trying to process the realtime video form the iPhone camera by using the function in AVCaptureVideoDataOutputSampleBufferDelegate.
The video had been edited but the direction of the video is changed, and the proportion of the video is strange.
I use the following code to edit the video.
override func viewDidLoad() {
// Do any additional setup after loading the view.
guard let captureDevice = AVCaptureDevice.default(for: .video) else { return }
guard let input = try? AVCaptureDeviceInput(device: captureDevice) else { return }
let dataOutput = AVCaptureVideoDataOutput()
dataOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "videoQueue"))
let preview = AVCaptureVideoPreviewLayer(session: captureSession)
preview.frame = cview.frame
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
let cameraImage = CIImage(cvPixelBuffer: imageBuffer!)
let comicEffect = CIFilter(name: "CIComicEffect")
comicEffect!.setValue(cameraImage, forKey: kCIInputImageKey)
let filteredImage = UIImage(ciImage: comicEffect!.value(forKey: kCIOutputImageKey) as! CIImage!)
DispatchQueue.main.async {
self.image.image = filteredImage
And it returns the following output:
To make the picture easier to compare, I removed the comicEffect:
The correct proportion should be like:
May I know how should I solve this problem?
I am creating a custom camera with filters. When I add the following line it crashes without showing any exception.
//Setting video output
func setupBuffer() {
videoBuffer = AVCaptureVideoDataOutput()
videoBuffer?.alwaysDiscardsLateVideoFrames = true
videoBuffer?.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString): NSNumber(value: kCVPixelFormatType_32RGBA)]
videoBuffer?.setSampleBufferDelegate(self, queue: DispatchQueue.main)
public func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!) {
if connection.videoOrientation != .portrait {
connection.videoOrientation = .portrait
guard let image = GMVUtility.sampleBufferTo32RGBA(sampleBuffer) else {
print("No Image 😂")
pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
ciImage = CIImage(cvImageBuffer: pixelBuffer!, options: CMCopyDictionaryOfAttachments(kCFAllocatorDefault, sampleBuffer, kCMAttachmentMode_ShouldPropagate)as! [String : Any]?)
CameraView.filter = CIFilter(name: "CIPhotoEffectProcess")
CameraView.filter?.setValue(ciImage, forKey: kCIInputImageKey)
let cgimg = CameraView.context.createCGImage(CameraView.filter!.outputImage!, from: ciImage.extent)
DispatchQueue.main.async {
self.preview.image = UIImage(cgImage: cgimg!)
But it's crashing on -
guard let image = GMVUtility.sampleBufferTo32RGBA(sampleBuffer) else {
print("No Image 😂")
When I pass image which is created from CIImage, it doesn't recognize the face in the image.
Complete code file is https://www.dropbox.com/s/y1ewd1sh18h3ezj/CameraView.swift.zip?dl=0
1) Create separate queue for buffer.
fileprivate var videoDataOutputQueue = DispatchQueue(label: "VideoDataOutputQueue")
2) Setup buffer with this
let videoBuffer = AVCaptureVideoDataOutput()
videoBuffer?.alwaysDiscardsLateVideoFrames = true
videoBuffer?.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString): NSNumber(value: kCVPixelFormatType_32BGRA)]
videoBuffer?.setSampleBufferDelegate(self, queue: videoDataOutputQueue ) //
I am using swift3 and can't change my resolution to custom values when i use AVCaptureSessionPresetMedium etc. it doesn't fit the screen scale(1/1.77).
let output = AVCaptureVideoDataOutput()
output.setSampleBufferDelegate(self, queue: sampleQueue)
let metaOutput = AVCaptureMetadataOutput()
metaOutput.setMetadataObjectsDelegate(self, queue: faceQueue)
// Desired resolution : 720x1280px
// session.sessionPreset = AVCaptureSessionPresetMedium;
if session.canAddInput(input) {
if session.canAddOutput(output) {
output .alwaysDiscardsLateVideoFrames = true;
connection1 = output.connection(withMediaType: AVMediaTypeVideo)
connection1?.preferredVideoStabilizationMode = AVCaptureVideoStabilizationMode.auto;
connection1?.videoOrientation = .portrait
connection1?.isVideoMirrored = true;
if session.canAddOutput(metaOutput) {
output .alwaysDiscardsLateVideoFrames = true;
connection2 = metaOutput.connection(withMediaType: AVMediaTypeMetadata)
connection2?.preferredVideoStabilizationMode = AVCaptureVideoStabilizationMode.auto;
connection2?.videoOrientation = .portrait
connection2?.isVideoMirrored = true
You should use the AVCaptureSessionPreset1280x720 preset. The presets are denoted in landscape but the capture setting of 1280x720 is the same as 720x1280 the only difference is the orientation. For example with an app that supports rotation:
func captureOutput(_ captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, from connection: AVCaptureConnection!)
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
cameraImage = CIImage(cvPixelBuffer: pixelBuffer)
print(cameraImage?.extent ?? "")
Will print (0.0, 0.0, 1280.0, 720.0) when in landscape, and (0.0, 0.0, 720.0, 1280.0) when in portrait.