I'm trying to align the screenshots emitted by RPScreenRecorder's startCapture method to logs saved elsewhere in my code.
I was hoping that I could just match CMSampleBuffer's presentationTimeStamp to the timestamp reported by CMClockGetHostTimeClock(), but that doesn't seem to be true.
I've created a small sample project to demonstrate my problem (available on Github), but here's the relevant code:
To show the current time, I'm updating a label with the current value of CMClockGetTime(CMClockGetHostTimeClock()) when CADisplayLink fires:
override func viewDidLoad() {
// ...
displayLink = CADisplayLink(target: self, selector: #selector(displayLinkDidFire))
displayLink?.add(to: .main, forMode: .common)
private func displayLinkDidFire(_ displayLink: CADisplayLink) {
timestampLabel.text = String(format: "%.3f", CMClockGetTime(CMClockGetHostTimeClock()).seconds)
And here is where I'm saving RPScreenRecorder's buffers to disk.
Each filename is the buffer's presentationTimeStamp in seconds, truncated to milliseconds:
RPScreenRecorder.shared().startCapture(handler: { buffer, bufferType, error in
switch bufferType {
case .video:
guard let imageBuffer = buffer.imageBuffer else {
CVPixelBufferLockBaseAddress(imageBuffer, .readOnly) // Do I need this?
autoreleasepool {
let ciImage = CIImage(cvImageBuffer: imageBuffer)
let uiImage = UIImage(ciImage: ciImage)
let data = uiImage.jpegData(compressionQuality: 0.5)
let filename = String(format: "%.3f", buffer.presentationTimeStamp.seconds)
let url = Self.screenshotDirectoryURL.appendingPathComponent(filename)
FileManager.default.createFile(atPath: url.path, contents: data)
CVPixelBufferUnlockBaseAddress(imageBuffer, .readOnly)
The result is a collection of screenshots like this:
I'd expect each screenshot's filename to match the timestamp visible in the screenshot, or at least be off by some consistent duration. Instead, I'm seeing variable differences which seem to get worse over time. More confusing, I also sometimes get duplicates of the same screenshot. For example, here are the times from a recent recording:
Visible in the screenshot
The screenshot's filename
The results are wild enough that I think I must be doing something exceptionally stupid, but I'm not sure what it is. Any ideas/recommendations? Or, ideas for how to better accomplish my goal?
I am new to using Metal but I have been following the tutorial here that takes the camera output and renders it on to the screen using metal.
Now I want to take an image, turn it into a MTLTexture, and position and render that texture on top of the camera output.
My current rendering code is as follows:
private func render(texture: MTLTexture, withCommandBuffer commandBuffer: MTLCommandBuffer, device: MTLDevice) {
let currentRenderPassDescriptor = metalView.currentRenderPassDescriptor,
let currentDrawable = metalView.currentDrawable,
let renderPipelineState = renderPipelineState,
let encoder = commandBuffer.makeRenderCommandEncoder(descriptor: currentRenderPassDescriptor)
else {
encoder.setFragmentTexture(texture, index: 0)
encoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4, instanceCount: 1)
commandBuffer.addScheduledHandler { [weak self] (buffer) in
guard let unwrappedSelf = self else { return }
unwrappedSelf.didRenderTexture(texture, withCommandBuffer: buffer, device: device)
I know that I can convert a UIImage to a MTLTexture using the following code:
let textureLoader = MTKTextureLoader(device: device)
let cgImage = UIImage(named: "myImage")!.cgImage!
let imageTexture = try! textureLoader.newTexture(cgImage: cgImage, options: nil)
So now I have two MTLTextures. Is there a simple function that allows me to combine them? I've been trying to search online and someone mentioned a function called over, but I haven't actually been able to find that one. Any help would be greatly appreciated.
You can simply do this inside the shader by adding or multiplying color values. I guess that's what shaders are for.
I have an app which does real time filtering on camera feed, i'm getting each frame from camera and then do some filtering using CIFilter and then pass the final frame(CIImage) to MTKView to be shown on my swiftUI view, it works fine, but when i want to do face/body detection, real time, on camera feed, frame rate goes down to 8 frames per second and super laggy.
i tried anything i could find on the internet, using vision, CIDetector, CoreML, everything is the same result, well, i would do this on global thread, which makes the UI responsive but the feed which i'm showing into the main view is still laggy, but things like scrollview are working fine.
so i tried to change the view from MTKView to UIImageView, Xcode shows its rendering at 120FPS (which i dont understand why, its 30FPS when not using any face detection) but the feed is still laggy, cannot keep up somehow to the output frame rate, i'm new to this, i dont understand why is it like that.
i also tried just to pass the coming image to MTKView (without any filtering in between, with face detection) also the same laggy result, without face detection, it goes to 30FPS (why not 120?).
this is the code i'm using for converting sampleBuffer to ciImage
extension CICameraCapture: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
var ciImage = CIImage(cvImageBuffer: imageBuffer)
if self.cameraPosition == AVCaptureDevice.Position.front {
ciImage = ciImage.oriented(.downMirrored)
ciImage = ciImage.transformed(by: CGAffineTransform(rotationAngle: 3 * .pi / 2))
ciImage = ciImage.transformToOrigin(withSize: ciImage.extent.size)
detectFace(image: ciImage) // this is for detecting face realtime, i have done it in vision
//and also cidetector - cidetector is a little bit faster when setted to low accuracy
//but still not desired result(frame rate)
DispatchQueue.main.async {
and this is the MTKView code, which is very simple and basic implementation of it:
import MetalKit
import CoreImage
class MetalRenderView: MTKView {
//var textureCache: CVMetalTextureCache?
override init(frame frameRect: CGRect, device: MTLDevice?) {
super.init(frame: frameRect, device: device)
if super.device == nil {
fatalError("No support for Metal. Sorry")
framebufferOnly = false
preferredFramesPerSecond = 120
sampleCount = 2
required init(coder: NSCoder) {
fatalError("init(coder:) has not been implemented")
private lazy var commandQueue: MTLCommandQueue? = {
[unowned self] in
return self.device!.makeCommandQueue()
private lazy var ciContext: CIContext = {
[unowned self] in
return CIContext(mtlDevice: self.device!)
var image: CIImage? {
didSet {
private func renderImage() {
guard var image = image else { return }
image = image.transformToOrigin(withSize: drawableSize) // this is an extension to resize
//the image to the render size so i dont get the render error while rendering a frame
let commandBuffer = commandQueue?.makeCommandBuffer()
let destination = CIRenderDestination(width: Int(drawableSize.width),
height: Int(drawableSize.height),
pixelFormat: .bgra8Unorm,
commandBuffer: commandBuffer) { () -> MTLTexture in
return self.currentDrawable!.texture
try! ciContext.startTask(toRender: image, to: destination)
and here is the code for face detection using CIDetector:
func detectFace (image: CIImage){
//DispatchQueue.global().async {
let options = [CIDetectorAccuracy: CIDetectorAccuracyHigh,
CIDetectorSmile: true, CIDetectorTypeFace: true] as [String : Any]
let faceDetector = CIDetector(ofType: CIDetectorTypeFace, context: nil,
options: options)!
let faces = faceDetector.features(in: image)
if let face = faces.first as? CIFaceFeature {
AppState.shared.mouth = face.mouthPosition
AppState.shared.leftEye = face.leftEyePosition
AppState.shared.rightEye = face.rightEyePosition
what I have tried
1) different face detection methods, using Vision, CIDetector and also CoreML(this one not very deeply as i dont have experience in it)
I would get the detection info, but frame rate is 8 or at the best case its 15 (which would be a delayed detection)
2) I've read somewhere that it might be result of the image colorsapce so i have tried different video setting and different rendering colorspace, still no change in the frame rate.
3) I'm somehow sure that it might be regarding to pixelbuffer release time, so i deep copied the imageBuffer and pass it to the detection, beside some memory issues it went up to 15 FPS, but still not minimum 30FPS. in here i also tried to convert imageBuffer to ciimage and then render ciimage to cgimage and the back to ciimage to just release the buffer, but also could not get more than 15FPS (well on average, sometimes goes to 17 or 19, but still laggy)
i'm new in this and still trying to figure it out, i would appreciate any suggestions, samples or tips that could direct me to a better path of solving this.
this is the camera capture setup code:
class CICameraCapture: NSObject {
typealias Callback = (CIImage?) -> ()
private var cameraPosition = AVCaptureDevice.Position.front
var ciContext: CIContext?
let callback: Callback
private let session = AVCaptureSession()
private let sampleBufferQueue = DispatchQueue(label: "buffer", qos: .userInitiated)//, attributes: [], autoreleaseFrequency: .workItem)
// face detection
//private var sequenceHandler = VNSequenceRequestHandler()
//var request: VNCoreMLRequest!
//var visionModel: VNCoreMLModel!
//let detectionQ = DispatchQueue(label: "detectionQ", qos: .background)//, attributes: [], autoreleaseFrequency: .workItem)
init(callback: #escaping Callback) {
self.callback = callback
ciContext = CIContext(mtlDevice: MTLCreateSystemDefaultDevice()!)
func start() {
func stop() {
private func prepareSession() {
session.sessionPreset = .high //.hd1920x1080
let cameraDiscovery = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInDualCamera, .builtInWideAngleCamera], mediaType: .video, position: cameraPosition)
guard let camera = cameraDiscovery.devices.first else { fatalError("Can't get hold of the camera") }
//try! camera.lockForConfiguration()
//camera.activeVideoMinFrameDuration = camera.formats[0].videoSupportedFrameRateRanges[0].minFrameDuration
//camera.activeVideoMaxFrameDuration = camera.formats[0].videoSupportedFrameRateRanges[0].maxFrameDuration
guard let input = try? AVCaptureDeviceInput(device: camera) else { fatalError("Can't get hold of the camera") }
let output = AVCaptureVideoDataOutput()
output.videoSettings = [:]
//[875704438, 875704422, 1111970369]
//output.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32BGRA)]
output.setSampleBufferDelegate(self, queue: sampleBufferQueue)
I am new to iOS and have experience with image processing coding in other languages, which I was hoping to translate into an app, however I am getting some unusual behavior which I don't understand. When I convert an image to a Data array and look at the number of elements in the array, with every new image this number changes. When I look at the specific data in the array, the values are 0-255 which matches what I would expect for a grayscale image, but I am confused why the size (or number of elements) in the data array changes. I would expect it to be held constant since I set the captureSession to 640x480. Why is this not the case? Even if it wasn't a grayscale image, I would expect the size to remain the same and not change picture to picture.
I am getting the uiimage from AV, and the code is shown below. The other rest of the code not shown is just beginning the session. I basically want to turn the image into raw pixel data, which I have seen a lot of different ways to do this, but this seems like a good method.
Relevant Code:
#objc func timerHandle() {
imageView.image = uiimages
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
uiimages = sampleBuffer.image(orientation: .down, scale: 1.0)!
print(uiimages) //output1
let data = sampleBuffer.data()
let newData = Array(data!)
print(data!.count) //output2
extension CMSampleBuffer {
func image(orientation: UIImageOrientation = .left, scale: CGFloat = 1.0) -> UIImage? {
if let buffer = CMSampleBufferGetImageBuffer(self) {
let ciImage = CIImage(cvPixelBuffer: buffer).applyingFilter("CIColorControls", parameters: [kCIInputSaturationKey:0.0])
return UIImage(ciImage: ciImage, scale: scale, orientation: orientation)
return nil
func data(orientation: UIImageOrientation = .left, scale: CGFloat = 1.0) -> Data? {
if let buffer = CMSampleBufferGetImageBuffer(self) {
let size = self.image()?.size
let scale = self.image()?.scale
let ciImage = CIImage(cvPixelBuffer: buffer).applyingFilter("CIColorControls", parameters: [kCIInputSaturationKey:0.0])
UIGraphicsBeginImageContextWithOptions(size!, false, scale!)
defer { UIGraphicsEndImageContext() }
UIImage(ciImage: ciImage).draw(in: CGRect(origin: .zero, size: size!))
guard let redraw = UIGraphicsGetImageFromCurrentImageContext() else { return nil }
return UIImagePNGRepresentation(redraw)
return nil
At output1, when I straight print the uiimage variable I get:
<UIImage: 0x1c40b0ec0>, {640, 480}
which shows correct dimensions
at output2, when I print the count, every time captureOutput is called I get a different value:
640x480 should give me 307,200, so why am I not getting constant numbers at least, even if the value isn't correct.
In the function below(didPressTakePhoto), I am trying to take a series of pictures(10 in this case), store them into an array and display them as an animation in the "gif". Yet the program keeps crashing and I have no idea why. This is all after one button click, hence the function name. Also, I tried taking the animation code outside the for loop, but the imageArray would then lose it's value for some reason.
func didPressTakePhoto(){
if let videoConnection = stillImageOutput?.connectionWithMediaType(AVMediaTypeVideo){
videoConnection.videoOrientation = AVCaptureVideoOrientation.Portrait
stillImageOutput?.captureStillImageAsynchronouslyFromConnection(videoConnection, completionHandler: {
(sampleBuffer, error) in
//var counter = 0
if sampleBuffer != nil {
for var index = 0; index < 10; ++index {
let imageData = AVCaptureStillImageOutput.jpegStillImageNSDataRepresentation(sampleBuffer)
let dataProvider = CGDataProviderCreateWithCFData(imageData)
let cgImageRef = CGImageCreateWithJPEGDataProvider(dataProvider, nil, true, CGColorRenderingIntent.RenderingIntentDefault)
var imageArray: [UIImage] = []
let image = UIImage(CGImage: cgImageRef!, scale: 1.0, orientation: UIImageOrientation.Right)
imageArray.insert(image, atIndex: index++)
self.tempImageView.image = image
self.tempImageView.hidden = false
//UIImageWriteToSavedPhotosAlbum(image, nil, nil, nil)
var gif: UIImageView!
gif.animationImages = imageArray
gif.animationRepeatCount = -1
gif.animationDuration = 1
Never try to make an array of images (i.e., a [UIImage] as you are doing). A UIImage is very big, so an array of many images is huge and you will run out of memory and crash.
Save your images to disk, maintaining only references to them (i.e. an array of their names).
Before using your images in the interface, reduce them to the actual physical size you will need for that interface. Using a full-size image for a mere screen-sized display (or smaller) is a huge waste of energy. You can use the ImageIO framework to get a "thumbnail" smaller version of the image from disk without wasting memory.
You are creating a new [UIImage] in every iteration of the loop, so in the last iteration there's only one image, you should take the imageArray creation out of the loop. Having said that, you should take into account what #matt answered
I am a beginner programmer and am creating a game using iOS sprite-kit. I have a simple animated GIF (30 frames) saved as a .gif file. Is there a simple way (few lines of code maybe similar to adding a regular .png through UIImage) of displaying this GIF in my game? I have done some research on displaying an animated GIF in Xcode and most involve importing extensive classes, most of which is stuff I don't think I need (I barely know enough to sift through it).
The way I think of it gifs are just like animating a sprite. So what I would do is add the gif as textures in a SKSpriteNode in a for loop and then tell it to run on the device with SKAction.repeatActionForever().
To be honest I'm fairly new to this as well. I'm just trying to give my best answer. This is written in Swift, but I don't think it'll be to hard to translate to Objective-C.
var gifTextures: [SKTexture] = [];
for i in 1...30 {
gifTextures.append(SKTexture(imageNamed: "gif\(i)"));
gifNode.runAction(SKAction.repeatActionForever(SKAction.animateWithTextures(gifTextures, timePerFrame: 0.125)));
Michael Choi's answer will get you half way there. The rest is getting the individual frames out of the gif file. Here's how I do it (in Swift):
func load(imagePath: String) -> ([SKTexture], TimeInterval?) {
guard let imageSource = CGImageSourceCreateWithURL(URL(fileURLWithPath: imagePath) as CFURL, nil) else {
return ([], nil)
let count = CGImageSourceGetCount(imageSource)
var images: [CGImage] = []
for i in 0..<count {
guard let img = CGImageSourceCreateImageAtIndex(imageSource, i, nil) else { continue }
let frameTime = count > 1 ? imageSource.delayFor(imageAt: 0) : nil
return (images.map { SKTexture(cgImage: $0) }, frameTime)
extension CGImageSource { // this was originally from another SO post for which I've lost the link. Apologies.
func delayFor(imageAt index: Int) -> TimeInterval {
var delay = 0.1
// Get dictionaries
let cfProperties = CGImageSourceCopyPropertiesAtIndex(self, index, nil)
let gifPropertiesPointer = UnsafeMutablePointer<UnsafeRawPointer?>.allocate(capacity: 0)
if CFDictionaryGetValueIfPresent(cfProperties, Unmanaged.passUnretained(kCGImagePropertyGIFDictionary).toOpaque(), gifPropertiesPointer) == false {
return delay
let gifProperties: CFDictionary = unsafeBitCast(gifPropertiesPointer.pointee, to: CFDictionary.self)
// Get delay time
var delayObject: AnyObject = unsafeBitCast(
to: AnyObject.self)
if delayObject.doubleValue == 0 {
delayObject = unsafeBitCast(CFDictionaryGetValue(gifProperties,
Unmanaged.passUnretained(kCGImagePropertyGIFDelayTime).toOpaque()), to: AnyObject.self)
delay = delayObject as? TimeInterval ?? 0.1
if delay < 0.1 {
delay = 0.1 // Make sure they're not too fast
return delay
Note that I assume that each frame of the gif is the same length, which is not always the case.
You could also pretty easily construct an SKTextureAtlas with these images.