I’m trying to create an app that can record video at 100 FPS using AVAssetWriter AND detect if a person is performing an action using the ActionClassifier from Create ML. But when I try to put the 2 together the FPS drops to 30 when recording and detecting actions.
If I do the recording by itself then it records at 100 FPS.
I am able to set the FPS of the camera to 100 FPS through the device configuration.
Capture output Function is setup
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
bufferImage = sampleBuffer
guard let calibrationData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) as? Data else {
return
}
cameraCalibrationMatrix = calibrationData.withUnsafeBytes { $0.pointee }
if self.isPredictorActivated == true {
do {
let poses = try predictor.processFrame(sampleBuffer)
if (predictor.isReadyToMakePrediction) {
let prediction = try predictor.makePrediction()
let confidence = prediction.confidence * 100
DispatchQueue.main.async {
self.predictionLabel.text = prediction.label + " " + String(confidence.rounded(toPlaces: 0))
if (prediction.label == "HandsUp" && prediction.confidence > 0.85) {
print("Challenging")
self.didChallengeVideo()
}
}
}
} catch {
print(error.localizedDescription)
}
}
let presentationTimeStamp = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
if assetWriter == nil {
createWriterInput(for: presentationTimeStamp)
} else {
let chunkDuration = CMTimeGetSeconds(CMTimeSubtract(presentationTimeStamp, chunkStartTime))
// print("Challenge\(isChallenging)")
if chunkDuration > 1500 || isChallenging {
assetWriter.endSession(atSourceTime: presentationTimeStamp)
// make a copy, as finishWriting is asynchronous
let newChunkURL = chunkOutputURL!
let chunkAssetWriter = assetWriter!
chunkAssetWriter.finishWriting {
print("finishWriting says: \(chunkAssetWriter.status.rawValue) \(String(describing: chunkAssetWriter.error))")
print("queuing \(newChunkURL)")
print("Chunk Duration: \(chunkDuration)")
let asset = AVAsset(url: newChunkURL)
print("FPS of CHUNK \(asset.tracks.first?.nominalFrameRate)")
if self.isChallenging {
self.challengeVideoProcess(video: asset)
}
self.isChallenging = false
}
createWriterInput(for: presentationTimeStamp)
}
}
if !assetWriterInput.append(sampleBuffer) {
print("append says NO: \(assetWriter.status.rawValue) \(String(describing: assetWriter.error))")
}
}
Performing action classification is quite expensive if you want to run it every frame so it may affect overall performance of the app (including video footage FPS). I don't know how often you need prediction but I would suggest you to try running Action Classifier 2-3 times per second maximum and see if that helps.
Running action classifier every frame won't change your classification that much because you're adding just one frame to your classifier action window so there is no need to run it so often.
For example if your action classifier was setup with window 3s and trained on 30fps videos, your classification is based on 3 * 30 = 90 frames. One frame won't make a difference.
Also make sure that your 100fps matches footage that you used for training action classifier. Otherwise you can get wrong predictions because running Action Classifier trained on 30fps video will treat 1s of 100fps footage as more than 3,333s.
Related
Im looking for a fast way to compare two frames of video, and decide if a lot has changed between them. This will be used to decide if I should send a request to image recognition service over REST, so I don't want to keep sending them, until there might be some different results. Something similar is doing Vuforia SDK. Im starting with a Framebuffer from ARKit, and I have it scaled to 640:480 and converted to RGB888 vBuffer_image. It could compare just few points, but it needs to find out if difference is significant nicely.
I started by calculating difference between few points using vDSP functions, but this has a disadvantage - if I move camera even very slightly to left/right, then the same points have different portions of image, and the calculated difference is high, even if nothing really changed much.
I was thinking about using histograms, but I didn't test this approach yet.
What would be the best solution for this? It needs to be fast, it can compare just smaller version of image, etc.
I have tested another approach using VNFeaturePointObservation from Vision. This works a lot better, but Im afraid it might be more CPU demanding. I need to test this on some older devices. Anyway, this is a part of code that works nicely. If someone could suggest some better approach to test, please let know:
private var lastScanningImageFingerprint: VNFeaturePrintObservation?
// Returns true if these are different enough
private func compareScanningImages(current: VNFeaturePrintObservation, last: VNFeaturePrintObservation?) -> Bool {
guard let last = last else { return true }
var distance = Float(0)
try! last.computeDistance(&distance, to: current)
print(distance)
return distance > 10
}
// After scanning is done, subclass should prepare suggestedTargets array.
private func performScanningIfNeeded(_ sender: Timer) {
guard !scanningInProgress else { return } // Wait for previous scanning to finish
guard let vImageBuffer = deletate?.currentFrameScalledImage else { return }
guard let image = CGImage.create(from: vImageBuffer) else { return }
func featureprintObservationForImage(image: CGImage) -> VNFeaturePrintObservation? {
let requestHandler = VNImageRequestHandler(cgImage: image, options: [:])
let request = VNGenerateImageFeaturePrintRequest()
do {
try requestHandler.perform([request])
return request.results?.first as? VNFeaturePrintObservation
} catch {
print("Vision error: \(error)")
return nil
}
}
guard let imageFingerprint = featureprintObservationForImage(image: image) else { return }
guard compareScanningImages(current: imageFingerprint, last: lastScanningImageFingerprint) else { return }
print("SCANN \(Date())")
lastScanningImageFingerprint = featureprintObservationForImage(image: image)
executeScanning(on: image) { [weak self] in
self?.scanningInProgress = false
}
}
Tested on older iPhone - as expected this causes some frame drops on camera preview. So I need a faster algorithm
I have an app that uses samplers to play loops. I am in the process of converting my app from using AVAudioEngine to AudioKit. My app now works well except for this: Approximately every 1-3 minutes, my app receives two .AVAudioEngineConfigurationChange notifications in a row. There is no apparent pattern to its repetition and this happens on both my iPhone 6s and new iPad.
Here is my init code for my "conductor" singleton:
init() {
//sampler array
//sampler array is cycled through as user changes sounds
samplerArray = [sampler0, sampler1, sampler2, sampler3]
//start by loading samplers with default preset
for sampler in samplerArray {
//get the sampler preset
let presetPath = Bundle.main.path(forResource: currentSound, ofType: "aupreset")
let presetURL = NSURL.fileURL(withPath: presetPath!)
do {
try sampler.samplerUnit.loadPreset(at: presetURL)
print("rrob: loaded sample")
} catch {
print("rrob: failed to load sample")
}
}
//signal chain
samplerMixer = AKMixer(samplerArray)
filter = AKMoogLadder(samplerMixer)
reverb = AKCostelloReverb(filter)
reverbMixer = AKDryWetMixer(filter, reverb, balance: 0.3)
outputMixer = AKMixer(reverbMixer)
AudioKit.output = outputMixer
//AKSettings.enableRouteChangeHandling = false
AKSettings.playbackWhileMuted = true
do {
try AKSettings.setSession(category: AKSettings.SessionCategory.playback, with: AVAudioSessionCategoryOptions.mixWithOthers)
} catch {
print("rrob: failed to set audio session")
}
//AudioBus recommended buffer length
AKSettings.bufferLength = .medium
AudioKit.start()
print("rrob: did init autoEngine")
}
Any AudioKit experts have ideas for where I can start troubleshooting? Happy to provide more info. Thanks.
I am trying to save depth images from the iPhoneX TrueDepth camera. Using the AVCamPhotoFilter sample code, I am able to view the depth, converted to grayscale format, on the screen of the phone in real-time. I cannot figure out how to save the sequence of depth images in the raw (16 bits or more) format.
I have depthData which is an instance of AVDepthData. One of its members is depthDataMap which is an instance of CVPixelBuffer and image format type kCVPixelFormatType_DisparityFloat16. Is there a way to save it to the phone to transfer for offline manipulation?
There's no standard video format for "raw" depth/disparity maps, which might have something to do with AVCapture not really offering a way to record it.
You have a couple of options worth investigating here:
Convert depth maps to grayscale textures (which you can do using the code in the AVCamPhotoFilter sample code), then pass those textures to AVAssetWriter to produce a grayscale video. Depending on the video format and grayscale conversion method you choose, other software you write for reading the video might be able to recover depth/disparity info with sufficient precision for your purposes from the grayscale frames.
Anytime you have a CVPixelBuffer, you can get at the data yourself and do whatever you want with it. Use CVPixelBufferLockBaseAddress (with the readOnly flag) to make sure the content won't change while you read it, then copy data from the pointer CVPixelBufferGetBaseAddress provides to wherever you want. (Use other pixel buffer functions to see how many bytes to copy, and unlock the buffer when you're done.)
Watch out, though: if you spend too much time copying from buffers, or otherwise retain them, they won't get deallocated as new buffers come in from the capture system, and your capture session will hang. (All told, it's unclear without testing whether a device has the memory & I/O bandwidth for much recording this way.)
You can use Compression library to create a zip file with the raw CVPixelBuffer data.
Few problems with this solution.
It's a lot of data and zip is not a good compression. (the compressed file is 20 times bigger than 32bits per frame video with the same number of frames).
Apple's Compression library creates a file which standard zip program does't open. I use zlib in C code to read it and use inflateInit2(&strm, -15); to make it work.
You'll need to do some work to export the file out of your application
Here is my code (which I limited to 250 frames since it hold it in RAM but you can flush to disk if needed more frames):
// DepthCapture.swift
// AVCamPhotoFilter
//
// Created by Eyal Fink on 07/04/2018.
// Copyright © 2018 Resonai. All rights reserved.
//
// Capture the depth pixelBuffer into a compress file.
// This is very hacky and there are lots of TODOs but instead we need to replace
// it with a much better compression (video compression)....
import AVFoundation
import Foundation
import Compression
class DepthCapture {
let kErrorDomain = "DepthCapture"
let maxNumberOfFrame = 250
lazy var bufferSize = 640 * 480 * 2 * maxNumberOfFrame // maxNumberOfFrame frames
var dstBuffer: UnsafeMutablePointer<UInt8>?
var frameCount: Int64 = 0
var outputURL: URL?
var compresserPtr: UnsafeMutablePointer<compression_stream>?
var file: FileHandle?
// All operations handling the compresser oobjects are done on the
// porcessingQ so they will happen sequentially
var processingQ = DispatchQueue(label: "compression",
qos: .userInteractive)
func reset() {
frameCount = 0
outputURL = nil
if self.compresserPtr != nil {
//free(compresserPtr!.pointee.dst_ptr)
compression_stream_destroy(self.compresserPtr!)
self.compresserPtr = nil
}
if self.file != nil {
self.file!.closeFile()
self.file = nil
}
}
func prepareForRecording() {
reset()
// Create the output zip file, remove old one if exists
let documentsPath = NSSearchPathForDirectoriesInDomains(.documentDirectory, .userDomainMask, true)[0] as NSString
self.outputURL = URL(fileURLWithPath: documentsPath.appendingPathComponent("Depth"))
FileManager.default.createFile(atPath: self.outputURL!.path, contents: nil, attributes: nil)
self.file = FileHandle(forUpdatingAtPath: self.outputURL!.path)
if self.file == nil {
NSLog("Cannot create file at: \(self.outputURL!.path)")
return
}
// Init the compression object
compresserPtr = UnsafeMutablePointer<compression_stream>.allocate(capacity: 1)
compression_stream_init(compresserPtr!, COMPRESSION_STREAM_ENCODE, COMPRESSION_ZLIB)
dstBuffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bufferSize)
compresserPtr!.pointee.dst_ptr = dstBuffer!
//defer { free(bufferPtr) }
compresserPtr!.pointee.dst_size = bufferSize
}
func flush() {
//let data = Data(bytesNoCopy: compresserPtr!.pointee.dst_ptr, count: bufferSize, deallocator: .none)
let nBytes = bufferSize - compresserPtr!.pointee.dst_size
print("Writing \(nBytes)")
let data = Data(bytesNoCopy: dstBuffer!, count: nBytes, deallocator: .none)
self.file?.write(data)
}
func startRecording() throws {
processingQ.async {
self.prepareForRecording()
}
}
func addPixelBuffers(pixelBuffer: CVPixelBuffer) {
processingQ.async {
if self.frameCount >= self.maxNumberOfFrame {
// TODO now!! flush when needed!!!
print("MAXED OUT")
return
}
CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)
let add : UnsafeMutableRawPointer = CVPixelBufferGetBaseAddress(pixelBuffer)!
self.compresserPtr!.pointee.src_ptr = UnsafePointer<UInt8>(add.assumingMemoryBound(to: UInt8.self))
let height = CVPixelBufferGetHeight(pixelBuffer)
self.compresserPtr!.pointee.src_size = CVPixelBufferGetBytesPerRow(pixelBuffer) * height
let flags = Int32(0)
let compression_status = compression_stream_process(self.compresserPtr!, flags)
if compression_status != COMPRESSION_STATUS_OK {
NSLog("Buffer compression retured: \(compression_status)")
return
}
if self.compresserPtr!.pointee.src_size != 0 {
NSLog("Compression lib didn't eat all data: \(compression_status)")
return
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
// TODO(eyal): flush when needed!!!
self.frameCount += 1
print("handled \(self.frameCount) buffers")
}
}
func finishRecording(success: #escaping ((URL) -> Void)) throws {
processingQ.async {
let flags = Int32(COMPRESSION_STREAM_FINALIZE.rawValue)
self.compresserPtr!.pointee.src_size = 0
//compresserPtr!.pointee.src_ptr = UnsafePointer<UInt8>(0)
let compression_status = compression_stream_process(self.compresserPtr!, flags)
if compression_status != COMPRESSION_STATUS_END {
NSLog("ERROR: Finish failed. compression retured: \(compression_status)")
return
}
self.flush()
DispatchQueue.main.sync {
success(self.outputURL!)
}
self.reset()
}
}
}
I am working on an application that plays back video and allows the user to scrub forwards and backwards in the video. The scrubbing has to happen smoothly, so we always re-write the video with SDAVAssetExportSession with the video compression property AVVideoMaxKeyFrameIntervalKey:#1 so that each frame will be a keyframe and allow smooth reverse scrubbing. This works great and provides smooth playback. The application uses video from a variety of sources and can be recorded on android or iOS devices and even downloaded from the web and added to the application, so we end up with quite different encodings, some of which are already suited for scrubbing (each frame is a keyframe). Is there a way to detect the keyframe interval of a video file so I can avoid needless video processing? I have been through much of AVFoundation's docs and don't see an obvious way to get this information. Thanks for any help on this.
If you can quickly parse the file without decoding the images by creating an AVAssetReaderTrackOutput with nil outputSettings. The frame sample buffers you encounter have an attachment array containing a dictionary with useful information, include whether the frame depends on other frames, or whether other frames depend on it. I would interpret that former as indicating a keyframe, although it gives me some low number (4% keyframes in one file?). Anyway, the code:
let asset = AVAsset(url: inputUrl)
let reader = try! AVAssetReader(asset: asset)
let videoTrack = asset.tracks(withMediaType: AVMediaTypeVideo)[0]
let trackReaderOutput = AVAssetReaderTrackOutput(track: videoTrack, outputSettings: nil)
reader.add(trackReaderOutput)
reader.startReading()
var numFrames = 0
var keyFrames = 0
while true {
if let sampleBuffer = trackReaderOutput.copyNextSampleBuffer() {
// NB: not every sample buffer corresponds to a frame!
if CMSampleBufferGetNumSamples(sampleBuffer) > 0 {
numFrames += 1
if let attachmentArray = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false) as? NSArray {
let attachment = attachmentArray[0] as! NSDictionary
// print("attach on frame \(frame): \(attachment)")
if let depends = attachment[kCMSampleAttachmentKey_DependsOnOthers] as? NSNumber {
if !depends.boolValue {
keyFrames += 1
}
}
}
}
} else {
break
}
}
print("\(keyFrames) on \(numFrames)")
N.B. This only works for local file assets.
p.s. you don't say how you're scrubbing or playing. An AVPlayerViewController and an AVPlayer?
Here is the Objective C version of the same answer. After implementing this and using it, Videos that should have all keyframes are returning about 96% keyframes from this code. I'm not sure why, so I am using that number as a determining factor even though I would like it to be more accurate. I am also only looking through the first 600 frames or the end of the video (whichever comes first) since I don't need to read through a whole 20 minute video to make this determination.
+ (BOOL)videoNeedsProcessingForSlomo:(NSURL*)fileUrl {
BOOL needsProcessing = YES;
AVAsset* anAsset = [AVAsset assetWithURL:fileUrl];
NSError *error;
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:anAsset error:&error];
if (error) {
DLog(#"Error:%#", error.localizedDescription);
return YES;
}
AVAssetTrack *videoTrack = [[anAsset tracksWithMediaType:AVMediaTypeVideo] objectAtIndex:0];
AVAssetReaderTrackOutput *trackOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:videoTrack outputSettings:nil];
[assetReader addOutput:trackOutput];
[assetReader startReading];
float numFrames = 0;
float keyFrames = 0;
while (numFrames < 600) { // If the video is long - only parse through 20 seconds worth.
CMSampleBufferRef sampleBuffer = [trackOutput copyNextSampleBuffer];
if (sampleBuffer) {
// NB: not every sample buffer corresponds to a frame!
if (CMSampleBufferGetNumSamples(sampleBuffer) > 0) {
numFrames += 1;
NSArray *attachmentArray = ((NSArray*)CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false));
if (attachmentArray) {
NSDictionary *attachment = attachmentArray[0];
NSNumber *depends = attachment[(__bridge NSNumber*)kCMSampleAttachmentKey_DependsOnOthers];
if (depends) {
if (depends.boolValue) {
keyFrames += 1;
}
}
}
}
}
else {
break;
}
}
needsProcessing = keyFrames / numFrames < 0.95f; // If more than 95% of the frames are keyframes - don't decompress.
return needsProcessing;
}
Using kCMSampleAttachmentKey_DependsOnOthers was giving me 0 key frames in some cases, when ffprobe would return key frames.
To get the same number of key frames as ffprobe shows, I used:
if attachment[CMSampleBuffer.PerSampleAttachmentsDictionary.Key.notSync] == nil {
keyFrames += 1
}
In the CoreMedia header it says:
/// Boolean (absence of this key implies Sync)
public static let notSync: CMSampleBuffer.PerSampleAttachmentsDictionary.Key
for dependsOnOthers key it says:
/// `true` (e.g., non-I-frame), `false` (e.g. I-frame), or absent if
/// unknown
public static let dependsOnOthers: CMSampleBuffer.PerSampleAttachmentsDictionary.Key
I’m looking for a way to maintain a seamless audio track while flipping between front and back camera. Many apps in the market can do this, one example is SnapChat…
Solutions should use AVCaptureSession and AVAssetWriter. Also it should explicitly not use AVMutableComposition since there is a bug between AVMutableComposition and AVCaptureSession ATM. Also, I can't afford post processing time.
Currently when I change the video input the audio recording skips and becomes out of sync.
I’m including the code that could be relevant.
Flip Camera
-(void) updateCameraDirection:(CamDirection)vCameraDirection {
if(session) {
AVCaptureDeviceInput* currentInput;
AVCaptureDeviceInput* newInput;
BOOL videoMirrored = NO;
switch (vCameraDirection) {
case CamDirection_Front:
currentInput = input_Back;
newInput = input_Front;
videoMirrored = NO;
break;
case CamDirection_Back:
currentInput = input_Front;
newInput = input_Back;
videoMirrored = YES;
break;
default:
break;
}
[session beginConfiguration];
//disconnect old input
[session removeInput:currentInput];
//connect new input
[session addInput:newInput];
//get new data connection and config
dataOutputVideoConnection = [dataOutputVideo connectionWithMediaType:AVMediaTypeVideo];
dataOutputVideoConnection.videoOrientation = AVCaptureVideoOrientationPortrait;
dataOutputVideoConnection.videoMirrored = videoMirrored;
//finish
[session commitConfiguration];
}
}
Sample Buffer
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
//not active
if(!recordingVideo)
return;
//start session if not started
if(!startedSession) {
startedSession = YES;
[assetWriter startSessionAtSourceTime:CMSampleBufferGetPresentationTimeStamp(sampleBuffer)];
}
//Process sample buffers
if (connection == dataOutputAudioConnection) {
if([assetWriterInputAudio isReadyForMoreMediaData]) {
BOOL success = [assetWriterInputAudio appendSampleBuffer:sampleBuffer];
//…
}
} else if (connection == dataOutputVideoConnection) {
if([assetWriterInputVideo isReadyForMoreMediaData]) {
BOOL success = [assetWriterInputVideo appendSampleBuffer:sampleBuffer];
//…
}
}
}
Perhaps adjust audio sample timeStamp?
Hey I was facing the same issue and discovered that after switching cameras the next frame was pushed far out of place. This seemed to shift every frame after that thus causing the the video and audio to be out of sync. My solution was to shift every misplaced frame to it's correct position after switching cameras.
Sorry my answer will be in Swift 4.2
You'll have to use AVAssetWriterInputPixelBufferAdaptor in order to append the sample buffers at a specify presentation timestamp.
previousPresentationTimeStamp is the presentation timestamp of the previous frame and currentPresentationTimestamp is as you guessed the presentation timestamp of the current. maxFrameDistance worked every well when testing but you can change this to your liking.
let currentFramePosition = (Double(self.frameRate) * Double(currentPresentationTimestamp.value)) / Double(currentPresentationTimestamp.timescale)
let previousFramePosition = (Double(self.frameRate) * Double(previousPresentationTimeStamp.value)) / Double(previousPresentationTimeStamp.timescale)
var presentationTimeStamp = currentPresentationTimestamp
let maxFrameDistance = 1.1
let frameDistance = currentFramePosition - previousFramePosition
if frameDistance > maxFrameDistance {
let expectedFramePosition = previousFramePosition + 1.0
//print("[mwCamera]: Frame at incorrect position moving from \(currentFramePosition) to \(expectedFramePosition)")
let newFramePosition = ((expectedFramePosition) * Double(currentPresentationTimestamp.timescale)) / Double(self.frameRate)
let newPresentationTimeStamp = CMTime.init(value: CMTimeValue(newFramePosition), timescale: currentPresentationTimestamp.timescale)
presentationTimeStamp = newPresentationTimeStamp
}
let success = assetWriterInputPixelBufferAdator.append(pixelBuffer, withPresentationTime: presentationTimeStamp)
if !success, let error = assetWriter.error {
fatalError(error.localizedDescription)
}
Also please note - This worked because I kept the frame rate consistent, so make sure that you have total control of the capture device's frame rate throughout this process.
I have a repo using this logic here
I did manage to find an intermediate solution for the sync problem I found on the Woody Jean-louis solution using is repo.
The results are similar to what instagram does but it seems to work a little bit better. Basically what I do is to prevent the assetWriterAudioInput to append new samples when switching cameras. There is no way to know exactly when this happens so I figured out that before and after the switch the captureOutput method was sending video samples every 0.02 seconds +- (max 0.04 seconds).
Knowing this I created a self.lastVideoSampleDate that is updated every time a video sample is appended to assetWriterInputPixelBufferAdator and I only allow the audio sample to be appended to assetWriterAudioInput is that date is lower than 0.05.
if let assetWriterAudioInput = self.assetWriterAudioInput,
output == self.audioOutput, assetWriterAudioInput.isReadyForMoreMediaData {
let since = Date().timeIntervalSince(self.lastVideoSampleDate)
if since < 0.05 {
let success = assetWriterAudioInput.append(sampleBuffer)
if !success, let error = assetWriter.error {
print(error)
fatalError(error.localizedDescription)
}
}
}
let success = assetWriterInputPixelBufferAdator.append(pixelBuffer, withPresentationTime: presentationTimeStamp)
if !success, let error = assetWriter.error {
print(error)
fatalError(error.localizedDescription)
}
self.lastVideoSampleDate = Date()
The most 'stable way' to fix this problem - is to 'pause' recording when switching sources.
But also you can 'fill the gap' with blank video and silent audio frames.
This is what I have implemented in my project.
So, create boolean to block ability to append new CMSampleBuffer's while switching cameras/microphones and reset it after some delay:
let idleTime = 1.0
self.recordingPaused = true
DispatchQueue.main.asyncAfter(deadline: .now() + idleTime) {
self.recordingPaused = false
}
writeAllIdleFrames()
In writeAllIdleFrames method you need to calculate how many frames you need to write:
func writeAllIdleFrames() {
let framesPerSecond = 1.0 / self.videoConfig.fps
let samplesPerSecond = 1024 / self.audioConfig.sampleRate
let videoFramesCount = Int(ceil(self.switchInputDelay / framesPerSecond))
let audioFramesCount = Int(ceil(self.switchInputDelay / samplesPerSecond))
for index in 0..<max(videoFramesCount, audioFramesCount) {
// creation synthetic buffers
recordingQueue.async {
if index < videoFramesCount {
let pts = self.nextVideoPTS()
self.writeBlankVideo(pts: pts)
}
if index < audioFramesCount {
let pts = self.nextAudioPTS()
self.writeSilentAudio(pts: pts)
}
}
}
}
How to calculate next PTS?
func nextVideoPTS() -> CMTime {
guard var pts = self.lastVideoRawPTS else { return CMTime.invalid }
let framesPerSecond = 1.0 / self.videoConfig.fps
let delta = CMTime(value: Int64(framesPerSecond * Double(pts.timescale)),
timescale: pts.timescale, flags: pts.flags, epoch: pts.epoch)
pts = CMTimeAdd(pts, delta)
return pts
}
Tell me, if you also need code that creates blank/silent video/audio buffers :)