I’m looking for a way to maintain a seamless audio track while flipping between front and back camera. Many apps in the market can do this, one example is SnapChat…
Solutions should use AVCaptureSession and AVAssetWriter. Also it should explicitly not use AVMutableComposition since there is a bug between AVMutableComposition and AVCaptureSession ATM. Also, I can't afford post processing time.
Currently when I change the video input the audio recording skips and becomes out of sync.
I’m including the code that could be relevant.
Flip Camera
-(void) updateCameraDirection:(CamDirection)vCameraDirection {
if(session) {
AVCaptureDeviceInput* currentInput;
AVCaptureDeviceInput* newInput;
BOOL videoMirrored = NO;
switch (vCameraDirection) {
case CamDirection_Front:
currentInput = input_Back;
newInput = input_Front;
videoMirrored = NO;
case CamDirection_Back:
currentInput = input_Front;
newInput = input_Back;
videoMirrored = YES;
[session beginConfiguration];
//disconnect old input
[session removeInput:currentInput];
//connect new input
[session addInput:newInput];
//get new data connection and config
dataOutputVideoConnection = [dataOutputVideo connectionWithMediaType:AVMediaTypeVideo];
dataOutputVideoConnection.videoOrientation = AVCaptureVideoOrientationPortrait;
dataOutputVideoConnection.videoMirrored = videoMirrored;
[session commitConfiguration];
Sample Buffer
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
//not active
//start session if not started
if(!startedSession) {
startedSession = YES;
[assetWriter startSessionAtSourceTime:CMSampleBufferGetPresentationTimeStamp(sampleBuffer)];
//Process sample buffers
if (connection == dataOutputAudioConnection) {
if([assetWriterInputAudio isReadyForMoreMediaData]) {
BOOL success = [assetWriterInputAudio appendSampleBuffer:sampleBuffer];
} else if (connection == dataOutputVideoConnection) {
if([assetWriterInputVideo isReadyForMoreMediaData]) {
BOOL success = [assetWriterInputVideo appendSampleBuffer:sampleBuffer];
Perhaps adjust audio sample timeStamp?
Hey I was facing the same issue and discovered that after switching cameras the next frame was pushed far out of place. This seemed to shift every frame after that thus causing the the video and audio to be out of sync. My solution was to shift every misplaced frame to it's correct position after switching cameras.
Sorry my answer will be in Swift 4.2
You'll have to use AVAssetWriterInputPixelBufferAdaptor in order to append the sample buffers at a specify presentation timestamp.
previousPresentationTimeStamp is the presentation timestamp of the previous frame and currentPresentationTimestamp is as you guessed the presentation timestamp of the current. maxFrameDistance worked every well when testing but you can change this to your liking.
let currentFramePosition = (Double(self.frameRate) * Double(currentPresentationTimestamp.value)) / Double(currentPresentationTimestamp.timescale)
let previousFramePosition = (Double(self.frameRate) * Double(previousPresentationTimeStamp.value)) / Double(previousPresentationTimeStamp.timescale)
var presentationTimeStamp = currentPresentationTimestamp
let maxFrameDistance = 1.1
let frameDistance = currentFramePosition - previousFramePosition
if frameDistance > maxFrameDistance {
let expectedFramePosition = previousFramePosition + 1.0
//print("[mwCamera]: Frame at incorrect position moving from \(currentFramePosition) to \(expectedFramePosition)")
let newFramePosition = ((expectedFramePosition) * Double(currentPresentationTimestamp.timescale)) / Double(self.frameRate)
let newPresentationTimeStamp = CMTime.init(value: CMTimeValue(newFramePosition), timescale: currentPresentationTimestamp.timescale)
presentationTimeStamp = newPresentationTimeStamp
let success = assetWriterInputPixelBufferAdator.append(pixelBuffer, withPresentationTime: presentationTimeStamp)
if !success, let error = assetWriter.error {
Also please note - This worked because I kept the frame rate consistent, so make sure that you have total control of the capture device's frame rate throughout this process.
I have a repo using this logic here
I did manage to find an intermediate solution for the sync problem I found on the Woody Jean-louis solution using is repo.
The results are similar to what instagram does but it seems to work a little bit better. Basically what I do is to prevent the assetWriterAudioInput to append new samples when switching cameras. There is no way to know exactly when this happens so I figured out that before and after the switch the captureOutput method was sending video samples every 0.02 seconds +- (max 0.04 seconds).
Knowing this I created a self.lastVideoSampleDate that is updated every time a video sample is appended to assetWriterInputPixelBufferAdator and I only allow the audio sample to be appended to assetWriterAudioInput is that date is lower than 0.05.
if let assetWriterAudioInput = self.assetWriterAudioInput,
output == self.audioOutput, assetWriterAudioInput.isReadyForMoreMediaData {
let since = Date().timeIntervalSince(self.lastVideoSampleDate)
if since < 0.05 {
let success = assetWriterAudioInput.append(sampleBuffer)
if !success, let error = assetWriter.error {
let success = assetWriterInputPixelBufferAdator.append(pixelBuffer, withPresentationTime: presentationTimeStamp)
if !success, let error = assetWriter.error {
self.lastVideoSampleDate = Date()
The most 'stable way' to fix this problem - is to 'pause' recording when switching sources.
But also you can 'fill the gap' with blank video and silent audio frames.
This is what I have implemented in my project.
So, create boolean to block ability to append new CMSampleBuffer's while switching cameras/microphones and reset it after some delay:
let idleTime = 1.0
self.recordingPaused = true
DispatchQueue.main.asyncAfter(deadline: .now() + idleTime) {
self.recordingPaused = false
In writeAllIdleFrames method you need to calculate how many frames you need to write:
func writeAllIdleFrames() {
let framesPerSecond = 1.0 / self.videoConfig.fps
let samplesPerSecond = 1024 / self.audioConfig.sampleRate
let videoFramesCount = Int(ceil(self.switchInputDelay / framesPerSecond))
let audioFramesCount = Int(ceil(self.switchInputDelay / samplesPerSecond))
for index in 0..<max(videoFramesCount, audioFramesCount) {
// creation synthetic buffers
recordingQueue.async {
if index < videoFramesCount {
let pts = self.nextVideoPTS()
self.writeBlankVideo(pts: pts)
if index < audioFramesCount {
let pts = self.nextAudioPTS()
self.writeSilentAudio(pts: pts)
How to calculate next PTS?
func nextVideoPTS() -> CMTime {
guard var pts = self.lastVideoRawPTS else { return CMTime.invalid }
let framesPerSecond = 1.0 / self.videoConfig.fps
let delta = CMTime(value: Int64(framesPerSecond * Double(pts.timescale)),
timescale: pts.timescale, flags: pts.flags, epoch: pts.epoch)
pts = CMTimeAdd(pts, delta)
return pts
Tell me, if you also need code that creates blank/silent video/audio buffers :)
I’m trying to create an app that can record video at 100 FPS using AVAssetWriter AND detect if a person is performing an action using the ActionClassifier from Create ML. But when I try to put the 2 together the FPS drops to 30 when recording and detecting actions.
If I do the recording by itself then it records at 100 FPS.
I am able to set the FPS of the camera to 100 FPS through the device configuration.
Capture output Function is setup
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
bufferImage = sampleBuffer
guard let calibrationData = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, attachmentModeOut: nil) as? Data else {
cameraCalibrationMatrix = calibrationData.withUnsafeBytes { $0.pointee }
if self.isPredictorActivated == true {
do {
let poses = try predictor.processFrame(sampleBuffer)
if (predictor.isReadyToMakePrediction) {
let prediction = try predictor.makePrediction()
let confidence = prediction.confidence * 100
DispatchQueue.main.async {
self.predictionLabel.text = prediction.label + " " + String(confidence.rounded(toPlaces: 0))
if (prediction.label == "HandsUp" && prediction.confidence > 0.85) {
} catch {
let presentationTimeStamp = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
if assetWriter == nil {
createWriterInput(for: presentationTimeStamp)
} else {
let chunkDuration = CMTimeGetSeconds(CMTimeSubtract(presentationTimeStamp, chunkStartTime))
// print("Challenge\(isChallenging)")
if chunkDuration > 1500 || isChallenging {
assetWriter.endSession(atSourceTime: presentationTimeStamp)
// make a copy, as finishWriting is asynchronous
let newChunkURL = chunkOutputURL!
let chunkAssetWriter = assetWriter!
chunkAssetWriter.finishWriting {
print("finishWriting says: \(chunkAssetWriter.status.rawValue) \(String(describing: chunkAssetWriter.error))")
print("queuing \(newChunkURL)")
print("Chunk Duration: \(chunkDuration)")
let asset = AVAsset(url: newChunkURL)
print("FPS of CHUNK \(asset.tracks.first?.nominalFrameRate)")
if self.isChallenging {
self.challengeVideoProcess(video: asset)
self.isChallenging = false
createWriterInput(for: presentationTimeStamp)
if !assetWriterInput.append(sampleBuffer) {
print("append says NO: \(assetWriter.status.rawValue) \(String(describing: assetWriter.error))")
Performing action classification is quite expensive if you want to run it every frame so it may affect overall performance of the app (including video footage FPS). I don't know how often you need prediction but I would suggest you to try running Action Classifier 2-3 times per second maximum and see if that helps.
Running action classifier every frame won't change your classification that much because you're adding just one frame to your classifier action window so there is no need to run it so often.
For example if your action classifier was setup with window 3s and trained on 30fps videos, your classification is based on 3 * 30 = 90 frames. One frame won't make a difference.
Also make sure that your 100fps matches footage that you used for training action classifier. Otherwise you can get wrong predictions because running Action Classifier trained on 30fps video will treat 1s of 100fps footage as more than 3,333s.
I am trying to record 60 fps video by modifying AVCAM application which can be found at:
Hence, it is normally getting 2-30 fps with my phone (iPhone X) and I tried to change the format it captures video.
do {
// Choose the back dual camera if available, otherwise default to a wide angle camera.
if let dualCameraDevice = AVCaptureDevice.default(.builtInWideAngleCamera, for: .video, position: .front) {
defaultVideoDevice = dualCameraDevice
if let formats = defaultVideoDevice?.formats {
for format in formats{
let formatDesc = format.formatDescription
let frameRate = format.videoSupportedFrameRateRanges.first
if let frameRate = frameRate, frameRate.maxFrameRate == 60.0 {
try defaultVideoDevice?.lockForConfiguration()
print(frameRate.maxFrameRate) //here prints 60.0
defaultVideoDevice?.activeVideoMaxFrameDuration = CMTimeMake(1,60)
defaultVideoDevice?.activeVideoMinFrameDuration = CMTimeMake(1,60)
Here at the line of 'defaultVideoDevice?.activeVideoMaxFrameDuration = CMTimeMake(1,60)' I am getting this error :
2019-11-21 09:23:50.225376+0300 AVCam[1250:667986] * Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '* -[AVCaptureDevice setActiveVideoMaxFrameDuration:] Unsupported frame duration - use -activeFormat.videoSupportedFrameRateRanges to discover valid ranges'
Thanks in advance.
You should set a valid format to AVCaptureDevice.
You can do this
// get device what you like
let device = xxxxx
// list all default formats for this device
for format in device.formats {
var founded = false
// check what you want and pick a perfect format.
let formatDesc = format.formatDescription
// mediaType / SubType
let mediaType = format.mediaSubType
// if your target is above(equal) iOS 13. use formatDesc.mediaSubType
let mediaSubType = CMFormatDescriptionGetMediaSubType(formatDesc)
// dimensions
// if your target is above(equal) iOS 13. use formatDesc.dimensions
let dimensions = CMVideoFormatDescriptionGetDimensions(formatDesc)
// fps
let ranges = format.videoSupportedFrameRateRanges.first
for supportedFPSRange in ranges {
if supportedFPSRange.maxFrameRate == 60 {
founded = true
// support Multi cam
let isMultiCamSupported = format.isMultiCamSupported
if founded {
// Set activeFormat for device! Your capture device is up and loaded.
do {
try device.lockForConfiguration()
device.activeFormat = format
} catch {
// catch some locking error
// Don't forget break the loop.:p
Or you can use filter
I am working on an application that plays back video and allows the user to scrub forwards and backwards in the video. The scrubbing has to happen smoothly, so we always re-write the video with SDAVAssetExportSession with the video compression property AVVideoMaxKeyFrameIntervalKey:#1 so that each frame will be a keyframe and allow smooth reverse scrubbing. This works great and provides smooth playback. The application uses video from a variety of sources and can be recorded on android or iOS devices and even downloaded from the web and added to the application, so we end up with quite different encodings, some of which are already suited for scrubbing (each frame is a keyframe). Is there a way to detect the keyframe interval of a video file so I can avoid needless video processing? I have been through much of AVFoundation's docs and don't see an obvious way to get this information. Thanks for any help on this.
If you can quickly parse the file without decoding the images by creating an AVAssetReaderTrackOutput with nil outputSettings. The frame sample buffers you encounter have an attachment array containing a dictionary with useful information, include whether the frame depends on other frames, or whether other frames depend on it. I would interpret that former as indicating a keyframe, although it gives me some low number (4% keyframes in one file?). Anyway, the code:
let asset = AVAsset(url: inputUrl)
let reader = try! AVAssetReader(asset: asset)
let videoTrack = asset.tracks(withMediaType: AVMediaTypeVideo)[0]
let trackReaderOutput = AVAssetReaderTrackOutput(track: videoTrack, outputSettings: nil)
var numFrames = 0
var keyFrames = 0
while true {
if let sampleBuffer = trackReaderOutput.copyNextSampleBuffer() {
// NB: not every sample buffer corresponds to a frame!
if CMSampleBufferGetNumSamples(sampleBuffer) > 0 {
numFrames += 1
if let attachmentArray = CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false) as? NSArray {
let attachment = attachmentArray[0] as! NSDictionary
// print("attach on frame \(frame): \(attachment)")
if let depends = attachment[kCMSampleAttachmentKey_DependsOnOthers] as? NSNumber {
if !depends.boolValue {
keyFrames += 1
} else {
print("\(keyFrames) on \(numFrames)")
N.B. This only works for local file assets.
p.s. you don't say how you're scrubbing or playing. An AVPlayerViewController and an AVPlayer?
Here is the Objective C version of the same answer. After implementing this and using it, Videos that should have all keyframes are returning about 96% keyframes from this code. I'm not sure why, so I am using that number as a determining factor even though I would like it to be more accurate. I am also only looking through the first 600 frames or the end of the video (whichever comes first) since I don't need to read through a whole 20 minute video to make this determination.
+ (BOOL)videoNeedsProcessingForSlomo:(NSURL*)fileUrl {
BOOL needsProcessing = YES;
AVAsset* anAsset = [AVAsset assetWithURL:fileUrl];
NSError *error;
AVAssetReader *assetReader = [AVAssetReader assetReaderWithAsset:anAsset error:&error];
if (error) {
DLog(#"Error:%#", error.localizedDescription);
return YES;
AVAssetTrack *videoTrack = [[anAsset tracksWithMediaType:AVMediaTypeVideo] objectAtIndex:0];
AVAssetReaderTrackOutput *trackOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:videoTrack outputSettings:nil];
[assetReader addOutput:trackOutput];
[assetReader startReading];
float numFrames = 0;
float keyFrames = 0;
while (numFrames < 600) { // If the video is long - only parse through 20 seconds worth.
CMSampleBufferRef sampleBuffer = [trackOutput copyNextSampleBuffer];
if (sampleBuffer) {
// NB: not every sample buffer corresponds to a frame!
if (CMSampleBufferGetNumSamples(sampleBuffer) > 0) {
numFrames += 1;
NSArray *attachmentArray = ((NSArray*)CMSampleBufferGetSampleAttachmentsArray(sampleBuffer, false));
if (attachmentArray) {
NSDictionary *attachment = attachmentArray[0];
NSNumber *depends = attachment[(__bridge NSNumber*)kCMSampleAttachmentKey_DependsOnOthers];
if (depends) {
if (depends.boolValue) {
keyFrames += 1;
else {
needsProcessing = keyFrames / numFrames < 0.95f; // If more than 95% of the frames are keyframes - don't decompress.
return needsProcessing;
Using kCMSampleAttachmentKey_DependsOnOthers was giving me 0 key frames in some cases, when ffprobe would return key frames.
To get the same number of key frames as ffprobe shows, I used:
if attachment[CMSampleBuffer.PerSampleAttachmentsDictionary.Key.notSync] == nil {
keyFrames += 1
In the CoreMedia header it says:
/// Boolean (absence of this key implies Sync)
public static let notSync: CMSampleBuffer.PerSampleAttachmentsDictionary.Key
for dependsOnOthers key it says:
/// `true` (e.g., non-I-frame), `false` (e.g. I-frame), or absent if
/// unknown
public static let dependsOnOthers: CMSampleBuffer.PerSampleAttachmentsDictionary.Key
I'm new to AVCaptureSession and wish to better understand how to work with it.
So I managed capturing the video stream as separated CIImages and convert them to UIImages.
Now I wish to be able to get the number of Frames Per Second captured and preferably to be able to set it.
Any idea how to do that?
AVCaptureConnection's videoMinFrameDuration is deprecated.
You can use AVCaptureDevice properties to detect supported video frame rate ranges and can assign minimum and maximum frame rates using properties.
device.activeFormat.videoSupportedFrameRateRanges return all video frame rates ranges supported by device.
device.activeVideoMinFrameDuration and device.activeVideoMaxFrameDuration can be used for specifying frame durations.
You could use AVCaptureConnection's videoMinFrameDuration accessor to set the value.
See the AVCaptureConnection documentation
Consider output be AVCaptureVideoDataOutput object.
AVCaptureConnection *conn = [output connectionWithMediaType:AVMediaTypeVideo];
if (conn.isVideoMinFrameDurationSupported)
conn.videoMinFrameDuration = CMTimeMake(1, CAPTURE_FRAMES_PER_SECOND);
if (conn.isVideoMaxFrameDurationSupported)
conn.videoMaxFrameDuration = CMTimeMake(1, CAPTURE_FRAMES_PER_SECOND);
More info, see my answer in this SO question
To set the capture session frame rate, you have to set it on the device using device.activeVideoMinFrameDuration and device.activeVideoMaxFrameDuration (if necessary).
In Swift 4 you can do something like this:
extension AVCaptureDevice {
func set(frameRate: Double) {
guard let range = activeFormat.videoSupportedFrameRateRanges.first,
range.minFrameRate...range.maxFrameRate ~= frameRate
else {
print("Requested FPS is not supported by the device's activeFormat !")
do { try lockForConfiguration()
activeVideoMinFrameDuration = CMTimeMake(value: 1, timescale: Int32(frameRate))
activeVideoMaxFrameDuration = CMTimeMake(value: 1, timescale: Int32(frameRate))
} catch {
print("LockForConfiguration failed with error: \(error.localizedDescription)")
And call it
device.set(frameRate: 60)
Do it like this
if let frameSupportRange = currentCamera.activeFormat.videoSupportedFrameRateRanges.first {
// currentCamera.activeVideoMinFrameDuration = CMTimeMake(1, Int32(frameSupportRange.maxFrameRate))
currentCamera.activeVideoMinFrameDuration = CMTimeMake(1, YOUR_FPS_RATE)
I have an AVPlayer which is streaming a live HLS stream.
When the user multitasks the app, I see the play rate drop to 0.0 (paused), when the user comes back it return to 1.0(play), but starts playing from the point it was paused.
What is the best way to force the player back to live without restarting the stream completely? Is there a seekToTime method that handles a closest to live time parameter?
I use:
double time = MAXFLOAT;
[player seekToTime: CMTimeMakeWithSeconds(time, NSEC_PER_SEC)];
Works well in my app.
Assuming player is an AVPlayer instance:
CMTimeRange seekableRange = [player.currentItem.seekableTimeRanges.lastObject CMTimeRangeValue];
CGFloat seekableStart = CMTimeGetSeconds(seekableRange.start);
CGFloat seekableDuration = CMTimeGetSeconds(seekableRange.duration);
CGFloat livePosition = seekableStart + seekableDuration;
[player seekToTime:CMTimeMake(livePosition, 1)];
Swift version of Igor Kulagin answer:
player.seek(to: kCMTimePositiveInfinity)
Works perfectly in any condition. Other solutions gave me NaN error calculating livePosition value, or {INVALID} error working directly with CMTime.
Swift 3.0 Version
public func resumeLive() {
guard let livePosition = player.currentItem?.seekableTimeRanges.last as? CMTimeRange else {
Swift version of Karim Mourra's answer:
let seekableRanges = player.currentItem!.seekableTimeRanges
guard seekableRanges.count > 0 else {
let range = seekableRanges.last!.CMTimeRangeValue
let livePosition = range.start + range.duration
let minus = CMTimeMakeWithSeconds(Float64(timeOffset), Int32(NSEC_PER_SEC))
let time = livePosition - minus
Swift 4 version:
if let seekableRange = player.currentItem?.seekableTimeRanges.last?.timeRangeValue {
let seekableStart = seekableRange.start
let seekableDuration = seekableRange.duration
let livePosition = seekableStart + seekableDuration
player.seek(to: livePosition, completionHandler: { [weak self] _ in
No need to convert to floating point if you use Apple's CMTimeRange manipulation functions:
NSValue *value = player.currentItem.seekableTimeRanges.lastObject;
if (value) {
CMTimeRange seekableRange = [value CMTimeRangeValue];
CMTime latestTime = CMTimeRangeGetEnd(seekableRange);
[player seekToTime:latestTime];
} else {
// there are no seekable time ranges
Please see #Fabian's comment below.