AVCaptureSession and AVCaptureMovieFileOutput frame timestamp - ios

I am recording a movie with AVCaptureSession and AVCaptureMovieFileOutput. I am also recording acceleration data and trying to align the acceleration data with the video.
I am trying to figure out a way to get the time the video file recording started. I am doing the following:
currentDate = [NSDate date];
[output startRecordingToOutputFileURL:fileUrl recordingDelegate:self];
However, according to my tests, the video recording starts 0.12 seconds before the call to startRecordingToOutputFileURL is made. I'm assuming this is because the various video buffers are already full of data which get added to the file.
Is there anyway to get the actual NSDate of the first frame of the video?

I had the same issue and I finally found the answer. I will write all code below this, but the missing piece I was looking for was:
self.captureSession.masterClock!.time
The masterClock in the captureSession is the clock where the relative time every buffer is based on (presentationTimeStamp).
Full code and explanation
First thing you have to do is convert the AVCaptureMovieFileOutput to AVCaptureVideoDataOutput and AVCaptureAudioDataOutput. So make sure your class implements AVCaptureVideoDataOutputSampleBufferDelegate and AVCaptureAudioDataOutputSampleBufferDelegate. They share the same function, so add it to your class (implementation I will get to later):
let videoDataOutput = AVCaptureVideoDataOutput()
let audioDataOutput = AVCaptureAudioDataOutput()
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// I will get to this
}
At the capture session adding the output my code looks like this (you can change the videoOrientation and other things if you want)
if captureSession.canAddInput(cameraInput)
&& captureSession.canAddInput(micInput)
// && captureSession.canAddOutput(self.movieFileOutput)
&& captureSession.canAddOutput(self.videoDataOutput)
&& captureSession.canAddOutput(self.audioDataOutput)
{
captureSession.beginConfiguration()
captureSession.addInput(cameraInput)
captureSession.addInput(micInput)
// self.captureSession.addOutput(self.movieFileOutput)
let videoAudioDataOutputQueue = DispatchQueue(label: "com.myapp.queue.video-audio-data-output") //Choose any label you want
self.videoDataOutput.alwaysDiscardsLateVideoFrames = false
self.videoDataOutput.setSampleBufferDelegate(self, queue: videoAudioDataOutputQueue)
self.captureSession.addOutput(self.videoDataOutput)
self.audioDataOutput.setSampleBufferDelegate(self, queue: videoAudioDataOutputQueue)
self.captureSession.addOutput(self.audioDataOutput)
if let connection = self.videoDataOutput.connection(with: .video) {
if connection.isVideoStabilizationSupported {
connection.preferredVideoStabilizationMode = .auto
}
if connection.isVideoOrientationSupported {
connection.videoOrientation = .portrait
}
}
self.captureSession.commitConfiguration()
DispatchQueue.global(qos: .userInitiated).async {
self.captureSession.startRunning()
}
}
To write the video like you would with AVCaptureMovieFileOutput, you can use AVAssetWriter. So add the following to your class:
var videoWriter: AVAssetWriter?
var videoWriterInput: AVAssetWriterInput?
var audioWriterInput: AVAssetWriterInput?
private func setupWriter(url: URL) {
self.videoWriter = try! AVAssetWriter(outputURL: url, fileType: AVFileType.mov)
self.videoWriterInput = AVAssetWriterInput(mediaType: .video, outputSettings: self.videoDataOutput.recommendedVideoSettingsForAssetWriter(writingTo: AVFileType.mov))
self.videoWriterInput!.expectsMediaDataInRealTime = true
self.videoWriter!.add(self.videoWriterInput!)
self.audioWriterInput = AVAssetWriterInput(mediaType: .audio, outputSettings: self.audioDataOutput.recommendedAudioSettingsForAssetWriter(writingTo: AVFileType.mov))
self.audioWriterInput!.expectsMediaDataInRealTime = true
self.videoWriter!.add(self.audioWriterInput!)
self.videoWriter!.startWriting()
}
Every time you want to record, you first need to setup the writer. The startWriting function doesn't actually start writing to the file, but prepares the writer that something will be written soon.
The next code we will add the code to start or stop recording. But please note I still need to fix the stopRecording. stopRecording actually finishes recording too soon, because the buffer is always delayed. But maybe that doesn't matter to you.
var isRecording = false
var recordFromTime: CMTime?
var sessionAtSourceTime: CMTime?
func startRecording(url: URL) {
guard !self.isRecording else { return }
self.isRecording = true
self.sessionAtSourceTime = nil
self.recordFromTime = self.captureSession.masterClock!.time //This is very important, because based on this time we will start recording appropriately
self.setupWriter(url: url)
//You can let a delegate or something know recording has started now
}
func stopRecording() {
guard self.isRecording else { return }
self.isRecording = false
self.videoWriter?.finishWriting { [weak self] in
self?.sessionAtSourceTime = nil
guard let url = self?.videoWriter?.outputURL else { return }
//Notify finished recording and pass url if needed
}
}
And finally the implementation of the function we mentioned at the beginning of this post:
private func canWrite() -> Bool {
return self.isRecording && self.videoWriter != nil && self.videoWriter!.status == .writing
}
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard CMSampleBufferDataIsReady(sampleBuffer), self.canWrite() else { return }
//sessionAtSourceTime is the first buffer we will write to the file
if self.sessionAtSourceTime == nil {
//Make sure we start by capturing the videoDataOutput (if we start with the audio the file gets corrupted)
guard output == self.videoDataOutput else { return }
//Make sure we don't start recording until the buffer reaches the correct time (buffer is always behind, this will fix the difference in time)
guard sampleBuffer.presentationTimeStamp >= self.recordFromTime! else { return }
self.sessionAtSourceTime = sampleBuffer.presentationTimeStamp
self.videoWriter!.startSession(atSourceTime: sampleBuffer.presentationTimeStamp)
}
if output == self.videoDataOutput {
if self.videoWriterInput!.isReadyForMoreMediaData {
self.videoWriterInput!.append(sampleBuffer)
}
} else if output == self.audioDataOutput {
if self.audioWriterInput!.isReadyForMoreMediaData {
self.audioWriterInput!.append(sampleBuffer)
}
}
}
So the most important thing that fixes the time difference start recording and your own code is the self.captureSession.masterClock!.time. We look at the buffer relative time until it reaches the time you started recording. If you want to fix the end time as well, just add a variable recordUntilTime and check if in the didOutput sampleBuffer method.

if i get your question correctly, you want to know the timestamp of when the first frame is recorded. you could try
CMTime captureStartTime = nil;
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection {
if !captureStartTime{
captureStartTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
}
// do the other things you want
}

Related

How to send CMSampleBuffer to WebRTC?

So I am using Replaykit to try stream my phone screen on a web browser.
override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
//if source!.isSocketConnected {
switch sampleBufferType {
case RPSampleBufferType.video:
// Handle video sample buffer
break
case RPSampleBufferType.audioApp:
// Handle audio sample buffer for app audio
break
case RPSampleBufferType.audioMic:
// Handle audio sample buffer for mic audio
break
#unknown default:
break
}
}
So how do we send that data to WebRTC?
In order to use WebRTC, I learned that you need a signaling server.
Is it possible to start a signaling server on your mobile, just like http server?
Hi Sam WebRTC have one function which can process CMSampleBuffer frames to get Video Frames. But it is working with CVPixelBuffer. So you have to firstly convert your CMSampleBuffer to CVPixelBuffer. And than add this frames into your localVideoSource with RTCVideoCapturer. i have solved similar problem on AVCaptureVideoDataOutputSampleBufferDelegate. This delegate produces CMSampleBuffer as ReplayKit. i hope that below code lines could be help to you. You can try at the below code lines to solve your problem.
private var videoCapturer: RTCVideoCapturer?
private var localVideoSource = RTCClient.factory.videoSource()
private var localVideoTrack: RTCVideoTrack?
private var remoteVideoTrack: RTCVideoTrack?
private var peerConnection: RTCPeerConnection? = nil
public static let factory: RTCPeerConnectionFactory = {
RTCInitializeSSL()
let videoEncoderFactory = RTCDefaultVideoEncoderFactory()
let videoDecoderFactory = RTCDefaultVideoDecoderFactory()
return RTCPeerConnectionFactory(encoderFactory: videoEncoderFactory, decoderFactory: videoDecoderFactory)
}()
extension RTCClient : AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
print("didOutPut: \(sampleBuffer)")
guard let imageBuffer: CVImageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
let timeStampNs: Int64 = Int64(CMTimeGetSeconds(CMSampleBufferGetPresentationTimeStamp(sampleBuffer)) * 1000000000)
let rtcPixlBuffer = RTCCVPixelBuffer(pixelBuffer: imageBuffer)
let rtcVideoFrame = RTCVideoFrame(buffer: rtcPixlBuffer, rotation: ._90, timeStampNs: timeStampNs)
self.localVideoSource.capturer(videoCapturer!, didCapture: rtcVideoFrame)
}
}
Also you need configuration like that for mediaSender,
func createMediaSenders() {
let streamId = "stream"
let videoTrack = self.createVideoTrack()
self.localVideoTrack = videoTrack
self.peerConnection!.add(videoTrack, streamIds: [streamId])
self.remoteVideoTrack = self.peerConnection!.transceivers.first { $0.mediaType == .video }?.receiver.track as? RTCVideoTrack
}
private func createVideoTrack() -> RTCVideoTrack {
let videoTrack = RTCClient.factory.videoTrack(with: self.videoSource, trackId: "video0")
return videoTrack
}

Unwanted "smoothing" in AVDepthData on iPhone 13 (not evident in iPhone 12)

We are writing an app which analyzes a real world 3D data by using the TrueDepth camera on the front of an iPhone, and an AVCaptureSession configured to produce AVDepthData along with image data. This worked great on iPhone 12, but the same code on iPhone 13 produces an unwanted "smoothing" effect which makes the scene impossible to process and breaks our app. We are unable to find any information on this effect, from Apple or otherwise, much less how to avoid it, so we are asking you experts.
At the bottom of this post (Figure 3) is our code which configures the capture session, using an AVCaptureDataOutputSynchronizer, to produce frames of 640x480 image and depth data. I boiled it down as much as possible, sorry it's so long. The main two parts are the configure function, which sets up our capture session, and the dataOutputSynchronizer function, near the bottom, which fires when a sycned set of data is available. In the latter function I've included my code which extracts the information from the AVDepthData object, including looping through all 640x480 depth data points (in meters). I've excluded further processing for brevity (believe it or not :)).
On an iPhone 12 device, the PNG data and the depth data merge nicely. The front view and side view of the merged pointcloud are below (Figure 1) . The angles visible in the side view are due to the application of the focal length which "de-perspectives" the data and places them in their proper position in xyz space.
The same code on an iPhone 13 produces depth maps that result in point cloud further below (Figure 2 -- straight on view, angled view, and side view). There is no longer any clear distinction between objects and the background becasue the depth data appears to be "smoothed" between the mannequin and the background -- i.e., there are seven or eight points between the subject and background that are not realistic and make it impossible to do any meaningful processing such as segmenting the scene.
Has anyone else encountered this issue, or have any insight into how we might change our code to avoid it? Any help or ideas are MUCH appreciated, since this is a definite showstopper (we can't tell people to only run our App on older phones :)). Thank you!
Figure 1 -- Merged depth data and image into point cloud, from iPhone 12
Figure 2 -- Merged depth data and image into point cloud, from iPhone 13; unwanted smoothing effect visible
Figure 3 -- Our configuration code and capture handler; edited to remove downstream processing of captured data (which was basically formatting it into an XML file and uploading to the cloud)
import Foundation
import Combine
import AVFoundation
import Photos
import UIKit
import FirebaseStorage
public struct AlertError {
public var title: String = ""
public var message: String = ""
public var primaryButtonTitle = "Accept"
public var secondaryButtonTitle: String?
public var primaryAction: (() -> ())?
public var secondaryAction: (() -> ())?
public init(title: String = "", message: String = "", primaryButtonTitle: String = "Accept", secondaryButtonTitle: String? = nil, primaryAction: (() -> ())? = nil, secondaryAction: (() -> ())? = nil) {
self.title = title
self.message = message
self.primaryAction = primaryAction
self.primaryButtonTitle = primaryButtonTitle
self.secondaryAction = secondaryAction
}
}
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
//
//
// this is the CameraService class, which configures and runs a capture session
// which acquires syncronized image and depth data
// using an AVCaptureDataOutputSynchronizer
//
//
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
public class CameraService: NSObject,
AVCaptureVideoDataOutputSampleBufferDelegate,
AVCaptureDepthDataOutputDelegate,
AVCaptureDataOutputSynchronizerDelegate,
MyFirebaseProtocol,
ObservableObject{
#Published public var shouldShowAlertView = false
#Published public var shouldShowSpinner = false
public var labelStatus: String = "Ready"
var images: [UIImage?] = []
public var alertError: AlertError = AlertError()
public let session = AVCaptureSession()
var isSessionRunning = false
var isConfigured = false
var setupResult: SessionSetupResult = .success
private let sessionQueue = DispatchQueue(label: "session queue") // Communicate with the session and other session objects on this queue.
#objc dynamic var videoDeviceInput: AVCaptureDeviceInput!
private let videoDeviceDiscoverySession = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInTrueDepthCamera], mediaType: .video, position: .front)
var videoCaptureDevice : AVCaptureDevice? = nil
let videoDataOutput: AVCaptureVideoDataOutput = AVCaptureVideoDataOutput() // Define frame output.
let depthDataOutput = AVCaptureDepthDataOutput()
var outputSynchronizer: AVCaptureDataOutputSynchronizer? = nil
let dataOutputQueue = DispatchQueue(label: "video data queue", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem)
var scanStateCounter: Int = 0
var m_DepthDatasetsToUpload = [AVCaptureSynchronizedDepthData]()
var m_FrameBufferToUpload = [AVCaptureSynchronizedSampleBufferData]()
var firebaseDepthDatasetsArray: [String] = []
#Published var firebaseImageUploadCount = 0
#Published var firebaseTextFileUploadCount = 0
public func configure() {
/*
Setup the capture session.
In general, it's not safe to mutate an AVCaptureSession or any of its
inputs, outputs, or connections from multiple threads at the same time.
Don't perform these tasks on the main queue because
AVCaptureSession.startRunning() is a blocking call, which can
take a long time. Dispatch session setup to the sessionQueue, so
that the main queue isn't blocked, which keeps the UI responsive.
*/
sessionQueue.async {
self.configureSession()
}
}
// MARK: Checks for user's permisions
public func checkForPermissions() {
switch AVCaptureDevice.authorizationStatus(for: .video) {
case .authorized:
// The user has previously granted access to the camera.
break
case .notDetermined:
/*
The user has not yet been presented with the option to grant
video access. Suspend the session queue to delay session
setup until the access request has completed.
*/
sessionQueue.suspend()
AVCaptureDevice.requestAccess(for: .video, completionHandler: { granted in
if !granted {
self.setupResult = .notAuthorized
}
self.sessionQueue.resume()
})
default:
// The user has previously denied access.
setupResult = .notAuthorized
DispatchQueue.main.async {
self.alertError = AlertError(title: "Camera Access", message: "SwiftCamera doesn't have access to use your camera, please update your privacy settings.", primaryButtonTitle: "Settings", secondaryButtonTitle: nil, primaryAction: {
UIApplication.shared.open(URL(string: UIApplication.openSettingsURLString)!,
options: [:], completionHandler: nil)
}, secondaryAction: nil)
self.shouldShowAlertView = true
}
}
}
// MARK: Session Management
// Call this on the session queue.
/// - Tag: ConfigureSession
private func configureSession() {
if setupResult != .success {
return
}
session.beginConfiguration()
session.sessionPreset = AVCaptureSession.Preset.vga640x480
// Add video input.
do {
var defaultVideoDevice: AVCaptureDevice?
let frontCameraDevice = AVCaptureDevice.default(.builtInTrueDepthCamera, for: .video, position: .front)
// If the rear wide angle camera isn't available, default to the front wide angle camera.
defaultVideoDevice = frontCameraDevice
videoCaptureDevice = defaultVideoDevice
guard let videoDevice = defaultVideoDevice else {
print("Default video device is unavailable.")
setupResult = .configurationFailed
session.commitConfiguration()
return
}
let videoDeviceInput = try AVCaptureDeviceInput(device: videoDevice)
if session.canAddInput(videoDeviceInput) {
session.addInput(videoDeviceInput)
self.videoDeviceInput = videoDeviceInput
} else if session.inputs.isEmpty == false {
self.videoDeviceInput = videoDeviceInput
} else {
print("Couldn't add video device input to the session.")
setupResult = .configurationFailed
session.commitConfiguration()
return
}
} catch {
print("Couldn't create video device input: \(error)")
setupResult = .configurationFailed
session.commitConfiguration()
return
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// MARK: add video output to session
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
videoDataOutput.videoSettings = [(kCVPixelBufferPixelFormatTypeKey as NSString) : NSNumber(value: kCVPixelFormatType_32BGRA)] as [String : Any]
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.setSampleBufferDelegate(self, queue: DispatchQueue(label: "camera_frame_processing_queue"))
if session.canAddOutput(self.videoDataOutput) {
session.addOutput(self.videoDataOutput)
} else if session.outputs.contains(videoDataOutput) {
} else {
print("Couldn't create video device output")
setupResult = .configurationFailed
session.commitConfiguration()
return
}
guard let connection = self.videoDataOutput.connection(with: AVMediaType.video),
connection.isVideoOrientationSupported else { return }
connection.videoOrientation = .portrait
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// MARK: add depth output to session
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Add a depth data output
if session.canAddOutput(depthDataOutput) {
session.addOutput(depthDataOutput)
depthDataOutput.isFilteringEnabled = false
//depthDataOutput.setDelegate(T##delegate: AVCaptureDepthDataOutputDelegate?##AVCaptureDepthDataOutputDelegate?, callbackQueue: <#T##DispatchQueue?#>)
depthDataOutput.setDelegate(self, callbackQueue: DispatchQueue(label: "depth_frame_processing_queue"))
if let connection = depthDataOutput.connection(with: .depthData) {
connection.isEnabled = true
} else {
print("No AVCaptureConnection")
}
} else if session.outputs.contains(depthDataOutput){
} else {
print("Could not add depth data output to the session")
session.commitConfiguration()
return
}
// Search for highest resolution with half-point depth values
let depthFormats = videoCaptureDevice!.activeFormat.supportedDepthDataFormats
let filtered = depthFormats.filter({
CMFormatDescriptionGetMediaSubType($0.formatDescription) == kCVPixelFormatType_DepthFloat16
})
let selectedFormat = filtered.max(by: {
first, second in CMVideoFormatDescriptionGetDimensions(first.formatDescription).width < CMVideoFormatDescriptionGetDimensions(second.formatDescription).width
})
do {
try videoCaptureDevice!.lockForConfiguration()
videoCaptureDevice!.activeDepthDataFormat = selectedFormat
videoCaptureDevice!.unlockForConfiguration()
} catch {
print("Could not lock device for configuration: \(error)")
session.commitConfiguration()
return
}
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
// Use an AVCaptureDataOutputSynchronizer to synchronize the video data and depth data outputs.
// The first output in the dataOutputs array, in this case the AVCaptureVideoDataOutput, is the "master" output.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////
outputSynchronizer = AVCaptureDataOutputSynchronizer(dataOutputs: [videoDataOutput, depthDataOutput])
outputSynchronizer!.setDelegate(self, queue: dataOutputQueue)
session.commitConfiguration()
self.isConfigured = true
//self.start()
}
// MARK: Device Configuration
/// - Tag: Stop capture session
public func stop(completion: (() -> ())? = nil) {
sessionQueue.async {
//print("entered stop")
if self.isSessionRunning {
//print(self.setupResult)
if self.setupResult == .success {
//print("entered success")
DispatchQueue.main.async{
self.session.stopRunning()
self.isSessionRunning = self.session.isRunning
if !self.session.isRunning {
DispatchQueue.main.async {
completion?()
}
}
}
}
}
}
}
/// - Tag: Start capture session
public func start() {
// We use our capture session queue to ensure our UI runs smoothly on the main thread.
sessionQueue.async {
if !self.isSessionRunning && self.isConfigured {
switch self.setupResult {
case .success:
self.session.startRunning()
self.isSessionRunning = self.session.isRunning
if self.session.isRunning {
}
case .configurationFailed, .notAuthorized:
print("Application not authorized to use camera")
DispatchQueue.main.async {
self.alertError = AlertError(title: "Camera Error", message: "Camera configuration failed. Either your device camera is not available or its missing permissions", primaryButtonTitle: "Accept", secondaryButtonTitle: nil, primaryAction: nil, secondaryAction: nil)
self.shouldShowAlertView = true
}
}
}
}
}
// ------------------------------------------------------------------------
// MARK: CAPTURE HANDLERS
// ------------------------------------------------------------------------
public func dataOutputSynchronizer(_ synchronizer: AVCaptureDataOutputSynchronizer, didOutput synchronizedDataCollection: AVCaptureSynchronizedDataCollection) {
//printWithTime("Capture")
guard let syncedDepthData: AVCaptureSynchronizedDepthData =
synchronizedDataCollection.synchronizedData(for: depthDataOutput) as? AVCaptureSynchronizedDepthData else {
return
}
guard let syncedVideoData: AVCaptureSynchronizedSampleBufferData =
synchronizedDataCollection.synchronizedData(for: videoDataOutput) as? AVCaptureSynchronizedSampleBufferData else {
return
}
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
//
//
// Below is the code that extracts the information from depth data
// The depth data is 640x480, which matches the size of the synchronized image
// I save this info to a file, upload it to the cloud, and merge it with the image
// on a PC to create a pointcloud
//
//
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
let depth_data : AVDepthData = syncedDepthData.depthData
let cvpixelbuffer : CVPixelBuffer = depth_data.depthDataMap
let height : Int = CVPixelBufferGetHeight(cvpixelbuffer)
let width : Int = CVPixelBufferGetWidth(cvpixelbuffer)
let quality : AVDepthData.Quality = depth_data.depthDataQuality
let accuracy : AVDepthData.Accuracy = depth_data.depthDataAccuracy
let pixelsize : Float = depth_data.cameraCalibrationData!.pixelSize
let camcaldata : AVCameraCalibrationData = depth_data.cameraCalibrationData!
let intmat : matrix_float3x3 = camcaldata.intrinsicMatrix
let cal_lensdistort_x : CGFloat = camcaldata.lensDistortionCenter.x
let cal_lensdistort_y : CGFloat = camcaldata.lensDistortionCenter.y
let cal_matrix_width : CGFloat = camcaldata.intrinsicMatrixReferenceDimensions.width
let cal_matrix_height : CGFloat = camcaldata.intrinsicMatrixReferenceDimensions.height
let intrinsics_fx : Float = camcaldata.intrinsicMatrix.columns.0.x
let intrinsics_fy : Float = camcaldata.intrinsicMatrix.columns.1.y
let intrinsics_ox : Float = camcaldata.intrinsicMatrix.columns.2.x
let intrinsics_oy : Float = camcaldata.intrinsicMatrix.columns.2.y
let pixelformattype : OSType = CVPixelBufferGetPixelFormatType(cvpixelbuffer)
CVPixelBufferLockBaseAddress(cvpixelbuffer, CVPixelBufferLockFlags(rawValue: 0))
let int16Buffer = unsafeBitCast(CVPixelBufferGetBaseAddress(cvpixelbuffer), to: UnsafeMutablePointer<Float16>.self)
let int16PerRow = CVPixelBufferGetBytesPerRow(cvpixelbuffer) / 2
for x in 0...height-1
{
for y in 0...width-1
{
let luma = int16Buffer[x * int16PerRow + y]
/////////////////////////
// SAVE DEPTH VALUE 'luma' to FILE FOR PROCESSING
}
}
CVPixelBufferUnlockBaseAddress(cvpixelbuffer, CVPixelBufferLockFlags(rawValue: 0))
}

AVAssetWriter async video and audio after calling broadcastPaused()

I am trying to record a video with sound from device screen using ReplayKit and RPBroadcastSampleHandler.
When i record it just using "start broadcast" and "stop" the result i get is great.
But if i try to pause recording (by using red status bar) i got problems. The result i got is video and audio with different length (audio is shorter but have all i need). On the recording i got video and audio that start being async after moment of tapping status bar(ios14). Audio goes good, but video freezing when status bar tapped and continue when modal window closed. As result i got video without audio in the end.
Here is my code:
1.All class fields i have:
class SampleHandler: RPBroadcastSampleHandler {
private let videoService = VideoService()
private let audioService = AudioService()
private var isRecording = false
private let lock = NSLock()
private var finishCalled = false
private var videoWriter: AVAssetWriter!
private var videoWriterInput: AVAssetWriterInput!
private var microphoneWriterInput: AVAssetWriterInput!
private var sessionBeginAtSourceTime: CMTime!
2.Some configure on start capturing:
override func broadcastStarted(withSetupInfo setupInfo: [String : NSObject]?) {
guard !isRecording else { return }
isRecording = true
BroadcastData.clear()
BroadcastData.startVideoDate = Date()
BroadcastData.status = .writing
sessionBeginAtSourceTime = nil
configurateVideoWriter()
}
private func configurateVideoWriter() {
let outputFileLocation = videoService.getVideoFileLocation()
videoWriter = try? AVAssetWriter.init(outputURL: outputFileLocation,
fileType: AVFileType.mov)
configurateVideoWriterInput()
configurateMicrophoneWriterInput()
if videoWriter.canAdd(videoWriterInput) { videoWriter.add(videoWriterInput) }
if videoWriter.canAdd(microphoneWriterInput) { videoWriter.add(microphoneWriterInput) }
videoWriter.startWriting()
}
private func configurateVideoWriterInput() {
let RESOLUTION_COEF: CGFloat = 16
let naturalWidth = UIScreen.main.bounds.width
let naturalHeight = UIScreen.main.bounds.height
let width = naturalWidth - naturalWidth.truncatingRemainder(dividingBy: RESOLUTION_COEF)
let height = naturalHeight - naturalHeight.truncatingRemainder(dividingBy: RESOLUTION_COEF)
let videoSettings: [String: Any] = [
AVVideoCodecKey: AVVideoCodecType.h264,
AVVideoWidthKey: width,
AVVideoHeightKey: height
]
videoWriterInput = AVAssetWriterInput(mediaType: .video,
outputSettings: videoSettings)
videoWriterInput.expectsMediaDataInRealTime = true
}
private func configurateMicrophoneWriterInput() {
let audioOutputSettings: [String : Any] = [
AVFormatIDKey: kAudioFormatMPEG4AAC,
AVNumberOfChannelsKey : 1,
AVSampleRateKey : 44100.0,
AVEncoderBitRateKey: 96000
]
microphoneWriterInput = AVAssetWriterInput(mediaType: .audio,
outputSettings: audioOutputSettings)
microphoneWriterInput.expectsMediaDataInRealTime = true
}
3.Write process:
override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType:
RPSampleBufferType) {
guard isRecording && videoWriter?.status == .writing else { return }
if BroadcastData.status != .writing {
isRecording = false
finishBroadCast()
return
}
if sessionBeginAtSourceTime == nil {
sessionBeginAtSourceTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
videoWriter.startSession(atSourceTime: sessionBeginAtSourceTime!)
}
switch sampleBufferType {
case .video:
if videoWriterInput.isReadyForMoreMediaData {
videoWriterInput.append(sampleBuffer)
}
case .audioMic:
if microphoneWriterInput.isReadyForMoreMediaData {
microphoneWriterInput.append(sampleBuffer)
}
case .audioApp:
break
#unknown default:
print("unknown")
}
}
4.Pause and resume
override func broadcastPaused() {
super.broadcastPaused()
}
override func broadcastResumed() {
super.broadcastResumed()
}
Pausing and resuming the recording creates a gap in the video presentation timestamps and a discontinuity in the audio, which I believe explains your symptoms.
What you need to do is measure how long the recording was paused for, possibly using the sample buffer timestamps, and then subtract that offset from the presentation timestamps of all the subsequent CMSampleBuffer's that you process. CMSampleBufferCreateCopyWithNewTiming() can help you with this.

Real-time AVAssetWriter synchronise audio and video when pausing/resuming

I am trying to record a video with sound using iPhone's front camera. As I need to also support pause/resume functionality, I need to use AVAssetWriter. I've found an example online, written in Objective-C, which almost achieves the desired functionality (http://www.gdcl.co.uk/2013/02/20/iPhone-Pause.html)
Unfortunately, after converting this example to Swift, I notice that if I pause/resume, at the end of each "section" there is a small but noticeable period during which the video is just a still frame and the audio is playing. So, it seems that when isPaused is triggered, the recorded audio track is longer than the recorded video track.
Sorry if it may seem like a noob question, but I am not a great expert in AVFoundation and some help would be appreciated!
Below I post my implementation of didOutput sampleBuffer.
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
var isVideo = true
if videoConntection != connection {
isVideo = false
}
if (!isCapturing || isPaused) {
return
}
if (encoder == nil) {
if isVideo {
return
}
if let fmt = CMSampleBufferGetFormatDescription(sampleBuffer) {
let desc = CMAudioFormatDescriptionGetStreamBasicDescription(fmt as CMAudioFormatDescription)
if let chan = desc?.pointee.mChannelsPerFrame, let rate = desc?.pointee.mSampleRate {
let path = tempPath()!
encoder = VideoEncoder(path: path, height: Int(cameraSize.height), width: Int(cameraSize.width), channels: chan, rate: rate)
}
}
}
if discont {
if isVideo {
return
}
discont = false
var pts = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
let last = lastAudio
if last.flags.contains(CMTimeFlags.valid) {
if cmOffset.flags.contains(CMTimeFlags.valid) {
pts = CMTimeSubtract(pts, cmOffset)
}
let off = CMTimeSubtract(pts, last)
print("setting offset from \(isVideo ? "video":"audio")")
print("adding \(CMTimeGetSeconds(off)) to \(CMTimeGetSeconds(cmOffset)) (pts \(CMTimeGetSeconds(cmOffset)))")
if cmOffset.value == 0 {
cmOffset = off
}
else {
cmOffset = CMTimeAdd(cmOffset, off)
}
}
lastVideo.flags = []
lastAudio.flags = []
return
}
var out:CMSampleBuffer?
if cmOffset.value > 0 {
var count:CMItemCount = CMSampleBufferGetNumSamples(sampleBuffer)
let pInfo = UnsafeMutablePointer<CMSampleTimingInfo>.allocate(capacity: count)
CMSampleBufferGetSampleTimingInfoArray(sampleBuffer, entryCount: count, arrayToFill: pInfo, entriesNeededOut: &count)
var i = 0
while i<count {
pInfo[i].decodeTimeStamp = CMTimeSubtract(pInfo[i].decodeTimeStamp, cmOffset)
pInfo[i].presentationTimeStamp = CMTimeSubtract(pInfo[i].presentationTimeStamp, cmOffset)
i+=1
}
CMSampleBufferCreateCopyWithNewTiming(allocator: nil, sampleBuffer: sampleBuffer, sampleTimingEntryCount: count, sampleTimingArray: pInfo, sampleBufferOut: &out)
}
else {
out = sampleBuffer
}
var pts = CMSampleBufferGetPresentationTimeStamp(out!)
let dur = CMSampleBufferGetDuration(out!)
if (dur.value > 0)
{
pts = CMTimeAdd(pts, dur);
}
if (isVideo) {
lastVideo = pts;
}
else {
lastAudio = pts;
}
encoder?.encodeFrame(sampleBuffer: out!, isVideo: isVideo)
}
And this is my VideoEncoder class:
final class VideoEncoder {
var writer:AVAssetWriter
var videoInput:AVAssetWriterInput
var audioInput:AVAssetWriterInput
var path:String
init(path:String, height:Int, width:Int, channels:UInt32, rate:Float64) {
self.path = path
if FileManager.default.fileExists(atPath:path) {
try? FileManager.default.removeItem(atPath: path)
}
let url = URL(fileURLWithPath: path)
writer = try! AVAssetWriter(outputURL: url, fileType: .mp4)
videoInput = AVAssetWriterInput(mediaType: .video, outputSettings: [
AVVideoCodecKey: AVVideoCodecType.h264,
AVVideoWidthKey:height,
AVVideoHeightKey:width
])
videoInput.expectsMediaDataInRealTime = true
writer.add(videoInput)
audioInput = AVAssetWriterInput(mediaType: .audio, outputSettings: [
AVFormatIDKey:kAudioFormatMPEG4AAC,
AVNumberOfChannelsKey:channels,
AVSampleRateKey:rate
])
audioInput.expectsMediaDataInRealTime = true
writer.add(audioInput)
}
func finish(with completionHandler:#escaping ()->Void) {
writer.finishWriting(completionHandler: completionHandler)
}
func encodeFrame(sampleBuffer:CMSampleBuffer, isVideo:Bool) -> Bool {
if CMSampleBufferDataIsReady(sampleBuffer) {
if writer.status == .unknown {
writer.startWriting()
writer.startSession(atSourceTime: CMSampleBufferGetPresentationTimeStamp(sampleBuffer))
}
if writer.status == .failed {
QFLogger.shared.addLog(format: "[ERROR initiating AVAssetWriter]", args: [], error: writer.error)
return false
}
if isVideo {
if videoInput.isReadyForMoreMediaData {
videoInput.append(sampleBuffer)
return true
}
}
else {
if audioInput.isReadyForMoreMediaData {
audioInput.append(sampleBuffer)
return true
}
}
}
return false
}
}
The rest of the code should be pretty obvious, but just to make it complete, here is what I have for pausing:
isPaused = true
discont = true
And here is resume:
isPaused = false
If anyone could help me to understand how to align video and audio tracks during such live recording that would be great!
Ok, turns out there was no mistake in the code which I provided. The issue which I experienced was caused by a video smoothing which was turned ON :) I guess it needs extra frames to smooth the video, which is why the video output freezes at the end for a short period of time.

Using WebRTC to send an iOS devices’ screen capture using ReplayKit

We would like to use WebRTC to send an iOS devices’ screen capture using ReplayKit.
The ReplayKit has a processSampleBuffer callback which gives CMSampleBuffer.
But here is where we are stuck, we can’t seem to get the CMSampleBuffer to be sent to the connected peer.
We have tried to create pixelBuffer from the sampleBuffer, and then create RTCVideoFrame.
we also extracted the RTCVideoSource from RTCPeerConnectionFactory and then used an RTCVideoCapturer and stream it to the localVideoSource.
Any idea what we are doing wrong?
var peerConnectionFactory: RTCPeerConnectionFactory?
override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
switch sampleBufferType {
case RPSampleBufferType.video:
// create the CVPixelBuffer
let pixelBuffer:CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!;
// create the RTCVideoFrame
var videoFrame:RTCVideoFrame?;
let timestamp = NSDate().timeIntervalSince1970 * 1000
videoFrame = RTCVideoFrame(pixelBuffer: pixelBuffer, rotation: RTCVideoRotation._0, timeStampNs: Int64(timestamp))
// connect the video frames to the WebRTC
let localVideoSource = self.peerConnectionFactory!.videoSource()
let videoCapturer = RTCVideoCapturer()
localVideoSource.capturer(videoCapturer, didCapture: videoFrame!)
let videoTrack : RTCVideoTrack = self.peerConnectionFactory!.videoTrack(with: localVideoSource, trackId: "100”)
let mediaStream: RTCMediaStream = (self.peerConnectionFactory?.mediaStream(withStreamId: “1"))!
mediaStream.addVideoTrack(videoTrack)
self.newPeerConnection!.add(mediaStream)
break
}
}
This is a great idea to implement you just have to render the RTCVideoFrame in the method that you have used in the snippet, and all the other object will initialize outsize the method, best way. for better understanding, I am giving you a snippet.
var peerConnectionFactory: RTCPeerConnectionFactory?
var localVideoSource: RTCVideoSource?
var videoCapturer: RTCVideoCapturer?
func setupVideoCapturer(){
// localVideoSource and videoCapturer will use
localVideoSource = self.peerConnectionFactory!.videoSource()
videoCapturer = RTCVideoCapturer()
// localVideoSource.capturer(videoCapturer, didCapture: videoFrame!)
let videoTrack : RTCVideoTrack = self.peerConnectionFactory!.videoTrack(with: localVideoSource, trackId: "100")
let mediaStream: RTCMediaStream = (self.peerConnectionFactory?.mediaStream(withStreamId: "1"))!
mediaStream.addVideoTrack(videoTrack)
self.newPeerConnection!.add(mediaStream)
}
override func processSampleBuffer(_ sampleBuffer: CMSampleBuffer, with sampleBufferType: RPSampleBufferType) {
switch sampleBufferType {
case RPSampleBufferType.video:
// create the CVPixelBuffer
let pixelBuffer:CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!;
// create the RTCVideoFrame
var videoFrame:RTCVideoFrame?;
let timestamp = NSDate().timeIntervalSince1970 * 1000
videoFrame = RTCVideoFrame(pixelBuffer: pixelBuffer, rotation: RTCVideoRotation._0, timeStampNs: Int64(timestamp))
// connect the video frames to the WebRTC
localVideoSource.capturer(videoCapturer, didCapture: videoFrame!)
break
}
}
Hope this will help you.

Resources