How to set timestamp of CMSampleBuffer for AVWriter writing - ios

I'm working with AVFoundation for capturing and recording audio. There are some issues I don't quite understand.
Basically I want to capture audio from AVCaptureSession and write it using AVWriter, however I need some shifting in the timestamp of the CMSampleBuffer I get from AVCaptureSession. I read documentation of CMSampleBuffer I see two different term of timestamp: 'presentation timestamp' and 'output presentation timestamp'. What the different of the two ?
Let say I get a CMSampleBuffer (for audio) instance from AVCaptureSession, and I want to write it to a file using AVWriter, what function should I use to 'inject' a CMTime to the buffer in order to set the presentation timestamp of it in the resulting file ?
Thanks.

Use the CMSampleBufferGetPresentationTimeStamp, that is the time when the buffer is captured and should be "presented" at when played back to be in sync. To quote session 520 at WWDC 2012: "Presentation time is the time at which the first sample in the buffer was picked up by the microphone".
If you start the AVWriter with
[videoWriter startWriting];
[videoWriter startSessionAtSourceTime:CMSampleBufferGetPresentationTimeStamp(sampleBuffer)];
and then append samples with
if(videoWriterInput.readyForMoreMediaData) [videoWriterInput appendSampleBuffer:sampleBuffer];
the frames in the finished video will be consistent with CMSampleBufferGetPresentationTimeStamp (I have checked). If you want to modify the time when adding samples you have to use AVAssetWriterInputPixelBufferAdaptor

Chunk of sample code from here: http://www.gdcl.co.uk/2013/02/20/iPhone-Pause.html
CMSampleBufferRef sample - is your sampleBuffer, CMSampleBufferRef sout your output. NewTimeStamp is your time stamp.
CMItemCount count;
CMTime newTimeStamp = CMTimeMake(YOURTIME_GOES_HERE);
CMSampleBufferGetSampleTimingInfoArray(sample, 0, nil, &count);
CMSampleTimingInfo* pInfo = malloc(sizeof(CMSampleTimingInfo) * count);
CMSampleBufferGetSampleTimingInfoArray(sample, count, pInfo, &count);
for (CMItemCount i = 0; i < count; i++)
{
pInfo[i].decodeTimeStamp = newTimeStamp; // kCMTimeInvalid if in sequence
pInfo[i].presentationTimeStamp = newTimeStamp;
}
CMSampleBufferRef sout;
CMSampleBufferCreateCopyWithNewTiming(kCFAllocatorDefault, sample, count, pInfo, &sout);
free(pInfo);

Related

AVAssetWriter AVVideoExpectedSourceFrameRateKey (frame rate) ignored

Me and my team are trying to re-encode a video file to a more "gify" feeling by changing the video frame rate. We are using the following properties for the AVAssetWriterInput:
let videoSettings:[String:Any] = [
AVVideoCodecKey: AVVideoCodecH264,
AVVideoHeightKey: videoTrack.naturalSize.height,
AVVideoWidthKey: videoTrack.naturalSize.width,
AVVideoCompressionPropertiesKey: [AVVideoExpectedSourceFrameRateKey: NSNumber(value: 12)]
]
But the output video keep playing in the normal frame rate (played using AVPlayer).
What is the right way to reduce video frame rate? (12 for example).
Any help in the right direction would be HIGHLY approcated. We stuck.
Best regards,
Roi
You can control the timing of each sample you append to your AVAssetWriterInput directly with CMSampleBufferCreateCopyWithNewTiming.
You need to adjust the timing in the CMSampleTimingInfo you provide.
Retrieve current timing info with CMSampleBufferGetOutputSampleTimingInfoArray and just go over the duration of each sample and calculate the correct duration to get 12 frames per second and adjust presentation and decode timestamps to match this new duration.
You then make your copy and feed it to your writer's input.
Let's say you have existingSampleBuffer:
CMSampleBufferRef sampleBufferToWrite = NULL;
CMSampleTimingInfo sampleTimingInfo = {0};
CMSampleBufferGetSampleTimingInfo(existingSampleBuffer, 0, &sampleTimingInfo);
// modify duration & presentationTimeStamp
sampleTimingInfo.duration = CMTimeMake(1, 12) // or whatever frame rate you desire
sampleTimingInfo.presentationTimeStamp = CMTimeAdd(previousPresentationTimeStamp, sampleTimingInfo.duration);
previousPresentationTimeStamp = sampleTimingInfo.presentationTimeStamp; // should be initialised before passing here the first time
OSStatus status = CMSampleBufferCreateCopyWithNewTiming(kCFAllocatorDefault, existingSampleBuffer, 1, &sampleTimingInfo, &sampleBufferToWrite);
if (status == noErr) {
// you can write sampleBufferToWrite
}
I'm making some assumptions in this code:
SampleBuffer contains only one sample
SampleBuffer contains uncompressed video (otherwise, you need to handle decodeTimeStamp as well)

How to get the timestamp of each video frame in iOS while decoding a video.mp4

Scenario:
I am writing an iOS app to try decode a videoFile.mp4. I am using AVAssetReaderTrackOutput with AVAssetReader to decode frames from the video file. This works very well. I get each & every frame from videoFile.mp4 basically using the following logic at the core.
Code:
AVAssetReader * videoFileReader;
AVAssetReaderTrackOutput * assetReaderOutput = [videoFileReader.outputs objectAtIndex:0];
CMSampleBufferRef sampleBuffer = [assetReaderOutput copyNextSampleBuffer];
sampleBuffer is the buffer of each video frame here.
Question:
How can I get the timestamp of each video frame here ?
In other words & more detail, how can I get the timestamp of each sampleBuffer that i am returned from copyNextSampleBuffer?
PS:
Please note that I need the timestamp in milliseconds.
I got the answer to my question finally. Following 2 lines can get the frame timestamp of the sampleBuffer returned from copyNextSampleBuffer
CMTime frameTime = CMSampleBufferGetOutputPresentationTimeStamp(sampleBuffer);
double frameTimeMillisecs = CMTimeGetSeconds(frameTime) * 1000;
Timestamp is returned in seconds. Hence multiplying it by 1000 to convert to milliseconds

iOS 11 Objective-C - Processing Image Buffers From ReplayKit Using AVAssetWriterInputPixelBufferAdaptor

I'm trying to record my app's screen using ReplayKit, cropping out some parts of it while recording the video. Not quite going well.
ReplayKit will capture the entire screen, so I decided to receive each frame from ReplayKit (as CMSampleBuffer via startCaptureWithHandler), crop it there and feed it to a video writer via AVAssetWriterInputPixelBufferAdaptor. But I am having a trouble in hard-copying the image buffer before cropping it.
This is my working code that records the entire screen:
// Starts recording with a completion/error handler
-(void)startRecordingWithHandler: (RPHandler)handler
{
// Sets up AVAssetWriter that will generate a video file from the recording.
self.writer = [AVAssetWriter assetWriterWithURL:self.outputFileURL
fileType:AVFileTypeQuickTimeMovie
error:nil];
NSDictionary* outputSettings =
#{
AVVideoWidthKey : #(screen.size.width), // The whole width of the entire screen.
AVVideoHeightKey : #(screen.size.height), // The whole height of the entire screen.
AVVideoCodecKey : AVVideoCodecTypeH264,
};
// Sets up AVAssetWriterInput that will feed ReplayKit's frame buffers to the writer.
self.videoInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeVideo
outputSettings:outputSettings];
// Lets it know that the input will be realtime using ReplayKit.
[self.videoInput setExpectsMediaDataInRealTime:YES];
NSDictionary* sourcePixelBufferAttributes =
#{
(NSString*) kCVPixelBufferPixelFormatTypeKey: #(kCVPixelFormatType_32BGRA),
(NSString*) kCVPixelBufferWidthKey : #(screen.size.width),
(NSString*) kCVPixelBufferHeightKey : #(screen.size.height),
};
// Adds the video input to the writer.
[self.writer addInput:self.videoInput];
// Sets up ReplayKit itself.
self.recorder = [RPScreenRecorder sharedRecorder];
// Arranges the pipleline from ReplayKit to the input.
RPBufferHandler bufferHandler = ^(CMSampleBufferRef sampleBuffer, RPSampleBufferType bufferType, NSError* error) {
[self captureSampleBuffer:sampleBuffer withBufferType:bufferType];
};
RPHandler errorHandler = ^(NSError* error) {
if (error) handler(error);
};
// Starts ReplayKit's recording session.
// Sample buffers will be sent to `captureSampleBuffer` method.
[self.recorder startCaptureWithHandler:bufferHandler completionHandler:errorHandler];
}
// Receives a sample buffer from ReplayKit every frame.
-(void)captureSampleBuffer:(CMSampleBufferRef)sampleBuffer withBufferType:(RPSampleBufferType)bufferType
{
// Uses a queue in sync so that the writer-starting logic won't be invoked twice.
dispatch_sync(dispatch_get_main_queue(), ^{
// Starts the writer if not started yet. We do this here in order to get the proper source time later.
if (self.writer.status == AVAssetWriterStatusUnknown) {
[self.writer startWriting];
return;
}
// Receives a sample buffer from ReplayKit.
switch (bufferType) {
case RPSampleBufferTypeVideo:{
// Initializes the source time when a video frame buffer is received the first time.
// This prevents the output video from starting with blank frames.
if (!self.startedWriting) {
NSLog(#"self.writer startSessionAtSourceTime");
[self.writer startSessionAtSourceTime:CMSampleBufferGetPresentationTimeStamp(sampleBuffer)];
self.startedWriting = YES;
}
// Appends a received video frame buffer to the writer.
[self.input append:sampleBuffer];
break;
}
}
});
}
// Stops the current recording session, and saves the output file to the user photo album.
-(void)stopRecordingWithHandler:(RPHandler)handler
{
// Closes the input.
[self.videoInput markAsFinished];
// Finishes up the writer.
[self.writer finishWritingWithCompletionHandler:^{
handler(self.writer.error);
// Saves the output video to the user photo album.
[[PHPhotoLibrary sharedPhotoLibrary] performChanges: ^{ [PHAssetChangeRequest creationRequestForAssetFromVideoAtFileURL: self.outputFileURL]; }
completionHandler: ^(BOOL s, NSError* e) { }];
}];
// Stops ReplayKit's recording.
[self.recorder stopCaptureWithHandler:nil];
}
where each sample buffer from ReplayKit will be directly fed to the writer (in captureSampleBuffer method), hence records the entire screen.
Then, I replaced the part with an identical logic using AVAssetWriterInputPixelBufferAdaptor, which works just fine:
...
case RPSampleBufferTypeVideo:{
... // Initializes source time.
// Gets the timestamp of the sample buffer.
CMTime time = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
// Extracts the pixel image buffer from the sample buffer.
CVPixelBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
// Appends a received sample buffer as an image buffer to the writer via the adaptor.
[self.videoAdaptor appendPixelBuffer:imageBuffer withPresentationTime:time];
break;
}
...
where the adaptor is set up as:
NSDictionary* sourcePixelBufferAttributes =
#{
(NSString*) kCVPixelBufferPixelFormatTypeKey: #(kCVPixelFormatType_32BGRA),
(NSString*) kCVPixelBufferWidthKey : #(screen.size.width),
(NSString*) kCVPixelBufferHeightKey : #(screen.size.height),
};
self.videoAdaptor = [AVAssetWriterInputPixelBufferAdaptor assetWriterInputPixelBufferAdaptorWithAssetWriterInput:self.videoInput
sourcePixelBufferAttributes:sourcePixelBufferAttributes];
So the pipeline is working.
Then, I created a hard copy of the image buffer in the main memory and feed it to the adaptor:
...
case RPSampleBufferTypeVideo:{
... // Initializes source time.
// Gets the timestamp of the sample buffer.
CMTime time = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
// Extracts the pixel image buffer from the sample buffer.
CVPixelBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
// Hard-copies the image buffer.
CVPixelBufferRef copiedImageBuffer = [self copy:imageBuffer];
// Appends a received video frame buffer to the writer via the adaptor.
[self.adaptor appendPixelBuffer:copiedImageBuffer withPresentationTime:time];
break;
}
...
// Hard-copies the pixel buffer.
-(CVPixelBufferRef)copy:(CVPixelBufferRef)inputBuffer
{
// Locks the base address of the buffer
// so that GPU won't change the data until unlocked later.
CVPixelBufferLockBaseAddress(inputBuffer, 0); //-------------------------------
char* baseAddress = (char*)CVPixelBufferGetBaseAddress(inputBuffer);
size_t bytesPerRow = CVPixelBufferGetBytesPerRow(inputBuffer);
size_t width = CVPixelBufferGetWidth(inputBuffer);
size_t height = CVPixelBufferGetHeight(inputBuffer);
size_t length = bytesPerRow * height;
// Mallocs the same length as the input buffer for copying.
char* outputAddress = (char*)malloc(length);
// Copies the input buffer's data to the malloced space.
for (int i = 0; i < length; i++) {
outputAddress[i] = baseAddress[i];
}
// Create a new image buffer using the copied data.
CVPixelBufferRef outputBuffer;
CVPixelBufferCreateWithBytes(kCFAllocatorDefault,
width,
height,
kCVPixelFormatType_32BGRA,
outputAddress,
bytesPerRow,
&releaseCallback, // Releases the malloced space.
NULL,
NULL,
&outputBuffer);
// Unlocks the base address of the input buffer
// So that GPU can restart using the data.
CVPixelBufferUnlockBaseAddress(inputBuffer, 0); //-------------------------------
return outputBuffer;
}
// Releases the malloced space.
void releaseCallback(void *releaseRefCon, const void *baseAddress)
{
free((void *)baseAddress);
}
This doesn't work -- the saved video will look like the screenshot on the right hand:
Seems like bytes per row and the color format are wrong. I have researched and experimented with the followings, but not avail:
Hard-coding 4 * width for bytes per row -> "bad access".
Using int and double instead of char -> some weird debugger-terminating exceptions.
Using other image formats -> either "not supported" or access errors.
Additionally, the releaseCallback is never called -- the ram will run out in 10 seconds of recording.
What are potential causes from the look of this output?
You could first save the video as it is.
Then by using AVMutableComposition class, you can crop the video by adding instructions and layer instructions to it.
On my case, Replaykit calls sampleBuffer with 420YpCbCr8BiPlanarFullRange format.
Not RBGA format. You need to work 2 plane. Your screenshot indicates Y plane on top UV plane on Bottom. UV plane is half size of Y plane.
For getting base address of 2 planes.
CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 0)
CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 1)
You also need to get width, height, byte per row, for each plane by these API.
CVPixelBufferGetWithOfPlane(imageBuffer, 0 or 1)
CVPixelBufferGetHeightOfPlane(imageBuffer, 0 or 1)
CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0 or 1)

Handle Varying Number of Samples in Audio Unit Rendering Cycle

This is a problem that's come up in my app after the introduction of the iPhone 6s and 6s+, and I'm almost positive that it is because the new model's built-in mic is stuck recording at 48kHz (you can read more about this here). To clarify, this was never a problem with previous phone models that I've tested. I'll walk through my Audio Engine implementation and the varying results at different points depending on the phone model further below.
So here's what's happening - when my code runs on previous devices I get a consistent number of audio samples in each CMSampleBuffer returned by the AVCaptureDevice, usually 1024 samples. The render callback for my audio unit graph provides an appropriate buffer with space for 1024 frames. Everything works great and sounds great.
Then Apple had to go make this damn iPhone 6s (just kidding, it's great, this bug is just getting to my head) and now I get some very inconsistent and confusing results. The AVCaptureDevice now varies between capturing 940 or 941 samples and the render callback now starts making a buffer with space for 940 or 941 sample frames on the first call, but then immediately starts increasing the space it reserves on subsequent calls up to 1010, 1012, or 1024 sample frames, then stays there. The space it ends up reserving varies by session. To be honest, I have no idea how this render callback is determining how many frames it prepares for the render, but I'm guessing it has to do with the sample rate of the Audio Unit that the render callback is on.
The format of the CMSampleBuffer comes in at 44.1kHz sample rate no matter what the device is, so I'm guessing theres some sort of implicit sample rate conversion that happens before I'm even receiving the CMSampleBuffer from the AVCaptureDevice on the 6s. The only difference is that the preferred hardware sample rate of the 6s is 48kHz opposed to earlier versions at 44.1kHz.
I've read that with the 6s you do have to be ready to make space for a varying number of samples being returned, but is the kind of behavior I described above normal? If it is, how can my render cycle be tailored to handle this?
Below is the code that is processing the audio buffers if you care to look further into this:
The audio samples buffers, which are CMSampleBufferRefs, come in through the mic AVCaptureDevice and are sent to my audio processing function that does the following to the captured CMSampleBufferRef named audioBuffer
CMBlockBufferRef buffer = CMSampleBufferGetDataBuffer(audioBuffer);
CMItemCount numSamplesInBuffer = CMSampleBufferGetNumSamples(audioBuffer);
AudioBufferList audioBufferList;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(audioBuffer,
NULL,
&audioBufferList,
sizeof(audioBufferList),
NULL,
NULL,
kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
&buffer
);
self.audioProcessingCallback(&audioBufferList, numSamplesInBuffer, audioBuffer);
CFRelease(buffer);
This is putting the the audio samples into an AudioBufferList and sending it, along with the number of samples and the retained CMSampleBuffer, to the below function that I use for audio processing. TL;DR the following code sets up some Audio Units that are in an Audio Graph, using the CMSampleBuffer's format to set the ASBD for input, runs the audio samples through a converter unit, a newTimePitch unit, and then another converter unit. I then start a render call on the output converter unit with the number of samples that I received from the CMSampleBufferRef and put the rendered samples back into the AudioBufferList to subsequently be written out to the movie file, more on the Audio Unit Render Callback below.
movieWriter.audioProcessingCallback = {(audioBufferList, numSamplesInBuffer, CMSampleBuffer) -> () in
var ASBDSize = UInt32(sizeof(AudioStreamBasicDescription))
self.currentInputAudioBufferList = audioBufferList.memory
let formatDescription = CMSampleBufferGetFormatDescription(CMSampleBuffer)
let sampleBufferASBD = CMAudioFormatDescriptionGetStreamBasicDescription(formatDescription!)
if (sampleBufferASBD.memory.mFormatID != kAudioFormatLinearPCM) {
print("Bad ASBD")
}
if(sampleBufferASBD.memory.mChannelsPerFrame != self.currentInputASBD.mChannelsPerFrame || sampleBufferASBD.memory.mSampleRate != self.currentInputASBD.mSampleRate){
// Set currentInputASBD to format of data coming IN from camera
self.currentInputASBD = sampleBufferASBD.memory
print("New IN ASBD: \(self.currentInputASBD)")
// set the ASBD for converter in's input to currentInputASBD
var err = AudioUnitSetProperty(self.converterInAudioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
0,
&self.currentInputASBD,
UInt32(sizeof(AudioStreamBasicDescription)))
self.checkErr(err, "Set converter in's input stream format")
// Set currentOutputASBD to the in/out format for newTimePitch unit
err = AudioUnitGetProperty(self.newTimePitchAudioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
0,
&self.currentOutputASBD,
&ASBDSize)
self.checkErr(err, "Get NewTimePitch ASBD stream format")
print("New OUT ASBD: \(self.currentOutputASBD)")
//Set the ASBD for the convert out's input to currentOutputASBD
err = AudioUnitSetProperty(self.converterOutAudioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Input,
0,
&self.currentOutputASBD,
ASBDSize)
self.checkErr(err, "Set converter out's input stream format")
//Set the ASBD for the converter out's output to currentInputASBD
err = AudioUnitSetProperty(self.converterOutAudioUnit,
kAudioUnitProperty_StreamFormat,
kAudioUnitScope_Output,
0,
&self.currentInputASBD,
ASBDSize)
self.checkErr(err, "Set converter out's output stream format")
//Initialize the graph
err = AUGraphInitialize(self.auGraph)
self.checkErr(err, "Initialize audio graph")
self.checkAllASBD()
}
self.currentSampleTime += Double(numSamplesInBuffer)
var timeStamp = AudioTimeStamp()
memset(&timeStamp, 0, sizeof(AudioTimeStamp))
timeStamp.mSampleTime = self.currentSampleTime
timeStamp.mFlags = AudioTimeStampFlags.SampleTimeValid
var flags = AudioUnitRenderActionFlags(rawValue: 0)
err = AudioUnitRender(self.converterOutAudioUnit,
&flags,
&timeStamp,
0,
UInt32(numSamplesInBuffer),
audioBufferList)
self.checkErr(err, "Render Call on converterOutAU")
}
The Audio Unit Render Callback that is called once the AudioUnitRender call reaches the input converter unit is below
func pushCurrentInputBufferIntoAudioUnit(inRefCon : UnsafeMutablePointer<Void>, ioActionFlags : UnsafeMutablePointer<AudioUnitRenderActionFlags>, inTimeStamp : UnsafePointer<AudioTimeStamp>, inBusNumber : UInt32, inNumberFrames : UInt32, ioData : UnsafeMutablePointer<AudioBufferList>) -> OSStatus {
let bufferRef = UnsafeMutablePointer<AudioBufferList>(inRefCon)
ioData.memory = bufferRef.memory
print(inNumberFrames);
return noErr
}
Blah, this is a huge brain dump but I really appreciate ANY help. Please let me know if there's any additional information you need.
Generally, you handle slight variations in buffer size (but a constant sample rate in and out) by putting the incoming samples in a lock-free circular fifo, and not removing any blocks of samples from that circular fifo until you have a full size block plus potentially some safety padding to cover future size jitter.
The variation in size probably has to do with the sample rate converter ratio not being a simple multiple, the resampling filter(s) needed, and any buffering needed for the resampling process.
1024 * (44100/48000) = 940.8
So that rate conversion might explain the jitter between 940 and 941 samples. If the hardware is always shipping out blocks of 1024 samples at a fixed rate of 48 kHz, and you need that block resampled to 44100 for your callback ASAP, there's a fraction of a converted sample that eventually needs to be output on only some output callbacks.

Playing multiple files with a single file player audio unit

I'm trying to use a file player audio unit (kAudioUnitSubType_AudioFilePlayer) to play multiple files (not at the same time, of course). That's on iOS.
So I've successfully opened the files and stored their details in an array of AudioFileID's that I set to the audio unit using kAudioUnitProperty_ScheduledFileIDs. Now I would like to define 2 ScheduledAudioFileRegion's, one per file, and used them with the file player...
But I can't seem to find out:
How to set the kAudioUnitProperty_ScheduledFileRegion property to store these 2 regions (actually, how to define the index of each region)?
How to trigger the playback of a specific region.. My guess is that the kAudioTimeStampSampleTimeValid parameter should enable this but how to define which region you want to play?
Maybe I'm just plain wrong about the way I should use this audio unit, but documentation is very difficult to get and I haven't found any example showing the playback of 2 regions on the same player!
Thanks in advance.
You need to schedule region every time you want play file. In ScheduledAudioFileRegion you must set AudioFileID to play. Playback begins when current time in unit (samples) are equal or greater than sample time in scheduled region.
Example:
// get current unit time
AudioTimeStamp timeStamp;
UInt32 propSize = sizeof(AudioTimeStamp);
AudioUnitGetProperty(m_playerUnit, kAudioUnitProperty_CurrentPlayTime, kAudioUnitScope_Global, 0, &timeStamp, &propSize);
// when to start playback
timeStamp.mSampleTime += 100;
// schedule region
ScheduledAudioFileRegion region;
memset(&region, 0, sizeof(ScheduledAudioFileRegion));
region.mAudioFile = ...; // your AudioFileID
region.mFramesToPlay = ...; // count of frames to play
region.mLoopCount = 1;
region.mStartFrame = 0;
region.mTimeStamp = timeStamp;
AudioUnitSetProperty(m_playerUnit, kAudioUnitProperty_ScheduledFileRegion, kAudioUnitScope_Global, 0, &region,sizeof(region));

Resources