I am trying to create a copy of a CMSampleBuffer as returned by captureOutput in a AVCaptureAudioDataOutputSampleBufferDelegate.
The problem I am having is that my frames coming from delegate method captureOutput:didOutputSampleBuffer:fromConnection: being dropped after I retain them in CFArray for long time.
Obviously, I need to create deep copies of incoming buffers for further processing. I also know that CMSampleBufferCreateCopy only creates shallow copies.
There are few related questions were asked on SO:
Pulling data from a CMSampleBuffer in order to create a deep copy
Creating copy of CMSampleBuffer in Swift returns OSStatus -12743 (Invalid Media Format)
Deep Copy of CMImageBuffer or CVImageBuffer
But none of them helps me to use correctly CMSampleBufferCreate function with 12 parameters:
CMSampleBufferRef copyBuffer;
CMBlockBufferRef data = CMSampleBufferGetDataBuffer(sampleBuffer);
CMFormatDescriptionRef formatDescription = CMSampleBufferGetFormatDescription(sampleBuffer);
CMItemCount itemCount = CMSampleBufferGetNumSamples(sampleBuffer);
CMTime duration = CMSampleBufferGetDuration(sampleBuffer);
CMTime presentationStamp = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
CMSampleTimingInfo timingInfo;
timingInfo.duration = duration;
timingInfo.presentationTimeStamp = presentationStamp;
timingInfo.decodeTimeStamp = CMSampleBufferGetDecodeTimeStamp(sampleBuffer);
size_t sampleSize = CMBlockBufferGetDataLength(data);
CMBlockBufferRef sampleData;
if (CMBlockBufferCopyDataBytes(data, 0, sampleSize, &sampleData) != kCMBlockBufferNoErr) {
VLog(#"error during copying sample buffer");
}
// Here I tried data and sampleData CMBlockBuffer instance, but no success
OSStatus status = CMSampleBufferCreate(kCFAllocatorDefault, data, isDataReady, nil, nil, formatDescription, itemCount, 1, &timingInfo, 1, &sampleSize, ©Buffer);
if (!self.sampleBufferArray) {
self.sampleBufferArray = CFArrayCreateMutable(NULL, 0, &kCFTypeArrayCallBacks);
//EXC_BAD_ACCESS crash when trying to add sampleBuffer to the array
CFArrayAppendValue(self.sampleBufferArray, copyBuffer);
} else {
CFArrayAppendValue(self.sampleBufferArray, copyBuffer);
}
How do you deep copy Audio CMSampleBuffer? Feel free to use any language (swift/objective-c) in your answers.
Here is a working solution I finally implemented. I sent this snippet to Apple Developer Technical support and asked them to check if it is a correct way to copy incoming sample buffer. The basic idea is copy AudioBufferList and then create a CMSampleBuffer and set AudioBufferList to this sample.
AudioBufferList audioBufferList;
CMBlockBufferRef blockBuffer;
//Create an AudioBufferList containing the data from the CMSampleBuffer,
//and a CMBlockBuffer which references the data in that AudioBufferList.
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, NULL, &audioBufferList, sizeof(audioBufferList), NULL, NULL, 0, &blockBuffer);
NSUInteger size = sizeof(audioBufferList);
char buffer[size];
memcpy(buffer, &audioBufferList, size);
//This is the Audio data.
NSData *bufferData = [NSData dataWithBytes:buffer length:size];
const void *copyBufferData = [bufferData bytes];
copyBufferData = (char *)copyBufferData;
CMSampleBufferRef copyBuffer = NULL;
OSStatus status = -1;
/* Format Description */
AudioStreamBasicDescription audioFormat = *CMAudioFormatDescriptionGetStreamBasicDescription((CMAudioFormatDescriptionRef) CMSampleBufferGetFormatDescription(sampleBuffer));
CMFormatDescriptionRef format = NULL;
status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &audioFormat, 0, nil, 0, nil, nil, &format);
CMFormatDescriptionRef formatdes = NULL;
status = CMFormatDescriptionCreate(NULL, kCMMediaType_Audio, 'lpcm', NULL, &formatdes);
if (status != noErr)
{
NSLog(#"Error in CMAudioFormatDescriptionCreator");
CFRelease(blockBuffer);
return;
}
/* Create sample Buffer */
CMItemCount framesCount = CMSampleBufferGetNumSamples(sampleBuffer);
CMSampleTimingInfo timing = {.duration= CMTimeMake(1, 44100), .presentationTimeStamp= CMSampleBufferGetPresentationTimeStamp(sampleBuffer), .decodeTimeStamp= CMSampleBufferGetDecodeTimeStamp(sampleBuffer)};
status = CMSampleBufferCreate(kCFAllocatorDefault, nil , NO,nil,nil,format, framesCount, 1, &timing, 0, nil, ©Buffer);
if( status != noErr) {
NSLog(#"Error in CMSampleBufferCreate");
CFRelease(blockBuffer);
return;
}
/* Copy BufferList to Sample Buffer */
AudioBufferList receivedAudioBufferList;
memcpy(&receivedAudioBufferList, copyBufferData, sizeof(receivedAudioBufferList));
//Creates a CMBlockBuffer containing a copy of the data from the
//AudioBufferList.
status = CMSampleBufferSetDataBufferFromAudioBufferList(copyBuffer, kCFAllocatorDefault , kCFAllocatorDefault, 0, &receivedAudioBufferList);
if (status != noErr) {
NSLog(#"Error in CMSampleBufferSetDataBufferFromAudioBufferList");
CFRelease(blockBuffer);
return;
}
Code-Level Support answer:
This code looks ok (though you’ll want to add some additional error
checking). I've successfully tested it in an app that implements the
AVCaptureAudioDataOutput delegate
captureOutput:didOutputSampleBuffer:fromConnection: method to
capture and record audio. The captured audio I'm getting when using
this deep copy code appears to be the same as what I get when directly
using the provided sample buffer (without the deep copy).
Apple Developer Technical Support
Couldn't find a decent answer doing this in Swift. Here's an extension:
extension CMSampleBuffer {
func deepCopy() -> CMSampleBuffer? {
guard let formatDesc = CMSampleBufferGetFormatDescription(self),
let data = self.data else {
return nil
}
let nFrames = CMSampleBufferGetNumSamples(self)
let pts = CMSampleBufferGetPresentationTimeStamp(self)
let dataBuffer = data.withUnsafeBytes { (buffer) -> CMBlockBuffer? in
var blockBuffer: CMBlockBuffer?
let length: Int = data.count
guard CMBlockBufferCreateWithMemoryBlock(
allocator: kCFAllocatorDefault,
memoryBlock: nil,
blockLength: length,
blockAllocator: nil,
customBlockSource: nil,
offsetToData: 0,
dataLength: length,
flags: 0,
blockBufferOut: &blockBuffer) == noErr else {
print("Failed to create block")
return nil
}
guard CMBlockBufferReplaceDataBytes(
with: buffer.baseAddress!,
blockBuffer: blockBuffer!,
offsetIntoDestination: 0,
dataLength: length) == noErr else {
print("Failed to move bytes for block")
return nil
}
return blockBuffer
}
guard let dataBuffer = dataBuffer else {
return nil
}
var newSampleBuffer: CMSampleBuffer?
CMAudioSampleBufferCreateReadyWithPacketDescriptions(
allocator: kCFAllocatorDefault,
dataBuffer: dataBuffer,
formatDescription: formatDesc,
sampleCount: nFrames,
presentationTimeStamp: pts,
packetDescriptions: nil,
sampleBufferOut: &newSampleBuffer
)
return newSampleBuffer
}
}
LLooggaann's solution is simpler and works well, however, in case anyone is interested, I migrated the original solution to Swift 5.6:
extension CMSampleBuffer {
func deepCopy() -> CMSampleBuffer? {
var audioBufferList : AudioBufferList = AudioBufferList()
var blockBuffer : CMBlockBuffer?
let sizeOfAudioBufferList = MemoryLayout<AudioBufferList>.size
//Create an AudioBufferList containing the data from the CMSampleBuffer.
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(self,
bufferListSizeNeededOut: nil,
bufferListOut: &audioBufferList,
bufferListSize: sizeOfAudioBufferList,
blockBufferAllocator: nil,
blockBufferMemoryAllocator: nil,
flags: 0,
blockBufferOut: &blockBuffer)
guard audioBufferList.mNumberBuffers == 1 else { return nil } //TODO: Make this generic for any number of buffers
/* Deep copy the audio buffer */
let audioBufferDataSize = Int(audioBufferList.mBuffers.mDataByteSize)
let audioBuffer = audioBufferList.mBuffers
let audioBufferDataCopyPointer = UnsafeMutableRawPointer.allocate(byteCount: audioBufferDataSize, alignment: 1)
defer {
audioBufferDataCopyPointer.deallocate()
}
memcpy(audioBufferDataCopyPointer, audioBufferList.mBuffers.mData, audioBufferDataSize)
let copiedAudioBuffer = AudioBuffer(mNumberChannels: audioBuffer.mNumberChannels,
mDataByteSize: audioBufferList.mBuffers.mDataByteSize,
mData: audioBufferDataCopyPointer)
/* Create a new audio buffer list with the deep copied audio buffer */
var copiedAudioBufferList = AudioBufferList(mNumberBuffers: 1, mBuffers: copiedAudioBuffer)
/* Copy audio format description, to be used in the new sample buffer */
guard let sampleBufferFormatDescription = CMSampleBufferGetFormatDescription(self) else { return nil }
/* Create copy of timing for new sample buffer */
var duration = CMSampleBufferGetDuration(self)
duration.value /= Int64(numSamples)
var timing = CMSampleTimingInfo(duration: duration,
presentationTimeStamp: CMSampleBufferGetPresentationTimeStamp(self),
decodeTimeStamp: CMSampleBufferGetDecodeTimeStamp(self))
/* New sample buffer preparation, using the audio format description, and the timing information. */
let sampleCount = CMSampleBufferGetNumSamples(self)
var newSampleBuffer : CMSampleBuffer?
guard CMSampleBufferCreate(allocator: kCFAllocatorDefault,
dataBuffer: nil,
dataReady: false,
makeDataReadyCallback: nil,
refcon: nil,
formatDescription: sampleBufferFormatDescription,
sampleCount: sampleCount,
sampleTimingEntryCount: 1,
sampleTimingArray: &timing,
sampleSizeEntryCount: 0,
sampleSizeArray: nil,
sampleBufferOut: &newSampleBuffer) == noErr else { return nil }
//Create a CMBlockBuffer containing a copy of the data from the AudioBufferList, add to new sample buffer.
let status = CMSampleBufferSetDataBufferFromAudioBufferList(newSampleBuffer!,
blockBufferAllocator: kCFAllocatorDefault,
blockBufferMemoryAllocator: kCFAllocatorDefault,
flags: 0,
bufferList: &copiedAudioBufferList)
guard status == noErr else { return nil }
return newSampleBuffer
}
}
Related
CVPixelBufferRef outputPixelBuffer = NULL;
CMBlockBufferRef blockBuffer = NULL;
void* buffer = (void*)[videoUnit bufferWithH265LengthHeader];
OSStatus status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
buffer,
videoUnit.length,
kCFAllocatorNull,
NULL, 0, videoUnit.length,
0, &blockBuffer);
if(status == kCMBlockBufferNoErr) {
CMSampleBufferRef sampleBuffer = NULL;
const size_t sampleSizeArray[] = {videoUnit.length};
status = CMSampleBufferCreateReady(kCFAllocatorDefault,
blockBuffer,
_decoderFormatDescription ,
1, 0, NULL, 1, sampleSizeArray,
&sampleBuffer);
if (status == kCMBlockBufferNoErr && sampleBuffer && _deocderSession) {
VTDecodeFrameFlags flags = 0;
VTDecodeInfoFlags flagOut = 0;
OSStatus decodeStatus = VTDecompressionSessionDecodeFrame(_deocderSession,
sampleBuffer,
flags,
&outputPixelBuffer,
&flagOut);
if(decodeStatus == kVTInvalidSessionErr) {
NSLog(#"IOS8VT: Invalid session, reset decoder session");
} else if(decodeStatus == kVTVideoDecoderBadDataErr) {
NSLog(#"IOS8VT: decode failed status=%d(Bad data)", decodeStatus);
} else if(decodeStatus != noErr) {
NSLog(#"IOS8VT: decode failed status=%d", decodeStatus);
}
CFRelease(sampleBuffer);
}
CFRelease(blockBuffer);
}
return outputPixelBuffer;
This is my code to decode the stream data.It was working good on iPhone 6s,but when the code running on iPhoneX or iphone11, the "outputPixelBuffer" return a nil. Can anyone help?
Without seeing the code for your decompressionSession creation, it is hard to say. It could be that your decompressionSession is providing the outputBuffer to the callback function provided at creation, so I highly recommend you add that part of your code too.
By providing &outputPixelBuffer in:
OSStatus decodeStatus = VTDecompressionSessionDecodeFrame(_deocderSession,
sampleBuffer,
flags,
&outputPixelBuffer,
&flagOut);
only means that you've provided the reference, it does not mean that it will be synchronously filled for certain.
I also recommend that you print out the OSStatus for:
CMBlockBufferCreateWithMemoryBlock
and
CMSampleBufferCreateReady
And if there's issues at those steps, there should be a way to know.
I'm struggling to encode audio buffers received from AVCaptureSession using
AudioConverter and then appending them to an AVAssetWriter.
I'm not getting any errors (including OSStatus responses), and the
CMSampleBuffers generated seem to have valid data, however the resulting file
simply does not have any playable audio. When writing together with video, the video
frames stop getting appended a couple of frames in (appendSampleBuffer()
returns false, but with no AVAssetWriter.error), probably because the asset
writer is waiting for the audio to catch up. I suspect it's related to the way
I'm setting up the priming for AAC.
The app uses RxSwift, but I've removed the RxSwift parts so that it's easier to
understand for a wider audience.
Please check out comments in the code below for more... comments
Given a settings struct:
import Foundation
import AVFoundation
import CleanroomLogger
public struct AVSettings {
let orientation: AVCaptureVideoOrientation = .Portrait
let sessionPreset = AVCaptureSessionPreset1280x720
let videoBitrate: Int = 2_000_000
let videoExpectedFrameRate: Int = 30
let videoMaxKeyFrameInterval: Int = 60
let audioBitrate: Int = 32 * 1024
/// Settings that are `0` means variable rate.
/// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time
/// to values based on the input stream.
let audioOutputABSD = AudioStreamBasicDescription(
mSampleRate: AVAudioSession.sharedInstance().sampleRate,
mFormatID: kAudioFormatMPEG4AAC,
mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue),
mBytesPerPacket: 0,
mFramesPerPacket: 1024,
mBytesPerFrame: 0,
mChannelsPerFrame: 1,
mBitsPerChannel: 0,
mReserved: 0)
let audioEncoderClassDescriptions = [
AudioClassDescription(
mType: kAudioEncoderComponentType,
mSubType: kAudioFormatMPEG4AAC,
mManufacturer: kAppleSoftwareAudioCodecManufacturer) ]
}
Some helper functions:
public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) {
switch (settings.sessionPreset, settings.orientation) {
case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920)
case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280)
default: fatalError("Unsupported session preset and orientation")
}
}
public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription {
var result = noErr
var absd = settings.audioOutputABSD
var description: CMAudioFormatDescription?
withUnsafePointer(&absd) { absdPtr in
result = CMAudioFormatDescriptionCreate(nil,
absdPtr,
0, nil,
0, nil,
nil,
&description)
}
if result != noErr {
Log.error?.message("Could not create audio format description")
}
return description!
}
public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription {
var result = noErr
var description: CMVideoFormatDescription?
let (width, height) = getVideoDimensions(fromSettings: settings)
result = CMVideoFormatDescriptionCreate(nil,
kCMVideoCodecType_H264,
Int32(width),
Int32(height),
[:],
&description)
if result != noErr {
Log.error?.message("Could not create video format description")
}
return description!
}
This is how the asset writer is initialized:
guard let audioDevice = defaultAudioDevice() else
{ throw RecordError.MissingDeviceFeature("Microphone") }
guard let videoDevice = defaultVideoDevice(.Back) else
{ throw RecordError.MissingDeviceFeature("Camera") }
let videoInput = try AVCaptureDeviceInput(device: videoDevice)
let audioInput = try AVCaptureDeviceInput(device: audioDevice)
let videoFormatHint = createVideoFormatDescription(fromSettings: settings)
let audioFormatHint = createAudioFormatDescription(fromSettings: settings)
let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo,
outputSettings: nil,
sourceFormatHint: videoFormatHint)
let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio,
outputSettings: nil,
sourceFormatHint: audioFormatHint)
writerVideoInput.expectsMediaDataInRealTime = true
writerAudioInput.expectsMediaDataInRealTime = true
let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true)
.URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString)
.URLByAppendingPathExtension("mp4")
let assetWriter = try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4)
if !assetWriter.canAddInput(writerVideoInput) {
throw RecordError.Unknown("Could not add video input") }
if !assetWriter.canAddInput(writerAudioInput) {
throw RecordError.Unknown("Could not add audio input") }
assetWriter.addInput(writerVideoInput)
assetWriter.addInput(writerAudioInput)
And this is how audio samples are being encoded, problem area is most likely to
be around here. I've re-written this so that it doesn't use any Rx-isms.
var outputABSD = settings.audioOutputABSD
var outputFormatDescription: CMAudioFormatDescription! = nil
CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription)
var converter: AudioConverter?
// Indicates whether priming information has been attached to the first buffer
var primed = false
func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? {
// Create the audio converter if it's not available
if converter == nil {
var classDescriptions = settings.audioEncoderClassDescriptions
var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory
var outputABSD = settings.audioOutputABSD
outputABSD.mSampleRate = inputABSD.mSampleRate
outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame
var converter: AudioConverterRef = nil
var result = noErr
result = withUnsafePointer(&outputABSD) { outputABSDPtr in
return withUnsafePointer(&inputABSD) { inputABSDPtr in
return AudioConverterNewSpecific(inputABSDPtr,
outputABSDPtr,
UInt32(classDescriptions.count),
&classDescriptions,
&converter)
}
}
if result != noErr { throw RecordError.Unknown }
// At this point I made an attempt to retrieve priming info from
// the audio converter assuming that it will give me back default values
// I can use, but ended up with `nil`
var primeInfo: AudioConverterPrimeInfo? = nil
var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo))
// The following returns a `noErr` but `primeInfo` is still `nil``
AudioConverterGetProperty(converter,
kAudioConverterPrimeInfo,
&primeInfoSize,
&primeInfo)
// I've also tried to set `kAudioConverterPrimeInfo` so that it knows
// the leading frames that are being primed, but the set didn't seem to work
// (`noErr` but getting the property afterwards still returned `nil`)
}
let converter = converter!
// Need to give a big enough output buffer.
// The assumption is that it will always be <= to the input size
let numSamples = CMSampleBufferGetNumSamples(buffer)
// This becomes 1024 * 2 = 2048
let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket)
let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize)
defer {
outputBufferPtr.destroy()
outputBufferPtr.dealloc(1)
}
var result = noErr
var outputPacketCount = UInt32(1)
var outputData = AudioBufferList(
mNumberBuffers: 1,
mBuffers: AudioBuffer(
mNumberChannels: outputABSD.mChannelsPerFrame,
mDataByteSize: UInt32(outputBufferSize),
mData: outputBufferPtr))
// See below for `EncodeAudioUserData`
var userData = EncodeAudioUserData(inputSampleBuffer: buffer,
inputBytesPerPacket: inputABSD.mBytesPerPacket)
withUnsafeMutablePointer(&userData) { userDataPtr in
// See below for `fetchAudioProc`
result = AudioConverterFillComplexBuffer(
converter,
fetchAudioProc,
userDataPtr,
&outputPacketCount,
&outputData,
nil)
}
if result != noErr {
Log.error?.message("Error while trying to encode audio buffer, code: \(result)")
return nil
}
// See below for `CMSampleBufferCreateCopy`
guard let newBuffer = CMSampleBufferCreateCopy(buffer,
fromAudioBufferList: &outputData,
newFromatDescription: outputFormatDescription) else {
Log.error?.message("Could not create sample buffer from audio buffer list")
return nil
}
if !primed {
primed = true
// Simply picked 2112 samples based on convention, is there a better way to determine this?
let samplesToPrime: Int64 = 2112
let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate)
let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond)
// Without setting the attachment the asset writer will complain about the
// first buffer missing the `TrimDurationAtStart` attachment, is there are way
// to infer the value from the given `AudioBufferList`?
CMSetAttachment(newBuffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, nil),
kCMAttachmentMode_ShouldNotPropagate)
}
return newBuffer
}
Below is the proc that fetches samples for the audio converter, and the data
structure that gets passed to it:
private class EncodeAudioUserData {
var inputSampleBuffer: CMSampleBuffer?
var inputBytesPerPacket: UInt32
init(inputSampleBuffer: CMSampleBuffer,
inputBytesPerPacket: UInt32) {
self.inputSampleBuffer = inputSampleBuffer
self.inputBytesPerPacket = inputBytesPerPacket
}
}
private let fetchAudioProc: AudioConverterComplexInputDataProc = {
(inAudioConverter,
ioDataPacketCount,
ioData,
outDataPacketDescriptionPtrPtr,
inUserData) in
var result = noErr
if ioDataPacketCount.memory == 0 { return noErr }
let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory
// If its already been processed
guard let buffer = userData.inputSampleBuffer else {
ioDataPacketCount.memory = 0
return -1
}
var inputBlockBuffer: CMBlockBuffer?
var inputBufferList = AudioBufferList()
result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
buffer,
nil,
&inputBufferList,
sizeof(AudioBufferList),
nil,
nil,
0,
&inputBlockBuffer)
if result != noErr {
Log.error?.message("Error while trying to retrieve buffer list, code: \(result)")
ioDataPacketCount.memory = 0
return result
}
let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket
ioDataPacketCount.memory = packetsCount
ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels
ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize
ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData
if outDataPacketDescriptionPtrPtr != nil {
outDataPacketDescriptionPtrPtr.memory = nil
}
return noErr
}
This is how I am converting AudioBufferLists to CMSampleBuffers:
public func CMSampleBufferCreateCopy(
buffer: CMSampleBuffer,
inout fromAudioBufferList bufferList: AudioBufferList,
newFromatDescription formatDescription: CMFormatDescription? = nil)
-> CMSampleBuffer? {
var result = noErr
var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)]
// Copy timing info from the previous buffer
var timingInfo = CMSampleTimingInfo()
result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo)
if result != noErr { return nil }
var newBuffer: CMSampleBuffer?
result = CMSampleBufferCreateReady(
kCFAllocatorDefault,
nil,
formatDescription ?? CMSampleBufferGetFormatDescription(buffer),
Int(bufferList.mNumberBuffers),
1, &timingInfo,
1, &sizeArray,
&newBuffer)
if result != noErr { return nil }
guard let b = newBuffer else { return nil }
CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList)
return newBuffer
}
Is there anything that I am obviously doing wrong? Is there a proper way to
construct CMSampleBuffers from AudioBufferList? How do you transfer priming
information from the converter to CMSampleBuffers that you create?
For my use case I need to do the encoding manually as the buffers will be
manipulated further down the pipeline (although I've disabled all
transformations after the encode in order to make sure that it works.)
Any help would be much appreciated. Sorry that there's so much code to
digest, but I wanted to provide as much context as possible.
Thanks in advance :)
Some related questions:
CMSampleBufferRef kCMSampleBufferAttachmentKey_TrimDurationAtStart crash
Can I use AVCaptureSession to encode an AAC stream to memory?
Writing video + generated audio to AVAssetWriterInput, audio stuttering
How do I use CoreAudio's AudioConverter to encode AAC in real-time?
Some references I've used:
Apple sample code demonstrating how to use AudioConverter
Note describing AAC encoder delay
Turns out there were a variety of things that I was doing wrong. Instead of posting a garble of code, I'm going to try and organize this into bite-sized pieces of things that I discovered..
Samples vs Packets vs Frames
This had been a huge source of confusion for me:
Each CMSampleBuffer can have 1 or more sample buffers (discovered via CMSampleBufferGetNumSamples)
Each CMSampleBuffer that contains 1 sample represents a single audio packet.
Therefore, CMSampleBufferGetNumSamples(sample) will return the number of packets contained in the given buffer.
Packets contain frames. This is governed by the mFramesPerPacket property of the buffer's AudioStreamBasicDescription. For linear PCM buffers, the total size of each sample buffer is frames * bytes per frame. For compressed buffers (like AAC), there is no relationship between the total size and frame count.
AudioConverterComplexInputDataProc
This callback is used to retrieve more linear PCM audio data for encoding. It's imperative that you must supply at least the number of packets specified by ioNumberDataPackets. Since I've been using the converter for real-time push-style encoding, I needed to ensure that each data push contains the minimum amount of packets. Something like this (pseudo-code):
let minimumPackets = outputFramesPerPacket / inputFramesPerPacket
var buffers: [CMSampleBuffer] = []
while getTotalSize(buffers) < minimumPackets {
buffers = buffers + [getNextBuffer()]
}
AudioConverterFillComplexBuffer(...)
Slicing CMSampleBuffer's
You can actually slice CMSampleBuffer's if they contain multiple buffers. The tool to do this is CMSampleBufferCopySampleBufferForRange. This is nice so that you can provide the AudioConverterComplexInputDataProc with the exact number of packets that it asks for, which makes handling timing information for the resulting encoded buffer easier. Because if you give the converter 1500 frames of data when it expects 1024, the result sample buffer will have a duration of 1024/sampleRate as opposed to 1500/sampleRate.
Priming and trim duration
When doing AAC encoding, you must set the trim duration like so:
CMSetAttachment(buffer,
kCMSampleBufferAttachmentKey_TrimDurationAtStart,
CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault),
kCMAttachmentMode_ShouldNotPropagate)
One thing I did wrong was that I added the trim duration at encode time. This should be handled by your writer so that it can guarantee the information gets added to your leading audio frames.
Also, the value of kCMSampleBufferAttachmentKey_TrimDurationAtStart should never be greater than the duration of the sample buffer. An example of priming:
Priming frames: 2112
Sample rate: 44100
Priming duration: 2112 / 44100 = ~0.0479s
First frame, frames: 1024, priming duration: 1024 / 44100
Second frame, frames: 1024, priming duration: 1088 / 41100
Creating the new CMSampleBuffer
AudioConverterFillComplexBuffer has an optional outputPacketDescriptionsPtr. You should use it. It will point to a new array of packet descriptions that contains sample size information. You need this sample size information to construct the new compressed sample buffer:
let bufferList: AudioBufferList
let packetDescriptions: [AudioStreamPacketDescription]
var newBuffer: CMSampleBuffer?
CMAudioSampleBufferCreateWithPacketDescriptions(
kCFAllocatorDefault, // allocator
nil, // dataBuffer
false, // dataReady
nil, // makeDataReadyCallback
nil, // makeDataReadyRefCon
formatDescription, // formatDescription
Int(bufferList.mNumberBuffers), // numSamples
CMSampleBufferGetPresentationTimeStamp(buffer), // sbufPTS (first PTS)
&packetDescriptions, // packetDescriptions
&newBuffer)
I'm using a project when I'm recording video from the camera, but the audio comes from streaming. The audio frames obviously are not synchronised with video frames.
If I use AVAssetWriter without video, recording audio frames from streaming it is working fine. But if I append video and audio frames, I can't hear anything.
Here it is the method for convert the audiodata from the stream to CMsampleBuffer
AudioStreamBasicDescription monoStreamFormat = [self getAudioDescription];
CMFormatDescriptionRef format = NULL;
OSStatus status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &monoStreamFormat, 0,NULL, 0, NULL, NULL, &format);
if (status != noErr) {
// really shouldn't happen
return nil;
}
CMSampleTimingInfo timing = { CMTimeMake(1, 44100.0), kCMTimeZero, kCMTimeInvalid };
CMSampleBufferRef sampleBuffer = NULL;
status = CMSampleBufferCreate(kCFAllocatorDefault, NULL, false, NULL, NULL, format, numSamples, 1, &timing, 0, NULL, &sampleBuffer);
if (status != noErr) {
// couldn't create the sample alguiebuffer
NSLog(#"Failed to create sample buffer");
CFRelease(format);
return nil;
}
// add the samples to the buffer
status = CMSampleBufferSetDataBufferFromAudioBufferList(sampleBuffer,
kCFAllocatorDefault,
kCFAllocatorDefault,
0,
samples);
if (status != noErr) {
NSLog(#"Failed to add samples to sample buffer");
CFRelease(sampleBuffer);
CFRelease(format);
return nil;
}
I don't know if this is related with the timing. But I would like to append the audio frames from the first second of the video.
is it that possible?
Thanks
Finally I did this
uint64_t _hostTimeToNSFactor = hostTime;
_hostTimeToNSFactor *= info.numer;
_hostTimeToNSFactor /= info.denom;
uint64_t timeNS = (uint64_t)(hostTime * _hostTimeToNSFactor);
CMTime presentationTime = self.initialiseTime;//CMTimeMake(timeNS, 1000000000);
CMSampleTimingInfo timing = { CMTimeMake(1, 44100), presentationTime, kCMTimeInvalid };
I have two iOS AudioQueues - one input that feeds samples directly to one output. Unfortunately, there is an echo effect that is quite noticeable :(
Is it possible to do low latency audio using AudioQueues or do I really need to use AudioUnits? (I have tried the Novocaine framework that uses AudioUnits and here the latency is much smaller. I've also noticed this framework seems to use less CPU resources. Unfortunately, I was unable to use this framework in my Swift project without major changes to it.)
Here are some extracts of my code, which is mainly done in Swift, except those callbacks which needs to be implemented in C.
private let audioStreamBasicDescription = AudioStreamBasicDescription(
mSampleRate: 16000,
mFormatID: AudioFormatID(kAudioFormatLinearPCM),
mFormatFlags: AudioFormatFlags(kAudioFormatFlagsNativeFloatPacked),
mBytesPerPacket: 4,
mFramesPerPacket: 1,
mBytesPerFrame: 4,
mChannelsPerFrame: 1,
mBitsPerChannel: 32,
mReserved: 0)
private let numberOfBuffers = 80
private let bufferSize: UInt32 = 256
private var active = false
private var inputQueue: AudioQueueRef = nil
private var outputQueue: AudioQueueRef = nil
private var inputBuffers = [AudioQueueBufferRef]()
private var outputBuffers = [AudioQueueBufferRef]()
private var headOfFreeOutputBuffers: AudioQueueBufferRef = nil
// callbacks implemented in Swift
private func audioQueueInputCallback(inputBuffer: AudioQueueBufferRef) {
if active {
if headOfFreeOutputBuffers != nil {
let outputBuffer = headOfFreeOutputBuffers
headOfFreeOutputBuffers = AudioQueueBufferRef(outputBuffer.memory.mUserData)
outputBuffer.memory.mAudioDataByteSize = inputBuffer.memory.mAudioDataByteSize
memcpy(outputBuffer.memory.mAudioData, inputBuffer.memory.mAudioData, Int(inputBuffer.memory.mAudioDataByteSize))
assert(AudioQueueEnqueueBuffer(outputQueue, outputBuffer, 0, nil) == 0)
} else {
println(__FUNCTION__ + ": out-of-output-buffers!")
}
assert(AudioQueueEnqueueBuffer(inputQueue, inputBuffer, 0, nil) == 0)
}
}
private func audioQueueOutputCallback(outputBuffer: AudioQueueBufferRef) {
if active {
outputBuffer.memory.mUserData = UnsafeMutablePointer<Void>(headOfFreeOutputBuffers)
headOfFreeOutputBuffers = outputBuffer
}
}
func start() {
var error: NSError?
audioSession.setCategory(AVAudioSessionCategoryPlayAndRecord, withOptions: .allZeros, error: &error)
dumpError(error, functionName: "AVAudioSessionCategoryPlayAndRecord")
audioSession.setPreferredSampleRate(16000, error: &error)
dumpError(error, functionName: "setPreferredSampleRate")
audioSession.setPreferredIOBufferDuration(0.005, error: &error)
dumpError(error, functionName: "setPreferredIOBufferDuration")
audioSession.setActive(true, error: &error)
dumpError(error, functionName: "setActive(true)")
assert(active == false)
active = true
// cannot provide callbacks to AudioQueueNewInput/AudioQueueNewOutput from Swift and so need to interface C functions
assert(MyAudioQueueConfigureInputQueueAndCallback(audioStreamBasicDescription, &inputQueue, audioQueueInputCallback) == 0)
assert(MyAudioQueueConfigureOutputQueueAndCallback(audioStreamBasicDescription, &outputQueue, audioQueueOutputCallback) == 0)
for (var i = 0; i < numberOfBuffers; i++) {
var audioQueueBufferRef: AudioQueueBufferRef = nil
assert(AudioQueueAllocateBuffer(inputQueue, bufferSize, &audioQueueBufferRef) == 0)
assert(AudioQueueEnqueueBuffer(inputQueue, audioQueueBufferRef, 0, nil) == 0)
inputBuffers.append(audioQueueBufferRef)
assert(AudioQueueAllocateBuffer(outputQueue, bufferSize, &audioQueueBufferRef) == 0)
outputBuffers.append(audioQueueBufferRef)
audioQueueBufferRef.memory.mUserData = UnsafeMutablePointer<Void>(headOfFreeOutputBuffers)
headOfFreeOutputBuffers = audioQueueBufferRef
}
assert(AudioQueueStart(inputQueue, nil) == 0)
assert(AudioQueueStart(outputQueue, nil) == 0)
}
And then my C-code to set up the callbacks back to Swift:
static void MyAudioQueueAudioInputCallback(void * inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer, const AudioTimeStamp * inStartTime,
UInt32 inNumberPacketDescriptions, const AudioStreamPacketDescription * inPacketDescs) {
void(^block)(AudioQueueBufferRef) = (__bridge void(^)(AudioQueueBufferRef))inUserData;
block(inBuffer);
}
static void MyAudioQueueAudioOutputCallback(void *inUserData, AudioQueueRef inAQ, AudioQueueBufferRef inBuffer) {
void(^block)(AudioQueueBufferRef) = (__bridge void(^)(AudioQueueBufferRef))inUserData;
block(inBuffer);
}
OSStatus MyAudioQueueConfigureInputQueueAndCallback(AudioStreamBasicDescription inFormat, AudioQueueRef *inAQ, void(^callback)(AudioQueueBufferRef)) {
return AudioQueueNewInput(&inFormat, MyAudioQueueAudioInputCallback, (__bridge_retained void *)([callback copy]), nil, nil, 0, inAQ);
}
OSStatus MyAudioQueueConfigureOutputQueueAndCallback(AudioStreamBasicDescription inFormat, AudioQueueRef *inAQ, void(^callback)(AudioQueueBufferRef)) {
return AudioQueueNewOutput(&inFormat, MyAudioQueueAudioOutputCallback, (__bridge_retained void *)([callback copy]), nil, nil, 0, inAQ);
}
After a good while I found this great post using AudioUnits instead of AudioQueues. I just ported it to Swift and then simply added:
audioSession.setPreferredIOBufferDuration(0.005, error: &error)
If you're recording audio from a microphone and playing it back within earshot of that microphone, then due to the audio throughput not being instantaneous, some of your previous output will make it into the new input, hence the echo. This phenomenon is called feedback.
This is a structural problem, so changing the recording API won't help (although changing your recording/playback buffer sizes will give you control over the delay in the echo). You can either play back the audio in such a way that the microphone can't hear it (e.g. not at all, or through headphones) or go down the rabbit hole of echo cancellation.
I'm trying to decode a raw stream of .H264 video data but I can't find a way to create a proper
- (void)decodeFrameWithNSData:(NSData*)data presentationTime:
(CMTime)presentationTime
{
#autoreleasepool {
CMSampleBufferRef sampleBuffer = NULL;
CMBlockBufferRef blockBuffer = NULL;
VTDecodeInfoFlags infoFlags;
int sourceFrame;
if( dSessionRef == NULL )
[self createDecompressionSession];
CMSampleTimingInfo timingInfo ;
timingInfo.presentationTimeStamp = presentationTime;
timingInfo.duration = CMTimeMake(1,100000000);
timingInfo.decodeTimeStamp = kCMTimeInvalid;
//Creates block buffer from NSData
OSStatus status = CMBlockBufferCreateWithMemoryBlock(CFAllocatorGetDefault(), (void*)data.bytes,data.length*sizeof(char), CFAllocatorGetDefault(), NULL, 0, data.length*sizeof(char), 0, &blockBuffer);
//Creates CMSampleBuffer to feed decompression session
status = CMSampleBufferCreateReady(CFAllocatorGetDefault(), blockBuffer,self.encoderVideoFormat,1,1,&timingInfo, 0, 0, &sampleBuffer);
status = VTDecompressionSessionDecodeFrame(dSessionRef,sampleBuffer, kVTDecodeFrame_1xRealTimePlayback, &sourceFrame,&infoFlags);
if(status != noErr) {
NSLog(#"Decode with data error %d",status);
}
}
}
At the end of the call I'm getting -12911 error in VTDecompressionSessionDecodeFrame that translates to kVTVideoDecoderMalfunctionErr which after reading this [post] pointed me that I should make a VideoFormatDescriptor using CMVideoFormatDescriptionCreateFromH264ParameterSets. But how can I create a new VideoFormatDescription if I don't have information of the currentSps or currentPps? How can I get that information from my raw .H264 streaming?
CMFormatDescriptionRef decoderFormatDescription;
const uint8_t* const parameterSetPointers[2] =
{ (const uint8_t*)[currentSps bytes], (const uint8_t*)[currentPps bytes] };
const size_t parameterSetSizes[2] =
{ [currentSps length], [currentPps length] };
status = CMVideoFormatDescriptionCreateFromH264ParameterSets(NULL,
2,
parameterSetPointers,
parameterSetSizes,
4,
&decoderFormatDescription);
Thanks in advance,
Marcos
[post] : Decoding H264 VideoToolkit API fails with Error -8971 in VTDecompressionSessionCreate
You you MUST call CMVideoFormatDescriptionCreateFromH264ParameterSets first. The SPS/PPS may be stored/transmitted separately from the video stream. Or may come inline.
Note that for VTDecompressionSessionDecodeFrame your NALUs must be preceded with a size, and not a start code.
You can read more here:
Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream