Append audio samples to AVAssetWriter from streaming - ios

I'm using a project when I'm recording video from the camera, but the audio comes from streaming. The audio frames obviously are not synchronised with video frames.
If I use AVAssetWriter without video, recording audio frames from streaming it is working fine. But if I append video and audio frames, I can't hear anything.
Here it is the method for convert the audiodata from the stream to CMsampleBuffer
AudioStreamBasicDescription monoStreamFormat = [self getAudioDescription];
CMFormatDescriptionRef format = NULL;
OSStatus status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &monoStreamFormat, 0,NULL, 0, NULL, NULL, &format);
if (status != noErr) {
// really shouldn't happen
return nil;
}
CMSampleTimingInfo timing = { CMTimeMake(1, 44100.0), kCMTimeZero, kCMTimeInvalid };
CMSampleBufferRef sampleBuffer = NULL;
status = CMSampleBufferCreate(kCFAllocatorDefault, NULL, false, NULL, NULL, format, numSamples, 1, &timing, 0, NULL, &sampleBuffer);
if (status != noErr) {
// couldn't create the sample alguiebuffer
NSLog(#"Failed to create sample buffer");
CFRelease(format);
return nil;
}
// add the samples to the buffer
status = CMSampleBufferSetDataBufferFromAudioBufferList(sampleBuffer,
kCFAllocatorDefault,
kCFAllocatorDefault,
0,
samples);
if (status != noErr) {
NSLog(#"Failed to add samples to sample buffer");
CFRelease(sampleBuffer);
CFRelease(format);
return nil;
}
I don't know if this is related with the timing. But I would like to append the audio frames from the first second of the video.
is it that possible?
Thanks

Finally I did this
uint64_t _hostTimeToNSFactor = hostTime;
_hostTimeToNSFactor *= info.numer;
_hostTimeToNSFactor /= info.denom;
uint64_t timeNS = (uint64_t)(hostTime * _hostTimeToNSFactor);
CMTime presentationTime = self.initialiseTime;//CMTimeMake(timeNS, 1000000000);
CMSampleTimingInfo timing = { CMTimeMake(1, 44100), presentationTime, kCMTimeInvalid };

Related

iOS 13.1.3 VTDecompressionSessionDecodeFrame can't decode right

CVPixelBufferRef outputPixelBuffer = NULL;
CMBlockBufferRef blockBuffer = NULL;
void* buffer = (void*)[videoUnit bufferWithH265LengthHeader];
OSStatus status = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
buffer,
videoUnit.length,
kCFAllocatorNull,
NULL, 0, videoUnit.length,
0, &blockBuffer);
if(status == kCMBlockBufferNoErr) {
CMSampleBufferRef sampleBuffer = NULL;
const size_t sampleSizeArray[] = {videoUnit.length};
status = CMSampleBufferCreateReady(kCFAllocatorDefault,
blockBuffer,
_decoderFormatDescription ,
1, 0, NULL, 1, sampleSizeArray,
&sampleBuffer);
if (status == kCMBlockBufferNoErr && sampleBuffer && _deocderSession) {
VTDecodeFrameFlags flags = 0;
VTDecodeInfoFlags flagOut = 0;
OSStatus decodeStatus = VTDecompressionSessionDecodeFrame(_deocderSession,
sampleBuffer,
flags,
&outputPixelBuffer,
&flagOut);
if(decodeStatus == kVTInvalidSessionErr) {
NSLog(#"IOS8VT: Invalid session, reset decoder session");
} else if(decodeStatus == kVTVideoDecoderBadDataErr) {
NSLog(#"IOS8VT: decode failed status=%d(Bad data)", decodeStatus);
} else if(decodeStatus != noErr) {
NSLog(#"IOS8VT: decode failed status=%d", decodeStatus);
}
CFRelease(sampleBuffer);
}
CFRelease(blockBuffer);
}
return outputPixelBuffer;
This is my code to decode the stream data.It was working good on iPhone 6s,but when the code running on iPhoneX or iphone11, the "outputPixelBuffer" return a nil. Can anyone help?
Without seeing the code for your decompressionSession creation, it is hard to say. It could be that your decompressionSession is providing the outputBuffer to the callback function provided at creation, so I highly recommend you add that part of your code too.
By providing &outputPixelBuffer in:
OSStatus decodeStatus = VTDecompressionSessionDecodeFrame(_deocderSession,
sampleBuffer,
flags,
&outputPixelBuffer,
&flagOut);
only means that you've provided the reference, it does not mean that it will be synchronously filled for certain.
I also recommend that you print out the OSStatus for:
CMBlockBufferCreateWithMemoryBlock
and
CMSampleBufferCreateReady
And if there's issues at those steps, there should be a way to know.

Muxing AAC audio and h.264 video streams to mp4 with AVFoundation

For OSX and IOS, I have streams of real time encoded video (h.264) and audio (AAC) data coming in, and I want to be able to mux these together into an mp4.
I'm using an AVAssetWriterto perform the muxing.
I have video working, but my audio still sounds like jumbled static. Here's what I'm trying right now (skipping some of the error checks here for brevity):
I initialize the writer:
NSURL *url = [NSURL fileURLWithPath:mContext->filename];
NSError* err = nil;
mContext->writer = [AVAssetWriter assetWriterWithURL:url fileType:AVFileTypeMPEG4 error:&err];
I initialize the audio input:
NSDictionary* settings;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
settings = nil; // set output to nil so it becomes a pass-through
CMAudioFormatDescriptionRef audioFormatDesc = nil;
{
AudioStreamBasicDescription absd = {0};
absd.mSampleRate = mParameters.audioSampleRate; //known sample rate
absd.mFormatID = kAudioFormatMPEG4AAC;
absd.mFormatFlags = kMPEG4Object_AAC_Main;
CMAudioFormatDescriptionCreate(NULL, &absd, 0, NULL, 0, NULL, NULL, &audioFormatDesc);
}
mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:settings sourceFormatHint:audioFormatDesc];
mContext->aacWriterInput.expectsMediaDataInRealTime = YES;
[mContext->writer addInput:mContext->aacWriterInput];
And start the writer:
[mContext->writer startWriting];
[mContext->writer startSessionAtSourceTime:kCMTimeZero];
Then, I have a callback where I receive a packet with a timestamp (milliseconds), and a std::vector<uint8_t> with the data containing 1024 compressed samples. I make sure isReadyForMoreMediaData is true. Then, if this is our first time receiving the callback, I set up the CMAudioFormatDescription:
OSStatus error = 0;
AudioStreamBasicDescription streamDesc = {0};
streamDesc.mSampleRate = mParameters.audioSampleRate;
streamDesc.mFormatID = kAudioFormatMPEG4AAC;
streamDesc.mFormatFlags = kMPEG4Object_AAC_Main;
streamDesc.mChannelsPerFrame = 2; // always stereo for us
streamDesc.mBitsPerChannel = 0;
streamDesc.mBytesPerFrame = 0;
streamDesc.mFramesPerPacket = 1024; // Our AAC packets contain 1024 samples per frame
streamDesc.mBytesPerPacket = 0;
streamDesc.mReserved = 0;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
error = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &streamDesc, sizeof(acl), &acl, 0, NULL, NULL, &mContext->audioFormat);
And finally, I create a CMSampleBufferRef and send it along:
CMSampleBufferRef buffer = NULL;
CMBlockBufferRef blockBuffer;
CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, NULL, packet.data.size(), kCFAllocatorDefault, NULL, 0, packet.data.size(), kCMBlockBufferAssureMemoryNowFlag, &blockBuffer);
CMBlockBufferReplaceDataBytes((void*)packet.data.data(), blockBuffer, 0, packet.data.size());
CMTime duration = CMTimeMake(1024, mParameters.audioSampleRate);
CMTime pts = CMTimeMake(packet.timestamp, 1000);
CMSampleTimingInfo timing = {duration , pts, kCMTimeInvalid };
size_t sampleSizeArray[1] = {packet.data.size()};
error = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, nullptr, mContext->audioFormat, 1, 1, &timing, 1, sampleSizeArray, &buffer);
// First input buffer must have an appropriate kCMSampleBufferAttachmentKey_TrimDurationAtStart since the codec has encoder delay'
if (mContext->firstAudioFrame)
{
CFDictionaryRef dict = NULL;
dict = CMTimeCopyAsDictionary(CMTimeMake(1024, 44100), kCFAllocatorDefault);
CMSetAttachment(buffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, dict, kCMAttachmentMode_ShouldNotPropagate);
// we must trim the start time on first audio frame...
mContext->firstAudioFrame = false;
}
CMSampleBufferMakeDataReady(buffer);
BOOL ret = [mContext->aacWriterInput appendSampleBuffer:buffer];
I guess the part I'm most suspicious of is my call to CMSampleBufferCreate. It seems I have to pass in a sample sizes array, otherwise I get this error message immediately when checking my writer's status:
Error Domain=AVFoundationErrorDomain Code=-11800 "The operation could not be completed" UserInfo={NSLocalizedFailureReason=An unknown error occurred (-12735), NSLocalizedDescription=The operation could not be completed, NSUnderlyingError=0x604001e50770 {Error Domain=NSOSStatusErrorDomain Code=-12735 "(null)"}}
Where the underlying error appears to be kCMSampleBufferError_BufferHasNoSampleSizes.
I did notice an example in Apple's documentation for creating the buffer with AAC data:
https://developer.apple.com/documentation/coremedia/1489723-cmsamplebuffercreate?language=objc
In their example, they specify a long sampleSizeArray with an entry for every single sample. Is that necessary? I don't have that information with this callback. And in our Windows implementation we didn't need that data. So I tried sending in packet.data.size() as the sample size but that doesn't seem right and it certainly doesn't produce pleasant audio.
Any ideas? Either tweaks to my calls here or different APIs I should be using to mux together streams of encoded data.
Thanks!
If you don't want to transcode, do not pass the outputSetting dictionary. You should pass nil there:
mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:nil sourceFormatHint:audioFormatDesc];
It is explained somewhere in this article:
https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/05_Export.html

Using CMSampleTimingInfo, CMSampleBuffer and AudioBufferList from raw PCM stream

I'm receiving a raw PCM stream from Google's WebRTC C++ reference implementation (a hook inserted into VoEBaseImpl::GetPlayoutData). The audio appears to be linear PCM, signed int16, but when recording this using an AssetWriter it saves to the audio file highly distorted and higher pitch.
I am assuming this is an error somewhere with the input parameters, most probably with respect to the conversion of the stereo-int16 to an AudioBufferList and then on to a CMSampleBuffer. Is there any issue with the below code?
void RecorderImpl::RenderAudioFrame(void* audio_data, size_t number_of_frames, int sample_rate, int64_t elapsed_time_ms, int64_t ntp_time_ms) {
OSStatus status;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
AudioStreamBasicDescription audioFormat;
audioFormat.mSampleRate = sample_rate;
audioFormat.mFormatID = kAudioFormatLinearPCM;
audioFormat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked;
audioFormat.mFramesPerPacket = 1;
audioFormat.mChannelsPerFrame = 2;
audioFormat.mBitsPerChannel = 16;
audioFormat.mBytesPerPacket = audioFormat.mFramesPerPacket * audioFormat.mChannelsPerFrame * audioFormat.mBitsPerChannel / 8;
audioFormat.mBytesPerFrame = audioFormat.mBytesPerPacket / audioFormat.mFramesPerPacket;
CMSampleTimingInfo timing = { CMTimeMake(1, sample_rate), CMTimeMake(elapsed_time_ms, 1000), kCMTimeInvalid };
CMFormatDescriptionRef format = NULL;
status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &audioFormat, sizeof(acl), &acl, 0, NULL, NULL, &format);
if(status != 0) {
NSLog(#"Failed to create audio format description");
return;
}
CMSampleBufferRef buffer;
status = CMSampleBufferCreate(kCFAllocatorDefault, NULL, false, NULL, NULL, format, (CMItemCount)number_of_frames, 1, &timing, 0, NULL, &buffer);
if(status != 0) {
NSLog(#"Failed to allocate sample buffer");
return;
}
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0].mNumberChannels = audioFormat.mChannelsPerFrame;
bufferList.mBuffers[0].mDataByteSize = (UInt32)(number_of_frames * audioFormat.mBytesPerFrame);
bufferList.mBuffers[0].mData = audio_data;
status = CMSampleBufferSetDataBufferFromAudioBufferList(buffer, kCFAllocatorDefault, kCFAllocatorDefault, 0, &bufferList);
if(status != 0) {
NSLog(#"Failed to convert audio buffer list into sample buffer");
return;
}
[recorder writeAudioFrames:buffer];
CFRelease(buffer);
}
For reference, the sample rate I'm receiving from WebRTC on an iPhone 6S+ / iOS 9.2 is 48kHz with 480 samples per invocation of this hook and I'm receiving data every 10 ms.
First of all, congratulations on having the temerity to create an audio CMSampleBuffer from scratch. For most, they are neither created nor destroyed, but handed down immaculate and mysterious from CoreMedia and AVFoundation.
The presentationTimeStamps in your timing info are in integral milliseconds, which cannot represent your 48kHz samples' positions in time.
Instead of CMTimeMake(elapsed_time_ms, 1000), try CMTimeMake(elapsed_frames, sample_rate), where elapsed_frames are the number of frames that you have previously written.
That would explain the distortion, but not the pitch, so make sure that the AudioStreamBasicDescription matches your AVAssetWriterInput setup. It's hard to say without seeing your AVAssetWriter code.
p.s Look out for writeAudioFrames - if it's asynchronous, you'll have problems with ownership of the audio_data.
p.p.s. it looks like you're leaking the CMFormatDescriptionRef.
I ended up opening up the audio file that was generated in Audacity and saw that every frame had half of it dropped, as shown in this rather bizarre looking waveform:
Changing acl.mChannelLayoutTag to kAudioChannelLayoutTag_Mono and changing audioFormat.mChannelsPerFrame to 1 solved the issue and now the audio quality is perfect. Hooray!

Decoding H264 VideoToolkit API fails with Error -12911 in VTDecompressionSessionDecodeFrame

I'm trying to decode a raw stream of .H264 video data but I can't find a way to create a proper
- (void)decodeFrameWithNSData:(NSData*)data presentationTime:
(CMTime)presentationTime
{
#autoreleasepool {
CMSampleBufferRef sampleBuffer = NULL;
CMBlockBufferRef blockBuffer = NULL;
VTDecodeInfoFlags infoFlags;
int sourceFrame;
if( dSessionRef == NULL )
[self createDecompressionSession];
CMSampleTimingInfo timingInfo ;
timingInfo.presentationTimeStamp = presentationTime;
timingInfo.duration = CMTimeMake(1,100000000);
timingInfo.decodeTimeStamp = kCMTimeInvalid;
//Creates block buffer from NSData
OSStatus status = CMBlockBufferCreateWithMemoryBlock(CFAllocatorGetDefault(), (void*)data.bytes,data.length*sizeof(char), CFAllocatorGetDefault(), NULL, 0, data.length*sizeof(char), 0, &blockBuffer);
//Creates CMSampleBuffer to feed decompression session
status = CMSampleBufferCreateReady(CFAllocatorGetDefault(), blockBuffer,self.encoderVideoFormat,1,1,&timingInfo, 0, 0, &sampleBuffer);
status = VTDecompressionSessionDecodeFrame(dSessionRef,sampleBuffer, kVTDecodeFrame_1xRealTimePlayback, &sourceFrame,&infoFlags);
if(status != noErr) {
NSLog(#"Decode with data error %d",status);
}
}
}
At the end of the call I'm getting -12911 error in VTDecompressionSessionDecodeFrame that translates to kVTVideoDecoderMalfunctionErr which after reading this [post] pointed me that I should make a VideoFormatDescriptor using CMVideoFormatDescriptionCreateFromH264ParameterSets. But how can I create a new VideoFormatDescription if I don't have information of the currentSps or currentPps? How can I get that information from my raw .H264 streaming?
CMFormatDescriptionRef decoderFormatDescription;
const uint8_t* const parameterSetPointers[2] =
{ (const uint8_t*)[currentSps bytes], (const uint8_t*)[currentPps bytes] };
const size_t parameterSetSizes[2] =
{ [currentSps length], [currentPps length] };
status = CMVideoFormatDescriptionCreateFromH264ParameterSets(NULL,
2,
parameterSetPointers,
parameterSetSizes,
4,
&decoderFormatDescription);
Thanks in advance,
Marcos
[post] : Decoding H264 VideoToolkit API fails with Error -8971 in VTDecompressionSessionCreate
You you MUST call CMVideoFormatDescriptionCreateFromH264ParameterSets first. The SPS/PPS may be stored/transmitted separately from the video stream. Or may come inline.
Note that for VTDecompressionSessionDecodeFrame your NALUs must be preceded with a size, and not a start code.
You can read more here:
Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream

Pass through CMSampleBufferRef data to audio output jack

I am developing one app in which I need to pass through audio capturing through output audio jack at the same time record and save video.
I have looked into aurio touch apple sample code and implemented audio passthrough.
I have also implemented the video recording through AVCaptureSession.
Above both functionality individually done and works pefectly.
But when I merge functionality audio pass through not working because of audio session of the AVCapturesession.
I have also tried to pass through audio data which I am getting from AVCaptureSession delegate methods. Below is my code :
OSStatus err = noErr;
AudioBufferList audioBufferList;
CMBlockBufferRef blockBuffer;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, NULL, &audioBufferList, sizeof(audioBufferList), NULL, NULL, 0, &blockBuffer);
CMItemCount numberOfFrames = CMSampleBufferGetNumSamples(sampleBuffer); // corresponds to the number of CoreAudio audio frames
currentSampleTime += (double)numberOfFrames;
AudioTimeStamp timeStamp;
memset(&timeStamp, 0, sizeof(AudioTimeStamp));
timeStamp.mSampleTime = currentSampleTime;
timeStamp.mFlags |= kAudioTimeStampSampleTimeValid;
AudioUnitRenderActionFlags flags = 0;
aurioTouchAppDelegate *THIS = (aurioTouchAppDelegate *)[[UIApplication sharedApplication]delegate];
err = AudioUnitRender(self.rioUnit, &flags, &timeStamp, 1, numberOfFrames, &audioBufferList);
if (err) { printf("PerformThru: error %d\n", (int)err); }
But it is giving error. Please advise what can be done further as soon as possible. I have looked into so many docs and so many codes but couldn't find any solution. Please help..
Here's some better error handling code. What error does it return? You can look up the error description by searching for it in the documentation.
static void CheckError (OSStatus error, const char *operation) {
if (error == noErr) return;
char str[20] = {};
// see if it appears to be a 4 char code
*(UInt32*)(str + 1) = CFSwapInt32HostToBig(error);
if (isprint(str[1]) && isprint(str[2]) && isprint(str[3]) && isprint(str[4])) {
str[0] = str[5] = '\'';
str[6] = '\0';
} else {
sprintf(str, "%d", (int)error);
}
fprintf(stderr, "Error: %s(%s)\n", operation, str);
exit(1);
}
- (void)yourFunction
{
AudioBufferList audioBufferList;
CMBlockBufferRef blockBuffer;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, NULL, &audioBufferList, sizeof(audioBufferList), NULL, NULL, 0, &blockBuffer);
CMItemCount numberOfFrames = CMSampleBufferGetNumSamples(sampleBuffer); // corresponds to the number of CoreAudio audio frames
currentSampleTime += (double)numberOfFrames;
AudioTimeStamp timeStamp;
memset(&timeStamp, 0, sizeof(AudioTimeStamp));
timeStamp.mSampleTime = currentSampleTime;
timeStamp.mFlags |= kAudioTimeStampSampleTimeValid;
AudioUnitRenderActionFlags flags = 0;
aurioTouchAppDelegate *THIS = (aurioTouchAppDelegate *)[[UIApplication sharedApplication]delegate];
CheckError(AudioUnitRender(self.rioUnit, &flags, &timeStamp, 1, numberOfFrames, &audioBufferList),
"Error with AudioUnitRender");
}

Resources