iOS: VideoToolBox decompress h263 Video abnormal - ios

I am working on H263 decompression with VideoToolBox.but when decoding 4CIF video stream, the output pixel data are all 0 value, and there is no error info.
I don't know why this happened, as video stream with CIF resolution is decompressed correctly.
Is any one has the same problem?
this is a piece of my code:
CMFormatDescriptionRef newFmtDesc = nil;
OSStatus status = CMVideoFormatDescriptionCreate(kCFAllocatorDefault,
kCMVideoCodecType_H263,
width,
height,
NULL,
&_videoFormatDescription);
if (status)
{
return -1;
}
CFMutableDictionaryRef dpba = CFDictionaryCreateMutable(kCFAllocatorDefault,
2,
&kCFTypeDictionaryKeyCallBacks,
&kCFTypeDictionaryValueCallBacks);
CFDictionarySetValue(dpba,
kCVPixelBufferOpenGLCompatibilityKey,
kCFBooleanFalse);
VTDictionarySetInt32(dpba,
kCVPixelBufferPixelFormatTypeKey,
kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange); // use NV12
VTDictionarySetInt32(dpba,
kCVPixelBufferWidthKey,
dimension.width);
VTDictionarySetInt32(dpba,
kCVPixelBufferHeightKey,
dimension.height);
VTDictionarySetInt32(dpba,
kCVPixelBufferBytesPerRowAlignmentKey,
dimension.width);// setup decoder callback record
VTDecompressionOutputCallbackRecord decoderCallbackRecord;
decoderCallbackRecord.decompressionOutputCallback = onDecodeCallback;
decoderCallbackRecord.decompressionOutputRefCon = this;// create decompression session
status = VTDecompressionSessionCreate(kCFAllocatorDefault,
_videoFormatDescription,
nil,
dpba,
&decoderCallbackRecord,
&_session);
// Do Decode
CMSampleBufferRef sampleBuffer;
sampleBuffer = VTSampleBufferCreate(_videoFormatDescription, (void*)data_start, data_len, ts);
VTDecodeFrameFlags flags = 0;
VTDecodeInfoFlags flagOut = 0;
OSStatus decodeStatus = VTDecompressionSessionDecodeFrame(_session,
sampleBuffer,
flags,
nil,
&flagOut);
I tried compressing H263 with VideoToolBox, I int the session with resolution of 4CIF, and push 4CIF NV12 image to the compression session, but the output of H263 stream is in CIF resolution!
Is VideoToolBox can't support 4CIF H263 Video on both compression and decompression?

Related

Crash while sending ARFrame CVPixelBuffer as byte array over network using gstreamer

I want to send ARFrame pixel buffer data over network, i have listed my setup below. With this setup if i try to send the frames app crashes in gstreamers C code after few frames, but if i send camera's AVCaptureVideoDataOutput pixel buffer instead, the stream works fine. I have set AVCaptureSession's pixel format type to
kCVPixelFormatType_420YpCbCr8BiPlanarFullRange
so it replicates the same type ARFrame gives. Please help i am unable to find any solution. I am sorry if my english is bad or i have missed something out, do ask me for it.
My Setup
Get pixelBuffer of ARFrame from didUpdateFrame delegate of ARKit
Encode to h264 using VTCompressionSession
- (void)SendFrames:(CVPixelBufferRef)pixelBuffer :(NSTimeInterval)timeStamp
{
size_t width = CVPixelBufferGetWidth(pixelBuffer);
size_t height = CVPixelBufferGetHeight(pixelBuffer);
if(session == NULL)
{
[self initEncoder:width height:height];
}
CMTime presentationTimeStamp = CMTimeMake(0, 1);
OSStatus statusCode = VTCompressionSessionEncodeFrame(session, pixelBuffer, presentationTimeStamp, kCMTimeInvalid, NULL, NULL, NULL);
if (statusCode != noErr) {
// End the session
VTCompressionSessionInvalidate(session);
CFRelease(session);
session = NULL;
return;
}
VTCompressionSessionEndPass(session, NULL, NULL);
}
- (void) initEncoder:(size_t)width height:(size_t)height
{
OSStatus status = VTCompressionSessionCreate(NULL, (int)width, (int)height, kCMVideoCodecType_H264, NULL, NULL, NULL, OutputCallback, NULL, &session);
NSLog(#":VTCompressionSessionCreate %d", (int)status);
if (status != noErr)
{
NSLog(#"Unable to create a H264 session");
return ;
}
VTSessionSetProperty(session, kVTCompressionPropertyKey_RealTime, kCFBooleanTrue);
VTSessionSetProperty(session, kVTCompressionPropertyKey_ProfileLevel, kVTProfileLevel_H264_Baseline_AutoLevel);
VTCompressionSessionPrepareToEncodeFrames(session);
}
Get sampleBuffer from callback, convert it to elementary stream
void OutputCallback(void *outputCallbackRefCon, void *sourceFrameRefCon, OSStatus status, VTEncodeInfoFlags infoFlags,CMSampleBufferRef sampleBuffer)
{
if (status != noErr) {
NSLog(#"Error encoding video, err=%lld", (int64_t)status);
return;
}
if (!CMSampleBufferDataIsReady(sampleBuffer))
{
NSLog(#"didCompressH264 data is not ready ");
return;
}
// In this example we will use a NSMutableData object to store the
// elementary stream.
NSMutableData *elementaryStream = [NSMutableData data];
// This is the start code that we will write to
// the elementary stream before every NAL unit
static const size_t startCodeLength = 4;
static const uint8_t startCode[] = {0x00, 0x00, 0x00, 0x01};
// Write the SPS and PPS NAL units to the elementary stream
CMFormatDescriptionRef description = CMSampleBufferGetFormatDescription(sampleBuffer);
// Find out how many parameter sets there are
size_t numberOfParameterSets;
int AVCCHeaderLength;
CMVideoFormatDescriptionGetH264ParameterSetAtIndex(description,
0, NULL, NULL,
&numberOfParameterSets,
&AVCCHeaderLength);
// Write each parameter set to the elementary stream
for (int i = 0; i < numberOfParameterSets; i++) {
const uint8_t *parameterSetPointer;
int NALUnitHeaderLengthOut = 0;
size_t parameterSetLength;
CMVideoFormatDescriptionGetH264ParameterSetAtIndex(description,
i,
&parameterSetPointer,
&parameterSetLength,
NULL, &NALUnitHeaderLengthOut);
// Write the parameter set to the elementary stream
[elementaryStream appendBytes:startCode length:startCodeLength];
[elementaryStream appendBytes:parameterSetPointer length:parameterSetLength];
}
// Get a pointer to the raw AVCC NAL unit data in the sample buffer
size_t blockBufferLength;
uint8_t *bufferDataPointer = NULL;
size_t lengthAtOffset = 0;
size_t bufferOffset = 0;
CMBlockBufferGetDataPointer(CMSampleBufferGetDataBuffer(sampleBuffer),
bufferOffset,
&lengthAtOffset,
&blockBufferLength,
(char **)&bufferDataPointer);
// Loop through all the NAL units in the block buffer
// and write them to the elementary stream with
// start codes instead of AVCC length headers
while (bufferOffset < blockBufferLength - AVCCHeaderLength) {
// Read the NAL unit length
uint32_t NALUnitLength = 0;
memcpy(&NALUnitLength, bufferDataPointer + bufferOffset, AVCCHeaderLength);
// Convert the length value from Big-endian to Little-endian
NALUnitLength = CFSwapInt32BigToHost(NALUnitLength);
// Write start code to the elementary stream
[elementaryStream appendBytes:startCode length:startCodeLength];
// Write the NAL unit without the AVCC length header to the elementary stream
[elementaryStream appendBytes:bufferDataPointer + bufferOffset + AVCCHeaderLength
length:NALUnitLength];
// Move to the next NAL unit in the block buffer
bufferOffset += AVCCHeaderLength + NALUnitLength;
}
char *bytePtr = (char *)[elementaryStream mutableBytes];
long maxSize = (long)elementaryStream.length;
CMTime presentationtime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer);
vidplayer_stream(bytePtr, maxSize, (long)presentationtime.value);
}

Muxing AAC audio and h.264 video streams to mp4 with AVFoundation

For OSX and IOS, I have streams of real time encoded video (h.264) and audio (AAC) data coming in, and I want to be able to mux these together into an mp4.
I'm using an AVAssetWriterto perform the muxing.
I have video working, but my audio still sounds like jumbled static. Here's what I'm trying right now (skipping some of the error checks here for brevity):
I initialize the writer:
NSURL *url = [NSURL fileURLWithPath:mContext->filename];
NSError* err = nil;
mContext->writer = [AVAssetWriter assetWriterWithURL:url fileType:AVFileTypeMPEG4 error:&err];
I initialize the audio input:
NSDictionary* settings;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
settings = nil; // set output to nil so it becomes a pass-through
CMAudioFormatDescriptionRef audioFormatDesc = nil;
{
AudioStreamBasicDescription absd = {0};
absd.mSampleRate = mParameters.audioSampleRate; //known sample rate
absd.mFormatID = kAudioFormatMPEG4AAC;
absd.mFormatFlags = kMPEG4Object_AAC_Main;
CMAudioFormatDescriptionCreate(NULL, &absd, 0, NULL, 0, NULL, NULL, &audioFormatDesc);
}
mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:settings sourceFormatHint:audioFormatDesc];
mContext->aacWriterInput.expectsMediaDataInRealTime = YES;
[mContext->writer addInput:mContext->aacWriterInput];
And start the writer:
[mContext->writer startWriting];
[mContext->writer startSessionAtSourceTime:kCMTimeZero];
Then, I have a callback where I receive a packet with a timestamp (milliseconds), and a std::vector<uint8_t> with the data containing 1024 compressed samples. I make sure isReadyForMoreMediaData is true. Then, if this is our first time receiving the callback, I set up the CMAudioFormatDescription:
OSStatus error = 0;
AudioStreamBasicDescription streamDesc = {0};
streamDesc.mSampleRate = mParameters.audioSampleRate;
streamDesc.mFormatID = kAudioFormatMPEG4AAC;
streamDesc.mFormatFlags = kMPEG4Object_AAC_Main;
streamDesc.mChannelsPerFrame = 2; // always stereo for us
streamDesc.mBitsPerChannel = 0;
streamDesc.mBytesPerFrame = 0;
streamDesc.mFramesPerPacket = 1024; // Our AAC packets contain 1024 samples per frame
streamDesc.mBytesPerPacket = 0;
streamDesc.mReserved = 0;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
error = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &streamDesc, sizeof(acl), &acl, 0, NULL, NULL, &mContext->audioFormat);
And finally, I create a CMSampleBufferRef and send it along:
CMSampleBufferRef buffer = NULL;
CMBlockBufferRef blockBuffer;
CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault, NULL, packet.data.size(), kCFAllocatorDefault, NULL, 0, packet.data.size(), kCMBlockBufferAssureMemoryNowFlag, &blockBuffer);
CMBlockBufferReplaceDataBytes((void*)packet.data.data(), blockBuffer, 0, packet.data.size());
CMTime duration = CMTimeMake(1024, mParameters.audioSampleRate);
CMTime pts = CMTimeMake(packet.timestamp, 1000);
CMSampleTimingInfo timing = {duration , pts, kCMTimeInvalid };
size_t sampleSizeArray[1] = {packet.data.size()};
error = CMSampleBufferCreate(kCFAllocatorDefault, blockBuffer, true, NULL, nullptr, mContext->audioFormat, 1, 1, &timing, 1, sampleSizeArray, &buffer);
// First input buffer must have an appropriate kCMSampleBufferAttachmentKey_TrimDurationAtStart since the codec has encoder delay'
if (mContext->firstAudioFrame)
{
CFDictionaryRef dict = NULL;
dict = CMTimeCopyAsDictionary(CMTimeMake(1024, 44100), kCFAllocatorDefault);
CMSetAttachment(buffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, dict, kCMAttachmentMode_ShouldNotPropagate);
// we must trim the start time on first audio frame...
mContext->firstAudioFrame = false;
}
CMSampleBufferMakeDataReady(buffer);
BOOL ret = [mContext->aacWriterInput appendSampleBuffer:buffer];
I guess the part I'm most suspicious of is my call to CMSampleBufferCreate. It seems I have to pass in a sample sizes array, otherwise I get this error message immediately when checking my writer's status:
Error Domain=AVFoundationErrorDomain Code=-11800 "The operation could not be completed" UserInfo={NSLocalizedFailureReason=An unknown error occurred (-12735), NSLocalizedDescription=The operation could not be completed, NSUnderlyingError=0x604001e50770 {Error Domain=NSOSStatusErrorDomain Code=-12735 "(null)"}}
Where the underlying error appears to be kCMSampleBufferError_BufferHasNoSampleSizes.
I did notice an example in Apple's documentation for creating the buffer with AAC data:
https://developer.apple.com/documentation/coremedia/1489723-cmsamplebuffercreate?language=objc
In their example, they specify a long sampleSizeArray with an entry for every single sample. Is that necessary? I don't have that information with this callback. And in our Windows implementation we didn't need that data. So I tried sending in packet.data.size() as the sample size but that doesn't seem right and it certainly doesn't produce pleasant audio.
Any ideas? Either tweaks to my calls here or different APIs I should be using to mux together streams of encoded data.
Thanks!
If you don't want to transcode, do not pass the outputSetting dictionary. You should pass nil there:
mContext->aacWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:nil sourceFormatHint:audioFormatDesc];
It is explained somewhere in this article:
https://developer.apple.com/library/archive/documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/05_Export.html

Using CMSampleTimingInfo, CMSampleBuffer and AudioBufferList from raw PCM stream

I'm receiving a raw PCM stream from Google's WebRTC C++ reference implementation (a hook inserted into VoEBaseImpl::GetPlayoutData). The audio appears to be linear PCM, signed int16, but when recording this using an AssetWriter it saves to the audio file highly distorted and higher pitch.
I am assuming this is an error somewhere with the input parameters, most probably with respect to the conversion of the stereo-int16 to an AudioBufferList and then on to a CMSampleBuffer. Is there any issue with the below code?
void RecorderImpl::RenderAudioFrame(void* audio_data, size_t number_of_frames, int sample_rate, int64_t elapsed_time_ms, int64_t ntp_time_ms) {
OSStatus status;
AudioChannelLayout acl;
bzero(&acl, sizeof(acl));
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
AudioStreamBasicDescription audioFormat;
audioFormat.mSampleRate = sample_rate;
audioFormat.mFormatID = kAudioFormatLinearPCM;
audioFormat.mFormatFlags = kLinearPCMFormatFlagIsSignedInteger | kLinearPCMFormatFlagIsPacked;
audioFormat.mFramesPerPacket = 1;
audioFormat.mChannelsPerFrame = 2;
audioFormat.mBitsPerChannel = 16;
audioFormat.mBytesPerPacket = audioFormat.mFramesPerPacket * audioFormat.mChannelsPerFrame * audioFormat.mBitsPerChannel / 8;
audioFormat.mBytesPerFrame = audioFormat.mBytesPerPacket / audioFormat.mFramesPerPacket;
CMSampleTimingInfo timing = { CMTimeMake(1, sample_rate), CMTimeMake(elapsed_time_ms, 1000), kCMTimeInvalid };
CMFormatDescriptionRef format = NULL;
status = CMAudioFormatDescriptionCreate(kCFAllocatorDefault, &audioFormat, sizeof(acl), &acl, 0, NULL, NULL, &format);
if(status != 0) {
NSLog(#"Failed to create audio format description");
return;
}
CMSampleBufferRef buffer;
status = CMSampleBufferCreate(kCFAllocatorDefault, NULL, false, NULL, NULL, format, (CMItemCount)number_of_frames, 1, &timing, 0, NULL, &buffer);
if(status != 0) {
NSLog(#"Failed to allocate sample buffer");
return;
}
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0].mNumberChannels = audioFormat.mChannelsPerFrame;
bufferList.mBuffers[0].mDataByteSize = (UInt32)(number_of_frames * audioFormat.mBytesPerFrame);
bufferList.mBuffers[0].mData = audio_data;
status = CMSampleBufferSetDataBufferFromAudioBufferList(buffer, kCFAllocatorDefault, kCFAllocatorDefault, 0, &bufferList);
if(status != 0) {
NSLog(#"Failed to convert audio buffer list into sample buffer");
return;
}
[recorder writeAudioFrames:buffer];
CFRelease(buffer);
}
For reference, the sample rate I'm receiving from WebRTC on an iPhone 6S+ / iOS 9.2 is 48kHz with 480 samples per invocation of this hook and I'm receiving data every 10 ms.
First of all, congratulations on having the temerity to create an audio CMSampleBuffer from scratch. For most, they are neither created nor destroyed, but handed down immaculate and mysterious from CoreMedia and AVFoundation.
The presentationTimeStamps in your timing info are in integral milliseconds, which cannot represent your 48kHz samples' positions in time.
Instead of CMTimeMake(elapsed_time_ms, 1000), try CMTimeMake(elapsed_frames, sample_rate), where elapsed_frames are the number of frames that you have previously written.
That would explain the distortion, but not the pitch, so make sure that the AudioStreamBasicDescription matches your AVAssetWriterInput setup. It's hard to say without seeing your AVAssetWriter code.
p.s Look out for writeAudioFrames - if it's asynchronous, you'll have problems with ownership of the audio_data.
p.p.s. it looks like you're leaking the CMFormatDescriptionRef.
I ended up opening up the audio file that was generated in Audacity and saw that every frame had half of it dropped, as shown in this rather bizarre looking waveform:
Changing acl.mChannelLayoutTag to kAudioChannelLayoutTag_Mono and changing audioFormat.mChannelsPerFrame to 1 solved the issue and now the audio quality is perfect. Hooray!

Decoding H264 VideoToolkit API fails with Error -12911 in VTDecompressionSessionDecodeFrame

I'm trying to decode a raw stream of .H264 video data but I can't find a way to create a proper
- (void)decodeFrameWithNSData:(NSData*)data presentationTime:
(CMTime)presentationTime
{
#autoreleasepool {
CMSampleBufferRef sampleBuffer = NULL;
CMBlockBufferRef blockBuffer = NULL;
VTDecodeInfoFlags infoFlags;
int sourceFrame;
if( dSessionRef == NULL )
[self createDecompressionSession];
CMSampleTimingInfo timingInfo ;
timingInfo.presentationTimeStamp = presentationTime;
timingInfo.duration = CMTimeMake(1,100000000);
timingInfo.decodeTimeStamp = kCMTimeInvalid;
//Creates block buffer from NSData
OSStatus status = CMBlockBufferCreateWithMemoryBlock(CFAllocatorGetDefault(), (void*)data.bytes,data.length*sizeof(char), CFAllocatorGetDefault(), NULL, 0, data.length*sizeof(char), 0, &blockBuffer);
//Creates CMSampleBuffer to feed decompression session
status = CMSampleBufferCreateReady(CFAllocatorGetDefault(), blockBuffer,self.encoderVideoFormat,1,1,&timingInfo, 0, 0, &sampleBuffer);
status = VTDecompressionSessionDecodeFrame(dSessionRef,sampleBuffer, kVTDecodeFrame_1xRealTimePlayback, &sourceFrame,&infoFlags);
if(status != noErr) {
NSLog(#"Decode with data error %d",status);
}
}
}
At the end of the call I'm getting -12911 error in VTDecompressionSessionDecodeFrame that translates to kVTVideoDecoderMalfunctionErr which after reading this [post] pointed me that I should make a VideoFormatDescriptor using CMVideoFormatDescriptionCreateFromH264ParameterSets. But how can I create a new VideoFormatDescription if I don't have information of the currentSps or currentPps? How can I get that information from my raw .H264 streaming?
CMFormatDescriptionRef decoderFormatDescription;
const uint8_t* const parameterSetPointers[2] =
{ (const uint8_t*)[currentSps bytes], (const uint8_t*)[currentPps bytes] };
const size_t parameterSetSizes[2] =
{ [currentSps length], [currentPps length] };
status = CMVideoFormatDescriptionCreateFromH264ParameterSets(NULL,
2,
parameterSetPointers,
parameterSetSizes,
4,
&decoderFormatDescription);
Thanks in advance,
Marcos
[post] : Decoding H264 VideoToolkit API fails with Error -8971 in VTDecompressionSessionCreate
You you MUST call CMVideoFormatDescriptionCreateFromH264ParameterSets first. The SPS/PPS may be stored/transmitted separately from the video stream. Or may come inline.
Note that for VTDecompressionSessionDecodeFrame your NALUs must be preceded with a size, and not a start code.
You can read more here:
Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream

Decoding H264 VideoToolkit API fails with Error -8971 in VTDecompressionSessionCreate

I am trying to write a Video decoder using the Hardware supported Video Toolkit Decoder. But if I try to initialize the decoding session like in the example posted below, I get the error -8971 while calling VTDecompressionSessionCreate. Can anyone tell me what I am doing wrong here?
Thank you and best regards,
Oliver
OSStatus status;
int tmpWidth = sps.EncodedWidth();
int tmpHeight = sps.EncodedHeight();
NSLog(#"Got new Width and Height from SPS - %dx%d", tmpWidth, tmpHeight);
const VTDecompressionOutputCallbackRecord callback = { ReceivedDecompressedFrame, self };
status = CMVideoFormatDescriptionCreate(NULL,
kCMVideoCodecType_H264,
tmpWidth,
tmpHeight,
NULL,
&decoderFormatDescription);
if (status == noErr)
{
// Set the pixel attributes for the destination buffer
CFMutableDictionaryRef destinationPixelBufferAttributes = CFDictionaryCreateMutable(
NULL, // CFAllocatorRef allocator
0, // CFIndex capacity
&kCFTypeDictionaryKeyCallBacks,
&kCFTypeDictionaryValueCallBacks);
SInt32 destinationPixelType = kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange;
CFDictionarySetValue(destinationPixelBufferAttributes,kCVPixelBufferPixelFormatTypeKey, CFNumberCreate(NULL, kCFNumberSInt32Type, &destinationPixelType));
CFDictionarySetValue(destinationPixelBufferAttributes,kCVPixelBufferWidthKey, CFNumberCreate(NULL, kCFNumberSInt32Type, &tmpWidth));
CFDictionarySetValue(destinationPixelBufferAttributes, kCVPixelBufferHeightKey, CFNumberCreate(NULL, kCFNumberSInt32Type, &tmpHeight));
CFDictionarySetValue(destinationPixelBufferAttributes, kCVPixelBufferOpenGLCompatibilityKey, kCFBooleanTrue);
// Set the Decoder Parameters
CFMutableDictionaryRef decoderParameters = CFDictionaryCreateMutable(
NULL, // CFAllocatorRef allocator
0, // CFIndex capacity
&kCFTypeDictionaryKeyCallBacks,
&kCFTypeDictionaryValueCallBacks);
CFDictionarySetValue(decoderParameters,kVTDecompressionPropertyKey_RealTime, kCFBooleanTrue);
// Create the decompression session
// Throws Error -8971 (codecExtensionNotFoundErr)
status = VTDecompressionSessionCreate(NULL, decoderFormatDescription, decoderParameters, destinationPixelBufferAttributes, &callback, &decoderDecompressionSession);
// release the dictionaries
CFRelease(destinationPixelBufferAttributes);
CFRelease(decoderParameters);
// Check the Status
if(status != noErr)
{
NSLog(#"Error %d while creating Video Decompression Session.", (int)status);
continue;
}
}
else
{
NSLog(#"Error %d while creating Video Format Descripttion.", (int)status);
continue;
}
I also stumbled with kVTVideoDecoderBadDataErr. In my case I was changing the header 0x00000001 with the size of the NAL package which included the 4 bytes of this header, that was the reason. I changed the size to not include these 4 bytes (frame_size = sizeof(NAL) - 4). This size should be encoded in big-endian.
You need to create the CMFormatDescriptionRef from your SPS and PPS like
CMFormatDescriptionRef decoderFormatDescription;
const uint8_t* const parameterSetPointers[2] = { (const uint8_t*)[currentSps bytes], (const uint8_t*)[currentPps bytes] };
const size_t parameterSetSizes[2] = { [currentSps length], [currentPps length] };
status = CMVideoFormatDescriptionCreateFromH264ParameterSets(NULL,
2,
parameterSetPointers,
parameterSetSizes,
4,
&decoderFormatDescription);
Also if you are getting your Video Data in Annex-B format you need to remove the start code and replace it with the 4-Byte size information for the decoder to recognize it as avcc formated (Thats what the 5th parameter to CMVideoFormatDescriptionCreateFromH264ParameterSets is for).
#Joride
refer to http://www.szatmary.org/blog/25
It explains that the header (first) byte of each buffer within a NALU describes the buffer's type. You need to mask off these bits and compare them to the table provided. Note the comment about the bit fields. You need to mask the byte with 0x1f to the type value.

Resources