CMSampleBufferRef pool to write H.264 AVCC stream - ios

I'm using AVAssetWriter/AVAssetWriterInput to write H.264 raw data to an MP4 file. As I'm receiving the data from a remote server, I use the following CoreMedia APIs to get a sample buffer (CMSampleBufferRef) containing the H.264 data in AVCC format that is in turned appended to an MP4 file by sending to an AVAssetWriterInput the message (BOOL)appendSampleBuffer:(CMSampleBufferRef)sampleBuffer:
CMBlockBufferCreateWithMemoryBlock to create a memory block
CMBlockBufferReplaceDataBytes to write the H.264 in AVCC format to the memory block
CMSampleBufferCreate to create a sample buffer with the memory block and a format descriptor containing the H.264 "extradata"
Everything works as expected, the only problem with this approach is that I'm periodically calling the above APIs and what I would really like is instead to be able to reuse the resources allocated - in particular CMSampleBufferRef and CMBlockBufferRef. Basically, I would like to have a pool of CMSampleBuffer's and be able to update its memory content and format descriptor as I'm receiving new H.264 data from the remote server.
I know that exists AVAssetWriterInputPixelBufferAdaptorthat gives access to a CVPixelBufferPool but, I can't use it in my case because as far as I know, to properly instantiate a pixel buffer adaptor, at minimum I need to be able to pass the video frame dimensions which I would' know until I parse the stream. Further, I don't know how to write the H.264 "extradata" with a CVPixelBuffer. So, I'm thinking that I need to stick with CMSampleBuffer. Unfortunately, it seems that CoreMedia APIs don't offer the possibility to update the memory block nor the format descriptor of a sample buffer once created (as far as I can tell, I only have access to immutable references of those objects). Thus, the best I can do so far is to reuse the memory block CMBlockBufferRef but I'm still recreating the sample buffer. My code is below. Hopefully someone here will have some ideas on how to implement a pool of CMSampleBuffer's or perhaps a more efficient way to write H.264 AVCC stream to MP4?
- (CMSampleBufferRef)sampleBufferWithData:(NSData*)data formatDescriptor:(CMFormatDescriptionRef)formatDescription
{
OSStatus result;
CMSampleBufferRef sampleBuffer = NULL;
// _blockBuffer is a CMBlockBufferRef instance variable
if (!_blockBuffer)
{
size_t blockLength = MAX_LENGTH;
result = CMBlockBufferCreateWithMemoryBlock(kCFAllocatorDefault,
NULL,
blockLength,
kCFAllocatorDefault,
NULL,
0,
blockLength,
kCMBlockBufferAssureMemoryNowFlag,
&_blockBuffer);
// check error
}
result = CMBlockBufferReplaceDataBytes([data bytes], _blockBuffer, 0, [data length]);
// check error
const size_t sampleSizes = [data length];
CMSampleTimingInfo timing = [self sampleTimingInfo];
result = CMSampleBufferCreate(kCFAllocatorDefault,
_blockBuffer,
YES,
NULL,
NULL,
formatDescription,
1,
1,
&timing,
1,
&sampleSizes,
&sampleBuffer);
// check error
return sampleBuffer;
}

If you are receiving raw H.264 data, then there is not much do to and no need to deal with CoreMedia at all.
Buffer all VCL NAL units until you get SPS/PPS NAL units. Create the extradata from them, then just append all buffered and new VCL NAL units to the file. In case you are received the NAL units in Annex B format you need to convert them to AVCC format (basically replacing the start code with a length code)
You only need to work with 'CMSampleBuffer' if you want to decode uncompressed pictures or if you want to decode compressed pictures. As you are already working with a raw H.264 stream and just want to write it into an MP4 file, just do so. No need to touch CoreMedia at all here.
Regarding CoreMedia: you wrap your video information in a CMBlockBuffer. This buffers together with a CMVideoFormatDescriptor (generated from SPS/PPS) plus CMTime make up a CMSampleBuffer. And multiple CMSampleBuffers make up a 'CMSampleBufferPool'.
'CVPixelBuffer' and 'CVPixelBufferPool' are not involved. These are either the input or output of a 'VTCompressionSession' or "VTDecompressionSession' when dealing with encoding/decoding h.264 video.
As said in your case, no need to touch any of the core framworks at all as you are just creating a file.
An overview about Annex B and AVCC stream format can be found here: Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream

Related

How do I write AVAudioCompressedBuffer to an AAC file in iOS?

I've successfully converted the AVAudioPCMBuffers from AVAudioEngine into AVAudioCompressedBuffers with an AAC format. Now I'm trying to write those buffers to a file but don't know how to do it. AVAudioFile only accepts AVAudioPCMBuffers.
Any help would be greatly appreciated!
The easiest methods are to write to AVAudioFile before converting to compressed, or to convert back to PCM buffer and write to AVAudioFile.
If the easy methods are not an option, I believe you are stuck using Audio File Services. You would use the AVAudioCompressedBuffer's AudioBufferList property's data pointer as a the inBuffer argument to AudioFileWriteBytes. Interacting with the C API can get ugly fast, but it's the only way to write straight data to an audio file, short of doing it manually.
let data = myAVAudioBufferlist.audioBufferList[0].mBuffers.mData
The easiest way I found was to convert the uncompressed AVAudioPCMBuffer to CMSampleBuffer and then use AVAssetWriter to write the audio samples to file. The asset writer can handle format compression to AAC.
I haven't tried this, but you can also probably use the audio buffer list from AVAudioCompressedBuffer and construct CMSampleBuffers.

How to save just raw PCM to file with iOS SDK (Core Audio)?

I'm converting an MP3 file into raw PCM, and I need to save it as just raw PCM. (Note, am using Java/RoboVM to port to iOS.)
I'm using the coreaudio package, and the relevant part of my code looks like this:
// Define the output PCM format.
AudioStreamBasicDescription outputFormat = new AudioStreamBasicDescription();
outputFormat.setFormat(AudioFormat.LinearPCM);
outputFormat.setFormatFlags(AudioFormatFlags.Canonical);
outputFormat.setBitsPerChannel(16);
outputFormat.setChannelsPerFrame(1);
outputFormat.setFramesPerPacket(1);
outputFormat.setBytesPerFrame(2);
outputFormat.setBytesPerPacket(2);
outputFormat.setSampleRate(22050);
// ...
outputFile = ExtAudioFile.create(outputFileURL, AudioFileType.CAF, outputFormat, null, AudioFileFlags.EraseFile);
I then run through a loop, reading from the MP3 file and writing to the output file.
Upon importing this raw file into Audacity, I notice it always has a spike at the start, indicating that it's not actually a raw PCM file but instead is inside of a wrapper with a header (whether it be WAV or CAF headers, etc).
I understand I can just take the file and strip the header off and get the raw PCM data, but in terms of space/performance of this part of my app, I'd love if I can just keep it simple and save the raw PCM data as-is without a wrapper, but I don't know how to go about doing that.
The issue arises here:
outputFile = ExtAudioFile.create(outputFileURL, AudioFileType.CAF, outputFormat, null, AudioFileFlags.EraseFile);
There aren't many choices for AudioFileType, I've tried WAVE and CAF. Ideally there would be a PCM or RAW option but there's not. Is there a specific AudioFileType I should choose, or do I need to go about this another way?
The extended audio file services framework doesn't support a "raw" PCM format.
For an application to understand a PCM format it needs to know data stuff like:
How many channels are there
Are they interleaved or not
What is the sample rate
Is the data floating point or not
What is the bit depth
etc...
In fact, on iOS and OS X the AudioStreamBasicDescription is a struct which tells you what is required to interpret a PCM stream. For this reason, a "raw PCM" format doesn't really work, it needs at least some metadata. The closest formats to raw PCM are WAV, AIFF and CAF. If these don't serve your purposes you'll have to create a custom file format. But this doesn't need to be difficult.
The extended audio file services APIs are quite configurable. After opening an audio file to read (ExtAudioFileOpenURL) you can set various properties on the ExtAudioFileRef handle.
In your case consider setting kExtAudioFileProperty_ClientDataFormat. This property controls the format of the PCM data read from the file. As ExtAudioFileRead decodes the input file, it will convert the data it sends back to the format you specify. There are some limitations to this method. IIRC, it does not support doing sample rate conversion and things like that.
As you read the properly decoded data, you can then use something like NSOutputStream to write the "raw PCM" format of your choice directly to a file with no metadata at all.

How to decode a live555 rtsp stream (h.264) MediaSink data using iOS8's VideoToolbox?

Ok, I know that this question is almost the same as get-rtsp-stream-from-live555-and-decode-with-avfoundation, but now VideoToolbox for iOS8 became public for use and although I know that it can be done using this framework, I have no idea of how to do this.
My goals are:
Connect with a WiFiCamera using rtsp protocol and receive stream data (Done with live555)
Decode the data and convert to UIImages to display on the screen (motionJPEG like)
And save the streamed data on a .mov file
I reached all this goals using ffmpeg, but unfortunately I can't use it due to my company's policy.
I know that I can display on the screen using openGL too, but this time I have to convert to UIImages. I also tried to use the libraries below:
ffmpeg: can't use this time due to company's policy. (don't ask me why)
libVLC: display lags about 2secs and I don't have access to stream data to save into a .mov file...
gstreamer: same as above
I believe that live555 + VideoToolbox will do the job, just can't figure out how to do this happen ...
I did it. VideoToolbox is still poor documented and we have no much information about video programming (without using ffmpeg) so it cost me more time than I really expected.
For stream using live555, I got the SPS and PPS info to create the CMVideoFormatDescription like this:
const uint8_t *props[] = {[spsData bytes], [ppsData bytes]};
size_t sizes[] = {[spsData length], [ppsData length]};
OSStatus result = CMVideoFormatDescriptionCreateFromH264ParameterSets(NULL, 2, props, sizes, 4, &videoFormat);
Now, the difficult part (because I'm noob on video programming): Replace the NALunit header with a 4 byte length code as described here
int headerEnd = 23; //where the real data starts
uint32_t hSize = (uint32_t)([rawData length] - headerEnd - 4);
uint32_t bigEndianSize = CFSwapInt32HostToBig(hSize);
NSMutableData *videoData = [NSMutableData dataWithBytes:&bigEndianSize length:sizeof(bigEndianSize)];
[videoData appendData:[rawData subdataWithRange:NSMakeRange(headerEnd + 4, [rawData length] - headerEnd - 4)]];
Now I was able to create a CMBlockBuffer successfully using this raw data and pass the buffer to VTDecompressionSessionDecodeFrame. From here is easy to convert the response CVImageBufferRef to UIImage... I used this stack overflow thread as reference.
And finally, save the stream data converted on UIImage following the explanation described on How do I export UIImage array as a movie?
I just posted a little bit of my code because I believe this is the important part, or in other words, it is where I was having problems.

How to decode a H.264 frame on iOS by hardware decoding?

I have been used ffmpeg to decode every single frame that I received from my ip cam. The brief code looks like this:
-(void) decodeFrame:(unsigned char *)frameData frameSize:(int)frameSize{
AVFrame frame;
AVPicture picture;
AVPacket pkt;
AVCodecContext *context;
pkt.data = frameData;
pat.size = frameSize;
avcodec_get_frame_defaults(&frame);
avpicture_alloc(&picture, PIX_FMT_RGB24, targetWidth, targetHeight);
avcodec_decode_video2(&context, &frame, &got_picture, &pkt);
}
The code woks fine, but it's software decoding. I want to enhance the decoding performance by hardware decoding. After lots of research, I know it may be achieved by AVFoundation framework.
The AVAssetReader class may help, but I can't figure out what's the next.Could anyone points out the following steps for me? Any help would be appreciated.
iOS does not provide any public access directly to the hardware decode engine, because hardware is always used to decode H.264 video on iOS.
Therefore, session 513 gives you all the information you need to allow frame-by-frame decoding on iOS. In short, per that session:
Generate individual network abstraction layer units (NALUs) from your H.264 elementary stream. There is much information on how this is done online. VCL NALUs (IDR and non-IDR) contain your video data and are to be fed into the decoder.
Re-package those NALUs according to the "AVCC" format, removing NALU start codes and replacing them with a 4-byte NALU length header.
Create a CMVideoFormatDescriptionRef from your SPS and PPS NALUs via CMVideoFormatDescriptionCreateFromH264ParameterSets()
Package NALU frames as CMSampleBuffers per session 513.
Create a VTDecompressionSessionRef, and feed VTDecompressionSessionDecodeFrame() with the sample buffers
Alternatively, use AVSampleBufferDisplayLayer, whose -enqueueSampleBuffer: method obviates the need to create your own decoder.
Edit:
This link provide more detail explanation on how to decode h.264 step by step: stackoverflow.com/a/29525001/3156169
Original answer:
I watched the session 513 "Direct Access to Video Encoding and Decoding" in WWDC 2014 yesterday, and got the answer of my own question.
The speaker says:
We have Video Toolbox(in iOS 8). Video Toolbox has been there on
OS X for a while, but now it's finally populated with headers on
iOS.This provides direct access to encoders and decoders.
So, there is no way to do hardware decoding frame by frame in iOS 7, but it can be done in iOS 8.
Is there anyone figure out how to directly access to video encoding and decoding frame by frame in iOS 8?

How to play audio buffers using `AVCapturesession` in `didOutputSampleBuffer`

I have been trying to play audio which is received as raw data in didOutputSampleBuffer delegate. What should be the proper way to process the raw data?
Look at the following sample code from Apple: AVCaptureTtoAudioUnitOSX
There you can see how to properly process the raw audio data and pass it to the AudioUnit.
The basic principle is as follows:
Get the SampleBuffer's AudioStreamBasicDescription for info on format
First get the CMFormatDescriptionRef with CMSampleBufferGetFormatDescription
Then get the AudioStreamBasicDescription with CMAudioFormatDescriptionGetStreamBasicDescription
Now you can get info on sample rate, bits per channel, channels per frame and frames per packet
Get the AudioBufferList with the actual audio data
Either use CoreAudio's Public Utility or check this mailing list entry for a correct way of doing so
The function is called CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer. Third parameter is the bufferListOut which is the AudioBufferList you want and will pass on to work with e.g. the AudioUnit or whatever your need is.
Getting the actual raw data
The AudioBufferList contains AudioBuffers each of which contain the data
struct AudioBuffer {
UInt32 mNumberChannels;
UInt32 mDataByteSize;
void *mData;
};
This should get you going. Look at the sample code from Apple for more info.

Resources