Importance of AVAssetWriterInputPixelBufferAdaptor in AVAssetWriter - ios

I'm trying to output video captured from the camera using AVAssetWriter.
I'm following some examples that don't use AVAssetWriterInputPixelBufferAdaptor (Record video with AVAssetWriter), and some that do (AVCaptureSession only got video buffer).
Based on the Apple references, I've interpreted the purpose of AVAssetWriterInputPixelBufferAdaptor (or CVPixelBuffer, CVPixelBufferPool) in general to be an efficient way to buffer incoming pixels in memory. In practice, how important is it to use this when writing video output using AVAssetWriter? I seem to be able to get a basic version working without using the adaptor just fine, but I wanted to understand a bit more the benefit/intent of using AVAssetWriterInputPixelBufferAdaptor in general.

I have been using video recording without the PixelBufferAdaptor for several years without any problems. I essentially use this code:
- (void)captureOutput:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection{
if (videoWriterInput.readyForMoreMediaData) {
[videoWriterInput appendSampleBuffer:sampleBuffer];
My take is that since the CMSampleBufferRef contains timing information it can be written directly. Whereas if you have a CVPixelBuffer you must add the timing information through the adaptor. So if you are doing some image processing before writing you will end up with a CVPixelBuffer and have to use the adaptor. The adaptor might also add some buffering capabilities for the CVPixelBuffer if your processing takes time.


React native: Real time camera data without image save and preview

I started working on my first non-demo react-native app. I hope it will be a iOS/Android app, but actually I'm focused on iOS only.
I have a one problem actually. How can I get a data (base64, array of pixels, ...) in real-time from the camera without saving to the camera roll.
There is this module: but base64 is deprecated and is useless for me, because I want a render processed image to user (change picture colors eg.), not the real picture from camera, as it does react-native-camera module.
(I know how to communicate with SWIFT code, but I don't know what the options are in native code, I come here from WebDev)
Thanks a lot.
This may not be optimal but is what I have been using. If anyone can give a better solution, I would appreciate your help, too!
My basic idea is simply to loop (but not simple for-loop, see below) taking still pictures in yuv/rgb format at max resolution, which is reasonably fast (~x0ms with normal exposure duration) and process them. Basically you will setup AVCaptureStillImageOutput that links to you camera (following tutorials everywhere) then set the format to kCVPixelFormatType_420YpCbCr8BiPlanarFullRange (if you want YUV) or kCVPixelFormatType_32BGRA(if you prefer rgba) like
bool usingYUVFormat = true;
NSDictionary *outputFormat = [NSDictionary dictionaryWithObject:
[NSNumber numberWithInt:usingYUVFormat?kCVPixelFormatType_420YpCbCr8BiPlanarFullRange:kCVPixelFormatType_32BGRA]
[yourAVCaptureStillImageOutput setOutputSettings:outputFormat];
When you are ready, you can start calling
AVCaptureConnection *captureConnection=[yourAVCaptureStillImageOutput connectionWithMediaType:AVMediaTypeVideo];
[yourAVCaptureStillImageOutput captureStillImageAsynchronouslyFromConnection:captureConnection completionHandler:^(CMSampleBufferRef imageDataSampleBuffer, NSError *error) {
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(imageDataSampleBuffer);
CVPixelBufferLockBaseAddress(imageBuffer, 0);
// do your magic with the data buffer imageBuffer
// use CVPixelBufferGetBaseAddressOfPlane(imageBuffer,0/1/2); to get each plane
// use CVPixelBufferGetWidth/CVPixelBufferGetHeight to get dimensions
// if you want more, please google
Additionally, use NSNotificationCenter to register your photo-taking action and post a notification after you have processed each frame (with some delay perhaps, to cap your through-put and reduce power consumption) so the loop will keep going.
A quick precaution: the Android counterpart is much worse a headache. Few hardware manufacturers implement api for max-resolution uncompressed photos but only 1080p for preview/video, as I have raised in my question. I am still looking for solutions but gave up most hope. JPEG images are just toooo slow.

Muxing compressed frames from VTCompressionSession with audio data into an MPEG2-TS container for network streaming

I'm working on a project that involves grabbing H.264 encoded frames from VTCompressionSession in iOS8, muxing them with live AAC or PCM audio from the microphone into a playable MPEG2-TS and streaming that over a socket in real time with minimum delay (i.e: (almost) no buffering).
After watching the presentation for the new VideoToolbox in iOS8 and doing some research I guess it's safe to assume that:
The encoded frames you get from VTCompressionSession are not in Annex B format, so I need to convert them somehow (All of the explanations I've seen so far are too vague, so I'm not really sure on how you do this (i.e: Replace the "3 or 4 byte header with a length header")).
The encoded frames you get from VTCompressionSession are actually an Elementary Stream. So first I would need to turn them into a Packetized Elementary Stream before it can be muxed.
I would also need an AAC or PCM elementary stream from the microphone data (I presume PCM would be easier since no encoding is involved). Which I don't know how to do either.
In order to mux the Packetized Elementary Streams I would also need some library like libmpegts. Or perhaps ffmpeg (by using libavcodec and libavformat libraries).
I'm pretty new to this. Can I get some advice on what would be the right approach to achieve this?.
Is there an easier way to implement this using Apple APIs (like AVFoundation)?
Is there any similar project I can take as a reference?
Thanks in advance!
In order to mux the Packetized Elementary Streams I would also need some library like libmpegts. Or perhaps ffmpeg (by using libavcodec and libavformat libraries).
From what I can gather, there is no way to mux TS with AVFoundation or related frameworks. While it seems like something one can do manually, I'm trying to use the Bento4 library to accomplish the same task as you. I'm guessing libmpegts, ffmpeg, GPAC, libav, or any other library like that would work too, but I didn't like their APIs.
Basically, I'm following Mp42Ts.cpp, ignoring the Mp4 parts and just looking at the Ts writing parts.
This StackOverflow question has all the outline of how to feed it video, and implementation of how to feed it audio. If you have any questions, ping me with a more specific question.
I hope this provides a good starting point for you, though.
I would also need an AAC or PCM elementary stream from the microphone data (I presume PCM would be easier since no encoding is involved). Which I don't know how to do either.
Getting the microphone data as AAC is very straightforward. Something like this:
AVCaptureDevice *microphone = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
_audioInput = [AVCaptureDeviceInput deviceInputWithDevice:microphone error:&error];
if (_audioInput == nil) {
NSLog(#"Couldn't open microphone %#: %#", microphone, error);
return NO;
_audioProcessingQueue = dispatch_queue_create("audio processing queue", DISPATCH_QUEUE_SERIAL);
_audioOutput = [[AVCaptureAudioDataOutput alloc] init];
[_audioOutput setSampleBufferDelegate:self queue:_audioProcessingQueue];
NSDictionary *audioOutputSettings = #{
AVFormatIDKey: #(kAudioFormatMPEG4AAC),
AVNumberOfChannelsKey: #(1),
AVSampleRateKey: #(44100.),
AVEncoderBitRateKey: #(64000),
_audioWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:audioOutputSettings];
_audioWriterInput.expectsMediaDataInRealTime = YES;
if(![_writer canAddInput:_audioWriterInput]) {
NSLog(#"Couldn't add audio input to writer");
return NO;
[_writer addInput:_audioWriterInput];
[_captureSession addInput:_audioInput];
[_captureSession addOutput:_audioOutput];
- (void)audioCapture:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
/// sampleBuffer contains encoded aac samples.
I'm guessing you're using an AVCaptureSession for your camera already; you can use the same capture session for the microphone.

CVPixelBufferRef as a GPU Texture

I have one (or possibly two) CVPixelBufferRef objects I am processing on the CPU, and then placing the results onto a final CVPixelBufferRef. I would like to do this processing on the GPU using GLSL instead because the CPU can barely keep up (these are frames of live video). I know this is possible "directly" (ie writing my own open gl code), but from the (absolutely impenetrable) sample code I've looked at it's an insane amount of work.
Two options seem to be:
1) GPUImage: This is an awesome library, but I'm a little unclear if I can do what I want easily. First thing I tried was requesting OpenGLES compatible pixel buffers using this code:
#{ (NSString *)kCVPixelBufferPixelFormatTypeKey : [NSNumber numberWithUnsignedInt:kCVPixelFormatType_32BGRA],
(NSString*)kCVPixelBufferOpenGLESCompatibilityKey : [NSNumber numberWithBool:YES]};
Then transferring data from the CVPixelBufferRef to GPUImageRawDataInput as follows:
// setup:
_foreground = [[GPUImageRawDataInput alloc] initWithBytes:nil size:CGSizeMake(0,0)pixelFormat:GPUPixelFormatBGRA type:GPUPixelTypeUByte];
// call for each frame:
[_foreground updateDataFromBytes:CVPixelBufferGetBaseAddress(foregroundPixelBuffer)
size:CGSizeMake(CVPixelBufferGetWidth(foregroundPixelBuffer), CVPixelBufferGetHeight(foregroundPixelBuffer))];
However, my CPU usage goes from 7% to 27% on an iPhone 5S just with that line (no processing or anything). This suggests there's some copying going on on the CPU, or something else is wrong. Am I missing something?
2) OpenFrameworks: OF is commonly used for this type of thing, and OF projects can be easily setup to use GLSL. However, two questions remain about this solution: 1. can I use openframeworks as a library, or do I have to rejigger my whole app just to use the OpenGL features? I don't see any tutorials or docs that show how I might do this without actually starting from scratch and creating an OF app. 2. is it possible to use CVPixelBufferRef as a texture.
I am targeting iOS 7+.
I was able to get this to work using the GPUImageMovie class. If you look inside this class, you'll see that there's a private method called:
- (void)processMovieFrame:(CVPixelBufferRef)movieFrame withSampleTime:(CMTime)currentSampleTime
This method takes a CVPixelBufferRef as input.
To access this method, declare a class extension that exposes it inside your class
#interface GPUImageMovie ()
-(void) processMovieFrame:(CVPixelBufferRef)movieFrame withSampleTime:(CMTime)currentSampleTime;
Then initialize the class, set up the filter, and pass it your video frame:
GPUImageMovie *gpuMovie = [[GPUImageMovie alloc] initWithAsset:nil]; // <- call initWithAsset even though there's no asset
// to initialize internal data structures
// connect filters...
// Call the method we exposed
[gpuMovie processMovieFrame:myCVPixelBufferRef withSampleTime:kCMTimeZero];
One thing: you need to request your pixel buffers with kCVPixelFormatType_420YpCbCr8BiPlanarFullRange in order to match what the library expects.

Efficient use of Core Image with AV Foundation

I'm writing an iOS app that applies filters to existing video files and outputs the results to new ones. Initially, I tried using Brad Larson's nice framework, GPUImage. Although I was able to output filtered video files without much effort, the output wasn't perfect: the videos were the proper length, but some frames were missing, and others were duplicated (see Issue 1501 for more info). I plan to learn more about OpenGL ES so that I can better investigate the dropped/skipped frames issue. However, in the meantime, I'm exploring other options for rendering my video files.
I'm already familiar with Core Image, so I decided to leverage it in an alternative video-filtering solution. Within a block passed to AVAssetWriterInput requestMediaDataWhenReadyOnQueue:usingBlock:, I filter and output each frame of the input video file like so:
CMSampleBufferRef sampleBuffer = [self.assetReaderVideoOutput copyNextSampleBuffer];
if (sampleBuffer != NULL)
CMTime presentationTimeStamp = CMSampleBufferGetOutputPresentationTimeStamp(sampleBuffer);
CVPixelBufferRef inputPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
CIImage* frame = [CIImage imageWithCVPixelBuffer:inputPixelBuffer];
// a CIFilter created outside the "isReadyForMoreMediaData" loop
[screenBlend setValue:frame forKey:kCIInputImageKey];
CVPixelBufferRef outputPixelBuffer;
CVReturn result = CVPixelBufferPoolCreatePixelBuffer(NULL, assetWriterInputPixelBufferAdaptor.pixelBufferPool, &outputPixelBuffer);
// verify that everything's gonna be ok
NSAssert(result == kCVReturnSuccess, #"CVPixelBufferPoolCreatePixelBuffer failed with error code");
NSAssert(CVPixelBufferGetPixelFormatType(outputPixelBuffer) == kCVPixelFormatType_32BGRA, #"Wrong pixel format");
[self.coreImageContext render:screenBlend.outputImage toCVPixelBuffer:outputPixelBuffer];
BOOL success = [assetWriterInputPixelBufferAdaptor appendPixelBuffer:outputPixelBuffer withPresentationTime:presentationTimeStamp];
sampleBuffer = NULL;
completedOrFailed = !success;
This works well: the rendering seems reasonably fast, and the resulting video file doesn't have any missing or duplicated frames. However, I'm not confident that my code is as efficient as it could be. Specifically, my questions are
Does this approach allow the device to keep all frame data on the GPU, or are there any methods (e.g. imageWithCVPixelBuffer: or render:toCVPixelBuffer:) that prematurely copy pixels to the CPU?
Would it be more efficient to use CIContext's drawImage:inRect:fromRect: to draw to an OpenGLES context?
If the answer to #2 is yes, what's the proper way to pipe the results of drawImage:inRect:fromRect: into a CVPixelBufferRef so that it can be appended to the output video file?
I've searched for an example of how to use CIContext drawImage:inRect:fromRect: to render filtered video frames, but haven't found any. Notably, the source for GPUImageMovieWriter does something similar, but since a) I don't really understand it yet, and b) it's not working quite right for this use case, I'm wary of copying its solution.

AVFoundation: Video to OpenGL texture working - How to play and sync audio?

I've managed to load a video-track of a movie frame by frame into an OpenGL texture with AVFoundation. I followed the steps described in the answer here: iOS4: how do I use video file as an OpenGL texture?
and took some code from the GLVideoFrame sample from WWDC2010 which can be downloaded here.
How do I play the audio-track of the movie synchronously to the video? I think it would not be a good idea to play it in a separate player, but to use the audio-track of the same AVAsset.
AVAssetTrack* audioTrack = [[asset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
I retrieve a videoframe and it's timestamp in the CADisplayLink-callback via
CMSampleBufferRef sampleBuffer = [self.readerOutput copyNextSampleBuffer];
CMTime timestamp = CMSampleBufferGetPresentationTimeStamp( sampleBuffer );
where readerOutput is of type AVAssetReaderTrackOutput*
How to get the corresponding audio-samples?
And how to play them?
I've looked around a bit and I think, best would be to use AudioQueue from the AudioToolbox.framework using the approach described here: AVAssetReader and Audio Queue streaming problem
There is also an audio-player in the AVFoundation: AVAudioPlayer. But I don't know exactly how I should pass data to its initWithData-initializer which expects NSData. Furthermore, I don't think it's the best choice for my case because a new AVAudioPlayer-instance would have to be created for every new chunk of audio samples, as I understand it.
Any other suggestions?
What's the best way to play the raw audio samples which I get from the AVAssetReaderTrackOutput?
You want do do an AV composition. You can merge multiple media sources, synchronized temporally, into one output.
