How to decode multiple videos simultaneously using AVAssetReader? - ios

I'm trying to decode frames from multiple video files, and use them as opengl texture.
I know how to decode a h264 file using AVAssetReader object, but it seems you have to read the frames after you call startReading in a while loop when the status is AVAssetReaderStatusReading. What I want to do is to call startReading then call copyNextSampleBuffer anywhere anytime I want. In this way, I can create a new video reader class from AVAssetReader, and load video frames from multiple video files whenever I want to use them as opengl textures.
Is this doable?

Short answer is yes, you can decode one frame at a time. You will need to manage the decode logic yourself and the most simple thing is to just allocate a buffer of BGRA pixels and then copy the framebuffer data into your temp buffer. Be warned that you will likely not be able to find a little code snippit that does all this. Thing is, streaming all the data from movies into OpenGL is not easy to implement. I would suggest that you avoid attempting to do this yourself and use a 3rd party library that already implements the hard stuff. If you want to see a complete example of something like this already implemented then you can have a look at my blog post Load OpenGL textures with alpha channel on iOS. This post shows how to stream video into OpenGL but you would need to decode from h.264 to disk first using this approach. It should also be possible to use other libraries to do the same thing, just keep in mind that playing multiple videos at the same time is resource intensive, so you may run into the limits of what can be done on your hardware device quickly. Also, if you do not actually need OpenGL textures, then it is a lot easier to just operate on CoreGraphics APIs directly under iOS.

Related

Most performant method of processing video and writing to file - ios AVFoundation

I want to read in a video asset on disk and a bunch of processing on it, things like using a CICropFilter on each individual frame and cutting out a mask, splitting up one video into several smaller videos, and removing frames from the original track to "compress" it down and make it more gif-like.
I've come up with a few possible avenues:
AVAssetWriter and AVAssetReader
In this scenario, I would read in the CMSampleBuffers from file, perform my desired manipulations, then write back to a new file using AVAssetWriter.
AVMutableComposition
Here, given a list of CMTimes I can easily cut out frames and rewrite the video or even create multiple compositions for each new video I want to create, then export all of them using AVAssetExportSession.
The metrics I'm concerned about: performance and power. That is to say I'm interested in the method that offers the greatest efficiency in performing my edits while also giving me the flexibility to do what I want. I'd imagine the kind of video editing I'm describing can be done with both approaches but really I want the most performant/with the best capabilities.
In my experience AVAssetExportSession is slightly more performant than using AVAssetReader and AVAssetWriter for a straight forward format A -> format B type conversion, however that said, it's probably not by enough to be too concerned about.
According to Apple's own documentation https://developer.apple.com/library/ios/documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/00_Introduction.html#//apple_ref/doc/uid/TP40010188:
You use an export session to reencode an existing asset into a format
defined by one of a small number of commonly-used presets. If you need
more control over the transformation, in iOS 4.1 and later you can use
an asset reader and asset writer object in tandem to convert an asset
from one representation to another. Using these objects you can, for
example, choose which of the tracks you want to be represented in the
output file, specify your own output format, or modify the asset
during the conversion process.
Given the nature of your question, it seems like you don't have much experience with the AVFoundation framework yet. My advice is to start with AVAssetExportSession and then when you hit a road block, move deeper down the stack into AVAssetReader and AVAssetWriter.
Eventually, depending on how far you take this, you may even want to write your own Custom Compositor.

Understanding Core Videos CVPixelBufferPool and CVOpenGLESTextureCache semantics

I'm refactoring my iOS OpenGL-based rendering pipeline. My pipeline consist of many rendering steps, hence I need a lot of intermediate textures to render to and read from. Those textures are of various types (unsigned byte and half float) and may posses a different number of channels.
To save memory and allocation effort I recycled textures that were used by previous steps in the pipeline and are no longer needed. In my previous implementation I did that on my own.
In my new implementation I want to use the APIs provided by the Core Video framework for that; especially since they provide much faster access to the texture memory from the CPU. I understand that the CVOpenGLESTextureCache allows me to create OpenGL textures out of CVPixelBuffers that can be created directly or using a CVPixelBufferPool. However, I am unable to find any documentation describing how they really work and how they play together.
Here are the things I want to know:
For getting a texture from the CVOpenGLESTextureCache I always need to provide a pixel buffer. Why is it called "cache" if I need to provide the memory anyways and are not able to retrieve an old, unused texture?
The CVOpenGLESTextureCacheFlush function "flushes currently unused resources". How does the cache know if a resource is "unused"? Are textures returned to the cache when I release the corresponding CVOpenGLESTextureRef? The same question applies to the CVPixelBufferPool.
Am I able to maintain textures with different properties (type, # channels, ...) in one texture cache? Does it know if a textures can be re-used or needs to be created depending on my request?
CVPixelBufferPools seem only to be able to manage buffers of the same size and type. This means I need to create one dedicated pool for each texture configuration I'm using, correct?
I'd be really happy if at least some of those questions could be clarified.
Yes, well you will not actually be able to find anything. I looked and looked and the short answer is you just need to test things out to see how the implementation functions. You can find my blog post on the subject along with example code at opengl_write_texture_cache. Basically, it seems that the way it works is that the texture cache object "holds" on to the association between the buffer (in the pool) and the OpenGL texture that is bound when a triangle render is executed. The result is that the same buffer should not be returned by the pool until after OpenGL is done with it. In the weird case of some kind of race condition, the pool might get 1 buffer larger to account for a buffer that is held too long. What is really nice about the texture cache API is that one only needs to write to the data buffer once, as opposed to calling an API like glTexImage2D() which would "upload" the data to the graphics card.

Display H.264 encoded images via AVSampleBufferDisplayLayer

I've been exploring options on iOS to achieve hardware accelerated decoding of raw H.264 stream and so far I only found that the only option is to write the H.264 stream into an MP4 file and then pass the file to an instance of AVAssetReader. Although this method works, it's not particulary suitable for realtime applications. AVFoundation reference indicates the existence of a CALayer that can display compressed video frames (AVSampleBufferDisplayLayer) and I believe this would be a valid alternative to the method mentioned above. Unfortunately this layer is only available on OSX. I would like to file an enchament radar but before I do so I would like to know from someone that has experience with this layer if indeed could be use to display H.264 raw data if was available on iOS. Currently in my app, the decompressed YUV frames are rendered via openGLES. Using this layer means that I will not need to use openGLES anymore?
In iOS 8 the AVSampleBufferDisplayLayer class is available now.
Take a Look and have Fun

Decode video using CoreMedia.framework on iOS

I need to decode mp4 file and draw it using OpenGL in ios app. I need to extract and decode h264 frames from mp4 file and I heard what it posible to do using CoreMedia. Anybody has any idea how to do it? Any examples of CoreMedia using?
It's not Core Media you're looking for, it's AVFoundation. In particular, you'd use an AVAssetReader to load from your movie and iterate through the frames. You then can upload these frames as OpenGL ES textures either by using glTexImage2D() or (on iOS 5.0) by using the much faster texture caches.
If you don't want to roll your own implementation of this, I have working AVFoundation-based movie loading and processing via OpenGL ES within my GPUImage framework. The GPUImageMovie class encapsulates movie reading and the process of uploading to a texture. If you want to extract that texture for use in your own scene, you can chain a GPUImageTextureOutput to it. Examples of both of these classes can be found in the SimpleVideoFileFilter and CubeExample sample applications within the framework distribution.
You can use this directly, or just look at the code I wrote to perform these same actions within the GPUImageMovie class.

Mixing and equalizing multiple streams of compressed audio on iOS

What I'm trying to do is exactly as the title says, decode multiple compressed audio streams/files - it will be extracted from a modified MP4 file - and do EQ on them in realtime simultaneously.
I have read through most of Apple's docs.
I have tried AudioQueues, but I won't be able to do equalization, as once the compressed audio goes in, it doesn't come out ... so I can't manipulate it.
Audio Units don't seem to have any components to handle decompression of AAC and MP3 - if I'm right it's converter only handles converting from one LPCM format to another.
I have been trying to work out a solution on and off for about a month and a half now.
I'm now thinking, use a 3rd party decoder (god help me; I haven't a clue how to use those, the source code is greek; oh and any recommendations? :x), then feed the decoded-to LPCM into AudioQueues doing EQ at the callback.
Maybe I'm missing something here. Suggestions? :(
I'm still trying to figure out Core Audio for my own needs, but from what I can understand, you want to use Extended Audio File Services which handles reading and compression for you, producing PCM data you can then hand off to a buffer. The MixerHost sample project provides an example of using ExtAudioFileOpenURL to do this.

Resources