Most performant method of processing video and writing to file - ios AVFoundation - ios

I want to read in a video asset on disk and a bunch of processing on it, things like using a CICropFilter on each individual frame and cutting out a mask, splitting up one video into several smaller videos, and removing frames from the original track to "compress" it down and make it more gif-like.
I've come up with a few possible avenues:
AVAssetWriter and AVAssetReader
In this scenario, I would read in the CMSampleBuffers from file, perform my desired manipulations, then write back to a new file using AVAssetWriter.
AVMutableComposition
Here, given a list of CMTimes I can easily cut out frames and rewrite the video or even create multiple compositions for each new video I want to create, then export all of them using AVAssetExportSession.
The metrics I'm concerned about: performance and power. That is to say I'm interested in the method that offers the greatest efficiency in performing my edits while also giving me the flexibility to do what I want. I'd imagine the kind of video editing I'm describing can be done with both approaches but really I want the most performant/with the best capabilities.

In my experience AVAssetExportSession is slightly more performant than using AVAssetReader and AVAssetWriter for a straight forward format A -> format B type conversion, however that said, it's probably not by enough to be too concerned about.
According to Apple's own documentation https://developer.apple.com/library/ios/documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/00_Introduction.html#//apple_ref/doc/uid/TP40010188:
You use an export session to reencode an existing asset into a format
defined by one of a small number of commonly-used presets. If you need
more control over the transformation, in iOS 4.1 and later you can use
an asset reader and asset writer object in tandem to convert an asset
from one representation to another. Using these objects you can, for
example, choose which of the tracks you want to be represented in the
output file, specify your own output format, or modify the asset
during the conversion process.
Given the nature of your question, it seems like you don't have much experience with the AVFoundation framework yet. My advice is to start with AVAssetExportSession and then when you hit a road block, move deeper down the stack into AVAssetReader and AVAssetWriter.
Eventually, depending on how far you take this, you may even want to write your own Custom Compositor.

Related

What kind of drum sampling options does Audiokit have?

Working in audio kit and I am looking to understand how people have incorporated drums. Obviously, the sampler is an option, but I am wondering if there is a built in option similar to some of the basic synthesis options.
There are a few options. I personally like the AppleSampler/MidiSampler like in the example but instead of using audio files you can create a EXS Sampler instrument in Logic where you can assign notes for different velocities. AppleSampler can also load AUPresets made in GarageBand and SoundFonts (SF2). The DunneAudioKit Sampler is an option if you are working with SFZ files, but I think that might be a work-in-progress in AudioKit 5. Loading WAV files directly into AppleSampler is also a good option if you just want one shot sounds.
I'm assuming you're mostly talking about playback of samples, not recording.
The best built-in option I've seen (other than AppleSampler/MidiSampler) is AudioPlayer, which lets you load in a sample and play it back on demand (from an on-screen pad, etc). MIDIListener can then help you respond to external MIDI events, etc. It works (I have a pretty big branch in my app where I tried it), but not sure it works well.
I wouldn't recommend DunneAudioKit Sampler for drums. There is no one-shot playback (so playing the same note in quick succession will cut off the previous note, even if you mess with the release). If you're trying to build a complex/realistic acoustic drum instrument, you'll also want round-robins so that variations of the same hit can be played, which Dunne also doesn't have. It can load SFZ files, but only a very limited subset of SFZ's opcodes (so again, it's missing things like round robins, mute groups, one-shot, etc).
Having gone down all those roads, I would suggest starting with AppleSampler, and I would build the EXS or aupreset file in Logic or Mainstage rather than trying to build something programmatically.
If your needs are really simple, the examples in AudioKit's recently released drum pad playground is a great place to start, loading single samples into a specific note on AppleSampler.

AVFoundation - Export and multi-pass encoding not very optimized?

I am using AVFoundation to re-export a video. I am reading it in using AVAssetReader, and then exporting again using AVAssetWriter. Works great for "normal" videos.
However, I've noticed that for videos where everything is constant except for a small region (e.g: https://www.youtube.com/watch?v=VyhG-AKLK-Y) it doesn't compress very well.
The reason is probably that the encoder has no way of knowing the contents of the video so it uses a generic compression method.
I came across this: https://developer.apple.com/videos/play/wwdc2014/513/ and implemented the steps needed to support multi-pass encoding (uses an additional pass so that it can analyze the contents of the video first).
The results, while better than the default, doesn't match up to ffmpeg or videos exported from premiere (much worse quality with higher filesize).
Does anyone have any insights on how to best optimize for videos like this?

How to decode multiple videos simultaneously using AVAssetReader?

I'm trying to decode frames from multiple video files, and use them as opengl texture.
I know how to decode a h264 file using AVAssetReader object, but it seems you have to read the frames after you call startReading in a while loop when the status is AVAssetReaderStatusReading. What I want to do is to call startReading then call copyNextSampleBuffer anywhere anytime I want. In this way, I can create a new video reader class from AVAssetReader, and load video frames from multiple video files whenever I want to use them as opengl textures.
Is this doable?
Short answer is yes, you can decode one frame at a time. You will need to manage the decode logic yourself and the most simple thing is to just allocate a buffer of BGRA pixels and then copy the framebuffer data into your temp buffer. Be warned that you will likely not be able to find a little code snippit that does all this. Thing is, streaming all the data from movies into OpenGL is not easy to implement. I would suggest that you avoid attempting to do this yourself and use a 3rd party library that already implements the hard stuff. If you want to see a complete example of something like this already implemented then you can have a look at my blog post Load OpenGL textures with alpha channel on iOS. This post shows how to stream video into OpenGL but you would need to decode from h.264 to disk first using this approach. It should also be possible to use other libraries to do the same thing, just keep in mind that playing multiple videos at the same time is resource intensive, so you may run into the limits of what can be done on your hardware device quickly. Also, if you do not actually need OpenGL textures, then it is a lot easier to just operate on CoreGraphics APIs directly under iOS.

How can I use AVAudioPlayer to play audio faster *and* higher pitched?

Statement of Problem:
I have a collection of sound effects in my app stored as.m4a files (AAC format, 48 KHz, 16-bit) that I want to play at a variety of speeds and pitches, without having to pre-generate all the variants as separate files.
Although the .rate property of an AVAudioPlayer object can alter playback speed, it always maintains the original pitch, which is not what I want. Instead, I simply want to play the sound sample faster or slower and have the pitch go up or down to match — just like speeding up or slowing down an old-fashioned reel-to-reel tape recorder. In other words, I need some way to essentially alter the audio sample rate by amounts like +2 semitones (12% faster), –5 semitones (33% slower), +12 semitones (2x faster), etc.
Question:
Is there some way fetch the Linear PCM audio data from an AVAudioPlayer object, apply sample rate conversion using a different iOS framework, and stuff the resulting audio data into a new AVAudioPlayer object, which can then be played normally?
Possible avenues:
I was reading up on AudioConverterConvertComplexBuffer. In particular kAudioConverterSampleRateConverterComplexity_Mastering, and kAudioConverterQuality_Max, and AudioConverterFillComplexBuffer() caught my eye. So it looks possible with this audio conversion framework. Is this an avenue I should explore further?
Requirements:
I actually don't need playback to begin instantly. If sample rate conversion incurs a slight delay, that's fine. All of my samples are 4 seconds or less, so I would imagine that any on-the-fly resampling would occur quickly, on the order of 1/10 second or less. (More than 1/2 would be too much, though.)
I'd really rather not get into heavyweight stuff like OpenAL or Core Audio if there is a simpler way to do this using a conversion framework provided by iOS. However, if there is a simple solution to this problem using OpenAL or Core Audio, I'd be happy to consider that. By "simple" I mean something that can be implemented in 50–100 lines of code and doesn't require starting up additional threads to feed data to the a sound device. I'd rather just have everything taken care of automatically — which is why I'm willing to convert the audio clip prior to playing.
I want to avoid any third-party libraries here, because this isn't rocket science and I know it must be possible with native iOS frameworks somehow.
Again, I need to adjust the pitch and playback rate together, not separately. So if playback is slowed down 2x, a human voice would become very deep and slow-spoken. And if playback is sped up 2–3x, a human voice would sound like a fast-talking chipmunk. In other words, I absolutely do not want to alter the pitch while keeping the audio duration the same, because that operation results in an undesirably "tinny" sound when bending the pitch upward more than a couple semitones. I just want to speed the whole thing up and have the pitch go up as a natural side-effect, just like old-fashioned tape recorders used to do.
Needs to work in iOS 6 and up, although iOS 5 support would be a nice bonus.
The forum link Jack Wu mentions has one suggestion, which involves overriding the AIFF header data directly. This may work, but you will need to have AIFF files since it relies on a specific range of the AIFF header to write into. This also needs to be done before you create the AVAudioPlayer, which means that you can't modify the pitch once it is running.
If you are willing to go to the AudioUnits route, a complete simple solution is probably ~200 lines (note that this assumes the code style that has one function take up to 7 lines with one parameter on each line). There is an Varispeed AudioUnit, which does exactly what you want by locking pitch to rate. You would basically need to look at the API, docs and some sample AudioUnit code to get familiar and then:
create/init the audio graph and stream format (~100 lines)
create and add to the graph a RemoteIO AudioUnit (kAudioUnitSubType_RemoteIO) (this outputs to the speaker)
create and add a varispeed unit, and connect the output of the varispeed unit (kAudioUnitSubType_Varispeed) to the input of the RemoteIO Unit
create and add to the graph a AudioFilePlayer (kAudioUnitSubType_AudioFilePlayer) unit to read the file and connect it to the varispeed unit
start the graph to begin playback
when you want to change the pitch, do it via AudioUnitSetParameter, and the pitch and playback rate change will take effect while playing
Note that there is a TimePitch audio unit which allows independent control of pitch and rate, as well.
For iOS 7, you'd want to look at AVPlayerItem's time-pitch algorithm (audioTimePitchAlgorithm) called AVAudioTimePitchAlgorithmVarispeed. Unfortunately this feature is not available on early systems.

OpenCV video frame metadata write and read

I would like to encode a date/time stamp in each frame of a video in a way that it can be easily read back by a computer. On my system the frame rate is variable, so counting frames does not seem like a good solution. I have it displaying the date and time in human readable form (text) on the frame, but reading that back into the computer doesn't appear to be as trivial as I would like. The recorded videos are large (10s of GB) and long so writing a text file also seems to be troublesome besides being one more file to keep track of. Is there a way to store frame-by-frame information in a video?
There are several ways you can do this.
If your compression is not very strong, you may be able to encode the time-stamp in the top or bottom row of your image. These may not contain too much valuable info. You can add some form of error correction (e.g. CRC) to correct any corruptions done by the compressor.
A more general solution (which I used in the past) is to have the video file, e.g. AVI, contain another separate text stream. Besides AVI most formats support multiple streams, since these are used for stereo-audio streams, subs etc. The drawback here is that there aren't many tools that allow you to write these streams, and you'll have to implement this yourself (using the relevant APIs) for each video format you want to support. In a way this is similar to keeping a text file next to your video, only this file content is multiplexed inside the same video file as a separate stream.

Resources