I am using AVFoundation to re-export a video. I am reading it in using AVAssetReader, and then exporting again using AVAssetWriter. Works great for "normal" videos.
However, I've noticed that for videos where everything is constant except for a small region (e.g: https://www.youtube.com/watch?v=VyhG-AKLK-Y) it doesn't compress very well.
The reason is probably that the encoder has no way of knowing the contents of the video so it uses a generic compression method.
I came across this: https://developer.apple.com/videos/play/wwdc2014/513/ and implemented the steps needed to support multi-pass encoding (uses an additional pass so that it can analyze the contents of the video first).
The results, while better than the default, doesn't match up to ffmpeg or videos exported from premiere (much worse quality with higher filesize).
Does anyone have any insights on how to best optimize for videos like this?
Related
I use AVPlayer to play audio(streaming or local file). For this audio I want to apply some effects - boost volume, skip silence, reduce noise, change speed(in 0.1 intervals).
I did same thing in android by creating own player, decoding different audio formats into pcm data and then using some c libraries to modify it. It was quite complicated.
Is it possible to do with AVPlayer or how can I do that? Something like modifying audio already decoded by AVPlayer. Is there some ios api (AVAudioEngine?) or frameworks (audioKit?) which can do this?
Thanks!
IMHO the best solution is to use https://github.com/audiokit/AudioKit as it is well maintained and supports most of the requirements you listed.
Another approach is to import the C library you used in the Android project and have a wrapper around it so it can easily used in ObjectiveC/Swift. With this approach you will have less code to maintain and you guarantee similar results between the two platforms. Do you care to share more about this code ?
I want to read in a video asset on disk and a bunch of processing on it, things like using a CICropFilter on each individual frame and cutting out a mask, splitting up one video into several smaller videos, and removing frames from the original track to "compress" it down and make it more gif-like.
I've come up with a few possible avenues:
AVAssetWriter and AVAssetReader
In this scenario, I would read in the CMSampleBuffers from file, perform my desired manipulations, then write back to a new file using AVAssetWriter.
AVMutableComposition
Here, given a list of CMTimes I can easily cut out frames and rewrite the video or even create multiple compositions for each new video I want to create, then export all of them using AVAssetExportSession.
The metrics I'm concerned about: performance and power. That is to say I'm interested in the method that offers the greatest efficiency in performing my edits while also giving me the flexibility to do what I want. I'd imagine the kind of video editing I'm describing can be done with both approaches but really I want the most performant/with the best capabilities.
In my experience AVAssetExportSession is slightly more performant than using AVAssetReader and AVAssetWriter for a straight forward format A -> format B type conversion, however that said, it's probably not by enough to be too concerned about.
According to Apple's own documentation https://developer.apple.com/library/ios/documentation/AudioVideo/Conceptual/AVFoundationPG/Articles/00_Introduction.html#//apple_ref/doc/uid/TP40010188:
You use an export session to reencode an existing asset into a format
defined by one of a small number of commonly-used presets. If you need
more control over the transformation, in iOS 4.1 and later you can use
an asset reader and asset writer object in tandem to convert an asset
from one representation to another. Using these objects you can, for
example, choose which of the tracks you want to be represented in the
output file, specify your own output format, or modify the asset
during the conversion process.
Given the nature of your question, it seems like you don't have much experience with the AVFoundation framework yet. My advice is to start with AVAssetExportSession and then when you hit a road block, move deeper down the stack into AVAssetReader and AVAssetWriter.
Eventually, depending on how far you take this, you may even want to write your own Custom Compositor.
I've been exploring options on iOS to achieve hardware accelerated decoding of raw H.264 stream and so far I only found that the only option is to write the H.264 stream into an MP4 file and then pass the file to an instance of AVAssetReader. Although this method works, it's not particulary suitable for realtime applications. AVFoundation reference indicates the existence of a CALayer that can display compressed video frames (AVSampleBufferDisplayLayer) and I believe this would be a valid alternative to the method mentioned above. Unfortunately this layer is only available on OSX. I would like to file an enchament radar but before I do so I would like to know from someone that has experience with this layer if indeed could be use to display H.264 raw data if was available on iOS. Currently in my app, the decompressed YUV frames are rendered via openGLES. Using this layer means that I will not need to use openGLES anymore?
In iOS 8 the AVSampleBufferDisplayLayer class is available now.
Take a Look and have Fun
What I'm trying to do is exactly as the title says, decode multiple compressed audio streams/files - it will be extracted from a modified MP4 file - and do EQ on them in realtime simultaneously.
I have read through most of Apple's docs.
I have tried AudioQueues, but I won't be able to do equalization, as once the compressed audio goes in, it doesn't come out ... so I can't manipulate it.
Audio Units don't seem to have any components to handle decompression of AAC and MP3 - if I'm right it's converter only handles converting from one LPCM format to another.
I have been trying to work out a solution on and off for about a month and a half now.
I'm now thinking, use a 3rd party decoder (god help me; I haven't a clue how to use those, the source code is greek; oh and any recommendations? :x), then feed the decoded-to LPCM into AudioQueues doing EQ at the callback.
Maybe I'm missing something here. Suggestions? :(
I'm still trying to figure out Core Audio for my own needs, but from what I can understand, you want to use Extended Audio File Services which handles reading and compression for you, producing PCM data you can then hand off to a buffer. The MixerHost sample project provides an example of using ExtAudioFileOpenURL to do this.
I was planning to use the vlc library to decode an H.264 based RTSP stream and extract each frame from it (convert vlc picture to IplImage). I have done a bit of exploration of the vlc code and concluded that there is a function called libvlc_video_take_snapshot which does a similar thing. However the captured frame in this case is saved on the hard disk which I wish to avoid due to the real time nature of my application. What would be the best way to do this? Would it be possible without modifying the vlc source (I want to avoid recompilation if possible). I have heard of vmem etc but could not really figure out what it does and how to use it.
The picture_t structure is internal to the library, how can we get an access to the same.
Awaiting your response.
P.S. Earlier I tried doing this using FFMPEG, however the ffmpeg library has a lot of issues while decoding an H.264 based RTSP stream on windows and hence I had to switch to VLC.
Regards,
Saurabh Gandhi