I'm trying to reverse audio in iOS with AVAsset and AVAssetWriter.
The following code is working, but the output file is shorter than input.
For example, input file has 1:59 duration, but output 1:50 with the same audio content.
- (void)reverse:(AVAsset *)asset
{
AVAssetReader* reader = [[AVAssetReader alloc] initWithAsset:asset error:nil];
AVAssetTrack* audioTrack = [[asset tracksWithMediaType:AVMediaTypeAudio] objectAtIndex:0];
NSMutableDictionary* audioReadSettings = [NSMutableDictionary dictionary];
[audioReadSettings setValue:[NSNumber numberWithInt:kAudioFormatLinearPCM]
forKey:AVFormatIDKey];
AVAssetReaderTrackOutput* readerOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:audioTrack outputSettings:audioReadSettings];
[reader addOutput:readerOutput];
[reader startReading];
NSDictionary *outputSettings = [NSDictionary dictionaryWithObjectsAndKeys:
[NSNumber numberWithInt: kAudioFormatMPEG4AAC], AVFormatIDKey,
[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
[NSNumber numberWithInt:128000], AVEncoderBitRateKey,
[NSData data], AVChannelLayoutKey,
nil];
AVAssetWriterInput *writerInput = [[AVAssetWriterInput alloc] initWithMediaType:AVMediaTypeAudio
outputSettings:outputSettings];
NSString *exportPath = [NSTemporaryDirectory() stringByAppendingPathComponent:#"out.m4a"];
NSURL *exportURL = [NSURL fileURLWithPath:exportPath];
NSError *writerError = nil;
AVAssetWriter *writer = [[AVAssetWriter alloc] initWithURL:exportURL
fileType:AVFileTypeAppleM4A
error:&writerError];
[writerInput setExpectsMediaDataInRealTime:NO];
[writer addInput:writerInput];
[writer startWriting];
[writer startSessionAtSourceTime:kCMTimeZero];
CMSampleBufferRef sample = [readerOutput copyNextSampleBuffer];
NSMutableArray *samples = [[NSMutableArray alloc] init];
while (sample != NULL) {
sample = [readerOutput copyNextSampleBuffer];
if (sample == NULL)
continue;
[samples addObject:(__bridge id)(sample)];
CFRelease(sample);
}
NSArray* reversedSamples = [[samples reverseObjectEnumerator] allObjects];
for (id reversedSample in reversedSamples) {
if (writerInput.readyForMoreMediaData) {
[writerInput appendSampleBuffer:(__bridge CMSampleBufferRef)(reversedSample)];
}
else {
[NSThread sleepForTimeInterval:0.05];
}
}
[writerInput markAsFinished];
dispatch_queue_t queue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_HIGH, 0);
dispatch_async(queue, ^{
[writer finishWriting];
});
}
UPDATE:
If I write samples directly in first while loop - everything is ok (even with writerInput.readyForMoreMediaData checking). In this case result file has exactly the same duration as original. But if I write the same samples from reversed NSArray - the result is shorter.
The method described here is implemented in an Xcode project at this link (multi-platform SwiftUI app):
ReverseAudio Xcode Project
It is not sufficient to write the audio samples in the reverse order. The sample data needs to be reversed itself.
In Swift, we create an extension to AVAsset.
The samples must be processed as decompressed samples. To that end create audio reader settings with kAudioFormatLinearPCM:
let kAudioReaderSettings = [
AVFormatIDKey: Int(kAudioFormatLinearPCM) as AnyObject,
AVLinearPCMBitDepthKey: 16 as AnyObject,
AVLinearPCMIsBigEndianKey: false as AnyObject,
AVLinearPCMIsFloatKey: false as AnyObject,
AVLinearPCMIsNonInterleaved: false as AnyObject]
Use our AVAsset extension method audioReader:
func audioReader(outputSettings: [String : Any]?) -> (audioTrack:AVAssetTrack?, audioReader:AVAssetReader?, audioReaderOutput:AVAssetReaderTrackOutput?) {
if let audioTrack = self.tracks(withMediaType: .audio).first {
if let audioReader = try? AVAssetReader(asset: self) {
let audioReaderOutput = AVAssetReaderTrackOutput(track: audioTrack, outputSettings: outputSettings)
return (audioTrack, audioReader, audioReaderOutput)
}
}
return (nil, nil, nil)
}
let (_, audioReader, audioReaderOutput) = self.audioReader(outputSettings: kAudioReaderSettings)
to create an audioReader (AVAssetReader) and audioReaderOutput (AVAssetReaderTrackOutput) for reading the audio samples.
We need to keep track of the audio sample:
var audioSamples:[CMSampleBuffer] = []
Now start reading samples.
if audioReader.startReading() {
while audioReader.status == .reading {
if let sampleBuffer = audioReaderOutput.copyNextSampleBuffer(){
// process sample
}
}
}
Save the audio sample buffer, we need it later when we create the reversed sample:
audioSamples.append(sampleBuffer)
We need an AVAssetWriter:
guard let assetWriter = try? AVAssetWriter(outputURL: destinationURL, fileType: AVFileType.wav) else {
// error handling
return
}
The file type is 'wav' because the reversed samples will be written as uncompressed audio format Linear PCM, as follows.
For the assetWriter we specify audio compression settings, and a ‘source format hint’ and can acquire this from an uncompressed sample buffer:
let sampleBuffer = audioSamples[0]
let sourceFormat = CMSampleBufferGetFormatDescription(sampleBuffer)
let audioCompressionSettings = [AVFormatIDKey: kAudioFormatLinearPCM] as [String : Any]
Now we can create the AVAssetWriterInput, add it to the writer and start writing:
let assetWriterInput = AVAssetWriterInput(mediaType: AVMediaType.audio, outputSettings:audioCompressionSettings, sourceFormatHint: sourceFormat)
assetWriter.add(assetWriterInput)
assetWriter.startWriting()
assetWriter.startSession(atSourceTime: CMTime.zero)
Now iterate through the samples, in reverse order, and for each reverse the samples themselves.
We have an extension for CMSampleBuffer that does just that, called ‘reverse’.
Using requestMediaDataWhenReady we do this as follows:
let nbrSamples = audioSamples.count
var index = 0
let serialQueue: DispatchQueue = DispatchQueue(label: "com.limit-point.reverse-audio-queue")
assetWriterInput.requestMediaDataWhenReady(on: serialQueue) {
while assetWriterInput.isReadyForMoreMediaData, index < nbrSamples {
let sampleBuffer = audioSamples[nbrSamples - 1 - index]
if let reversedBuffer = sampleBuffer.reverse(), assetWriterInput.append(reversedBuffer) == true {
index += 1
}
else {
index = nbrSamples
}
if index == nbrSamples {
assetWriterInput.markAsFinished()
finishWriting() // call assetWriter.finishWriting, check assetWriter status, etc.
}
}
}
So the last thing to explain is how do you reverse the audio sample in the ‘reverse’ method?
We create an extension to CMSampleBuffer that takes a sample buffer and returns the reversed sample buffer, as an extension on CMSampleBuffer:
func reverse() -> CMSampleBuffer?
The data that has to be reversed needs to be obtained using the method:
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer
The CMSampleBuffer header files descibes this method as follows:
“Creates an AudioBufferList containing the data from the CMSampleBuffer, and a CMBlockBuffer which references (and manages the lifetime of) the data in that AudioBufferList.”
Call it as follows, where ‘self’ refers to the CMSampleBuffer we are reversing since this is an extension:
var blockBuffer: CMBlockBuffer? = nil
let audioBufferList: UnsafeMutableAudioBufferListPointer = AudioBufferList.allocate(maximumBuffers: 1)
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
self,
bufferListSizeNeededOut: nil,
bufferListOut: audioBufferList.unsafeMutablePointer,
bufferListSize: AudioBufferList.sizeInBytes(maximumBuffers: 1),
blockBufferAllocator: nil,
blockBufferMemoryAllocator: nil,
flags: kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
blockBufferOut: &blockBuffer
)
Now you can access the raw data as:
let data: UnsafeMutableRawPointer = audioBufferList.unsafePointer.pointee.mBuffers.mData
Reversing data we need to access the data as an array of ‘samples’ called sampleArray, and is done as follows in Swift:
let samples = data.assumingMemoryBound(to: Int16.self)
let sizeofInt16 = MemoryLayout<Int16>.size
let dataSize = audioBufferList.unsafePointer.pointee.mBuffers.mDataByteSize
let dataCount = Int(dataSize) / sizeofInt16
var sampleArray = Array(UnsafeBufferPointer(start: samples, count: dataCount)) as [Int16]
Now reverse the array sampleArray:
sampleArray.reverse()
Using the reversed samples we create a new CMSampleBuffer that contains the reversed samples.
Now we replace the data in the CMBlockBuffer we previously obtained with CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer:
First reassign ‘samples’ using the reversed array:
var status:OSStatus = noErr
sampleArray.withUnsafeBytes { sampleArrayPtr in
if let baseAddress = sampleArrayPtr.baseAddress {
let bufferPointer: UnsafePointer<Int16> = baseAddress.assumingMemoryBound(to: Int16.self)
let rawPtr = UnsafeRawPointer(bufferPointer)
status = CMBlockBufferReplaceDataBytes(with: rawPtr, blockBuffer: blockBuffer!, offsetIntoDestination: 0, dataLength: Int(dataSize))
}
}
if status != noErr {
return nil
}
Finally create the new sample buffer using CMSampleBufferCreate. This function needs two arguments we can get from the original sample buffer, namely the formatDescription and numberOfSamples:
let formatDescription = CMSampleBufferGetFormatDescription(self)
let numberOfSamples = CMSampleBufferGetNumSamples(self)
var newBuffer:CMSampleBuffer?
Now create the new sample buffer with the reversed blockBuffer:
guard CMSampleBufferCreate(allocator: kCFAllocatorDefault, dataBuffer: blockBuffer, dataReady: true, makeDataReadyCallback: nil, refcon: nil, formatDescription: formatDescription, sampleCount: numberOfSamples, sampleTimingEntryCount: 0, sampleTimingArray: nil, sampleSizeEntryCount: 0, sampleSizeArray: nil, sampleBufferOut: &newBuffer) == noErr else {
return self
}
return newBuffer
And that’s all there is to it!
As a final note the Core Audio and AVFoundation headers provide a lot of useful information, such as CoreAudioTypes.h, CMSampleBuffer.h, and many more.
Complete example for reverse video and audio using Swift 5 into the same asset output, audio processed using above recommendations:
private func reverseVideo(inURL: URL, outURL: URL, queue: DispatchQueue, _ completionBlock: ((Bool)->Void)?) {
Log.info("Start reverse video!")
let asset = AVAsset.init(url: inURL)
guard
let reader = try? AVAssetReader.init(asset: asset),
let videoTrack = asset.tracks(withMediaType: .video).first,
let audioTrack = asset.tracks(withMediaType: .audio).first
else {
assert(false)
completionBlock?(false)
return
}
let width = videoTrack.naturalSize.width
let height = videoTrack.naturalSize.height
// Video reader
let readerVideoSettings: [String : Any] = [ String(kCVPixelBufferPixelFormatTypeKey) : kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange,]
let readerVideoOutput = AVAssetReaderTrackOutput.init(track: videoTrack, outputSettings: readerVideoSettings)
reader.add(readerVideoOutput)
// Audio reader
let readerAudioSettings: [String : Any] = [
AVFormatIDKey: kAudioFormatLinearPCM,
AVLinearPCMBitDepthKey: 16 ,
AVLinearPCMIsBigEndianKey: false ,
AVLinearPCMIsFloatKey: false,]
let readerAudioOutput = AVAssetReaderTrackOutput.init(track: audioTrack, outputSettings: readerAudioSettings)
reader.add(readerAudioOutput)
//Start reading content
reader.startReading()
//Reading video samples
var videoBuffers = [CMSampleBuffer]()
while let nextBuffer = readerVideoOutput.copyNextSampleBuffer() {
videoBuffers.append(nextBuffer)
}
//Reading audio samples
var audioBuffers = [CMSampleBuffer]()
var timingInfos = [CMSampleTimingInfo]()
while let nextBuffer = readerAudioOutput.copyNextSampleBuffer() {
var timingInfo = CMSampleTimingInfo()
var timingInfoCount = CMItemCount()
CMSampleBufferGetSampleTimingInfoArray(nextBuffer, entryCount: 0, arrayToFill: &timingInfo, entriesNeededOut: &timingInfoCount)
let duration = CMSampleBufferGetDuration(nextBuffer)
let endTime = CMTimeAdd(timingInfo.presentationTimeStamp, duration)
let newPresentationTime = CMTimeSubtract(duration, endTime)
timingInfo.presentationTimeStamp = newPresentationTime
timingInfos.append(timingInfo)
audioBuffers.append(nextBuffer)
}
//Stop reading
let status = reader.status
reader.cancelReading()
guard status == .completed, let firstVideoBuffer = videoBuffers.first, let firstAudioBuffer = audioBuffers.first else {
assert(false)
completionBlock?(false)
return
}
//Start video time
let sessionStartTime = CMSampleBufferGetPresentationTimeStamp(firstVideoBuffer)
//Writer for video
let writerVideoSettings: [String:Any] = [
AVVideoCodecKey : AVVideoCodecType.h264,
AVVideoWidthKey : width,
AVVideoHeightKey: height,
]
let writerVideoInput: AVAssetWriterInput
if let formatDescription = videoTrack.formatDescriptions.last {
writerVideoInput = AVAssetWriterInput.init(mediaType: .video, outputSettings: writerVideoSettings, sourceFormatHint: (formatDescription as! CMFormatDescription))
} else {
writerVideoInput = AVAssetWriterInput.init(mediaType: .video, outputSettings: writerVideoSettings)
}
writerVideoInput.transform = videoTrack.preferredTransform
writerVideoInput.expectsMediaDataInRealTime = false
//Writer for audio
let writerAudioSettings: [String:Any] = [
AVFormatIDKey : kAudioFormatMPEG4AAC,
AVSampleRateKey : 44100,
AVNumberOfChannelsKey: 2,
AVEncoderBitRateKey:128000,
AVChannelLayoutKey: NSData(),
]
let sourceFormat = CMSampleBufferGetFormatDescription(firstAudioBuffer)
let writerAudioInput: AVAssetWriterInput = AVAssetWriterInput.init(mediaType: .audio, outputSettings: writerAudioSettings, sourceFormatHint: sourceFormat)
writerAudioInput.expectsMediaDataInRealTime = true
guard
let writer = try? AVAssetWriter.init(url: outURL, fileType: .mp4),
writer.canAdd(writerVideoInput),
writer.canAdd(writerAudioInput)
else {
assert(false)
completionBlock?(false)
return
}
let pixelBufferAdaptor = AVAssetWriterInputPixelBufferAdaptor.init(assetWriterInput: writerVideoInput, sourcePixelBufferAttributes: nil)
let group = DispatchGroup.init()
group.enter()
writer.add(writerVideoInput)
writer.add(writerAudioInput)
writer.startWriting()
writer.startSession(atSourceTime: sessionStartTime)
var videoFinished = false
var audioFinished = false
//Write video samples in reverse order
var currentSample = 0
writerVideoInput.requestMediaDataWhenReady(on: queue) {
for i in currentSample..<videoBuffers.count {
currentSample = i
if !writerVideoInput.isReadyForMoreMediaData {
return
}
let presentationTime = CMSampleBufferGetPresentationTimeStamp(videoBuffers[i])
guard let imageBuffer = CMSampleBufferGetImageBuffer(videoBuffers[videoBuffers.count - i - 1]) else {
Log.info("VideoWriter reverseVideo: warning, could not get imageBuffer from SampleBuffer...")
continue
}
if !pixelBufferAdaptor.append(imageBuffer, withPresentationTime: presentationTime) {
Log.info("VideoWriter reverseVideo: warning, could not append imageBuffer...")
}
}
// finish write video samples
writerVideoInput.markAsFinished()
Log.info("Video writing finished!")
videoFinished = true
if(audioFinished){
group.leave()
}
}
//Write audio samples in reverse order
let totalAudioSamples = audioBuffers.count
writerAudioInput.requestMediaDataWhenReady(on: queue) {
for i in 0..<totalAudioSamples-1 {
if !writerAudioInput.isReadyForMoreMediaData {
return
}
let audioSample = audioBuffers[totalAudioSamples-1-i]
let timingInfo = timingInfos[i]
// reverse samples data using timing info
if let reversedBuffer = audioSample.reverse(timingInfo: [timingInfo]) {
// append data
if writerAudioInput.append(reversedBuffer) == false {
break
}
}
}
// finish
writerAudioInput.markAsFinished()
Log.info("Audio writing finished!")
audioFinished = true
if(videoFinished){
group.leave()
}
}
group.notify(queue: queue) {
writer.finishWriting {
if writer.status != .completed {
Log.info("VideoWriter reverse video: error - \(String(describing: writer.error))")
completionBlock?(false)
} else {
Log.info("Ended reverse video!")
completionBlock?(true)
}
}
}
}
Happy coding!
Print out the size of each buffer in number of samples (through the "reading" readerOuput while loop), and repeat in the "writing" writerInput for-loop. This way you can see all the buffer sizes and see if they add up.
For example, are you missing or skipping a buffer if (writerInput.readyForMoreMediaData) is false, you "sleep", but then proceed to the next reversedSample in reversedSamples (that buffer effectively gets dropped from the writerInput)
UPDATE (based on comments):
I found in the code, there are two problems:
The output settings is incorrect (the input file is mono (1 channel), but the output settings is configured to be 2 channels. It should be: [NSNumber numberWithInt:1], AVNumberOfChannelsKey. Look at the info on output and input files:
The second problem is that you are reversing 643 buffers of 8192 audio samples, instead of reversing the index of each audio sample. To see each buffer, I changed your debugging from looking at the size of each sample to looking at the size of the buffer, which is 8192. So line 76 is now: size_t sampleSize = CMSampleBufferGetNumSamples(sample);
The output looks like:
2015-03-19 22:26:28.171 audioReverse[25012:4901250] Reading [0]: 8192
2015-03-19 22:26:28.172 audioReverse[25012:4901250] Reading [1]: 8192
...
2015-03-19 22:26:28.651 audioReverse[25012:4901250] Reading [640]: 8192
2015-03-19 22:26:28.651 audioReverse[25012:4901250] Reading [641]: 8192
2015-03-19 22:26:28.651 audioReverse[25012:4901250] Reading [642]: 5056
2015-03-19 22:26:28.651 audioReverse[25012:4901250] Writing [0]: 5056
2015-03-19 22:26:28.652 audioReverse[25012:4901250] Writing [1]: 8192
...
2015-03-19 22:26:29.134 audioReverse[25012:4901250] Writing [640]: 8192
2015-03-19 22:26:29.135 audioReverse[25012:4901250] Writing [641]: 8192
2015-03-19 22:26:29.135 audioReverse[25012:4901250] Writing [642]: 8192
This shows that you're reversing the order of each buffer of 8192 samples, but in each buffer the audio is still "facing forward". We can see this in this screen shot I took of a correctly reversed (sample-by-sample) versus your buffer reversal:
I think your current scheme can work if you also reverse each sample each 8192 buffer. I personally would not recommend using NSArray enumerators for signal-processing, but it can work if you operate at the sample-level.
extension CMSampleBuffer {
func reverse(timingInfo:[CMSampleTimingInfo]) -> CMSampleBuffer? {
var blockBuffer: CMBlockBuffer? = nil
let audioBufferList: UnsafeMutableAudioBufferListPointer = AudioBufferList.allocate(maximumBuffers: 1)
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
self,
bufferListSizeNeededOut: nil,
bufferListOut: audioBufferList.unsafeMutablePointer,
bufferListSize: AudioBufferList.sizeInBytes(maximumBuffers: 1),
blockBufferAllocator: nil,
blockBufferMemoryAllocator: nil,
flags: kCMSampleBufferFlag_AudioBufferList_Assure16ByteAlignment,
blockBufferOut: &blockBuffer
)
if let data = audioBufferList.unsafePointer.pointee.mBuffers.mData {
let samples = data.assumingMemoryBound(to: Int16.self)
let sizeofInt16 = MemoryLayout<Int16>.size
let dataSize = audioBufferList.unsafePointer.pointee.mBuffers.mDataByteSize
let dataCount = Int(dataSize) / sizeofInt16
var sampleArray = Array(UnsafeBufferPointer(start: samples, count: dataCount)) as [Int16]
sampleArray.reverse()
var status:OSStatus = noErr
sampleArray.withUnsafeBytes { sampleArrayPtr in
if let baseAddress = sampleArrayPtr.baseAddress {
let bufferPointer: UnsafePointer<Int16> = baseAddress.assumingMemoryBound(to: Int16.self)
let rawPtr = UnsafeRawPointer(bufferPointer)
status = CMBlockBufferReplaceDataBytes(with: rawPtr, blockBuffer: blockBuffer!, offsetIntoDestination: 0, dataLength: Int(dataSize))
}
}
if status != noErr {
return nil
}
let formatDescription = CMSampleBufferGetFormatDescription(self)
let numberOfSamples = CMSampleBufferGetNumSamples(self)
var newBuffer:CMSampleBuffer?
guard CMSampleBufferCreate(allocator: kCFAllocatorDefault, dataBuffer: blockBuffer, dataReady: true, makeDataReadyCallback: nil, refcon: nil, formatDescription: formatDescription, sampleCount: numberOfSamples, sampleTimingEntryCount: timingInfo.count, sampleTimingArray: timingInfo, sampleSizeEntryCount: 0, sampleSizeArray: nil, sampleBufferOut: &newBuffer) == noErr else {
return self
}
return newBuffer
}
return nil
}
}
Missed function!
Related
Currently I am trying to process the frames of an existing video with OpenCV. Are there any AV reader libraries that contain delegate methods that process frames while playing back videos? I know how to process frames during a live AVCaptureSession through the use of the AVCaptureVideoDataOutput and the captureOutput delegate method. Is there something similar for playing back videos?
Any help would be appreiciated.
Here's the solution. Thanks to Tim Bull's answer I accomplished this using AVAssetReader / AssetReaderOutput
The below function I called within a button click to start the video, and begin processing each frame with OpenCV:
func processVids() {
guard let pathOfOrigVid = Bundle.main.path(forResource: "output_10_34_34", ofType: "mp4") else{
print("video.m4v not found\n")
exit(0)
}
var path: URL? = nil
do{
path = try FileManager.default.url(for: .documentDirectory, in:.userDomainMask, appropriateFor: nil, create: false)
path = path?.appendingPathComponent("grayVideo.mp4")
}catch{
print("Unable to make URL to Movies path\n")
exit(0)
}
let movie: AVURLAsset = AVURLAsset(url: NSURL(fileURLWithPath: pathOfOrigVid) as URL, options: nil)
let tracks: [AVAssetTrack] = movie.tracks(withMediaType: AVMediaTypeVideo)
let track: AVAssetTrack = tracks[0]
var reader: AVAssetReader? = nil
do{
reader = try AVAssetReader(asset: movie)
}
catch{
print("Problem initializing AVReader\n")
}
let settings : [String: Any?] = [
String(kCVPixelBufferPixelFormatTypeKey): NSNumber(value: kCVPixelFormatType_32ARGB),
String(kCVPixelBufferIOSurfacePropertiesKey): [:]
]
let rout: AVAssetReaderTrackOutput = AVAssetReaderTrackOutput(track: track, outputSettings: settings)
reader?.add(rout)
reader?.startReading()
DispatchQueue.global().async(execute: {
while reader?.status == AVAssetReaderStatus.reading {
if(rout.copyNextSampleBuffer() != nil){
// Buffer of the frame to perform OpenCV processing on
let sbuff: CMSampleBuffer = rout.copyNextSampleBuffer()!
}
usleep(10000)
}
})
}
AVAssetReader / AVAssetReaderOutput are what you're looking for. Check out the CopyNextSampleBuffer method.
https://developer.apple.com/documentation/avfoundation/avassetreaderoutput
You can use AVVideoComposition
If You want to process frames with CoreImage you can create an instance by calling init(asset:applyingCIFiltersWithHandler:) method.
Or you can create custom comopsitor
You can implement your own custom video compositor by implementing the
AVVideoCompositing protocol; a custom video compositor is provided
with pixel buffers for each of its video sources during playback and
other operations and can perform arbitrary graphical operations on
them in order to produce visual output.
See docs for more info.
Here you can find an example (but example is in Objective-C).
For someone need to process frame of video by OpenCV.
Decode video:
#objc public protocol ARVideoReaderDelegate : NSObjectProtocol {
func reader(_ reader:ARVideoReader!, newFrameReady sampleBuffer:CMSampleBuffer?, _ frameCount:Int)
func readerDidFinished(_ reader:ARVideoReader!, totalFrameCount:Int)
}
#objc open class ARVideoReader: NSObject {
var _asset: AVURLAsset!
#objc var _delegate: ARVideoReaderDelegate?
#objc public init!(urlAsset asset:AVURLAsset){
_asset = asset
super.init()
}
#objc open func startReading() -> Void {
if let reader = try? AVAssetReader.init(asset: _asset){
let videoTrack = _asset.tracks(withMediaType: .video).compactMap{ $0 }.first;
let options = [kCVPixelBufferPixelFormatTypeKey : Int(kCVPixelFormatType_32BGRA)]
let readerOutput = AVAssetReaderTrackOutput.init(track: videoTrack!, outputSettings: options as [String : Any])
reader.add(readerOutput)
reader.startReading()
var count = 0
//reading
while (reader.status == .reading && videoTrack?.nominalFrameRate != 0){
let sampleBuffer = readerOutput.copyNextSampleBuffer()
_delegate?.reader(self, newFrameReady: sampleBuffer, count)
count = count+1;
}
_delegate?.readerDidFinished(self,totalFrameCount: count)
}
}
}
In the callback of delegate:
//convert sampleBuffer to cv::Mat
CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer);
size_t width = CVPixelBufferGetWidth(imageBuffer);
size_t height = CVPixelBufferGetHeight(imageBuffer);
CVPixelBufferLockBaseAddress(imageBuffer, kCVPixelBufferLock_ReadOnly);
char *baseBuffer = (char*)CVPixelBufferGetBaseAddress(imageBuffer);
cv::Mat cvImage = cv::Mat((int)height,(int)width,CV_8UC3);
cv::MatIterator_<cv::Vec3b> it_start = cvImage.begin<cv::Vec3b>();
cv::MatIterator_<cv::Vec3b> it_end = cvImage.end<cv::Vec3b>();
long cur = 0;
size_t padding = CVPixelBufferGetBytesPerRow(imageBuffer) - width*4;
size_t offset = padding;
while (it_start != it_end) {
//opt pixel
long p_idx = cur*4 + offset;
char b = baseBuffer[p_idx];
char g = baseBuffer[p_idx + 1];
char r = baseBuffer[p_idx + 2];
cv::Vec3b newpixel(b,g,r);
*it_start = newpixel;
cur++;
it_start++;
if (cur%width == 0) {
offset = offset + padding;
}
}
CVPixelBufferUnlockBaseAddress(imageBuffer, kCVPixelBufferLock_ReadOnly);
//process cvImage now
What is the most efficient way to capture frames from a MTKView? If possible, I would like to save a .mov file from the frames in realtime. Is it possible to render into an AVPlayer frame or something?
It is currently drawing with this code (based on #warrenm PerformanceShaders project):
func draw(in view: MTKView) {
_ = inflightSemaphore.wait(timeout: DispatchTime.distantFuture)
updateBuffers()
let commandBuffer = commandQueue.makeCommandBuffer()
commandBuffer.addCompletedHandler{ [weak self] commandBuffer in
if let strongSelf = self {
strongSelf.inflightSemaphore.signal()
}
}
// Dispatch the current kernel to perform the selected image filter
selectedKernel.encode(commandBuffer: commandBuffer,
sourceTexture: kernelSourceTexture!,
destinationTexture: kernelDestTexture!)
if let renderPassDescriptor = view.currentRenderPassDescriptor, let currentDrawable = view.currentDrawable
{
let clearColor = MTLClearColor(red: 0, green: 0, blue: 0, alpha: 1)
renderPassDescriptor.colorAttachments[0].clearColor = clearColor
let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)
renderEncoder.label = "Main pass"
renderEncoder.pushDebugGroup("Draw textured square")
renderEncoder.setFrontFacing(.counterClockwise)
renderEncoder.setCullMode(.back)
renderEncoder.setRenderPipelineState(pipelineState)
renderEncoder.setVertexBuffer(vertexBuffer, offset: MBEVertexDataSize * bufferIndex, at: 0)
renderEncoder.setVertexBuffer(uniformBuffer, offset: MBEUniformDataSize * bufferIndex , at: 1)
renderEncoder.setFragmentTexture(kernelDestTexture, at: 0)
renderEncoder.setFragmentSamplerState(sampler, at: 0)
renderEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
renderEncoder.popDebugGroup()
renderEncoder.endEncoding()
commandBuffer.present(currentDrawable)
}
bufferIndex = (bufferIndex + 1) % MBEMaxInflightBuffers
commandBuffer.commit()
}
Here's a small class that performs the essential functions of writing out a movie file that captures the contents of a Metal view:
class MetalVideoRecorder {
var isRecording = false
var recordingStartTime = TimeInterval(0)
private var assetWriter: AVAssetWriter
private var assetWriterVideoInput: AVAssetWriterInput
private var assetWriterPixelBufferInput: AVAssetWriterInputPixelBufferAdaptor
init?(outputURL url: URL, size: CGSize) {
do {
assetWriter = try AVAssetWriter(outputURL: url, fileType: .m4v)
} catch {
return nil
}
let outputSettings: [String: Any] = [ AVVideoCodecKey : AVVideoCodecType.h264,
AVVideoWidthKey : size.width,
AVVideoHeightKey : size.height ]
assetWriterVideoInput = AVAssetWriterInput(mediaType: .video, outputSettings: outputSettings)
assetWriterVideoInput.expectsMediaDataInRealTime = true
let sourcePixelBufferAttributes: [String: Any] = [
kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_32BGRA,
kCVPixelBufferWidthKey as String : size.width,
kCVPixelBufferHeightKey as String : size.height ]
assetWriterPixelBufferInput = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: assetWriterVideoInput,
sourcePixelBufferAttributes: sourcePixelBufferAttributes)
assetWriter.add(assetWriterVideoInput)
}
func startRecording() {
assetWriter.startWriting()
assetWriter.startSession(atSourceTime: .zero)
recordingStartTime = CACurrentMediaTime()
isRecording = true
}
func endRecording(_ completionHandler: #escaping () -> ()) {
isRecording = false
assetWriterVideoInput.markAsFinished()
assetWriter.finishWriting(completionHandler: completionHandler)
}
func writeFrame(forTexture texture: MTLTexture) {
if !isRecording {
return
}
while !assetWriterVideoInput.isReadyForMoreMediaData {}
guard let pixelBufferPool = assetWriterPixelBufferInput.pixelBufferPool else {
print("Pixel buffer asset writer input did not have a pixel buffer pool available; cannot retrieve frame")
return
}
var maybePixelBuffer: CVPixelBuffer? = nil
let status = CVPixelBufferPoolCreatePixelBuffer(nil, pixelBufferPool, &maybePixelBuffer)
if status != kCVReturnSuccess {
print("Could not get pixel buffer from asset writer input; dropping frame...")
return
}
guard let pixelBuffer = maybePixelBuffer else { return }
CVPixelBufferLockBaseAddress(pixelBuffer, [])
let pixelBufferBytes = CVPixelBufferGetBaseAddress(pixelBuffer)!
// Use the bytes per row value from the pixel buffer since its stride may be rounded up to be 16-byte aligned
let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let region = MTLRegionMake2D(0, 0, texture.width, texture.height)
texture.getBytes(pixelBufferBytes, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)
let frameTime = CACurrentMediaTime() - recordingStartTime
let presentationTime = CMTimeMakeWithSeconds(frameTime, preferredTimescale: 240)
assetWriterPixelBufferInput.append(pixelBuffer, withPresentationTime: presentationTime)
CVPixelBufferUnlockBaseAddress(pixelBuffer, [])
}
}
After initializing one of these and calling startRecording(), you can add a scheduled handler to the command buffer containing your rendering commands and call writeFrame (after you end encoding, but before presenting the drawable or committing the buffer):
let texture = currentDrawable.texture
commandBuffer.addCompletedHandler { commandBuffer in
self.recorder.writeFrame(forTexture: texture)
}
When you're done recording, just call endRecording, and the video file will be finalized and closed.
Caveats:
This class assumes the source texture to be of the default format, .bgra8Unorm. If it isn't, you'll get crashes or corruption. If necessary, convert the texture with a compute or fragment shader, or use Accelerate.
This class also assumes that the texture is the same size as the video frame. If this isn't the case (if the drawable size changes, or your screen autorotates), the output will be corrupted and you may see crashes. Mitigate this by scaling or cropping the source texture as your application requires.
Upgraded to Swift 5
import AVFoundation
class MetalVideoRecorder {
var isRecording = false
var recordingStartTime = TimeInterval(0)
private var assetWriter: AVAssetWriter
private var assetWriterVideoInput: AVAssetWriterInput
private var assetWriterPixelBufferInput: AVAssetWriterInputPixelBufferAdaptor
init?(outputURL url: URL, size: CGSize) {
do {
assetWriter = try AVAssetWriter(outputURL: url, fileType: AVFileType.m4v)
} catch {
return nil
}
let outputSettings: [String: Any] = [ AVVideoCodecKey : AVVideoCodecType.h264,
AVVideoWidthKey : size.width,
AVVideoHeightKey : size.height ]
assetWriterVideoInput = AVAssetWriterInput(mediaType: AVMediaType.video, outputSettings: outputSettings)
assetWriterVideoInput.expectsMediaDataInRealTime = true
let sourcePixelBufferAttributes: [String: Any] = [
kCVPixelBufferPixelFormatTypeKey as String : kCVPixelFormatType_32BGRA,
kCVPixelBufferWidthKey as String : size.width,
kCVPixelBufferHeightKey as String : size.height ]
assetWriterPixelBufferInput = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: assetWriterVideoInput,
sourcePixelBufferAttributes: sourcePixelBufferAttributes)
assetWriter.add(assetWriterVideoInput)
}
func startRecording() {
assetWriter.startWriting()
assetWriter.startSession(atSourceTime: CMTime.zero)
recordingStartTime = CACurrentMediaTime()
isRecording = true
}
func endRecording(_ completionHandler: #escaping () -> ()) {
isRecording = false
assetWriterVideoInput.markAsFinished()
assetWriter.finishWriting(completionHandler: completionHandler)
}
func writeFrame(forTexture texture: MTLTexture) {
if !isRecording {
return
}
while !assetWriterVideoInput.isReadyForMoreMediaData {}
guard let pixelBufferPool = assetWriterPixelBufferInput.pixelBufferPool else {
print("Pixel buffer asset writer input did not have a pixel buffer pool available; cannot retrieve frame")
return
}
var maybePixelBuffer: CVPixelBuffer? = nil
let status = CVPixelBufferPoolCreatePixelBuffer(nil, pixelBufferPool, &maybePixelBuffer)
if status != kCVReturnSuccess {
print("Could not get pixel buffer from asset writer input; dropping frame...")
return
}
guard let pixelBuffer = maybePixelBuffer else { return }
CVPixelBufferLockBaseAddress(pixelBuffer, [])
let pixelBufferBytes = CVPixelBufferGetBaseAddress(pixelBuffer)!
// Use the bytes per row value from the pixel buffer since its stride may be rounded up to be 16-byte aligned
let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let region = MTLRegionMake2D(0, 0, texture.width, texture.height)
texture.getBytes(pixelBufferBytes, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)
let frameTime = CACurrentMediaTime() - recordingStartTime
let presentationTime = CMTimeMakeWithSeconds(frameTime, preferredTimescale: 240)
assetWriterPixelBufferInput.append(pixelBuffer, withPresentationTime: presentationTime)
CVPixelBufferUnlockBaseAddress(pixelBuffer, [])
// You need to release memory allocated to pixelBuffer
CVPixelBufferRelease(pixelBuffer)
}
}
EDIT: added CVPixelBufferRelease(pixelBuffer) to avoid memory leaks.
Without this during each frame newly created pixelBuffer will stay in the memory and eventually app will you all of the available system memory.
I'm trying to record segments of audio and recombine them without producing a gap in audio.
The eventual goal is to also have video, but I've found that audio itself creates gaps when combined with ffmpeg -f concat -i list.txt -c copy out.mp4
If I put the audio in an HLS playlist, there are also gaps, so I don't think this is unique to ffmpeg.
The idea is that samples come in continuously, and my controller routes samples to the proper AVAssetWriter. How do I eliminate gaps in audio?
import Foundation
import UIKit
import AVFoundation
class StreamController: UIViewController, AVCaptureAudioDataOutputSampleBufferDelegate, AVCaptureVideoDataOutputSampleBufferDelegate {
var closingAudioInput: AVAssetWriterInput?
var closingAssetWriter: AVAssetWriter?
var currentAudioInput: AVAssetWriterInput?
var currentAssetWriter: AVAssetWriter?
var nextAudioInput: AVAssetWriterInput?
var nextAssetWriter: AVAssetWriter?
var videoHelper: VideoHelper?
var startTime: NSTimeInterval = 0
let closeAssetQueue: dispatch_queue_t = dispatch_queue_create("closeAssetQueue", nil);
override func viewDidLoad() {
super.viewDidLoad()
startTime = NSDate().timeIntervalSince1970
createSegmentWriter()
videoHelper = VideoHelper()
videoHelper!.delegate = self
videoHelper!.startSession()
NSTimer.scheduledTimerWithTimeInterval(1, target: self, selector: "createSegmentWriter", userInfo: nil, repeats: true)
}
func createSegmentWriter() {
print("Creating segment writer at t=\(NSDate().timeIntervalSince1970 - self.startTime)")
let outputPath = OutputFileNameHelper.instance.pathForOutput()
OutputFileNameHelper.instance.incrementSegmentIndex()
try? NSFileManager.defaultManager().removeItemAtPath(outputPath)
nextAssetWriter = try! AVAssetWriter(URL: NSURL(fileURLWithPath: outputPath), fileType: AVFileTypeMPEG4)
nextAssetWriter!.shouldOptimizeForNetworkUse = true
let audioSettings: [String:AnyObject] = EncodingSettings.AUDIO
nextAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio, outputSettings: audioSettings)
nextAudioInput!.expectsMediaDataInRealTime = true
nextAssetWriter?.addInput(nextAudioInput!)
nextAssetWriter!.startWriting()
}
func closeWriterIfNecessary() {
if closing && audioFinished {
closing = false
audioFinished = false
let outputFile = closingAssetWriter?.outputURL.pathComponents?.last
closingAssetWriter?.finishWritingWithCompletionHandler() {
let delta = NSDate().timeIntervalSince1970 - self.startTime
print("segment \(outputFile!) finished at t=\(delta)")
}
self.closingAudioInput = nil
self.closingAssetWriter = nil
}
}
var audioFinished = false
var closing = false
func captureOutput(captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBufferRef, fromConnection connection: AVCaptureConnection!) {
if let nextWriter = nextAssetWriter {
if nextWriter.status.rawValue != 0 {
if (currentAssetWriter != nil) {
closing = true
}
var sampleTiming: CMSampleTimingInfo = kCMTimingInfoInvalid
CMSampleBufferGetSampleTimingInfo(sampleBuffer, 0, &sampleTiming)
print("Switching asset writers at t=\(NSDate().timeIntervalSince1970 - self.startTime)")
closingAssetWriter = currentAssetWriter
closingAudioInput = currentAudioInput
currentAssetWriter = nextAssetWriter
currentAudioInput = nextAudioInput
nextAssetWriter = nil
nextAudioInput = nil
currentAssetWriter?.startSessionAtSourceTime(sampleTiming.presentationTimeStamp)
}
}
if let _ = captureOutput as? AVCaptureVideoDataOutput {
} else if let _ = captureOutput as? AVCaptureAudioDataOutput {
captureAudioSample(sampleBuffer)
}
dispatch_async(closeAssetQueue) {
self.closeWriterIfNecessary()
}
}
func printTimingInfo(sampleBuffer: CMSampleBufferRef, prefix: String) {
var sampleTiming: CMSampleTimingInfo = kCMTimingInfoInvalid
CMSampleBufferGetSampleTimingInfo(sampleBuffer, 0, &sampleTiming)
let presentationTime = Double(sampleTiming.presentationTimeStamp.value) / Double(sampleTiming.presentationTimeStamp.timescale)
print("\(prefix):\(presentationTime)")
}
func captureAudioSample(sampleBuffer: CMSampleBufferRef) {
printTimingInfo(sampleBuffer, prefix: "A")
if (closing && !audioFinished) {
if closingAudioInput?.readyForMoreMediaData == true {
closingAudioInput?.appendSampleBuffer(sampleBuffer)
}
closingAudioInput?.markAsFinished()
audioFinished = true
} else {
if currentAudioInput?.readyForMoreMediaData == true {
currentAudioInput?.appendSampleBuffer(sampleBuffer)
}
}
}
}
With packet formats like AAC you have silent priming frames (a.k.a encoder delay) at the beginning and remainder frames at the end (when your audio length is not a multiple of the packet size). In your case it's 2112 of them at the beginning of every file. Priming and remainder frames break the possibility of concatenating the files without transcoding them, so you can't really blame ffmpeg -c copy for not producing seamless output.
I'm not sure where this leaves you with video - obviously audio is synced to the video, even in the presence of priming frames.
It all depends on how you intend to concatenate the final audio (and eventually video). If you're doing it yourself using AVFoundation, then you can detect and account for priming/remainder frames using
CMGetAttachment(buffer, kCMSampleBufferAttachmentKey_TrimDurationAtStart, NULL)
CMGetAttachment(audioBuffer, kCMSampleBufferAttachmentKey_TrimDurationAtEnd, NULL)
As a short term solution, you can switch to a non "packetised" to get gapless, concatenatable (with ffmpeg) files.
e.g.
AVFormatIDKey: kAudioFormatAppleIMA4, fileType: AVFileTypeAIFC, suffix ".aifc" or
AVFormatIDKey: kAudioFormatLinearPCM, fileType: AVFileTypeWAVE, suffix ".wav"
p.s. you can see priming & remainder frames and packet sizes using the ubiquitous afinfo tool.
afinfo chunk.mp4
Data format: 2 ch, 44100 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
...
audio 39596 valid frames + 2112 priming + 276 remainder = 41984
...
Not sure if this helps you but if you have a bunch of MP4s you can use this code to combine them:
func mergeAudioFiles(audioFileUrls: NSArray, callback: (url: NSURL?, error: NSError?)->()) {
// Create the audio composition
let composition = AVMutableComposition()
// Merge
for (var i = 0; i < audioFileUrls.count; i++) {
let compositionAudioTrack :AVMutableCompositionTrack = composition.addMutableTrackWithMediaType(AVMediaTypeAudio, preferredTrackID: CMPersistentTrackID())
let asset = AVURLAsset(URL: audioFileUrls[i] as! NSURL)
let track = asset.tracksWithMediaType(AVMediaTypeAudio)[0]
let timeRange = CMTimeRange(start: CMTimeMake(0, 600), duration: track.timeRange.duration)
try! compositionAudioTrack.insertTimeRange(timeRange, ofTrack: track, atTime: composition.duration)
}
// Create output url
let format = NSDateFormatter()
format.dateFormat="yyyy-MM-dd-HH-mm-ss"
let currentFileName = "recording-\(format.stringFromDate(NSDate()))-merge.m4a"
print(currentFileName)
let documentsDirectory = NSFileManager.defaultManager().URLsForDirectory(.DocumentDirectory, inDomains: .UserDomainMask)[0]
let outputUrl = documentsDirectory.URLByAppendingPathComponent(currentFileName)
print(outputUrl.absoluteString)
// Export it
let assetExport = AVAssetExportSession(asset: composition, presetName: AVAssetExportPresetAppleM4A)
assetExport?.outputFileType = AVFileTypeAppleM4A
assetExport?.outputURL = outputUrl
assetExport?.exportAsynchronouslyWithCompletionHandler({ () -> Void in
switch assetExport!.status {
case AVAssetExportSessionStatus.Failed:
callback(url: nil, error: assetExport?.error)
default:
callback(url: assetExport?.outputURL, error: nil)
}
})
}
I'm converting an mp3 to m4a in Swift with code based on this.
It works when I generate a PCM file. When I change the export format to m4a it generates a file but it won't play. Why is it corrupt?
Here is the code so far:
import AVFoundation
import UIKit
class ViewController: UIViewController {
var rwAudioSerializationQueue:dispatch_queue_t!
var asset:AVAsset!
var assetReader:AVAssetReader!
var assetReaderAudioOutput:AVAssetReaderTrackOutput!
var assetWriter:AVAssetWriter!
var assetWriterAudioInput:AVAssetWriterInput!
var outputURL:NSURL!
override func viewDidLoad() {
super.viewDidLoad()
let rwAudioSerializationQueueDescription = String(self) + " rw audio serialization queue"
// Create the serialization queue to use for reading and writing the audio data.
self.rwAudioSerializationQueue = dispatch_queue_create(rwAudioSerializationQueueDescription, nil)
let paths = NSSearchPathForDirectoriesInDomains(.DocumentDirectory, .UserDomainMask, true)
let documentsPath = paths[0]
print(NSBundle.mainBundle().pathForResource("input", ofType: "mp3"))
self.asset = AVAsset(URL: NSURL(fileURLWithPath: NSBundle.mainBundle().pathForResource("input", ofType: "mp3")! ))
self.outputURL = NSURL(fileURLWithPath: documentsPath + "/output.m4a")
print(self.outputURL)
// [self.asset loadValuesAsynchronouslyForKeys:#[#"tracks"] completionHandler:^{
self.asset.loadValuesAsynchronouslyForKeys(["tracks"], completionHandler: {
print("loaded")
var success = true
var localError:NSError?
success = (self.asset.statusOfValueForKey("tracks", error: &localError) == AVKeyValueStatus.Loaded)
// Check for success of loading the assets tracks.
//success = ([self.asset statusOfValueForKey:#"tracks" error:&localError] == AVKeyValueStatusLoaded);
if (success)
{
// If the tracks loaded successfully, make sure that no file exists at the output path for the asset writer.
let fm = NSFileManager.defaultManager()
let localOutputPath = self.outputURL.path
if (fm.fileExistsAtPath(localOutputPath!)) {
do {
try fm.removeItemAtPath(localOutputPath!)
success = true
} catch {
}
}
}
if (success) {
success = self.setupAssetReaderAndAssetWriter()
}
if (success) {
success = self.startAssetReaderAndWriter()
}
})
}
func setupAssetReaderAndAssetWriter() -> Bool {
do {
try self.assetReader = AVAssetReader(asset: self.asset)
} catch {
}
do {
try self.assetWriter = AVAssetWriter(URL: self.outputURL, fileType: AVFileTypeCoreAudioFormat)
} catch {
}
var assetAudioTrack:AVAssetTrack? = nil
let audioTracks = self.asset.tracksWithMediaType(AVMediaTypeAudio)
if (audioTracks.count > 0) {
assetAudioTrack = audioTracks[0]
}
if (assetAudioTrack != nil)
{
let decompressionAudioSettings:[String : AnyObject] = [
AVFormatIDKey:Int(kAudioFormatLinearPCM)
]
self.assetReaderAudioOutput = AVAssetReaderTrackOutput(track: assetAudioTrack!, outputSettings: decompressionAudioSettings)
self.assetReader.addOutput(self.assetReaderAudioOutput)
var channelLayout = AudioChannelLayout()
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
/*let compressionAudioSettings:[String : AnyObject] = [
AVFormatIDKey:Int(kAudioFormatMPEG4AAC) ,
AVEncoderBitRateKey:128000,
AVSampleRateKey:44100 ,
// AVEncoderBitRatePerChannelKey:16,
// AVEncoderAudioQualityKey:AVAudioQuality.High.rawValue,
AVNumberOfChannelsKey:2,
AVChannelLayoutKey: NSData(bytes:&channelLayout, length:sizeof(AudioChannelLayout))
]
var outputSettings:[String : AnyObject] = [
AVFormatIDKey: Int(kAudioFormatLinearPCM),
AVSampleRateKey: 44100,
AVNumberOfChannelsKey: 2,
AVChannelLayoutKey: NSData(bytes:&channelLayout, length:sizeof(AudioChannelLayout)),
AVLinearPCMBitDepthKey: 16,
AVLinearPCMIsNonInterleaved: false,
AVLinearPCMIsFloatKey: false,
AVLinearPCMIsBigEndianKey: false
]*/
let outputSettings:[String : AnyObject] = [
AVFormatIDKey: Int(kAudioFormatMPEG4AAC),
AVSampleRateKey: 44100,
AVNumberOfChannelsKey: 2,
AVChannelLayoutKey: NSData(bytes:&channelLayout, length:sizeof(AudioChannelLayout)) ]
self.assetWriterAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio, outputSettings: outputSettings)
self.assetWriter.addInput(self.assetWriterAudioInput)
}
return true
}
func startAssetReaderAndWriter() -> Bool {
self.assetWriter.startWriting()
self.assetReader.startReading()
self.assetWriter.startSessionAtSourceTime(kCMTimeZero)
self.assetWriterAudioInput.requestMediaDataWhenReadyOnQueue(self.rwAudioSerializationQueue, usingBlock: {
while (self.assetWriterAudioInput.readyForMoreMediaData ) {
var sampleBuffer = self.assetReaderAudioOutput.copyNextSampleBuffer()
if (sampleBuffer != nil) {
self.assetWriterAudioInput.appendSampleBuffer(sampleBuffer!)
sampleBuffer = nil
} else {
self.assetWriterAudioInput.markAsFinished()
self.assetReader.cancelReading()
print("done")
break
}
}
})
return true
}
}
Updated the source code in the question to Swift 4 and wrapped it in a class. Credit goes to Castles and Rythmic Fistman for original source code and answer. Left author's comments, added a few assertion's and print statements for debugging. Tested on iOS.
The bit rate for the output file is hardcoded at 96kb/s, you can easily override this value. Most of the audio files I'm converting are 320kb/s, so I'm using this class to compress the files for offline storage. Compression results at the bottom of this answer.
Usage:
let inputFilePath = URL(fileURLWithPath: "/path/to/file.mp3")
let outputFileURL = URL(fileURLWithPath: "/path/to/output/compressed.mp4")
if let audioConverter = AVAudioFileConverter(inputFileURL: inputFilePath, outputFileURL: outputFileURL) {
audioConverter.convert()
}
Class
import AVFoundation
final class AVAudioFileConverter {
var rwAudioSerializationQueue: DispatchQueue!
var asset:AVAsset!
var assetReader:AVAssetReader!
var assetReaderAudioOutput:AVAssetReaderTrackOutput!
var assetWriter:AVAssetWriter!
var assetWriterAudioInput:AVAssetWriterInput!
var outputURL:URL
var inputURL:URL
init?(inputFileURL: URL, outputFileURL: URL) {
inputURL = inputFileURL
outputURL = outputFileURL
if (FileManager.default.fileExists(atPath: inputURL.absoluteString)) {
print("Input file does not exist at file path \(inputURL.absoluteString)")
return nil
}
}
func convert() {
let rwAudioSerializationQueueDescription = " rw audio serialization queue"
// Create the serialization queue to use for reading and writing the audio data.
rwAudioSerializationQueue = DispatchQueue(label: rwAudioSerializationQueueDescription)
assert(rwAudioSerializationQueue != nil, "Failed to initialize Dispatch Queue")
asset = AVAsset(url: inputURL)
assert(asset != nil, "Error creating AVAsset from input URL")
print("Output file path -> ", outputURL.absoluteString)
asset.loadValuesAsynchronously(forKeys: ["tracks"], completionHandler: {
var success = true
var localError:NSError?
success = (self.asset.statusOfValue(forKey: "tracks", error: &localError) == AVKeyValueStatus.loaded)
// Check for success of loading the assets tracks.
if (success) {
// If the tracks loaded successfully, make sure that no file exists at the output path for the asset writer.
let fm = FileManager.default
let localOutputPath = self.outputURL.path
if (fm.fileExists(atPath: localOutputPath)) {
do {
try fm.removeItem(atPath: localOutputPath)
success = true
} catch {
print("Error trying to remove output file at path -> \(localOutputPath)")
}
}
}
if (success) {
success = self.setupAssetReaderAndAssetWriter()
} else {
print("Failed setting up Asset Reader and Writer")
}
if (success) {
success = self.startAssetReaderAndWriter()
return
} else {
print("Failed to start Asset Reader and Writer")
}
})
}
func setupAssetReaderAndAssetWriter() -> Bool {
do {
assetReader = try AVAssetReader(asset: asset)
} catch {
print("Error Creating AVAssetReader")
}
do {
assetWriter = try AVAssetWriter(outputURL: outputURL, fileType: AVFileType.m4a)
} catch {
print("Error Creating AVAssetWriter")
}
var assetAudioTrack:AVAssetTrack? = nil
let audioTracks = asset.tracks(withMediaType: AVMediaType.audio)
if (audioTracks.count > 0) {
assetAudioTrack = audioTracks[0]
}
if (assetAudioTrack != nil) {
let decompressionAudioSettings:[String : Any] = [
AVFormatIDKey:Int(kAudioFormatLinearPCM)
]
assetReaderAudioOutput = AVAssetReaderTrackOutput(track: assetAudioTrack!, outputSettings: decompressionAudioSettings)
assert(assetReaderAudioOutput != nil, "Failed to initialize AVAssetReaderTrackOutout")
assetReader.add(assetReaderAudioOutput)
var channelLayout = AudioChannelLayout()
memset(&channelLayout, 0, MemoryLayout<AudioChannelLayout>.size);
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
let outputSettings:[String : Any] = [
AVFormatIDKey: Int(kAudioFormatMPEG4AAC),
AVSampleRateKey: 44100,
AVEncoderBitRateKey: 96000,
AVNumberOfChannelsKey: 2,
AVChannelLayoutKey: NSData(bytes:&channelLayout, length:MemoryLayout<AudioChannelLayout>.size)]
assetWriterAudioInput = AVAssetWriterInput(mediaType: AVMediaType.audio, outputSettings: outputSettings)
assert(rwAudioSerializationQueue != nil, "Failed to initialize AVAssetWriterInput")
assetWriter.add(assetWriterAudioInput)
}
print("Finsihed Setup of AVAssetReader and AVAssetWriter")
return true
}
func startAssetReaderAndWriter() -> Bool {
print("STARTING ASSET WRITER")
assetWriter.startWriting()
assetReader.startReading()
assetWriter.startSession(atSourceTime: kCMTimeZero)
assetWriterAudioInput.requestMediaDataWhenReady(on: rwAudioSerializationQueue, using: {
while(self.assetWriterAudioInput.isReadyForMoreMediaData ) {
var sampleBuffer = self.assetReaderAudioOutput.copyNextSampleBuffer()
if(sampleBuffer != nil) {
self.assetWriterAudioInput.append(sampleBuffer!)
sampleBuffer = nil
} else {
self.assetWriterAudioInput.markAsFinished()
self.assetReader.cancelReading()
self.assetWriter.finishWriting {
print("Asset Writer Finished Writing")
}
break
}
}
})
return true
}
}
Input File: 17.3 MB
// generated with afinfo on mac
File: D290A73C37B777F1.mp3
File type ID: MPG3
Num Tracks: 1
----
Data format: 2 ch, 44100 Hz, '.mp3' (0x00000000) 0 bits/channel, 0 bytes/packet, 1152 frames/packet, 0 bytes/frame
no channel layout.
estimated duration: 424.542025 sec
audio bytes: 16981681
audio packets: 16252
bit rate: 320000 bits per second
packet size upper bound: 1052
maximum packet size: 1045
audio data file offset: 322431
optimized
audio 18720450 valid frames + 576 priming + 1278 remainder = 18722304
----
Output File: 5.1 MB
// generated with afinfo on Mac
File: compressed.m4a
File type ID: m4af
Num Tracks: 1
----
Data format: 2 ch, 44100 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Stereo (L R)
estimated duration: 424.542041 sec
audio bytes: 5019294
audio packets: 18286
bit rate: 94569 bits per second
packet size upper bound: 763
maximum packet size: 763
audio data file offset: 44
not optimized
audio 18722304 valid frames + 2112 priming + 448 remainder = 18724864
format list:
[ 0] format: 2 ch, 44100 Hz, 'aac ' (0x00000000) 0 bits/channel, 0 bytes/packet, 1024 frames/packet, 0 bytes/frame
Channel layout: Stereo (L R)
----
update
You're creating a caf file instead of an m4a.
Replace AVFileTypeCoreAudioFormat with AVFileTypeAppleM4A in
AVAssetWriter(URL: self.outputURL, fileType: AVFileTypeCoreAudioFormat)
Call self.assetWriter.finishWritingWithCompletionHandler() when you've finished.
I have a Swift class that takes an image and creates a 10 second video of that image.
I create the video using AVAssetWriter.
It outputs a ten seconds video. This all works as expected.
func setup(){
//Setup video size
self.videoSize = self.getVideoSizeForImage(self.videoImage!)
//Setup temp video path
self.tempVideoPath = self.getTempVideoPath()
//Setup video writer
var videoWriter = AVAssetWriter(
URL: NSURL(fileURLWithPath: self.tempVideoPath!),
fileType: AVFileTypeMPEG4,
error: nil)
//Setup video writer input
let videoSettings = [
AVVideoCodecKey: AVVideoCodecH264,
AVVideoWidthKey: self.videoSize!.width,
AVVideoHeightKey: self.videoSize!.height
]
let videoWriterInput = AVAssetWriterInput(
mediaType: AVMediaTypeVideo,
outputSettings: videoSettings as [NSObject : AnyObject]
)
videoWriterInput.expectsMediaDataInRealTime = true
//Setup video writer adaptor
let adaptor = AVAssetWriterInputPixelBufferAdaptor(
assetWriterInput: videoWriterInput,
sourcePixelBufferAttributes: [
kCVPixelBufferPixelFormatTypeKey : kCVPixelFormatType_32ARGB,
]
)
videoWriter.addInput(videoWriterInput)
//Setup frame time
self.currentFrameTime = CMTimeMake(0, 10000)
//Start video writing session
videoWriter.startWriting()
videoWriter.startSessionAtSourceTime(self.currentFrameTime!)
self.currentVideoWriter = videoWriter
self.currentVideoWriterAdaptor = adaptor
}
func addImageToVideoWriter(image:UIImage, duration:CGFloat){
//Get image pixel buffer
let buffer = self.pixelBufferFromImage(image)
var frameTime = self.currentFrameTime!
//Add pixel buffer to video
let adaptor = self.currentVideoWriterAdaptor!
while(!adaptor.assetWriterInput.readyForMoreMediaData){}
if adaptor.assetWriterInput.readyForMoreMediaData{
adaptor.appendPixelBuffer(
buffer,
withPresentationTime: frameTime
)
let seconds = CGFloat(CMTimeGetSeconds(frameTime))+duration
var timescale = frameTime.timescale
var value = CGFloat(timescale) * seconds
frameTime.value = Int64(value)
frameTime.timescale = Int32(timescale)
self.currentFrameTime = frameTime
self.lastImage = image
}
}
func pixelBufferFromImage(image:UIImage) -> CVPixelBufferRef{
let size = image.size
var pixelBuffer: Unmanaged<CVPixelBuffer>?
let bufferOptions:CFDictionary = [
"kCVPixelBufferCGBitmapContextCompatibilityKey": NSNumber(bool: true),
"kCVPixelBufferCGImageCompatibilityKey": NSNumber(bool: true)
]
let status:CVReturn = CVPixelBufferCreate(
nil,
Int(size.width),
Int(size.height),
OSType(kCVPixelFormatType_32ARGB),
bufferOptions,
&pixelBuffer
)
let managedPixelBuffer = pixelBuffer!.takeRetainedValue()
let lockStatus = CVPixelBufferLockBaseAddress(managedPixelBuffer, 0)
let pixelData = CVPixelBufferGetBaseAddress(managedPixelBuffer)
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.PremultipliedFirst.rawValue)
let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
let context = CGBitmapContextCreate(
pixelData,
Int(size.width),
Int(size.height),
8,
Int(4 * size.width),
rgbColorSpace,
bitmapInfo
)
CGContextDrawImage(context, CGRectMake(0, 0, size.width, size.height), image.CGImage)
CVPixelBufferUnlockBaseAddress(managedPixelBuffer, 0)
return managedPixelBuffer
}
func saveVideoWriterToDisk(){
self.addImageToVideoWriter(self.lastImage!, duration: 0)
self.currentVideoWriter?.endSessionAtSourceTime(self.currentFrameTime!)
self.currentVideoWriterAdaptor?.assetWriterInput.markAsFinished()
//Write video to disk
let semaphore = dispatch_semaphore_create(0)
self.currentVideoWriter?.finishWritingWithCompletionHandler({
dispatch_semaphore_signal(semaphore)
})
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER)
}
I then use AVAssetExportSession and AVMutableComposition to add music to that video. This works most of the time.
func exportVideo(sourceFilePath:String, destinationFilePath:String) -> Bool{
let fileManager = NSFileManager()
var success = false
//Compile audio and video together
let composition = AVMutableComposition()
/*
//Setup audio track
let trackAudio:AVMutableCompositionTrack = composition.addMutableTrackWithMediaType(
AVMediaTypeAudio,
preferredTrackID: CMPersistentTrackID()
)
//Add audio file to audio track
let audioFileAsset = AVURLAsset(URL:self.audioFileURL, options:nil)
let audioFileAssetTrack = audioFileAsset.tracksWithMediaType(AVMediaTypeAudio).last as! AVAssetTrack
trackAudio.insertTimeRange(
audioFileAssetTrack.timeRange,
ofTrack: audioFileAssetTrack,
atTime: kCMTimeZero,
error: nil
)*/
//Setup video track
let trackVideo:AVMutableCompositionTrack = composition.addMutableTrackWithMediaType(
AVMediaTypeVideo,
preferredTrackID: CMPersistentTrackID()
)
//Add video file to video track
let videoFileAsset = AVURLAsset(URL: NSURL(fileURLWithPath: sourceFilePath), options: nil)
let videoFileAssetTracks = videoFileAsset.tracksWithMediaType(AVMediaTypeVideo)
if videoFileAssetTracks.count > 0{
let videoFileAssetTrack:AVAssetTrack = videoFileAssetTracks[0] as! AVAssetTrack
trackVideo.insertTimeRange(
videoFileAssetTrack.timeRange,
ofTrack: videoFileAssetTrack,
atTime: kCMTimeZero,
error: nil
)
}
//Export compiled video to disk
if fileManager.fileExistsAtPath(destinationFilePath){
fileManager.removeItemAtPath(destinationFilePath, error: nil)
}
let exporter = AVAssetExportSession(
asset: composition,
presetName: AVAssetExportPresetHighestQuality
)
exporter.outputFileType = AVFileTypeMPEG4
exporter.outputURL = NSURL(fileURLWithPath: destinationFilePath)
let semaphore = dispatch_semaphore_create(0)
exporter.exportAsynchronouslyWithCompletionHandler({
success = true
dispatch_semaphore_signal(semaphore)
})
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER)
//Delete file at source path
fileManager.removeItemAtPath(sourceFilePath, error: nil)
return success
}
However.
When the image is quite large (in terms of resolution, i.e. 1600*1600 and larger) the video created with AVAssetExportSession is 8 seconds instead of 10. The time difference only occurs when using large images.
It's not a huge problem. As it is highly unlikely people will be using this class with images that large.
However I would still like to know what's going on and how to fix it.