I have a Movie class that has an array of UIImage that I write to an H264 file. It works but occasionally it will write a video file that does not contain any content, but the file size is still set to a non-zero size.
Here is the code where I do the writing. I am pretty new to iOS development so this was copied from the internet so I may not fully understand what it is doing. Hopefully someone can suggest a better way of doing this.
frame is a UIImage instance in the loop
func writeAnimationToMovie() {
var error: NSError?
writer.startWriting()
writer.startSessionAtSourceTime(kCMTimeZero)
var buffer: CVPixelBufferRef
var frameCount = 0
for frame in self.photos {
buffer = createPixelBufferFromCGImage(frame.CGImage)
var appendOk = false
var j = 0
while (!appendOk && j < 30) {
if pixelBufferAdaptor.assetWriterInput.readyForMoreMediaData {
let frameTime = CMTimeMake(Int64(frameCount), Int32(fps))
appendOk = pixelBufferAdaptor.appendPixelBuffer(buffer, withPresentationTime: frameTime)
// appendOk will always be false
NSThread.sleepForTimeInterval(0.05)
} else {
NSThread.sleepForTimeInterval(0.1)
}
j++
}
if (!appendOk) {
println("Doh, frame \(frame) at offset \(frameCount) failed to append")
}
frameCount++
}
input.markAsFinished()
writer.finishWritingWithCompletionHandler({
if self.writer.status == AVAssetWriterStatus.Failed {
println("oh noes, an error: \(self.writer.error.description)")
} else {
let content = NSFileManager.defaultManager().contentsAtPath(self.fileURL.path!)
println("wrote video: \(self.fileURL.path) at size: \(content?.length)")
}
})
}
func createPixelBufferFromCGImage(image: CGImageRef) -> CVPixelBufferRef {
let options = [
"kCVPixelBufferCGImageCompatibilityKey": true,
"kCVPixelBufferCGBitmapContextCompatibilityKey": true
]
let frameSize = CGSizeMake(CGFloat(CGImageGetWidth(image)), CGFloat(CGImageGetHeight(image)))
var pixelBufferPointer = UnsafeMutablePointer<Unmanaged<CVPixelBuffer>?>.alloc(1)
var status:CVReturn = CVPixelBufferCreate(
kCFAllocatorDefault,
UInt(frameSize.width),
UInt(frameSize.height),
OSType(kCVPixelFormatType_32ARGB),
options,
pixelBufferPointer
)
var lockStatus:CVReturn = CVPixelBufferLockBaseAddress(pixelBufferPointer.memory?.takeUnretainedValue(), 0)
var pxData:UnsafeMutablePointer<(Void)> = CVPixelBufferGetBaseAddress(pixelBufferPointer.memory?.takeUnretainedValue())
let bitmapinfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.NoneSkipFirst.rawValue)
let rgbColorSpace:CGColorSpace = CGColorSpaceCreateDeviceRGB()
var context:CGContextRef = CGBitmapContextCreate(
pxData,
UInt(frameSize.width),
UInt(frameSize.height),
8,
4 * CGImageGetWidth(image),
rgbColorSpace,
bitmapinfo
)
CGContextDrawImage(context, CGRectMake(0, 0, frameSize.width, frameSize.height), image)
CVPixelBufferUnlockBaseAddress(pixelBufferPointer.memory?.takeUnretainedValue(), 0)
return pixelBufferPointer.memory!.takeUnretainedValue()
}
EDIT The files that do not seem to have any content won't play like the other movies that work. If I view a working file with exiftool this is what I get
$ exiftool 20242697651186-o.mp4
ExifTool Version Number : 9.76
File Name : 20242697651186-o.mp4
Directory : .
File Size : 74 kB
File Modification Date/Time : 2014:12:05 11:07:29-05:00
File Access Date/Time : 2014:12:08 10:12:29-05:00
File Inode Change Date/Time : 2014:12:05 11:07:29-05:00
File Permissions : rw-r--r--
File Type : MP4
MIME Type : video/mp4
Major Brand : MP4 v2 [ISO 14496-14]
Minor Version : 0.0.1
Compatible Brands : mp41, mp42, isom
Movie Data Size : 74741
Movie Data Offset : 44
Movie Header Version : 0
Create Date : 2014:12:05 16:07:32
Modify Date : 2014:12:05 16:07:32
Time Scale : 600
Duration : 0.50 s
Preferred Rate : 1
Preferred Volume : 100.00%
Preview Time : 0 s
Preview Duration : 0 s
Poster Time : 0 s
Selection Time : 0 s
Selection Duration : 0 s
Current Time : 0 s
Next Track ID : 2
Track Header Version : 0
Track Create Date : 2014:12:05 16:07:32
Track Modify Date : 2014:12:05 16:07:32
Track ID : 1
Track Duration : 0.50 s
Track Layer : 0
Track Volume : 0.00%
Matrix Structure : 1 0 0 0 1 0 0 0 1
Image Width : 640
Image Height : 480
Media Header Version : 0
Media Create Date : 2014:12:05 16:07:32
Media Modify Date : 2014:12:05 16:07:32
Media Time Scale : 600
Media Duration : 0.50 s
Media Language Code : und
Handler Type : Video Track
Handler Description : Core Media Video
Graphics Mode : srcCopy
Op Color : 0 0 0
Compressor ID : avc1
Source Image Width : 640
Source Image Height : 480
X Resolution : 72
Y Resolution : 72
Bit Depth : 24
Video Frame Rate : 8
Avg Bitrate : 1.2 Mbps
Image Size : 640x480
Rotation : 0
And here is a file that doesn't work. It doesn't have all that metadata in it
$ exiftool 20242891987099-o.mp4
ExifTool Version Number : 9.76
File Name : 20242891987099-o.mp4
Directory : .
File Size : 75 kB
File Modification Date/Time : 2014:12:05 11:07:37-05:00
File Access Date/Time : 2014:12:08 10:12:36-05:00
File Inode Change Date/Time : 2014:12:05 11:07:37-05:00
File Permissions : rw-r--r--
File Type : MP4
MIME Type : video/mp4
Major Brand : MP4 v2 [ISO 14496-14]
Minor Version : 0.0.1
Compatible Brands : mp41, mp42, isom
Movie Data Size : 76856
Movie Data Offset : 44
You have a few issues. I'm not sure exactly what you mean by "does not contain any content", but hopefully one of these will help (and if not, you should implement them anyway):
You're blocking the thread with your calls to sleepForTimeInterval(), which could cause the problem. This answer suggests moving the run loop along instead of sleeping, which is a slightly better solution, but the readyForMoreMediaData documentation has an even better suggestion:
This property is observable using key-value observing (see Key-Value Observing Programming Guide). Observers should not assume that they will be notified of changes on a specific thread.
Instead of running a loop and asking if it's available, just get the object to tell you when it's ready for more using KVO.
In your createPixelBufferFromCGImage method, you're not checking for failure at any point. For example, you should handle the possibility that pixelBufferPointer.memory? is nil.
Basically, I'm hypothesizing that one of these things is happening:
j hits 30 and nothing's been written yet because a thread is blocked. In this case you write an empty file with the correct file size.
createPixelBufferFromCGImage is returning unexpected data, which you're then writing to disk
Related
Thanks for AudioKit!
I'm a beginner in Swift and AudioKit, so this may be an easy question:
I want to build a metronome app. From the example in Audiokit's new CookBook and the old AKMetronome(), I can see how to build a simple metronome using AudioKit. But I don't know how to play beats with compound time signatures (3/8, 7/8, etc.). Both examples here use a time signature with 4 as a fixed bottom number and the top number can be changed (i.e. we can have 1/4, 3/4, 6/4 but not 3/8, 6/8).
Is there a way to change the bottom number?
Link for AKMetronome: https://audiokit.io/docs/Classes/AKMetronome.html#/s:8AudioKit11AKMetronomeC5resetyyF
AudioKit Cookbook's Shaker Metronome:
https://github.com/AudioKit/Cookbook/blob/main/Cookbook/Cookbook/Recipes/Shaker.swift
I made some changes to the Shaker Metronome's code to illustrate how you could create a metronome that plays different time signatures such as 6/8, 5/8, 7/8, and so on.
First I added some information to the ShakerMetronomeData structure:
enum Figure: Double {
case quarter = 1.0
case eighth = 0.5
}
struct ShakerMetronomeData {
var isPlaying = false
var tempo: BPM = 120
var timeSignatureTop: Int = 4
var downbeatNoteNumber = MIDINoteNumber(6)
var beatNoteNumber = MIDINoteNumber(10)
var beatNoteVelocity = 100.0
var currentBeat = 0
var figure: Figure = .quarter
var pattern: [Int] = [4]
}
Then, the part of the updateSequences function that plays the metronome clicks would become:
func updateSequences() {
var track = sequencer.tracks.first!
track.length = Double(data.timeSignatureTop)
track.clear()
var startTime: Double = 0.0
for numberOfBeatsInGroup in data.pattern {
track.sequence.add(noteNumber: data.downbeatNoteNumber, position: 0.0, duration: 0.4)
startTime += data.figure.rawValue
let vel = MIDIVelocity(Int(data.beatNoteVelocity))
for beat in 1 ..< numberOfBeatsInGroup {
track.sequence.add(noteNumber: data.beatNoteNumber, velocity: vel, position: startTime, duration: 0.1)
startTime += data.figure.rawValue
}
}
}
These would be the values of the figure and pattern members of the structure for different time signatures:
/*
Time signature Figure Pattern
4/4 .quarter [4]
3/4 .quarter [3]
6/8 .eighth [3,3]
5/8 .eighth [5]
7/8 .eighth [2,2,3]
*/
Please note I haven't tested this code, but it illustrates how you could play beats with a compound time signature.
This could be improved by having three different sounds instead of two: one for the start of each bar, one for the beginning of each group, and one for the downbeats.
Let me see if I understood it correctly.
At the present most advanced hardware, iOS allows me to record at the following fps: 30, 60, 120 and 240.
But these fps behave differently. If I shoot at 30 or 60 fps, I expect the videos files created from shooting at these fps to play at 30 and 60 fps respectively.
But if I shoot at 120 or 240 fps, I expect the video files creating from shooting at these fps to play at 30 fps, or I will not see the slow motion.
A few questions:
am I right?
is there a way to shoot at 120 or 240 fps and play at 120 and 240 fps respectively? I mean play at the fps the videos were shoot without slo-mo?
How do I control that framerate when I write the file?
I am creating the AVAssetWriter input like this...
NSDictionary *videoCompressionSettings = #{AVVideoCodecKey : AVVideoCodecH264,
AVVideoWidthKey : #(videoWidth),
AVVideoHeightKey : #(videoHeight),
AVVideoCompressionPropertiesKey : #{ AVVideoAverageBitRateKey : #(bitsPerSecond),
AVVideoMaxKeyFrameIntervalKey : #(1)}
};
_assetWriterVideoInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeVideo outputSettings:videoCompressionSettings];
and there is no apparent way to control that.
NOTE: I have tried different numbers where that 1 is. I have tried 1.0/fps, I have tried fps and I have removed the key. No difference.
This is how I setup `AVAssetWriter:
AVAssetWriter *newAssetWriter = [[AVAssetWriter alloc] initWithURL:_movieURL fileType:AVFileTypeQuickTimeMovie
error:&error];
_assetWriter = newAssetWriter;
_assetWriter.shouldOptimizeForNetworkUse = NO;
CGFloat videoWidth = size.width;
CGFloat videoHeight = size.height;
NSUInteger numPixels = videoWidth * videoHeight;
NSUInteger bitsPerSecond;
// Assume that lower-than-SD resolutions are intended for streaming, and use a lower bitrate
// if ( numPixels < (640 * 480) )
// bitsPerPixel = 4.05; // This bitrate matches the quality produced by AVCaptureSessionPresetMedium or Low.
// else
NSUInteger bitsPerPixel = 11.4; // This bitrate matches the quality produced by AVCaptureSessionPresetHigh.
bitsPerSecond = numPixels * bitsPerPixel;
NSDictionary *videoCompressionSettings = #{AVVideoCodecKey : AVVideoCodecH264,
AVVideoWidthKey : #(videoWidth),
AVVideoHeightKey : #(videoHeight),
AVVideoCompressionPropertiesKey : #{ AVVideoAverageBitRateKey : #(bitsPerSecond)}
};
if (![_assetWriter canApplyOutputSettings:videoCompressionSettings forMediaType:AVMediaTypeVideo]) {
NSLog(#"Couldn't add asset writer video input.");
return;
}
_assetWriterVideoInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeVideo
outputSettings:videoCompressionSettings
sourceFormatHint:formatDescription];
_assetWriterVideoInput.expectsMediaDataInRealTime = YES;
NSDictionary *adaptorDict = #{
(id)kCVPixelBufferPixelFormatTypeKey : #(kCVPixelFormatType_32BGRA),
(id)kCVPixelBufferWidthKey : #(videoWidth),
(id)kCVPixelBufferHeightKey : #(videoHeight)
};
_pixelBufferAdaptor = [[AVAssetWriterInputPixelBufferAdaptor alloc]
initWithAssetWriterInput:_assetWriterVideoInput
sourcePixelBufferAttributes:adaptorDict];
// Add asset writer input to asset writer
if (![_assetWriter canAddInput:_assetWriterVideoInput]) {
return;
}
[_assetWriter addInput:_assetWriterVideoInput];
captureOutput method is very simple. I get the image from the filter and write it to file using:
if (videoJustStartWriting)
[_assetWriter startSessionAtSourceTime:presentationTime];
CVPixelBufferRef renderedOutputPixelBuffer = NULL;
OSStatus err = CVPixelBufferPoolCreatePixelBuffer(nil,
_pixelBufferAdaptor.pixelBufferPool,
&renderedOutputPixelBuffer);
if (err) return; // NSLog(#"Cannot obtain a pixel buffer from the buffer pool");
//_ciContext is a metal context
[_ciContext render:finalImage
toCVPixelBuffer:renderedOutputPixelBuffer
bounds:[finalImage extent]
colorSpace:_sDeviceRgbColorSpace];
[self writeVideoPixelBuffer:renderedOutputPixelBuffer
withInitialTime:presentationTime];
- (void)writeVideoPixelBuffer:(CVPixelBufferRef)pixelBuffer withInitialTime:(CMTime)presentationTime
{
if ( _assetWriter.status == AVAssetWriterStatusUnknown ) {
// If the asset writer status is unknown, implies writing hasn't started yet, hence start writing with start time as the buffer's presentation timestamp
if ([_assetWriter startWriting]) {
[_assetWriter startSessionAtSourceTime:presentationTime];
}
}
if ( _assetWriter.status == AVAssetWriterStatusWriting ) {
// If the asset writer status is writing, append sample buffer to its corresponding asset writer input
if (_assetWriterVideoInput.readyForMoreMediaData) {
if (![_pixelBufferAdaptor appendPixelBuffer:pixelBuffer withPresentationTime:presentationTime]) {
NSLog(#"error", [_assetWriter.error localizedFailureReason]);
}
}
}
if ( _assetWriter.status == AVAssetWriterStatusFailed ) {
NSLog(#"failed");
}
}
I put the whole thing to shoot at 240 fps. These are presentation times of frames being appended.
time ======= 113594.311510508
time ======= 113594.324011508
time ======= 113594.328178716
time ======= 113594.340679424
time ======= 113594.344846383
if you do some calculation between them you will see that the framerate is about 240 fps. So the frames are being stored with the correct time.
But when I watch the video the movement is not in slow motion and quick time says the video is 30 fps.
Note: this app grabs frames from the camera, the frames goes into CIFilters and the result of those filters is converted back to a sample buffer that is stored to file and displayed on the screen.
I'm reaching here, but I think this is where you're going wrong. Think of your video capture as a pipeline.
(1) Capture buffer -> (2) Do Something With buffer -> (3) Write buffer as frames in video.
Sounds like you've successfully completed (1) and (2), you're getting the buffer fast enough and you're processing them so you can vend them as frames.
The problem is almost certainly in (3) writing the video frames.
https://developer.apple.com/reference/avfoundation/avmutablevideocomposition
Check out the frameDuration setting in your AVMutableComposition, you'll need something like CMTime(1, 60) //60FPS or CMTime(1, 240) // 240FPS to get what you're after (telling the video to WRITE this many frames and encode at this rate).
Using AVAssetWriter, it's exactly the same principle but you set the frame rate as a property in the AVAssetWriterInput outputSettings adding in the AVVideoExpectedSourceFrameRateKey.
NSDictionary *videoCompressionSettings = #{AVVideoCodecKey : AVVideoCodecH264,
AVVideoWidthKey : #(videoWidth),
AVVideoHeightKey : #(videoHeight),
AVVideoExpectedSourceFrameRateKey : #(60),
AVVideoCompressionPropertiesKey : #{ AVVideoAverageBitRateKey : #(bitsPerSecond),
AVVideoMaxKeyFrameIntervalKey : #(1)}
};
To expand a little more - you can't strictly control or sync your camera capture exactly to the output / playback rate, the timing just doesn't work that way and isn't that exact, and of course the processing pipeline adds overhead. When you capture frames they are time stamped, which you've seen, but in the writing / compression phase, it's using only the frames it needs to produce the output specified for the composition.
It goes both ways, you could capture only 30 FPS and write out at 240 FPS, the video would display fine, you'd just have a lot of frames "missing" and being filled in by the algorithm. You can even vend only 1 frame per second and play back at 30FPS, the two are separate from each other (how fast I capture Vs how many frames and what I present per second)
As to how to play it back at different speed, you just need to tweak the playback speed - slow it down as needed.
If you've correctly set the time base (frameDuration), it will always play back "normal" - you're telling it "play back is X Frames Per Second", of course, your eye may notice a difference (almost certainly between low FPS and high FPS), and the screen may not refresh that high (above 60FPS), but regardless the video will be at a "normal" 1X speed for it's timebase. By slowing the video, if my timebase is 120, and I slow it to .5x I know effectively see 60FPS and one second of playback takes two seconds.
You control the playback speed by setting the rate property on AVPlayer https://developer.apple.com/reference/avfoundation/avplayer
The iOS screen refresh is locked at 60fps, so the only way to "see" the extra frames is, as you say, to slow down the playback rate, a.k.a slow motion.
So
yes, you are right
the screen refresh rate (and perhaps limitations of the human visual system, assuming you're human?) means that you cannot perceive 120 & 240fps frame rates. You can play them at normal speed by downsampling to the screen refresh rate. Surely this is what AVPlayer already does, although I'm not sure if that's the answer you're looking for.
you control the framerate of the file when you write it with the CMSampleBuffer presentation timestamps. If your frames are coming from the camera, you're probably passing the timestamps straight through, in which case check that you really are getting the framerate you asked for (a log statement in your capture callback should be enough to verify this). If you're procedurally creating frames, then you choose the presentation timestamps so that they're spaced 1.0/desiredFrameRate seconds apart!
Is 3. not working for you?
p.s. you can discard & ignore AVVideoMaxKeyFrameIntervalKey - it's a quality setting and has nothing to do with playback framerate.
After heavy usage of my app which running AVCaptureSession instance It's suffering
DroppedFrameReason(P) = OutOfBuffers
This is the details from SampleBuffer object in - (void)captureOutput:(AVCaptureOutput *)captureOutput didDropSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
CMSampleBuffer 0x10de70770 retainCount: 1 allocator: 0x1b45e2bb8
invalid = NO
dataReady = YES
makeDataReadyCallback = 0x0
makeDataReadyRefcon = 0x0
buffer-level attachments:
DroppedFrameReason(P) = OutOfBuffers
formatDescription = <CMVideoFormatDescription 0x174441e90 [0x1b45e2bb8]> {
mediaType:'vide'
mediaSubType:'BGRA'
mediaSpecific: {
codecType: 'BGRA' dimensions: 480 x 360
}
extensions: {<CFBasicHash 0x174a61100 [0x1b45e2bb8]>{type = immutable dict, count = 5,
entries =>
0 : <CFString 0x1ae9fa7c8 [0x1b45e2bb8]>{contents = "CVImageBufferYCbCrMatrix"} = <CFString 0x1ae9fa808 [0x1b45e2bb8]>{contents = "ITU_R_601_4"}
1 : <CFString 0x1ae9fa928 [0x1b45e2bb8]>{contents = "CVImageBufferTransferFunction"} = <CFString 0x1ae9fa7e8 [0x1b45e2bb8]>{contents = "ITU_R_709_2"}
2 : <CFString 0x1aea2c3e0 [0x1b45e2bb8]>{contents = "CVBytesPerRow"} = <CFNumber 0xb000000000007802 [0x1b45e2bb8]>{value = +1920, type = kCFNumberSInt32Type}
3 : <CFString 0x1aea2c460 [0x1b45e2bb8]>{contents = "Version"} = <CFNumber 0xb000000000000022 [0x1b45e2bb8]>{value = +2, type = kCFNumberSInt32Type}
5 : <CFString 0x1ae9fa8a8 [0x1b45e2bb8]>{contents = "CVImageBufferColorPrimaries"} = <CFString 0x1ae9fa7e8 [0x1b45e2bb8]>{contents = "ITU_R_709_2"}
}
}
}
sbufToTrackReadiness = 0x0
numSamples = 0
sampleTimingArray[1] = {
{PTS = {3825121221333/1000000000 = 3825.121}, DTS = {INVALID}, duration = {INVALID}},
}
dataBuffer = 0x0
I did some digging and found This
The module providing sample buffers has run out of source buffers.
This condition is typically caused by the client holding onto buffers
for too long and can be alleviated by returning buffers to the
provider.
What do they mean by : returning buffers to the provider ??
Is there any fix I can do ?
Came across this recently and found a solution after reading this post. I figure it's worth sharing.
Apple's documentation at the link provided in the OP is pretty non-specific in what they mean by "holding on to buffers" and "provider," but here's what they mean.
The provider is the VideoOutput object that is sending you CMSampleBuffer's through its AVCaptureVideoDataOutputSampleBufferDelegate method:
func captureOutput(
_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection
) {
//do stuff with frames here
}
However, in Apple's documentation, it says that there is a finite number of sampleBuffers it can hold in memory at once. Meaning that if you hold onto the sample buffer longer than it takes for the next frame to come in, you're going to run into a bottle neck where when the SampleBuffer memory is filled up. It has to wait for one of the old ones it sent out to get pulled off the stack and deallocated in whatever process it's being executed with.
So as an example, if you're pulling in frames at 60fps, and you hold onto frames in a process that takes longer than 17ms, you're going to have a frame reduction.
You should either figure out a way to execute your tasks with the frames more efficiently, or like us (when using CoreML) figure out a way to make your process work with less frames. Meaning you only send frames out at a fraction of the rate they come in. We were able to make our models work with a framerate of roughly 10 fps so we only sent out one every 6 frames on the rear facing camera.
In fairness, Apple does say this is usually the culprit for any frame drops in that documentation post, but they aren't good at communicating what that means exactly.
I have noticed that in Swift 5 & Xcode 12.4 & iOS 14.3:
I was using CVMetalTextureCacheCreateTextureFromImage to create textures from the capture session's CVPixelBuffer and that would cause the outofbuffers error after 10-ish reads. It seems to be a question of whether the textures exist perpetually or for too long and then the buffer pool then overflows.
Curiously, if I set the newly created metalTexture to nil directly after reading it, the error goes away, presumably due to allowing the memory to deallocate sooner. So, it may be possible to copy the texture & then set the original one to nil to avoid this issue. Still looking into it...
in here you actually mentioned the problem and the answer
"This condition is typically caused by the client holding onto buffers for too long and can be alleviated by returning buffers to the provider."
I have almost no knowledge in signal-processing and currently I'm trying to implement a function in Swift that triggers an event when there is an increase in the sound pressure level (e.g. when a human screams).
I am tapping into an input node of an AVAudioEngine with a callback like this:
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat){
(buffer : AVAudioPCMBuffer?, when : AVAudioTime) in
let arraySize = Int(buffer.frameLength)
let samples = Array(UnsafeBufferPointer(start: buffer.floatChannelData![0], count:arraySize))
//do something with samples
let volume = 20 * log10(floatArray.reduce(0){ $0 + $1} / Float(arraySize))
if(!volume.isNaN){
print("this is the current volume: \(volume)")
}
}
After turning it into a float array I tried just getting a rough estimation of the sound pressure level by computing the mean.
But this gives me values that fluctuate a lot even when the iPad was just sitting in a quite room:
this is the current volume: -123.971
this is the current volume: -119.698
this is the current volume: -147.053
this is the current volume: -119.749
this is the current volume: -118.815
this is the current volume: -123.26
this is the current volume: -118.953
this is the current volume: -117.273
this is the current volume: -116.869
this is the current volume: -110.633
this is the current volume: -130.988
this is the current volume: -119.475
this is the current volume: -116.422
this is the current volume: -158.268
this is the current volume: -118.933
There is indeed an significant increase in this value if I clap near the microphone.
So I can do something like first computing a mean of these volumes during the preparing phase, and comparing if there is a significant increase in the difference during the event-triggering phase:
if(!volume.isNaN){
if(isInThePreparingPhase){
print("this is the current volume: \(volume)")
volumeSum += volume
volumeCount += 1
}else if(isInTheEventTriggeringPhase){
if(volume > meanVolume){
//triggers an event
}
}
}
where averageVolume is computed during the transition from the preparing phase to the triggering event phase: meanVolume = volumeSum / Float(volumeCount)
....
However, there appears to be no significant increases if I play loud music besides the microphone. And on rare occasion, volume is greater than meanVolume even when the environment has no significant increase in volume (audible to the human ears).
So what is the proper way of extracting the sound pressure level from AVAudioPCMBuffer?
The wikipedia gives a formula like this
with p being the root mean square sound pressure and p0 being the reference sound pressure.
But I have no ideas what the float values in AVAudioPCMBuffer.floatChannelData represent. The apple page only says
The buffer's audio samples as floating point values.
How should I work with them?
Thanks to the response from #teadrinker I finally find out a solution for this problem. I share my Swift code that outputs the volume of the AVAudioPCMBuffer input:
private func getVolume(from buffer: AVAudioPCMBuffer, bufferSize: Int) -> Float {
guard let channelData = buffer.floatChannelData?[0] else {
return 0
}
let channelDataArray = Array(UnsafeBufferPointer(start:channelData, count: bufferSize))
var outEnvelope = [Float]()
var envelopeState:Float = 0
let envConstantAtk:Float = 0.16
let envConstantDec:Float = 0.003
for sample in channelDataArray {
let rectified = abs(sample)
if envelopeState < rectified {
envelopeState += envConstantAtk * (rectified - envelopeState)
} else {
envelopeState += envConstantDec * (rectified - envelopeState)
}
outEnvelope.append(envelopeState)
}
// 0.007 is the low pass filter to prevent
// getting the noise entering from the microphone
if let maxVolume = outEnvelope.max(),
maxVolume > Float(0.015) {
return maxVolume
} else {
return 0.0
}
}
I think the first step is to get the envelope of the sound. You could use simple averaging to calculate an envelope, but you need to add a rectification step (usually means using abs() or square() to make all samples positive)
More commonly a simple iir-filter is used instead of averaging, with different constants for attack and decay, here is a lab. Note that these constants depend on the sampling frequency, you can use this formula to calculate the constants:
1 - exp(-timePerSample*2/smoothingTime)
Step 2
When you have the envelope, you can smooth it with an additional filter, and then compare the two envelopes to find a sound that is louder than the baselevel, here's a more complete lab.
Note that detecting audio "events" can be quite tricky, and hard to predict, make sure you have a lot of debbugging aid!
I've created a process to generate video "slideshows" from collections of photographs and images in an application that I'm building. The process is functioning correctly, but creates unnecessarily large files given that any photographs included in the video repeat for 100 to 150 frames unchanged. I've included whatever compression I can find in AVFoundation, which mostly applies intra-frame techniques and tried to find more information on inter-frame compression in AVFoundation. Unfortunately, there are only a few references that I've been able to find and nothing that has let me get it to work.
I'm hoping that someone can steer me in the right direction. The code for the video generator is included below. I've not included the code for fetching and preparing the individual frames (called below as self.getFrame()) since that seems to be working fine and gets quite complex since it handles photos, videos, adding title frames, and doing fade transitions. For repeated frames, it returns a structure with the frame image and a counter for the number of output frames to include.
// Create a new AVAssetWriter Instance that will build the video
assetWriter = createAssetWriter(path: filePathNew, size: videoSize!)
guard assetWriter != nil else
{
print("Error converting images to video: AVAssetWriter not created.")
inProcess = false
return
}
let writerInput = assetWriter!.inputs.filter{ $0.mediaType == AVMediaTypeVideo }.first!
let sourceBufferAttributes : [String : AnyObject] = [
kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32ARGB) as AnyObject,
kCVPixelBufferWidthKey as String : videoSize!.width as AnyObject,
kCVPixelBufferHeightKey as String : videoSize!.height as AnyObject,
AVVideoMaxKeyFrameIntervalKey as String : 50 as AnyObject,
AVVideoCompressionPropertiesKey as String : [
AVVideoAverageBitRateKey: 725000,
AVVideoProfileLevelKey: AVVideoProfileLevelH264Baseline30,
] as AnyObject
]
let pixelBufferAdaptor = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: writerInput, sourcePixelBufferAttributes: sourceBufferAttributes)
// Start the writing session
assetWriter!.startWriting()
assetWriter!.startSession(atSourceTime: kCMTimeZero)
if (pixelBufferAdaptor.pixelBufferPool == nil) {
print("Error converting images to video: pixelBufferPool nil after starting session")
inProcess = false
return
}
// -- Create queue for <requestMediaDataWhenReadyOnQueue>
let mediaQueue = DispatchQueue(label: "mediaInputQueue")
// Initialize run time values
var presentationTime = kCMTimeZero
var done = false
var nextFrame: FramePack? // The FramePack struct has the frame to output, noDisplays - the number of times that it will be output
// and an isLast flag that is true when it's the final frame
writerInput.requestMediaDataWhenReady(on: mediaQueue, using: { () -> Void in // Keeps invoking the block to get input until call markAsFinished
nextFrame = self.getFrame() // Get the next frame to be added to the output with its associated values
let imageCGOut = nextFrame!.frame // The frame to output
if nextFrame!.isLast { done = true } // Identifies the last frame so can drop through to markAsFinished() below
var frames = 0 // Counts how often we've output this frame
var waitCount = 0 // Used to avoid an infinite loop if there's trouble with writer.Input
while (frames < nextFrame!.noDisplays) && (waitCount < 1000000) // Need to wait for writerInput to be ready - count deals with potential hung writer
{
waitCount += 1
if waitCount == 1000000 // Have seen it go into 100s of thousands and succeed
{
print("Exceeded waitCount limit while attempting to output slideshow frame.")
self.inProcess = false
return
}
if (writerInput.isReadyForMoreMediaData)
{
waitCount = 0
frames += 1
autoreleasepool
{
if let pixelBufferPool = pixelBufferAdaptor.pixelBufferPool
{
let pixelBufferPointer = UnsafeMutablePointer<CVPixelBuffer?>.allocate(capacity: 1)
let status: CVReturn = CVPixelBufferPoolCreatePixelBuffer(
kCFAllocatorDefault,
pixelBufferPool,
pixelBufferPointer
)
if let pixelBuffer = pixelBufferPointer.pointee, status == 0
{
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer)
let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
// Set up a context for rendering using the PixelBuffer allocated above as the target
let context = CGContext(
data: pixelData,
width: Int(self.videoWidth),
height: Int(self.videoHeight),
bitsPerComponent: 8,
bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer),
space: rgbColorSpace,
bitmapInfo: CGImageAlphaInfo.premultipliedFirst.rawValue
)
// Draw the image into the PixelBuffer used for the context
context?.draw(imageCGOut, in: CGRect(x: 0.0,y: 0.0,width: 1280, height: 720))
// Append the image (frame) from the context pixelBuffer onto the video file
_ = pixelBufferAdaptor.append(pixelBuffer, withPresentationTime: presentationTime)
presentationTime = presentationTime + CMTimeMake(1, videoFPS)
// We're done with the PixelBuffer, so unlock it
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
}
pixelBufferPointer.deinitialize()
pixelBufferPointer.deallocate(capacity: 1)
} else {
NSLog("Error: Failed to allocate pixel buffer from pool")
}
}
}
}
Thanks in advance for any suggestions.
It looks like you're
appending a bunch of redundant frames to your video,
labouring under a misapprehension: that video files must have a constant framerate that is high, e.g. 30fps.
If, for example, you're showing a slideshow of 3 images over a duration of 15 seconds, then you need only output 3 images, with presentation timestamps of 0s, 5s, 10s and an assetWriter.endSession(atSourceTime:) of 15s, not 15s * 30 FPS = 450 frames .
In other words, your frame rate is way too high - for the best interframe compression money can buy, lower your frame rate to the bare minimum number of frames you need and all will be well*.
*I've seen some video services/players choke on unusually low framerates,
so you may need a minimum framerate and some redundant frames, e.g. 1frame/5s, ymmv