I've created a process to generate video "slideshows" from collections of photographs and images in an application that I'm building. The process is functioning correctly, but creates unnecessarily large files given that any photographs included in the video repeat for 100 to 150 frames unchanged. I've included whatever compression I can find in AVFoundation, which mostly applies intra-frame techniques and tried to find more information on inter-frame compression in AVFoundation. Unfortunately, there are only a few references that I've been able to find and nothing that has let me get it to work.
I'm hoping that someone can steer me in the right direction. The code for the video generator is included below. I've not included the code for fetching and preparing the individual frames (called below as self.getFrame()) since that seems to be working fine and gets quite complex since it handles photos, videos, adding title frames, and doing fade transitions. For repeated frames, it returns a structure with the frame image and a counter for the number of output frames to include.
// Create a new AVAssetWriter Instance that will build the video
assetWriter = createAssetWriter(path: filePathNew, size: videoSize!)
guard assetWriter != nil else
{
print("Error converting images to video: AVAssetWriter not created.")
inProcess = false
return
}
let writerInput = assetWriter!.inputs.filter{ $0.mediaType == AVMediaTypeVideo }.first!
let sourceBufferAttributes : [String : AnyObject] = [
kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32ARGB) as AnyObject,
kCVPixelBufferWidthKey as String : videoSize!.width as AnyObject,
kCVPixelBufferHeightKey as String : videoSize!.height as AnyObject,
AVVideoMaxKeyFrameIntervalKey as String : 50 as AnyObject,
AVVideoCompressionPropertiesKey as String : [
AVVideoAverageBitRateKey: 725000,
AVVideoProfileLevelKey: AVVideoProfileLevelH264Baseline30,
] as AnyObject
]
let pixelBufferAdaptor = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: writerInput, sourcePixelBufferAttributes: sourceBufferAttributes)
// Start the writing session
assetWriter!.startWriting()
assetWriter!.startSession(atSourceTime: kCMTimeZero)
if (pixelBufferAdaptor.pixelBufferPool == nil) {
print("Error converting images to video: pixelBufferPool nil after starting session")
inProcess = false
return
}
// -- Create queue for <requestMediaDataWhenReadyOnQueue>
let mediaQueue = DispatchQueue(label: "mediaInputQueue")
// Initialize run time values
var presentationTime = kCMTimeZero
var done = false
var nextFrame: FramePack? // The FramePack struct has the frame to output, noDisplays - the number of times that it will be output
// and an isLast flag that is true when it's the final frame
writerInput.requestMediaDataWhenReady(on: mediaQueue, using: { () -> Void in // Keeps invoking the block to get input until call markAsFinished
nextFrame = self.getFrame() // Get the next frame to be added to the output with its associated values
let imageCGOut = nextFrame!.frame // The frame to output
if nextFrame!.isLast { done = true } // Identifies the last frame so can drop through to markAsFinished() below
var frames = 0 // Counts how often we've output this frame
var waitCount = 0 // Used to avoid an infinite loop if there's trouble with writer.Input
while (frames < nextFrame!.noDisplays) && (waitCount < 1000000) // Need to wait for writerInput to be ready - count deals with potential hung writer
{
waitCount += 1
if waitCount == 1000000 // Have seen it go into 100s of thousands and succeed
{
print("Exceeded waitCount limit while attempting to output slideshow frame.")
self.inProcess = false
return
}
if (writerInput.isReadyForMoreMediaData)
{
waitCount = 0
frames += 1
autoreleasepool
{
if let pixelBufferPool = pixelBufferAdaptor.pixelBufferPool
{
let pixelBufferPointer = UnsafeMutablePointer<CVPixelBuffer?>.allocate(capacity: 1)
let status: CVReturn = CVPixelBufferPoolCreatePixelBuffer(
kCFAllocatorDefault,
pixelBufferPool,
pixelBufferPointer
)
if let pixelBuffer = pixelBufferPointer.pointee, status == 0
{
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer)
let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
// Set up a context for rendering using the PixelBuffer allocated above as the target
let context = CGContext(
data: pixelData,
width: Int(self.videoWidth),
height: Int(self.videoHeight),
bitsPerComponent: 8,
bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer),
space: rgbColorSpace,
bitmapInfo: CGImageAlphaInfo.premultipliedFirst.rawValue
)
// Draw the image into the PixelBuffer used for the context
context?.draw(imageCGOut, in: CGRect(x: 0.0,y: 0.0,width: 1280, height: 720))
// Append the image (frame) from the context pixelBuffer onto the video file
_ = pixelBufferAdaptor.append(pixelBuffer, withPresentationTime: presentationTime)
presentationTime = presentationTime + CMTimeMake(1, videoFPS)
// We're done with the PixelBuffer, so unlock it
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
}
pixelBufferPointer.deinitialize()
pixelBufferPointer.deallocate(capacity: 1)
} else {
NSLog("Error: Failed to allocate pixel buffer from pool")
}
}
}
}
Thanks in advance for any suggestions.
It looks like you're
appending a bunch of redundant frames to your video,
labouring under a misapprehension: that video files must have a constant framerate that is high, e.g. 30fps.
If, for example, you're showing a slideshow of 3 images over a duration of 15 seconds, then you need only output 3 images, with presentation timestamps of 0s, 5s, 10s and an assetWriter.endSession(atSourceTime:) of 15s, not 15s * 30 FPS = 450 frames .
In other words, your frame rate is way too high - for the best interframe compression money can buy, lower your frame rate to the bare minimum number of frames you need and all will be well*.
*I've seen some video services/players choke on unusually low framerates,
so you may need a minimum framerate and some redundant frames, e.g. 1frame/5s, ymmv
Related
I'm trying to create a camera remote control app with an iPhone as the camera and an iPad as the remote control. What I'm trying to do is send the iPhone's camera preview using the AVCaptureVideoDataOutput and stream it with OutputStream using the MultipeerConnectivity framework. Then the iPad will receive the data and show it using UIView by setting the layer contents. So far what I've done is this:
(iPhone/Camera preview stream) didOutput function implementation from the AVCaptureVideoDataOutputSampleBufferDelegate:
func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer: CMSampleBuffer,
from connection: AVCaptureConnection) {
DispatchQueue.global(qos: .utility).async { [unowned self] in
let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
if let imageBuffer {
CVPixelBufferLockBaseAddress(imageBuffer, [])
let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer)
let bytesPerRow: size_t? = CVPixelBufferGetBytesPerRow(imageBuffer)
let width: size_t? = CVPixelBufferGetWidth(imageBuffer)
let height: size_t? = CVPixelBufferGetHeight(imageBuffer)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let newContext = CGContext(data: baseAddress,
width: width ?? 0,
height: height ?? 0,
bitsPerComponent: 8,
bytesPerRow: bytesPerRow ?? 0,
space: colorSpace,
bitmapInfo: CGBitmapInfo.byteOrder32Little.rawValue | CGImageAlphaInfo.premultipliedFirst.rawValue)
if let newImage = newContext?.makeImage() {
let image = UIImage(cgImage: newImage,
scale: 0.2,
orientation: .up)
CVPixelBufferUnlockBaseAddress(imageBuffer, [])
if let data = image.jpegData(compressionQuality: 0.2) {
let bytesWritten = data.withUnsafeBytes({
viewFinderStream?
.write($0.bindMemory(to: UInt8.self).baseAddress!, maxLength: data.count)
})
}
}
}
}
}
(iPad/Camera remote controller) Receiving the stream and showing it on the view. This is a function from StreamDelegate protocol:
func stream(_ aStream: Stream, handle eventCode: Stream.Event) {
let inputStream = aStream as! InputStream
switch eventCode {
case .hasBytesAvailable:
DispatchQueue.global(qos: .userInteractive).async { [unowned self] in
var buffer = [UInt8](repeating: 0, count: 1024)
let numberBytes = inputStream.read(&buffer, maxLength: 1024)
let data = Data(referencing: NSData(bytes: &buffer, length: numberBytes))
if let imageData = UIImage(data: data) {
DispatchQueue.main.async {
previewCameraView.layer.contents = imageData.cgImage
}
}
}
case .hasSpaceAvailable:
break
default:
break
}
}
Unfortunately, the iPad did receive the stream but it shows the video data just a tiny bit of it like this (notice the view on the right, there are few pixels that shows the camera preview data on the top left of the view. The rest is just a gray color):
EDIT: And I get this warning too in the console
2023-02-02 20:24:44.487399+0700 MultipeerVideo-Assignment[31170:1065836] Warning! [0x15c023800] Decoding incomplete with error code -1. This is expected if the image has not been fully downloaded.
And I'm not sure if this is normal or not but the iPhone uses almost 100% of it's CPU power.
My question is what did I do wrong for the video stream not showing completely on the iPad? And is there any way to make the stream more efficient so that the iPhone's CPU doesn't work too hard? I'm still new to iOS programming so I'm not sure how to solve this. If you need more code for clarity regarding this, please reach me in the comments.
I think the root of the issue is the fact that iPad reads the data from the stream using a 1024-byte buffer, which is just 256 pixels. That's what you likely see in the preview.
Instead, you need to somehow "know" the length of every frame so you could read it in full.
If you sent an uncompressed data then you could first send the iPad the expected dimensions so iPad could always read full frames. However you send compressed images (jpegs) and you need so somehow tell iPad what's the binary size of every "image".
Sending full frames is kinda inefficient. I am not an expert in this area, but I would consider encoding the camera input into a video and then stream it to iPad. I believe it should be possible to somehow use hardware encoding and the streaming nature of mp4 videos should also help. But that might not be a good suggestion since I have a very little idea of what I'm talking about.
You might want to look into:
VideoToolbox Framework;
Explore low-latency video encoding with VideoToolbox WWDC Session;
How to use VideoToolbox to decompress H.264 video stream
In my application, I used VNImageRequestHandler with a custom MLModel for object detection.
The app works fine with iOS versions before 14.5.
When iOS 14.5 came, it broke everything.
Whenever try handler.perform([visionRequest]) throws an error (Error Domain=com.apple.vis Code=11 "encountered unknown exception" UserInfo={NSLocalizedDescription=encountered unknown exception}), the pixelBuffer memory is held and never released, it made the buffers of AVCaptureOutput full then new frame not came.
I have to change the code as below, by copy the pixelBuffer to another var, I solved the problem that new frame not coming, but memory leak problem is still happened.
Because of memory leak, the app crashed after some times.
Notice that before iOS version 14.5, detection works perfectly, try handler.perform([visionRequest]) never throws any error.
Here is my code:
private func predictWithPixelBuffer(sampleBuffer: CMSampleBuffer) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
// Get additional info from the camera.
var options: [VNImageOption : Any] = [:]
if let cameraIntrinsicMatrix = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
options[.cameraIntrinsics] = cameraIntrinsicMatrix
}
autoreleasepool {
// Because of iOS 14.5, there is a bug that when perform vision request failed, pixel buffer memory leaked so the AVCaptureOutput buffers is full, it will not output new frame any more, this is a temporary work around to copy pixel buffer to a new buffer, this currently make the memory increased a lot also. Need to find a better way
var clonePixelBuffer: CVPixelBuffer? = pixelBuffer.copy()
let handler = VNImageRequestHandler(cvPixelBuffer: clonePixelBuffer!, orientation: orientation, options: options)
print("[DEBUG] detecting...")
do {
try handler.perform([visionRequest])
} catch {
delegate?.detector(didOutputBoundingBox: [])
failedCount += 1
print("[DEBUG] detect failed \(failedCount)")
print("Failed to perform Vision request: \(error)")
}
clonePixelBuffer = nil
}
}
Has anyone experienced the same problem? If so, how did you fix it?
iOS 14.7 Beta available on the developer portal seems to have fixed this issue.
I have a partial fix for this using #Matthijs Hollemans CoreMLHelpers library.
The model I use has 300 classes and 2363 anchors. I used a lot of the code Matthijs provided here to convert the model to MLModel.
In the last step a pipeline is built using the 3 sub models: raw_ssd_output, decoder, and nms. For this workaround you need to remove the nms model from the pipeline, and output raw_confidence and raw_coordinates.
In your app you need to add the code from CoreMLHelpers.
Then add this function to decode the output from your MLModel:
func decodeResults(results:[VNCoreMLFeatureValueObservation]) -> [BoundingBox] {
let raw_confidence: MLMultiArray = results[0].featureValue.multiArrayValue!
let raw_coordinates: MLMultiArray = results[1].featureValue.multiArrayValue!
print(raw_confidence.shape, raw_coordinates.shape)
var boxes = [BoundingBox]()
let startDecoding = Date()
for anchor in 0..<raw_confidence.shape[0].int32Value {
var maxInd:Int = 0
var maxConf:Float = 0
for score in 0..<raw_confidence.shape[1].int32Value {
let key = [anchor, score] as [NSNumber]
let prob = raw_confidence[key].floatValue
if prob > maxConf {
maxInd = Int(score)
maxConf = prob
}
}
let y0 = raw_coordinates[[anchor, 0] as [NSNumber]].doubleValue
let x0 = raw_coordinates[[anchor, 1] as [NSNumber]].doubleValue
let y1 = raw_coordinates[[anchor, 2] as [NSNumber]].doubleValue
let x1 = raw_coordinates[[anchor, 3] as [NSNumber]].doubleValue
let width = x1-x0
let height = y1-y0
let x = x0 + width/2
let y = y0 + height/2
let rect = CGRect(x: x, y: y, width: width, height: height)
let box = BoundingBox(classIndex: maxInd, score: maxConf, rect: rect)
boxes.append(box)
}
let finishDecoding = Date()
let keepIndices = nonMaxSuppressionMultiClass(numClasses: raw_confidence.shape[1].intValue, boundingBoxes: boxes, scoreThreshold: 0.5, iouThreshold: 0.6, maxPerClass: 5, maxTotal: 10)
let finishNMS = Date()
var keepBoxes = [BoundingBox]()
for index in keepIndices {
keepBoxes.append(boxes[index])
}
print("Time Decoding", finishDecoding.timeIntervalSince(startDecoding))
print("Time Performing NMS", finishNMS.timeIntervalSince(finishDecoding))
return keepBoxes
}
Then when you receive the results from Vision, you call the function like this:
if let rawResults = vnRequest.results as? [VNCoreMLFeatureValueObservation] {
let boxes = self.decodeResults(results: rawResults)
print(boxes)
}
This solution is slow because of the way I move the data around and formulate my list of BoundingBox types. It would be much more efficient to process the MLMultiArray data using underlying pointers, and maybe use Accelerate to find the maximum score and best class for each anchor box.
In my case it helped to disable neural engine by forcing CoreML to run on CPU and GPU only. This is often slower but doesn't throw the exception (at least in our case). At the end we implemented a policy to force some of our models to not run on neural engine for certain iOS devices.
See MLModelConfiguration.computeUntis to constraint the hardware coreml model can use.
There could be several things wrong with my implementation, but I feel like it’s close.
I'm trying to record the camera feed using GPUImage, as well as set a dynamic overlay that updates 30 (or 60) times per second onto the video while it's recording. I don't want this to be done after the video has been recorded.
I have a pixel buffer that is updated 30 times a second in this case, and I'm creating a GPUImageRawDataInput object from the base address (UnsafeMutablePointer<GLubyte>). With the GPUImageRawDataInput object, I'm setting it's target to the 'filter' variable, which is just a GPUImageFilter(). I'm not sure if this is the correct way to set it up.
Currently the video it’s recording is just the camera feed, there’s no overlay.
func setupRecording() {
movieWriter = GPUImageMovieWriter(movieURL: fileURL(), size: self.view.frame.size)
movieWriter?.encodingLiveVideo = true
videoCamera = GPUImageVideoCamera(sessionPreset: AVCaptureSession.Preset.hd1920x1080.rawValue, cameraPosition: .back)
videoCamera?.outputImageOrientation = .portrait
videoCamera?.horizontallyMirrorFrontFacingCamera = true
videoCamera?.horizontallyMirrorRearFacingCamera = false
let userCameraView = gpuImageView
userCameraView?.fillMode = kGPUImageFillModePreserveAspectRatioAndFill;
//filter's declaration up top - let filter = GPUImageFilter()
videoCamera?.addTarget(filter)
videoCamera?.audioEncodingTarget = movieWriter;
filter.addTarget(userCameraView)
filter.addTarget(movieWriter)
videoCamera?.startCapture()
}
func shouldUpdateRawInput(_ data: UnsafeMutablePointer<GLubyte>!) {//updated 30x per second
if let rawDataInput = rawDataInput {
rawDataInput.updateData(fromBytes: data, size: self.view.frame.size)
rawDataInput.processData()
} else {
//first time creating it
rawDataInput = GPUImageRawDataInput(bytes: data, size: self.view.frame.size, pixelFormat: GPUPixelFormatBGRA)
rawDataInput?.processData()
rawDataInput?.addTarget(filter)
}
}
//----------------------------------------
//this is my conversion of the pixel buffer to the GLubyte in another file
CVPixelBufferLockBaseAddress(pixelBuf, 0);
GLubyte* rawDataBytes=(GLubyte*)CVPixelBufferGetBaseAddress(pixelBuf);
[_delegate shouldUpdateRawInput:rawDataBytes];
I am actually working with image processing on iOS10 with iPad Pro.
I have written small Swift3 image processing app to test the speed of image processing.
My decoder sends every ~33ms (about 30 FPS) new frame, which I need to process with some CoreImage filters of iOS without additional buffering. Every ~33ms the following function will be called:
func newFrame(_ player: MediaPlayer!, buffer: UnsafeMutableRawPointer!,
size: Int32, format_fourcc: UnsafeMutablePointer<Int8>!,
width: Int32, height: Int32, bytes_per_row: Int32,
pts: Int, will_show: Int32) -> Int32 {
if String(cString: format_fourcc) == "BGRA" && will_show == 1 {
// START
var pixelBuffer: CVPixelBuffer? = nil
let ret = CVPixelBufferCreateWithBytes(kCFAllocatorSystemDefault,
Int(width),
Int(height),
kCVPixelFormatType_32BGRA,
buffer,
Int(bytes_per_row),
{ (releaseContext:
UnsafeMutableRawPointer?,
baseAddr:
UnsafeRawPointer?) -> () in
// Do not need to be used
// since created CVPixelBuffer
// will be destroyed
// in scope of this function
// automatically
},
buffer,
nil,
&pixelBuffer)
// END_1
if ret != kCVReturnSuccess {
NSLog("New Frame: Can't create the buffer")
return -1
}
if let pBuff = pixelBuffer {
let img = CIImage(cvPixelBuffer: pBuff)
.applyingFilter("CIColorInvert", withInputParameters: [:])
}
// END_2
}
return 0
}
I need to solve one of the following problems:
Copying CIImage img raw memory data back to UnsafeMutableRawPointer buffer memory.
Somehow apply GPU image filter to CVPixelBuffer pixelBuffer or UnsafeMutableRawPointer buffer directly
The code bloc between // START and // END_2 need to be run in less than 5ms.
What I know:
The code between // START and // END_1 runs in less than 1.3ms.
Please help with your ideas.
Best regards,
Alex
I found temporary solution:
1) Create CIContext in your view :
imgContext = CIContext(eaglContext: eaglContext!)
2) Use a context to draw filtered CIImage to the pointer's memory:
imgContext.render(img,
toBitmap: buffer,
rowBytes: Int(bytes_per_row),
bounds: CGRect(x: 0,
y: 0,
width: Int(width),
height: Int(height)),
format: kCIFormatBGRA8,
colorSpace: CGColorSpaceCreateDeviceRGB())
This solution works well as it uses SIMD instructions of iPad CPU. But the CPU utilization only for copy operation is too high ~30%. This 30% will be added to CPU usage of your program.
Probably somebody has any better idea how let GPU directly write to UnsafeMutableRawPointer after CIFilter?
What is the best way to convert CGImage to OTVideoFrame?
I tried to get the underlying CGImage pixel buffer and feed it into an OTVideoBuffer, but got a distorted image.
Here is what I have done:
created a new OTVideoFormat object with ARGB pixel format
Set the bytesPerRow of the OTVideoFormat to height*width*4. Taking the value of CGImageGetBytesPerRow(...) did not work, got no error messages but also no frames on the other end of the line.
Copied the rows truncating them to convert from CGImageGetBytesPerRow(...) to height*width*4 bytes per row.
Got a distorted image with rows slightly shifted
Here is the code:
func toOTVideoFrame() throws -> OTVideoFrame {
let width : UInt32 = UInt32(CGImageGetWidth(self)) // self is a CGImage
let height : UInt32 = UInt32(CGImageGetHeight(self))
assert(CGImageGetBitsPerPixel(self) == 32)
assert(CGImageGetBitsPerComponent(self) == 8)
let bitmapInfo = CGImageGetBitmapInfo(self)
assert(bitmapInfo.contains(CGBitmapInfo.FloatComponents) == false)
assert(bitmapInfo.contains(CGBitmapInfo.ByteOrderDefault))
assert(CGImageGetAlphaInfo(self) == .NoneSkipFirst)
let bytesPerPixel : UInt32 = 4
let cgImageBytesPerRow : UInt32 = UInt32(CGImageGetBytesPerRow(self))
let otFrameBytesPerRow : UInt32 = bytesPerPixel * width
let videoFormat = OTVideoFormat()
videoFormat.pixelFormat = .ARGB
videoFormat.bytesPerRow.addObject(NSNumber(unsignedInt: otFrameBytesPerRow))
videoFormat.imageWidth = width
videoFormat.imageHeight = height
videoFormat.estimatedFramesPerSecond = 15
videoFormat.estimatedCaptureDelay = 100
let videoFrame = OTVideoFrame(format: videoFormat)
videoFrame.timestamp = CMTimeMake(0, 1) // This is temporary
videoFrame.orientation = OTVideoOrientation.Up // This is temporary
let dataProvider = CGImageGetDataProvider(self)
let imageData : NSData = CGDataProviderCopyData(dataProvider)!
let buffer = UnsafeMutablePointer<UInt8>.alloc(Int(otFrameBytesPerRow * height))
for currentRow in 0..<height {
let currentRowStartOffsetCGImage = currentRow * cgImageBytesPerRow
let currentRowStartOffsetOTVideoFrame = currentRow * otFrameBytesPerRow
let cgImageRange = NSRange(location: Int(currentRowStartOffsetCGImage), length: Int(otFrameBytesPerRow))
imageData.getBytes(buffer.advancedBy(Int(currentRowStartOffsetOTVideoFrame)),
range: cgImageRange)
}
do {
let planes = UnsafeMutablePointer<UnsafeMutablePointer<UInt8>>.alloc(1)
planes.initialize(buffer)
videoFrame.setPlanesWithPointers(planes, numPlanes: 1)
planes.dealloc(1)
}
return videoFrame
}
The result image:
Solved this issue by my own.
It appears to be a bug in the OpenTok SDK. The SDK does not seem to be able to handle images whose size is not a multiple of 16. When I changed all image sizes to be multiple of 16, everything started to work fine.
TokBox did not bother to state this limitation in the API documentation, nor throw an exception when the input image size is not a multiple of 16.
This is a second critical bug I have found in OpenTok SDK. I strongly suggest you do not use this product. It is of very low quality.