I am able to get the frames from a video using AVAssetImageGenerator, and I do so using generateCGImagesAsynchronously. Whenever the result succeeds, I print the requestedTime in seconds as well as the actualTime for that image. In the end, however, two generated images are the same and have the same actualTimes even though my step value and frames for times are evenly spaced out.
Here is a snippet of my printed requested and actual times in seconds for each image:
Requested: 0.9666666666666667
Actual: 0.9666666666666667
Requested: 1.0
Actual: 1.0
Requested: 1.0333333333333334
Actual: 1.0333333333333334
Requested: 1.0666666666666667
Actual: 1.0666666666666667
Requested: 1.1
Actual: 1.1
Requested: 1.1333333333333333
Actual: 1.1
Requested: 1.1666666666666667
Actual: 1.135
It seems to be going fine until the frame corresponding to 1.1 seconds in the video is generated, which results in two of the same images and the actualTime to be delayed for the rest of the process.
I've already tried adjusting the way in which I compute the frames that should be generated, but it seems to be correct. I am using frames per second and multiplying that by the video duration to figure out how many frames I need to have in total, and I'm dividing the total duration by the sample counts to make sure cgImages are generated evenly.
let videoDuration = asset.duration
print("video duration: \(videoDuration.seconds)")
let videoTrack = asset.tracks(withMediaType: AVMediaType.video)[0]
let fps = videoTrack.nominalFrameRate
var frameForTimes = [NSValue]()
let sampleCounts = Int(videoDuration.seconds * Double(fps))
let totalTimeLength = Int(videoDuration.seconds * Double(videoDuration.timescale))
let step = totalTimeLength / sampleCounts
for i in 0 ..< sampleCounts {
let cmTime = CMTimeMake(value: Int64(i * step), timescale: Int32(videoDuration.timescale))
frameForTimes.append(NSValue(time: cmTime))
}
and the way in which I create images (see this):
imageGenerator.generateCGImagesAsynchronously(forTimes: timeValues) { (requestedTime, cgImage, actualTime, result, error) in
if let cgImage = cgImage {
print("Requested: \(requestedTime.seconds), Actual: \(actualTime.seconds)")
let image = UIImage(cgImage: cgImage)
// scale image if you want
frames.append(image)
}
}
I also set tolerance to zero before calling generateCGImages:
imageGenerator.requestedTimeToleranceBefore = CMTime.zero
imageGenerator.requestedTimeToleranceAfter = CMTime.zero
I expected the actual times to be consistent with the requested times, and for each image produced to be different. Looking through the images, there is always a duplicate regardless of the video being tested, and it normally occurs towards the middle to end.
Edit:
I found this, this, and this which mention the same problem but I've had no success with any of them.
Related
In my application, I used VNImageRequestHandler with a custom MLModel for object detection.
The app works fine with iOS versions before 14.5.
When iOS 14.5 came, it broke everything.
Whenever try handler.perform([visionRequest]) throws an error (Error Domain=com.apple.vis Code=11 "encountered unknown exception" UserInfo={NSLocalizedDescription=encountered unknown exception}), the pixelBuffer memory is held and never released, it made the buffers of AVCaptureOutput full then new frame not came.
I have to change the code as below, by copy the pixelBuffer to another var, I solved the problem that new frame not coming, but memory leak problem is still happened.
Because of memory leak, the app crashed after some times.
Notice that before iOS version 14.5, detection works perfectly, try handler.perform([visionRequest]) never throws any error.
Here is my code:
private func predictWithPixelBuffer(sampleBuffer: CMSampleBuffer) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
// Get additional info from the camera.
var options: [VNImageOption : Any] = [:]
if let cameraIntrinsicMatrix = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
options[.cameraIntrinsics] = cameraIntrinsicMatrix
}
autoreleasepool {
// Because of iOS 14.5, there is a bug that when perform vision request failed, pixel buffer memory leaked so the AVCaptureOutput buffers is full, it will not output new frame any more, this is a temporary work around to copy pixel buffer to a new buffer, this currently make the memory increased a lot also. Need to find a better way
var clonePixelBuffer: CVPixelBuffer? = pixelBuffer.copy()
let handler = VNImageRequestHandler(cvPixelBuffer: clonePixelBuffer!, orientation: orientation, options: options)
print("[DEBUG] detecting...")
do {
try handler.perform([visionRequest])
} catch {
delegate?.detector(didOutputBoundingBox: [])
failedCount += 1
print("[DEBUG] detect failed \(failedCount)")
print("Failed to perform Vision request: \(error)")
}
clonePixelBuffer = nil
}
}
Has anyone experienced the same problem? If so, how did you fix it?
iOS 14.7 Beta available on the developer portal seems to have fixed this issue.
I have a partial fix for this using #Matthijs Hollemans CoreMLHelpers library.
The model I use has 300 classes and 2363 anchors. I used a lot of the code Matthijs provided here to convert the model to MLModel.
In the last step a pipeline is built using the 3 sub models: raw_ssd_output, decoder, and nms. For this workaround you need to remove the nms model from the pipeline, and output raw_confidence and raw_coordinates.
In your app you need to add the code from CoreMLHelpers.
Then add this function to decode the output from your MLModel:
func decodeResults(results:[VNCoreMLFeatureValueObservation]) -> [BoundingBox] {
let raw_confidence: MLMultiArray = results[0].featureValue.multiArrayValue!
let raw_coordinates: MLMultiArray = results[1].featureValue.multiArrayValue!
print(raw_confidence.shape, raw_coordinates.shape)
var boxes = [BoundingBox]()
let startDecoding = Date()
for anchor in 0..<raw_confidence.shape[0].int32Value {
var maxInd:Int = 0
var maxConf:Float = 0
for score in 0..<raw_confidence.shape[1].int32Value {
let key = [anchor, score] as [NSNumber]
let prob = raw_confidence[key].floatValue
if prob > maxConf {
maxInd = Int(score)
maxConf = prob
}
}
let y0 = raw_coordinates[[anchor, 0] as [NSNumber]].doubleValue
let x0 = raw_coordinates[[anchor, 1] as [NSNumber]].doubleValue
let y1 = raw_coordinates[[anchor, 2] as [NSNumber]].doubleValue
let x1 = raw_coordinates[[anchor, 3] as [NSNumber]].doubleValue
let width = x1-x0
let height = y1-y0
let x = x0 + width/2
let y = y0 + height/2
let rect = CGRect(x: x, y: y, width: width, height: height)
let box = BoundingBox(classIndex: maxInd, score: maxConf, rect: rect)
boxes.append(box)
}
let finishDecoding = Date()
let keepIndices = nonMaxSuppressionMultiClass(numClasses: raw_confidence.shape[1].intValue, boundingBoxes: boxes, scoreThreshold: 0.5, iouThreshold: 0.6, maxPerClass: 5, maxTotal: 10)
let finishNMS = Date()
var keepBoxes = [BoundingBox]()
for index in keepIndices {
keepBoxes.append(boxes[index])
}
print("Time Decoding", finishDecoding.timeIntervalSince(startDecoding))
print("Time Performing NMS", finishNMS.timeIntervalSince(finishDecoding))
return keepBoxes
}
Then when you receive the results from Vision, you call the function like this:
if let rawResults = vnRequest.results as? [VNCoreMLFeatureValueObservation] {
let boxes = self.decodeResults(results: rawResults)
print(boxes)
}
This solution is slow because of the way I move the data around and formulate my list of BoundingBox types. It would be much more efficient to process the MLMultiArray data using underlying pointers, and maybe use Accelerate to find the maximum score and best class for each anchor box.
In my case it helped to disable neural engine by forcing CoreML to run on CPU and GPU only. This is often slower but doesn't throw the exception (at least in our case). At the end we implemented a policy to force some of our models to not run on neural engine for certain iOS devices.
See MLModelConfiguration.computeUntis to constraint the hardware coreml model can use.
I'm trying to append CVPixelBuffers to AVAssetWriterInputPixelBufferAdaptor at the intended framerate, but it seems to be too fast, and my math is off. This isn't capturing from the camera, but capturing changing images. The actual video is much to fast than the elapsed time it was captured.
I have a function that appends the CVPixelBuffer every 1/24 of a second. So I'm trying to add an offset of 1/24 of a second to the last time.
I've tried:
let sampleTimeOffset = CMTimeMake(value: 100, timescale: 2400)
and:
let sampleTimeOffset = CMTimeMake(value: 24, timescale: 600)
and:
let sampleTimeOffset = CMTimeMakeWithSeconds(0.0416666666, preferredTimescale: 1000000000)
I'm adding onto the currentSampleTime and appending like so:
self.currentSampleTime = CMTimeAdd(currentSampleTime, sampleTimeOffset)
let success = self.assetWriterPixelBufferInput?.append(cv, withPresentationTime: currentSampleTime)
One other solution I thought of is get the difference between the last time and the current time, and add that onto the currentSampleTime for accuracy, but unsure how to do it.
I found a way to accurately capture the time delay by comparing the last time in milliseconds compared to the current time in milliseconds.
First, I have a general current milliseconds time function:
func currentTimeInMilliSeconds()-> Int
{
let currentDate = Date()
let since1970 = currentDate.timeIntervalSince1970
return Int(since1970 * 1000)
}
When I create a writer, (when I start recording video) I set a variable in my class to the current time in milliseconds:
currentCaptureMillisecondsTime = currentTimeInMilliSeconds()
Then in my function that's supposed to be called 1/24 of a second is not always accurate, so I need to get the difference in milliseconds between when I started writing, or my last function call.
Do a conversion of milliseconds to seconds, and set that to CMTimeMakeWithSeconds.
let lastTimeMilliseconds = self.currentCaptureMillisecondsTime
let nowTimeMilliseconds = currentTimeInMilliSeconds()
let millisecondsDifference = nowTimeMilliseconds - lastTimeMilliseconds
// set new current time
self.currentCaptureMillisecondsTime = nowTimeMilliseconds
let millisecondsToSeconds:Float64 = Double(millisecondsDifference) * 0.001
let sampleTimeOffset = CMTimeMakeWithSeconds(millisecondsToSeconds, preferredTimescale: 1000000000)
I can now append my frame with the accurate delay that actually occurred.
self.currentSampleTime = CMTimeAdd(currentSampleTime, sampleTimeOffset)
let success = self.assetWriterPixelBufferInput?.append(cv, withPresentationTime: currentSampleTime)
When I finish writing the video and I save it to my camera roll, it is the exact duration from when I was recording.
I've created a process to generate video "slideshows" from collections of photographs and images in an application that I'm building. The process is functioning correctly, but creates unnecessarily large files given that any photographs included in the video repeat for 100 to 150 frames unchanged. I've included whatever compression I can find in AVFoundation, which mostly applies intra-frame techniques and tried to find more information on inter-frame compression in AVFoundation. Unfortunately, there are only a few references that I've been able to find and nothing that has let me get it to work.
I'm hoping that someone can steer me in the right direction. The code for the video generator is included below. I've not included the code for fetching and preparing the individual frames (called below as self.getFrame()) since that seems to be working fine and gets quite complex since it handles photos, videos, adding title frames, and doing fade transitions. For repeated frames, it returns a structure with the frame image and a counter for the number of output frames to include.
// Create a new AVAssetWriter Instance that will build the video
assetWriter = createAssetWriter(path: filePathNew, size: videoSize!)
guard assetWriter != nil else
{
print("Error converting images to video: AVAssetWriter not created.")
inProcess = false
return
}
let writerInput = assetWriter!.inputs.filter{ $0.mediaType == AVMediaTypeVideo }.first!
let sourceBufferAttributes : [String : AnyObject] = [
kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32ARGB) as AnyObject,
kCVPixelBufferWidthKey as String : videoSize!.width as AnyObject,
kCVPixelBufferHeightKey as String : videoSize!.height as AnyObject,
AVVideoMaxKeyFrameIntervalKey as String : 50 as AnyObject,
AVVideoCompressionPropertiesKey as String : [
AVVideoAverageBitRateKey: 725000,
AVVideoProfileLevelKey: AVVideoProfileLevelH264Baseline30,
] as AnyObject
]
let pixelBufferAdaptor = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: writerInput, sourcePixelBufferAttributes: sourceBufferAttributes)
// Start the writing session
assetWriter!.startWriting()
assetWriter!.startSession(atSourceTime: kCMTimeZero)
if (pixelBufferAdaptor.pixelBufferPool == nil) {
print("Error converting images to video: pixelBufferPool nil after starting session")
inProcess = false
return
}
// -- Create queue for <requestMediaDataWhenReadyOnQueue>
let mediaQueue = DispatchQueue(label: "mediaInputQueue")
// Initialize run time values
var presentationTime = kCMTimeZero
var done = false
var nextFrame: FramePack? // The FramePack struct has the frame to output, noDisplays - the number of times that it will be output
// and an isLast flag that is true when it's the final frame
writerInput.requestMediaDataWhenReady(on: mediaQueue, using: { () -> Void in // Keeps invoking the block to get input until call markAsFinished
nextFrame = self.getFrame() // Get the next frame to be added to the output with its associated values
let imageCGOut = nextFrame!.frame // The frame to output
if nextFrame!.isLast { done = true } // Identifies the last frame so can drop through to markAsFinished() below
var frames = 0 // Counts how often we've output this frame
var waitCount = 0 // Used to avoid an infinite loop if there's trouble with writer.Input
while (frames < nextFrame!.noDisplays) && (waitCount < 1000000) // Need to wait for writerInput to be ready - count deals with potential hung writer
{
waitCount += 1
if waitCount == 1000000 // Have seen it go into 100s of thousands and succeed
{
print("Exceeded waitCount limit while attempting to output slideshow frame.")
self.inProcess = false
return
}
if (writerInput.isReadyForMoreMediaData)
{
waitCount = 0
frames += 1
autoreleasepool
{
if let pixelBufferPool = pixelBufferAdaptor.pixelBufferPool
{
let pixelBufferPointer = UnsafeMutablePointer<CVPixelBuffer?>.allocate(capacity: 1)
let status: CVReturn = CVPixelBufferPoolCreatePixelBuffer(
kCFAllocatorDefault,
pixelBufferPool,
pixelBufferPointer
)
if let pixelBuffer = pixelBufferPointer.pointee, status == 0
{
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer)
let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
// Set up a context for rendering using the PixelBuffer allocated above as the target
let context = CGContext(
data: pixelData,
width: Int(self.videoWidth),
height: Int(self.videoHeight),
bitsPerComponent: 8,
bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer),
space: rgbColorSpace,
bitmapInfo: CGImageAlphaInfo.premultipliedFirst.rawValue
)
// Draw the image into the PixelBuffer used for the context
context?.draw(imageCGOut, in: CGRect(x: 0.0,y: 0.0,width: 1280, height: 720))
// Append the image (frame) from the context pixelBuffer onto the video file
_ = pixelBufferAdaptor.append(pixelBuffer, withPresentationTime: presentationTime)
presentationTime = presentationTime + CMTimeMake(1, videoFPS)
// We're done with the PixelBuffer, so unlock it
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
}
pixelBufferPointer.deinitialize()
pixelBufferPointer.deallocate(capacity: 1)
} else {
NSLog("Error: Failed to allocate pixel buffer from pool")
}
}
}
}
Thanks in advance for any suggestions.
It looks like you're
appending a bunch of redundant frames to your video,
labouring under a misapprehension: that video files must have a constant framerate that is high, e.g. 30fps.
If, for example, you're showing a slideshow of 3 images over a duration of 15 seconds, then you need only output 3 images, with presentation timestamps of 0s, 5s, 10s and an assetWriter.endSession(atSourceTime:) of 15s, not 15s * 30 FPS = 450 frames .
In other words, your frame rate is way too high - for the best interframe compression money can buy, lower your frame rate to the bare minimum number of frames you need and all will be well*.
*I've seen some video services/players choke on unusually low framerates,
so you may need a minimum framerate and some redundant frames, e.g. 1frame/5s, ymmv
The code below was inspired by other posts on SO and extracts an image from a video. Unfortunately, the image looks blurry even though the video looks sharp and fully in focus.
Is there something wrong with the code, or is this a natural difficulty of extracting images from videos?
func getImageFromVideo(videoURL: String) -> UIImage {
do {
let asset = AVURLAsset(URL: NSURL(fileURLWithPath: videoURL), options: nil)
let imgGenerator = AVAssetImageGenerator(asset: asset)
imgGenerator.appliesPreferredTrackTransform = true
let cgImage = try imgGenerator.copyCGImageAtTime(CMTimeMake(0, 1), actualTime: nil)
let image = UIImage(CGImage: cgImage)
return image
} catch {
...
}
}
Your code is working without errors or problems. I've tried with a video and the grabbed image was not blurry.
I would try to debug this by using a different timescale for CMTime.
With CMTimeMake, the first argument is the value and the second argument is the timescale.
Your timescale is 1, so the value is in seconds. A value of 0 means 1st second, a value of 1 means 2nd second, etc. Actually it means the first frame after the designated location in the timeline.
With your current CMTime it grabs the first frame of the first second: that's the first frame of the video (even if the video is less than 1s).
With a timescale of 4, the value would be 1/4th of a second. Etc.
Try finding a CMTime that falls right on a steady frame (it depends on your video framerate, you'll have to make tests).
For example if your video is at 24 fps, then to grab exactly one frame of video, the timescale should be at 24 (that way each value unit would represent a whole frame):
let cgImage = try imgGenerator.copyCGImageAtTime(CMTimeMake(0, 24), actualTime: nil)
On the other hand, you mention that only the first and last frames of the video are blurry. As you rightly guessed, it's probably the actual cause of your issue and is caused by a lack of device stabilization.
A note: the encoding of the video might also play a role. Some MPG encoders create incomplete and interpolated frames that are "recreated" when the video plays, but these frames can appear blurry when grabbed with copyCGImageAtTime. The only solution I've found for this rare problem is to grab another frame just before or just after the blurry one.
I have an app that lets people combine up to 4 pictures. However when I let them choose from their photos (up to 4) it can be very slow even when I set image quality to FastFormat. It will take 4 seconds (about 1 second per photo). On highest quality, 4 images takes 6 seconds.
Can you suggest anyway I get get the images out faster?
Here is the block where I process images.
func processImages()
{
_selectediImages = Array()
_cacheImageComplete = 0
for asset in _selectedAssets
{
var options:PHImageRequestOptions = PHImageRequestOptions()
options.synchronous = true
options.deliveryMode = PHImageRequestOptionsDeliveryMode.FastFormat
PHImageManager.defaultManager().requestImageForAsset(asset, targetSize:CGSizeMake(CGFloat(asset.pixelWidth), CGFloat(asset.pixelHeight)), contentMode: .AspectFit, options: options)
{
result, info in
var minRatio:CGFloat = 1
//Reduce file size so take 1/3 the screen w&h
if(CGFloat(asset.pixelWidth) > UIScreen.mainScreen().bounds.width/2 || CGFloat(asset.pixelHeight) > UIScreen.mainScreen().bounds.height/2)
{
minRatio = min((UIScreen.mainScreen().bounds.width/2)/(CGFloat(asset.pixelWidth)), ((UIScreen.mainScreen().bounds.height/2)/CGFloat(asset.pixelHeight)))
}
var size:CGSize = CGSizeMake((CGFloat(asset.pixelWidth)*minRatio),(CGFloat(asset.pixelHeight)*minRatio))
UIGraphicsBeginImageContextWithOptions(size, false, 0.0)
result.drawInRect(CGRectMake(0, 0, size.width, size.height))
var final = UIGraphicsGetImageFromCurrentImageContext()
var image = iImage(uiimage: final)
self._selectediImages.append(image)
self._cacheImageComplete!++
println(self._cacheImageComplete)
if(self._cacheImageComplete == self._selectionCount)
{
self._processingImages = false
self.selectionCallback(self._selectediImages)
}
}
}
}
Don't resize the images yourself — part of what PHImageManager is for is to do that for you. (It also caches the thumbnail images so that you can get them more quickly next time, and shares that cache across apps so that you don't end up with half a dozen apps creating half a dozen separate 500MB thumbnail caches of your whole library.)
func processImages() {
_selectediImages = Array()
_cacheImageComplete = 0
for asset in _selectedAssets {
let options = PHImageRequestOptions()
options.deliveryMode = .FastFormat
// request images no bigger than 1/3 the screen width
let maxDimension = UIScreen.mainScreen().bounds.width / 3 * UIScreen.mainScreen().scale
let size = CGSize(width: maxDimension, height: maxDimension)
PHImageManager.defaultManager().requestImageForAsset(asset, targetSize: size, contentMode: .AspectFill, options: options)
{ result, info in
// probably some of this code is unnecessary, too,
// but I'm not sure what you're doing here so leaving it alone
self._selectediImages.append(result)
self._cacheImageComplete!++
println(self._cacheImageComplete)
if self._cacheImageComplete == self._selectionCount {
self._processingImages = false
self.selectionCallback(self._selectediImages)
}
}
}
}
}
Notable changes:
Don't ask for images synchronously on the main thread. Just don't.
Pass a square maximum size to requestImageForAsset and use the AspectFill mode. This will get you an image that crops to fill that square no matter what its aspect ratio is.
You're asking for images by their pixel size here, and the screen size is in points. Multiply by the screen scale or your images will be pixelated. (Then again, you're asking for FastFormat, so you might get blurry images anyway.)
Why did you say synchronous? Obviously that's going to slow things way down. Moreover, saying synchronous on the main thread is absolutely forbidden!!!! Read the docs and obey them. That is the primary issue here.
There are then many other considerations. Basically you're using this call all wrong. Once you've removed the synchronous, do not process the image like that! Remember, this callback is going to be called many times as the image is provided in better and better versions. You must not do anything time-consuming here.
(Also, why are you resizing the image? If you wanted the image at a certain size, you should have asked for that size when you requested it. Let the image-fetcher do the work for you.)