iOS -- How to change video resolution in webRTC? - ios

I am trying to change local video resolution in webRTC. I used following method to create local video tracker:
-(RTCVideoTrack *)createLocalVideoTrack {
RTCVideoTrack *localVideoTrack = nil;
RTCMediaConstraints *mediaConstraints = [[RTCMediaConstraints alloc] initWithMandatoryConstraints:nil optionalConstraints:nil];
RTCAVFoundationVideoSource *source =
[self.factory avFoundationVideoSourceWithConstraints:mediaConstraints];
localVideoTrack =
[self.factory videoTrackWithSource:source
trackId:#"ARDAMSv0"];
return localVideoTrack;
}
I set the mandatory constraint as follow, but it doesn't work:
#{#"minFrameRate":#"20",#"maxFrameRate":#"30",#"maxWidth":#"320",#"minWidth":#"240",#"maxHeight":#"320",#"minHeight":#"240"};
Could anyone help me?

Latest SDK builds don't provide factory method to build capturer with constraints any more. Solution should be based on AVCaptureSession instead and WebRTC will take care about CPU and bandwidth utilization.
For this you need to keep reference to your RTCVideoSource that was passed to capturer. It has method:
- (void)adaptOutputFormatToWidth:(int)width height:(int)height fps:(int)fps;
Calling this function will cause frames to be scaled down to the requested resolution. Also, frames will be cropped to match the requested aspect ratio, and frames will be dropped to match the requested fps. The requested aspect ratio is orientation agnostic and will be adjusted to maintain the input orientation, so it doesn't matter if e.g. 1280x720 or 720x1280 is requested.
var localVideoSource: RTCVideoSource?
You may create your video track this way:
func createVideoTrack() -> RTCVideoTrack? {
var source: RTCVideoSource
if let localSource = self.localVideoSource {
source = localSource
} else {
source = self.factory.videoSource()
self.localVideoSource = source
}
let devices = RTCCameraVideoCapturer.captureDevices()
if let camera = devices.first,
// here you can decide to use front or back camera
let format = RTCCameraVideoCapturer.supportedFormats(for: camera).last,
// here you have a bunch of formats from tiny to up to 4k, find 1st that conforms your needs, i.e. if you usemax 1280x720, then no need to pick 4k
let fps = format.videoSupportedFrameRateRanges.first?.maxFrameRate
// or take smth in between min..max, i.e. 24 fps and not 30, to reduce gpu/cpu use {
let intFps = Int(fps)
let capturer = RTCCameraVideoCapturer(delegate: source)
capturer.startCapture(with: camera, format: format, fps: intFps)
let videoTrack = self.factory.videoTrack(with: source, trackId: WebRTCClient.trackIdVideo)
return videoTrack
}
retun nil
}
And when you need to change resolution, you can tell this video source to do "scaling".
func changeResolution(w: Int32, h: Int32) -> Bool {
guard let videoSource = self.localVideoSource else {
return false
}
// TODO: decide fps
videoSource.adaptOutputFormat(toWidth: w, height: h, fps: 30)
return true
}
Camera will still capture frames with resolution providd in format to startCapture. And if you care about resource utilization, then you can also use next methods prior to adaptOutputFormat.
// Stops the capture session asynchronously and notifies callback on completion.
- (void)stopCaptureWithCompletionHandler:(nullable void (^)(void))completionHandler;
// Starts the capture session asynchronously.
- (void)startCaptureWithDevice:(AVCaptureDevice *)device format:(AVCaptureDeviceFormat *)format fps:(NSInteger)fps;

Related

AVCaptureSession image size compression like IOS default camera

i have implemented in Xamarin IOS simple app to take picture using AVCaptureSession and AVCapturePhotoCaptureDelegate.
All work fine, but my problem is the image size.
Using in my code AVCaptureSession.PresetPhoto like session preset in my AVCaptureSession i obtain very big photo size.
After some researches i write my method to obtain the correct format scale.
public UIImage RescaleImage(UIImage img) {
var format = new UIGraphicsImageRendererFormat();
format.Scale = 1;
var renderer = new UIGraphicsImageRenderer(img.Size, format);
return renderer.CreateImage(c => img.Draw(CGPoint.Empty) );
}
After that the size of my photo has the same resolution of IOS camera (3024 x 4032).
Now i should compress image like jpg.
//Image is UIImage
var jpgImag = Image.AsJPEG(0f);
// Converting UIImage to byte[] array
Byte[] myByteArray = new Byte[jpgImag.Length];
System.Runtime.InteropServices.Marshal.Copy(jpgImag.Bytes,
myByteArray, 0, Convert.ToInt32(jpgImag.Length));
Here my capture session code:
CaptureSession = new AVCaptureSession();
CaptureSession.SessionPreset = AVCaptureSession.PresetPhoto;
CaptureOutput = new AVCapturePhotoOutput();
CaptureSession.AddOutput(CaptureOutput);
CaptureDevice = AVCaptureDevice.GetDefaultDevice(AVMediaType.Video);
NSError Error;
// Prepare device for configuration
if (!CaptureDevice.LockForConfiguration(out Error)) {
// There has been an issue, abort
Console.WriteLine("Error: {0}", Error.LocalizedDescription);
CaptureDevice.UnlockForConfiguration();
return;
}
CaptureDevice.FocusMode = AVCaptureFocusMode.ContinuousAutoFocus;
// Unlock configuration
CaptureDevice.UnlockForConfiguration();
CaptureDeviceInput = AVCaptureDeviceInput.FromDevice(CaptureDevice);
CaptureSession.AddInput(CaptureDeviceInput);
My settings:
var settings = AVCapturePhotoSettings.Create();
var previewPixelType = settings.AvailablePreviewPhotoPixelFormatTypes.First();
settings.PreviewPhotoFormat = new NSDictionary<NSString, NSObject>(CVPixelBuffer.PixelFormatTypeKey, settings.AvailablePreviewPhotoPixelFormatTypes[0]);
settings.FlashMode = AVCaptureFlashMode.Auto;
return settings;
If i save myByteArray i obtain a 20MB file for example. If i shoot same photo with ios camera i obtain 3MB image (for example).
The only difference between 2 images is the Bit depth.
My app is 32. IOS camera 24
And dpi
My app is 96. IOS camera 72
My question is:
how can set my AVCaptureSession to obtain more similar result like IOS camera?
(swift code is fine too)
Thanks

CoreML Memory Leak in iOS 14.5

In my application, I used VNImageRequestHandler with a custom MLModel for object detection.
The app works fine with iOS versions before 14.5.
When iOS 14.5 came, it broke everything.
Whenever try handler.perform([visionRequest]) throws an error (Error Domain=com.apple.vis Code=11 "encountered unknown exception" UserInfo={NSLocalizedDescription=encountered unknown exception}), the pixelBuffer memory is held and never released, it made the buffers of AVCaptureOutput full then new frame not came.
I have to change the code as below, by copy the pixelBuffer to another var, I solved the problem that new frame not coming, but memory leak problem is still happened.
Because of memory leak, the app crashed after some times.
Notice that before iOS version 14.5, detection works perfectly, try handler.perform([visionRequest]) never throws any error.
Here is my code:
private func predictWithPixelBuffer(sampleBuffer: CMSampleBuffer) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
return
}
// Get additional info from the camera.
var options: [VNImageOption : Any] = [:]
if let cameraIntrinsicMatrix = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
options[.cameraIntrinsics] = cameraIntrinsicMatrix
}
autoreleasepool {
// Because of iOS 14.5, there is a bug that when perform vision request failed, pixel buffer memory leaked so the AVCaptureOutput buffers is full, it will not output new frame any more, this is a temporary work around to copy pixel buffer to a new buffer, this currently make the memory increased a lot also. Need to find a better way
var clonePixelBuffer: CVPixelBuffer? = pixelBuffer.copy()
let handler = VNImageRequestHandler(cvPixelBuffer: clonePixelBuffer!, orientation: orientation, options: options)
print("[DEBUG] detecting...")
do {
try handler.perform([visionRequest])
} catch {
delegate?.detector(didOutputBoundingBox: [])
failedCount += 1
print("[DEBUG] detect failed \(failedCount)")
print("Failed to perform Vision request: \(error)")
}
clonePixelBuffer = nil
}
}
Has anyone experienced the same problem? If so, how did you fix it?
iOS 14.7 Beta available on the developer portal seems to have fixed this issue.
I have a partial fix for this using #Matthijs Hollemans CoreMLHelpers library.
The model I use has 300 classes and 2363 anchors. I used a lot of the code Matthijs provided here to convert the model to MLModel.
In the last step a pipeline is built using the 3 sub models: raw_ssd_output, decoder, and nms. For this workaround you need to remove the nms model from the pipeline, and output raw_confidence and raw_coordinates.
In your app you need to add the code from CoreMLHelpers.
Then add this function to decode the output from your MLModel:
func decodeResults(results:[VNCoreMLFeatureValueObservation]) -> [BoundingBox] {
let raw_confidence: MLMultiArray = results[0].featureValue.multiArrayValue!
let raw_coordinates: MLMultiArray = results[1].featureValue.multiArrayValue!
print(raw_confidence.shape, raw_coordinates.shape)
var boxes = [BoundingBox]()
let startDecoding = Date()
for anchor in 0..<raw_confidence.shape[0].int32Value {
var maxInd:Int = 0
var maxConf:Float = 0
for score in 0..<raw_confidence.shape[1].int32Value {
let key = [anchor, score] as [NSNumber]
let prob = raw_confidence[key].floatValue
if prob > maxConf {
maxInd = Int(score)
maxConf = prob
}
}
let y0 = raw_coordinates[[anchor, 0] as [NSNumber]].doubleValue
let x0 = raw_coordinates[[anchor, 1] as [NSNumber]].doubleValue
let y1 = raw_coordinates[[anchor, 2] as [NSNumber]].doubleValue
let x1 = raw_coordinates[[anchor, 3] as [NSNumber]].doubleValue
let width = x1-x0
let height = y1-y0
let x = x0 + width/2
let y = y0 + height/2
let rect = CGRect(x: x, y: y, width: width, height: height)
let box = BoundingBox(classIndex: maxInd, score: maxConf, rect: rect)
boxes.append(box)
}
let finishDecoding = Date()
let keepIndices = nonMaxSuppressionMultiClass(numClasses: raw_confidence.shape[1].intValue, boundingBoxes: boxes, scoreThreshold: 0.5, iouThreshold: 0.6, maxPerClass: 5, maxTotal: 10)
let finishNMS = Date()
var keepBoxes = [BoundingBox]()
for index in keepIndices {
keepBoxes.append(boxes[index])
}
print("Time Decoding", finishDecoding.timeIntervalSince(startDecoding))
print("Time Performing NMS", finishNMS.timeIntervalSince(finishDecoding))
return keepBoxes
}
Then when you receive the results from Vision, you call the function like this:
if let rawResults = vnRequest.results as? [VNCoreMLFeatureValueObservation] {
let boxes = self.decodeResults(results: rawResults)
print(boxes)
}
This solution is slow because of the way I move the data around and formulate my list of BoundingBox types. It would be much more efficient to process the MLMultiArray data using underlying pointers, and maybe use Accelerate to find the maximum score and best class for each anchor box.
In my case it helped to disable neural engine by forcing CoreML to run on CPU and GPU only. This is often slower but doesn't throw the exception (at least in our case). At the end we implemented a policy to force some of our models to not run on neural engine for certain iOS devices.
See MLModelConfiguration.computeUntis to constraint the hardware coreml model can use.

AVPlayerItemVideoOutput copyPixelBuffer always returns 1280x720

Im instanciating the AVPlayerItemVideoOutput like so:
let videoOutput = AVPlayerItemVideoOutput(pixelBufferAttributes: [String(kCVPixelBufferPixelFormatTypeKey): NSNumber(value: kCVPixelFormatType_32BGRA)])
And retrieving the pixelBuffers like this:
#objc func displayLinkDidRefresh(link: CADisplayLink) {
let itemTime = videoOutput.itemTime(forHostTime: CACurrentMediaTime())
if videoOutput.hasNewPixelBuffer(forItemTime: itemTime) {
if let pixelBuffer = videoOutput.copyPixelBuffer(forItemTime: itemTime, itemTimeForDisplay: nil) {
}
}
}
But for some reason CVPixelBufferGetHeight(pixelBuffer) or Width. always return 1280x720 when the video was taken when the iPhone's camera (landscape or portrait) always height=1280 width=720. EVEN if the video is 4k. If I load a square video from instagram or any other video downloaded from the internet (not created directly with the camera app) the width and height are printed correctly when the resolution is less than 720p. But a different resolution, for ex. a 1008x1792 will throw CVPixelBufferGetHeight(pixelBuffer) = 1280
Videos taken with the camera... it always throws a lower res. I tried 4k and 1080 settings (you can change that in iOS Settings > Camera). still.. even in 1080, I get 1280x720 pixel buffers.
I figured out that th UIPickerController I was using was set to default transcode the selected video from library to a Medium setting. in this case it was 1280x720
I ended up changing this properties of the picker
picker.videoQuality = .typeHigh
picker.videoExportPreset = AVAssetExportPresetHighestQuality
Altho the property that actually makes the change is the videoExportPreset the other one I dont know what it does, even tho the Documentation specifies it is for when you record a video OR you pick a video.

GPUImageRawDataInput with Camera Feed

There could be several things wrong with my implementation, but I feel like it’s close.
I'm trying to record the camera feed using GPUImage, as well as set a dynamic overlay that updates 30 (or 60) times per second onto the video while it's recording. I don't want this to be done after the video has been recorded.
I have a pixel buffer that is updated 30 times a second in this case, and I'm creating a GPUImageRawDataInput object from the base address (UnsafeMutablePointer<GLubyte>). With the GPUImageRawDataInput object, I'm setting it's target to the 'filter' variable, which is just a GPUImageFilter(). I'm not sure if this is the correct way to set it up.
Currently the video it’s recording is just the camera feed, there’s no overlay.
func setupRecording() {
movieWriter = GPUImageMovieWriter(movieURL: fileURL(), size: self.view.frame.size)
movieWriter?.encodingLiveVideo = true
videoCamera = GPUImageVideoCamera(sessionPreset: AVCaptureSession.Preset.hd1920x1080.rawValue, cameraPosition: .back)
videoCamera?.outputImageOrientation = .portrait
videoCamera?.horizontallyMirrorFrontFacingCamera = true
videoCamera?.horizontallyMirrorRearFacingCamera = false
let userCameraView = gpuImageView
userCameraView?.fillMode = kGPUImageFillModePreserveAspectRatioAndFill;
//filter's declaration up top - let filter = GPUImageFilter()
videoCamera?.addTarget(filter)
videoCamera?.audioEncodingTarget = movieWriter;
filter.addTarget(userCameraView)
filter.addTarget(movieWriter)
videoCamera?.startCapture()
}
func shouldUpdateRawInput(_ data: UnsafeMutablePointer<GLubyte>!) {//updated 30x per second
if let rawDataInput = rawDataInput {
rawDataInput.updateData(fromBytes: data, size: self.view.frame.size)
rawDataInput.processData()
} else {
//first time creating it
rawDataInput = GPUImageRawDataInput(bytes: data, size: self.view.frame.size, pixelFormat: GPUPixelFormatBGRA)
rawDataInput?.processData()
rawDataInput?.addTarget(filter)
}
}
//----------------------------------------
//this is my conversion of the pixel buffer to the GLubyte in another file
CVPixelBufferLockBaseAddress(pixelBuf, 0);
GLubyte* rawDataBytes=(GLubyte*)CVPixelBufferGetBaseAddress(pixelBuf);
[_delegate shouldUpdateRawInput:rawDataBytes];

Ways to do inter-frame video compression in AVFoundation

I've created a process to generate video "slideshows" from collections of photographs and images in an application that I'm building. The process is functioning correctly, but creates unnecessarily large files given that any photographs included in the video repeat for 100 to 150 frames unchanged. I've included whatever compression I can find in AVFoundation, which mostly applies intra-frame techniques and tried to find more information on inter-frame compression in AVFoundation. Unfortunately, there are only a few references that I've been able to find and nothing that has let me get it to work.
I'm hoping that someone can steer me in the right direction. The code for the video generator is included below. I've not included the code for fetching and preparing the individual frames (called below as self.getFrame()) since that seems to be working fine and gets quite complex since it handles photos, videos, adding title frames, and doing fade transitions. For repeated frames, it returns a structure with the frame image and a counter for the number of output frames to include.
// Create a new AVAssetWriter Instance that will build the video
assetWriter = createAssetWriter(path: filePathNew, size: videoSize!)
guard assetWriter != nil else
{
print("Error converting images to video: AVAssetWriter not created.")
inProcess = false
return
}
let writerInput = assetWriter!.inputs.filter{ $0.mediaType == AVMediaTypeVideo }.first!
let sourceBufferAttributes : [String : AnyObject] = [
kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32ARGB) as AnyObject,
kCVPixelBufferWidthKey as String : videoSize!.width as AnyObject,
kCVPixelBufferHeightKey as String : videoSize!.height as AnyObject,
AVVideoMaxKeyFrameIntervalKey as String : 50 as AnyObject,
AVVideoCompressionPropertiesKey as String : [
AVVideoAverageBitRateKey: 725000,
AVVideoProfileLevelKey: AVVideoProfileLevelH264Baseline30,
] as AnyObject
]
let pixelBufferAdaptor = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: writerInput, sourcePixelBufferAttributes: sourceBufferAttributes)
// Start the writing session
assetWriter!.startWriting()
assetWriter!.startSession(atSourceTime: kCMTimeZero)
if (pixelBufferAdaptor.pixelBufferPool == nil) {
print("Error converting images to video: pixelBufferPool nil after starting session")
inProcess = false
return
}
// -- Create queue for <requestMediaDataWhenReadyOnQueue>
let mediaQueue = DispatchQueue(label: "mediaInputQueue")
// Initialize run time values
var presentationTime = kCMTimeZero
var done = false
var nextFrame: FramePack? // The FramePack struct has the frame to output, noDisplays - the number of times that it will be output
// and an isLast flag that is true when it's the final frame
writerInput.requestMediaDataWhenReady(on: mediaQueue, using: { () -> Void in // Keeps invoking the block to get input until call markAsFinished
nextFrame = self.getFrame() // Get the next frame to be added to the output with its associated values
let imageCGOut = nextFrame!.frame // The frame to output
if nextFrame!.isLast { done = true } // Identifies the last frame so can drop through to markAsFinished() below
var frames = 0 // Counts how often we've output this frame
var waitCount = 0 // Used to avoid an infinite loop if there's trouble with writer.Input
while (frames < nextFrame!.noDisplays) && (waitCount < 1000000) // Need to wait for writerInput to be ready - count deals with potential hung writer
{
waitCount += 1
if waitCount == 1000000 // Have seen it go into 100s of thousands and succeed
{
print("Exceeded waitCount limit while attempting to output slideshow frame.")
self.inProcess = false
return
}
if (writerInput.isReadyForMoreMediaData)
{
waitCount = 0
frames += 1
autoreleasepool
{
if let pixelBufferPool = pixelBufferAdaptor.pixelBufferPool
{
let pixelBufferPointer = UnsafeMutablePointer<CVPixelBuffer?>.allocate(capacity: 1)
let status: CVReturn = CVPixelBufferPoolCreatePixelBuffer(
kCFAllocatorDefault,
pixelBufferPool,
pixelBufferPointer
)
if let pixelBuffer = pixelBufferPointer.pointee, status == 0
{
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer)
let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
// Set up a context for rendering using the PixelBuffer allocated above as the target
let context = CGContext(
data: pixelData,
width: Int(self.videoWidth),
height: Int(self.videoHeight),
bitsPerComponent: 8,
bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer),
space: rgbColorSpace,
bitmapInfo: CGImageAlphaInfo.premultipliedFirst.rawValue
)
// Draw the image into the PixelBuffer used for the context
context?.draw(imageCGOut, in: CGRect(x: 0.0,y: 0.0,width: 1280, height: 720))
// Append the image (frame) from the context pixelBuffer onto the video file
_ = pixelBufferAdaptor.append(pixelBuffer, withPresentationTime: presentationTime)
presentationTime = presentationTime + CMTimeMake(1, videoFPS)
// We're done with the PixelBuffer, so unlock it
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
}
pixelBufferPointer.deinitialize()
pixelBufferPointer.deallocate(capacity: 1)
} else {
NSLog("Error: Failed to allocate pixel buffer from pool")
}
}
}
}
Thanks in advance for any suggestions.
It looks like you're
appending a bunch of redundant frames to your video,
labouring under a misapprehension: that video files must have a constant framerate that is high, e.g. 30fps.
If, for example, you're showing a slideshow of 3 images over a duration of 15 seconds, then you need only output 3 images, with presentation timestamps of 0s, 5s, 10s and an assetWriter.endSession(atSourceTime:) of 15s, not 15s * 30 FPS = 450 frames .
In other words, your frame rate is way too high - for the best interframe compression money can buy, lower your frame rate to the bare minimum number of frames you need and all will be well*.
*I've seen some video services/players choke on unusually low framerates,
so you may need a minimum framerate and some redundant frames, e.g. 1frame/5s, ymmv

Resources