There could be several things wrong with my implementation, but I feel like it’s close.
I'm trying to record the camera feed using GPUImage, as well as set a dynamic overlay that updates 30 (or 60) times per second onto the video while it's recording. I don't want this to be done after the video has been recorded.
I have a pixel buffer that is updated 30 times a second in this case, and I'm creating a GPUImageRawDataInput object from the base address (UnsafeMutablePointer<GLubyte>). With the GPUImageRawDataInput object, I'm setting it's target to the 'filter' variable, which is just a GPUImageFilter(). I'm not sure if this is the correct way to set it up.
Currently the video it’s recording is just the camera feed, there’s no overlay.
func setupRecording() {
movieWriter = GPUImageMovieWriter(movieURL: fileURL(), size: self.view.frame.size)
movieWriter?.encodingLiveVideo = true
videoCamera = GPUImageVideoCamera(sessionPreset: AVCaptureSession.Preset.hd1920x1080.rawValue, cameraPosition: .back)
videoCamera?.outputImageOrientation = .portrait
videoCamera?.horizontallyMirrorFrontFacingCamera = true
videoCamera?.horizontallyMirrorRearFacingCamera = false
let userCameraView = gpuImageView
userCameraView?.fillMode = kGPUImageFillModePreserveAspectRatioAndFill;
//filter's declaration up top - let filter = GPUImageFilter()
videoCamera?.audioEncodingTarget = movieWriter;
func shouldUpdateRawInput(_ data: UnsafeMutablePointer<GLubyte>!) {//updated 30x per second
if let rawDataInput = rawDataInput {
rawDataInput.updateData(fromBytes: data, size: self.view.frame.size)
} else {
//first time creating it
rawDataInput = GPUImageRawDataInput(bytes: data, size: self.view.frame.size, pixelFormat: GPUPixelFormatBGRA)
//this is my conversion of the pixel buffer to the GLubyte in another file
CVPixelBufferLockBaseAddress(pixelBuf, 0);
GLubyte* rawDataBytes=(GLubyte*)CVPixelBufferGetBaseAddress(pixelBuf);
[_delegate shouldUpdateRawInput:rawDataBytes];


Export collage video using metalkit

How can i export collage video using different resolution videos? I'm trying to achieve like showing first image below, I'm using AVCustomEdit demo and have done so far, I created AVMutableVideoComposition pass all video trackIDs to customVideoCompositorClass and getting all videos CVPixelBuffer and than converting in MTLTexture than render all textures but problem is my video output size is square(destinationTexture) and videos size is portrait or landscape thats why every video is squeezed also how can i rotate scale position and mask shape every video? also how can i apply cifilters? should i convert every CVPixelBuffer to ciImage and ciImage back to CVPixelBuffer?
override func renderPixelBuffer(backgroundTexture: MTLTexture,
firstPixelBuffer: CVPixelBuffer,
secondPixelBuffer: CVPixelBuffer,
thirdPixelBuffer: CVPixelBuffer,
fourthPixelBuffer: CVPixelBuffer,
destinationPixelBuffer: CVPixelBuffer) {
// Create a MTLTexture from the CVPixelBuffer.
guard let firstTexture = buildTextureForPixelBuffer(firstPixelBuffer) else { return }
guard let secondTexture = buildTextureForPixelBuffer(secondPixelBuffer) else { return }
guard let thirdTexture = buildTextureForPixelBuffer(thirdPixelBuffer) else { return }
guard let fourthTexture = buildTextureForPixelBuffer(fourthPixelBuffer) else { return }
guard let destinationTexture = buildTextureForPixelBuffer(destinationPixelBuffer) else { return }
We must maintain a reference to the pixel buffer until the Metal rendering is complete. This is because the
'buildTextureForPixelBuffer' function above uses CVMetalTextureCacheCreateTextureFromImage to create a
Metal texture (CVMetalTexture) from the IOSurface that backs the CVPixelBuffer, but
CVMetalTextureCacheCreateTextureFromImage doesn't increment the use count of the IOSurface; only the
CVPixelBuffer, and the CVMTLTexture own this IOSurface. Therefore we must maintain a reference to either
the pixel buffer or Metal texture until the Metal rendering is done. The MTLCommandBuffer completion
handler below is then used to release these references.
pixelBuffers = RenderPixelBuffers(firstBuffer: firstPixelBuffer,
secondBuffer: secondPixelBuffer,
thirdBuffer: thirdPixelBuffer,
fourthBuffer: fourthPixelBuffer,
destinationBuffer: destinationPixelBuffer)
// Create a new command buffer for each renderpass to the current drawable.
let commandBuffer = commandQueue.makeCommandBuffer()!
commandBuffer.label = "MyCommand"
Obtain a drawable texture for this render pass and set up the renderpass
descriptor for the command encoder to render into.
let renderPassDescriptor = setupRenderPassDescriptorForTexture(destinationTexture)
// Create a render command encoder so we can render into something.
let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)!
renderEncoder.label = "MyRenderEncoder"
guard let renderPipelineState = renderPipelineState else { return }
modelConstants.modelViewMatrix = matrix_identity_float4x4
// Render background texture.
renderTexture(renderEncoder, texture: backgroundTexture, pipelineState: renderPipelineState)
var translationMatrix = matrix_float4x4(translation: simd_float3(-0.5, 0.5, 0))
// var rotationMatrix = matrix_float4x4(rotationZ: radians(fromDegrees: -90))
var scaleMatrix = matrix_float4x4(scaling: 0.25)
var modelMatrix = translationMatrix * scaleMatrix
modelConstants.modelViewMatrix = modelMatrix
// Render first texture.
renderTexture(renderEncoder, texture: firstTexture, pipelineState: renderPipelineState)
// translationMatrix = matrix_float4x4(translation: simd_float3(0.5, -0.5, 0))
// rotationMatrix = matrix_float4x4(rotationZ: radians(fromDegrees: -45))
// scaleMatrix = matrix_float4x4(scaling: 0.5)
// modelMatrix = translationMatrix * scaleMatrix * rotationMatrix
// modelConstants.modelViewMatrix = modelMatrix
// // Render second texture.
// renderTexture(renderEncoder, texture: secondTexture, pipelineState: renderPipelineState)
// // Render third texture.
// renderTexture(renderEncoder, texture: thirdTexture, pipelineState: renderPipelineState)
// // Render fourth texture.
// renderTexture(renderEncoder, texture: fourthTexture, pipelineState: renderPipelineState)
// We're done encoding commands.
// Use the command buffer completion block to release the reference to the pixel buffers.
commandBuffer.addCompletedHandler({ _ in
self.pixelBuffers = nil // Release the reference to the pixel buffers.
// Finalize rendering here & push the command buffer to the GPU.
I would recommend to use a library called MetalPetal. It is an image processing framework based on Metal.You have to convert the CVPixelBuffer in to MetalImage that is MTIImage. and then you can do anything in the image like there are premade filters and you can apply to it or you can use even CIFilter or your custom filters and you can transform , rotate , crop every frame so that collage frames are accurate . then you have to convert the MTIimage to cvpixelbuffer again . Here you can also CIImage but it will be slow i guess. And you are getting box images maybe for the render size . Please see the render size .

Ways to do inter-frame video compression in AVFoundation

I've created a process to generate video "slideshows" from collections of photographs and images in an application that I'm building. The process is functioning correctly, but creates unnecessarily large files given that any photographs included in the video repeat for 100 to 150 frames unchanged. I've included whatever compression I can find in AVFoundation, which mostly applies intra-frame techniques and tried to find more information on inter-frame compression in AVFoundation. Unfortunately, there are only a few references that I've been able to find and nothing that has let me get it to work.
I'm hoping that someone can steer me in the right direction. The code for the video generator is included below. I've not included the code for fetching and preparing the individual frames (called below as self.getFrame()) since that seems to be working fine and gets quite complex since it handles photos, videos, adding title frames, and doing fade transitions. For repeated frames, it returns a structure with the frame image and a counter for the number of output frames to include.
// Create a new AVAssetWriter Instance that will build the video
assetWriter = createAssetWriter(path: filePathNew, size: videoSize!)
guard assetWriter != nil else
print("Error converting images to video: AVAssetWriter not created.")
inProcess = false
let writerInput = assetWriter!.inputs.filter{ $0.mediaType == AVMediaTypeVideo }.first!
let sourceBufferAttributes : [String : AnyObject] = [
kCVPixelBufferPixelFormatTypeKey as String : Int(kCVPixelFormatType_32ARGB) as AnyObject,
kCVPixelBufferWidthKey as String : videoSize!.width as AnyObject,
kCVPixelBufferHeightKey as String : videoSize!.height as AnyObject,
AVVideoMaxKeyFrameIntervalKey as String : 50 as AnyObject,
AVVideoCompressionPropertiesKey as String : [
AVVideoAverageBitRateKey: 725000,
AVVideoProfileLevelKey: AVVideoProfileLevelH264Baseline30,
] as AnyObject
let pixelBufferAdaptor = AVAssetWriterInputPixelBufferAdaptor(assetWriterInput: writerInput, sourcePixelBufferAttributes: sourceBufferAttributes)
// Start the writing session
assetWriter!.startSession(atSourceTime: kCMTimeZero)
if (pixelBufferAdaptor.pixelBufferPool == nil) {
print("Error converting images to video: pixelBufferPool nil after starting session")
inProcess = false
// -- Create queue for <requestMediaDataWhenReadyOnQueue>
let mediaQueue = DispatchQueue(label: "mediaInputQueue")
// Initialize run time values
var presentationTime = kCMTimeZero
var done = false
var nextFrame: FramePack? // The FramePack struct has the frame to output, noDisplays - the number of times that it will be output
// and an isLast flag that is true when it's the final frame
writerInput.requestMediaDataWhenReady(on: mediaQueue, using: { () -> Void in // Keeps invoking the block to get input until call markAsFinished
nextFrame = self.getFrame() // Get the next frame to be added to the output with its associated values
let imageCGOut = nextFrame!.frame // The frame to output
if nextFrame!.isLast { done = true } // Identifies the last frame so can drop through to markAsFinished() below
var frames = 0 // Counts how often we've output this frame
var waitCount = 0 // Used to avoid an infinite loop if there's trouble with writer.Input
while (frames < nextFrame!.noDisplays) && (waitCount < 1000000) // Need to wait for writerInput to be ready - count deals with potential hung writer
waitCount += 1
if waitCount == 1000000 // Have seen it go into 100s of thousands and succeed
print("Exceeded waitCount limit while attempting to output slideshow frame.")
self.inProcess = false
if (writerInput.isReadyForMoreMediaData)
waitCount = 0
frames += 1
if let pixelBufferPool = pixelBufferAdaptor.pixelBufferPool
let pixelBufferPointer = UnsafeMutablePointer<CVPixelBuffer?>.allocate(capacity: 1)
let status: CVReturn = CVPixelBufferPoolCreatePixelBuffer(
if let pixelBuffer = pixelBufferPointer.pointee, status == 0
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
let pixelData = CVPixelBufferGetBaseAddress(pixelBuffer)
let rgbColorSpace = CGColorSpaceCreateDeviceRGB()
// Set up a context for rendering using the PixelBuffer allocated above as the target
let context = CGContext(
data: pixelData,
width: Int(self.videoWidth),
height: Int(self.videoHeight),
bitsPerComponent: 8,
bytesPerRow: CVPixelBufferGetBytesPerRow(pixelBuffer),
space: rgbColorSpace,
bitmapInfo: CGImageAlphaInfo.premultipliedFirst.rawValue
// Draw the image into the PixelBuffer used for the context
context?.draw(imageCGOut, in: CGRect(x: 0.0,y: 0.0,width: 1280, height: 720))
// Append the image (frame) from the context pixelBuffer onto the video file
_ = pixelBufferAdaptor.append(pixelBuffer, withPresentationTime: presentationTime)
presentationTime = presentationTime + CMTimeMake(1, videoFPS)
// We're done with the PixelBuffer, so unlock it
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: CVOptionFlags(0)))
pixelBufferPointer.deallocate(capacity: 1)
} else {
NSLog("Error: Failed to allocate pixel buffer from pool")
Thanks in advance for any suggestions.
It looks like you're
appending a bunch of redundant frames to your video,
labouring under a misapprehension: that video files must have a constant framerate that is high, e.g. 30fps.
If, for example, you're showing a slideshow of 3 images over a duration of 15 seconds, then you need only output 3 images, with presentation timestamps of 0s, 5s, 10s and an assetWriter.endSession(atSourceTime:) of 15s, not 15s * 30 FPS = 450 frames .
In other words, your frame rate is way too high - for the best interframe compression money can buy, lower your frame rate to the bare minimum number of frames you need and all will be well*.
*I've seen some video services/players choke on unusually low framerates,
so you may need a minimum framerate and some redundant frames, e.g. 1frame/5s, ymmv

iOS -- How to change video resolution in webRTC?

I am trying to change local video resolution in webRTC. I used following method to create local video tracker:
-(RTCVideoTrack *)createLocalVideoTrack {
RTCVideoTrack *localVideoTrack = nil;
RTCMediaConstraints *mediaConstraints = [[RTCMediaConstraints alloc] initWithMandatoryConstraints:nil optionalConstraints:nil];
RTCAVFoundationVideoSource *source =
[self.factory avFoundationVideoSourceWithConstraints:mediaConstraints];
localVideoTrack =
[self.factory videoTrackWithSource:source
return localVideoTrack;
I set the mandatory constraint as follow, but it doesn't work:
Could anyone help me?
Latest SDK builds don't provide factory method to build capturer with constraints any more. Solution should be based on AVCaptureSession instead and WebRTC will take care about CPU and bandwidth utilization.
For this you need to keep reference to your RTCVideoSource that was passed to capturer. It has method:
- (void)adaptOutputFormatToWidth:(int)width height:(int)height fps:(int)fps;
Calling this function will cause frames to be scaled down to the requested resolution. Also, frames will be cropped to match the requested aspect ratio, and frames will be dropped to match the requested fps. The requested aspect ratio is orientation agnostic and will be adjusted to maintain the input orientation, so it doesn't matter if e.g. 1280x720 or 720x1280 is requested.
var localVideoSource: RTCVideoSource?
You may create your video track this way:
func createVideoTrack() -> RTCVideoTrack? {
var source: RTCVideoSource
if let localSource = self.localVideoSource {
source = localSource
} else {
source = self.factory.videoSource()
self.localVideoSource = source
let devices = RTCCameraVideoCapturer.captureDevices()
if let camera = devices.first,
// here you can decide to use front or back camera
let format = RTCCameraVideoCapturer.supportedFormats(for: camera).last,
// here you have a bunch of formats from tiny to up to 4k, find 1st that conforms your needs, i.e. if you usemax 1280x720, then no need to pick 4k
let fps = format.videoSupportedFrameRateRanges.first?.maxFrameRate
// or take smth in between min..max, i.e. 24 fps and not 30, to reduce gpu/cpu use {
let intFps = Int(fps)
let capturer = RTCCameraVideoCapturer(delegate: source)
capturer.startCapture(with: camera, format: format, fps: intFps)
let videoTrack = self.factory.videoTrack(with: source, trackId: WebRTCClient.trackIdVideo)
return videoTrack
retun nil
And when you need to change resolution, you can tell this video source to do "scaling".
func changeResolution(w: Int32, h: Int32) -> Bool {
guard let videoSource = self.localVideoSource else {
return false
// TODO: decide fps
videoSource.adaptOutputFormat(toWidth: w, height: h, fps: 30)
return true
Camera will still capture frames with resolution providd in format to startCapture. And if you care about resource utilization, then you can also use next methods prior to adaptOutputFormat.
// Stops the capture session asynchronously and notifies callback on completion.
- (void)stopCaptureWithCompletionHandler:(nullable void (^)(void))completionHandler;
// Starts the capture session asynchronously.
- (void)startCaptureWithDevice:(AVCaptureDevice *)device format:(AVCaptureDeviceFormat *)format fps:(NSInteger)fps;

GPUImage crop to CGRect and rotate

Given a CGRect, I want to use GPUImage to crop a video. For example, if the rect is (0, 0, 50, 50), the video would be cropped at (0,0) with a length of 50 on each side.
What's throwing me is that GPUImageCropFilter doesn't take a rectangle, rather a normalized crop region with values ranging from 0 to 1. My intuition was to to this:
let assetSize = CGSizeApplyAffineTransform(videoTrack.naturalSize, videoTrack.preferredTransform)
let cropRect = CGRect(x: frame.minX/assetSize.width,
y: frame.minY/assetSize.height,
width: frame.width/assetSize.width,
height: frame.height/assetSize.height)
to calculate the crop region based on the size of the incoming asset. Then:
// Filter
let cropFilter = GPUImageCropFilter(cropRegion: cropRect)
let url = NSURL(fileURLWithPath: "\(NSTemporaryDirectory())\(String.random()).mp4")
let movieWriter = GPUImageMovieWriter(movieURL: url, size: assetSize)
movieWriter.encodingLiveVideo = false
movieWriter.shouldPassthroughAudio = false
// add targets
cropFilter.setInputRotation(kGPUImageRotateRight, atIndex: 0)
What should the movie writer size be? Shouldn't it be the size of the frame I want to crop with? And should I be using forceProcessingAtSize with the size value of my crop frame?
A complete code example would be great; I've been trying for hours and I can't seem to get the section of the video that I want.
if let videoTrack = self.asset.tracks.first {
let movieFile = GPUImageMovie(asset: self.asset)
let transformedRegion = CGRectApplyAffineTransform(region, videoTrack.preferredTransform)
// Filters
let cropFilter = GPUImageCropFilter(cropRegion: transformedRegion)
let url = NSURL(fileURLWithPath: "\(NSTemporaryDirectory())\(String.random()).mp4")
let renderSize = CGSizeApplyAffineTransform(videoTrack.naturalSize, CGAffineTransformMakeScale(transformedRegion.width, transformedRegion.height))
let movieWriter = GPUImageMovieWriter(movieURL: url, size: renderSize)
movieWriter.transform = videoTrack.preferredTransform
movieWriter.encodingLiveVideo = false
movieWriter.shouldPassthroughAudio = false
// add targets
movieWriter.completionBlock = {
movieWriter.failureBlock = { _ in
disposable.addDisposable {
As you note, the GPUImageCropFilter takes in a rectangle in normalized coordinates. You're on the right track, in that you just need to convert your CGRect in pixels to normalized coordinates by dividing the X components (origin.x and size.width) by the width of the image and the Y components by the height.
You don't need to use forceProcessingAtSize(), because the crop will automatically output an image of the appropriate cropped size. The movie writer's size should be matched to this cropped size, which you should know from your original CGRect.
The one complication you introduce is the rotation. If you need to apply a rotation in addition to your crop, you might want to check and make sure that you don't need to swap your X and Y for your crop region. This should be apparent in the output if the two need to be swapped.
There were some bugs with applying rotation at the same time as a crop a while ago, and I can't remember if I fixed all those. If I didn't, you could insert a dummy filter (gamma or brightness set to default values) before or after the crop and apply the rotation at that stage.

ios - imageFromCurrentFramebuffer from gpuimage lib saves black frame

I'm applying list of filters to detect shape during camera capturing process. Once shape is detected - want to save it to photos for review. Googled for imageFromCurrentFramebuffer, but is always saves black picture.
// camera init
var videoCamera = GPUImageVideoCamera(sessionPreset: AVCaptureSessionPreset1920x1080, cameraPosition: .Back)
videoCamera!.outputImageOrientation = .Portrait;
// filter init
var houghTransformFilter = GPUImageHoughTransformLineDetector()
houghTransformFilter!.lineDetectionThreshold = 0.3
houghTransformFilter!.useNextFrameForImageCapture() //without this crashes
houghTransformFilter!.linesDetectedBlock = {
// my custom shape detection logic
if (found) {
var capturedImage:UIImage? = self.houghTransformFilter!.imageFromCurrentFramebuffer()
UIImageWriteToSavedPhotosAlbum(capturedImage, nil, nil, nil);
I had a similar problem. You probably need to add this line before the call to grab the frame:
