Problems (wrong scale, aspect ratio) while rendering MTKTexture to MTKView after applying compute shaders - ios

I am trying to process video frames from the camera using compute metal shaders and display it to the user. The problem is with displaying the modified frames. The output contains stacked copies of the processed frame with some of them clipped and they don't fill the screen completely.
P.S I am new to both iOS and metal
So far, I have identified variables that control this:
1. Number of Thread groups launched
2. MTKView's drawable size
3. sampling id in the metal shader
I have played around with these with no good result.
Below are the code and my output
The function that sets up the MTKView
func initMetalView() {
metalView = MTKView(frame: view.frame, device: metalDevice)
metalView.delegate = self
metalView.framebufferOnly = false
metalView.colorPixelFormat = .bgra8Unorm
metalView.autoResizeDrawable = false
metalView.drawableSize = CGSize(width: 1920, height: 1080)
metalView.layer.transform = CATransform3DMakeRotation(CGFloat(Float.pi),0.0,1.0,0.0)
view.insertSubview(metalView, at: 0)
The AVCaptureVideoDataOutputSampleBufferDelegate used to convert CMSampleBuffer to MTLTexture
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// sample buffer -> image buffer -> CoreVideo metal texture -> MTL texture
guard let cvImageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
else { fatalError("can't get image buffer") }
var textureCache: CVMetalTextureCache?
guard CVMetalTextureCacheCreate(kCFAllocatorDefault, nil, metalDevice, nil, &textureCache) == kCVReturnSuccess else { fatalError("cant create texture cache") }
let width = CVPixelBufferGetWidth(cvImageBuffer)
let height = CVPixelBufferGetHeight(cvImageBuffer)
var imageTexture: CVMetalTexture?
let result = CVMetalTextureCacheCreateTextureFromImage(kCFAllocatorDefault, textureCache!, cvImageBuffer, nil, MTLPixelFormat.bgra8Unorm, width, height, 0, &imageTexture)
guard let unwrappedImageTexture = imageTexture,
result == kCVReturnSuccess
else { fatalError("failed to create texture from image") }
inputTexture = CVMetalTextureGetTexture(unwrappedImageTexture)
The MTKViewDelegate used to apply the shader on inputTexture and display the outputTexture to metalView
extension ViewController: MTKViewDelegate {
func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
func draw(in view: MTKView) {
guard let inputTexture = inputTexture,
let commandQueue = commandQueue,
let commandBuffer = commandQueue.makeCommandBuffer(),
let encoder = commandBuffer.makeComputeCommandEncoder(),
let pipelineState = pipelineState
else { return }
encoder.setTextures([metalView.currentDrawable!.texture, inputTexture], range: 0..<2)
encoder.dispatchThreadgroups(MTLSizeMake(inputTexture.width/16, inputTexture.height/16, 1), threadsPerThreadgroup: threadsPerBlock)
// inputTexture w:1920, h:1080
The metal compute shader
#include <metal_stdlib>
using namespace metal;
kernel void blacky (texture2d<float, access::write> outTexture [[texture(0)]],
texture2d<float, access::read> inTexture [[texture(1)]],
uint2 id [[thread_position_in_grid]]) {
uint2 flipped_id = uint2(id.y, id.x);
float3 val =;
float g = (val.r + val.g + val.b)/3.0;
float4 out = float4(g, g, g, 1);
outTexture.write(out.rgba, id);
You can see the current output here:


My custom metal image filter is slow. How can I make it faster?

I've seen a lot of other's online tutorial that are able to achieve 0.0X seconds mark on filtering an image. Meanwhile my code here took 1.09 seconds to filter an image.(Just to reduce brightness by half).
edit after first comment
time measured with 2 methods
Date() timeinterval , when the button “apply filter” tapped and after the apply filter function is done running
build it on iphone and count manually with my timer on my watch
Since I'm new to metal & kernel stuff, I don't really know the difference between my code and those tutorials that achieve faster result. Which part of my code can be improved/ use different approach to make it a lot faster.
here's my kernel code
#include <metal_stdlib>
using namespace metal;
kernel void black(
texture2d<float, access::write> outTexture [[texture(0)]],
texture2d<float, access::read> inTexture [[texture(1)]],
uint2 id [[thread_position_in_grid]]) {
float3 val =;
float r = val.r / 4;
float g = val.g / 4;
float b = val.b / 2;
float4 out = float4(r, g, b, 1.0);
outTexture.write(out.rgba, id);
this is my swift code
import Metal
import MetalKit
// UIImage -> CGImage -> MTLTexture -> COMPUTE HAPPENS |
// UIImage <- CGImage <- MTLTexture <--
class Filter {
var device: MTLDevice
var defaultLib: MTLLibrary?
var grayscaleShader: MTLFunction?
var commandQueue: MTLCommandQueue?
var commandBuffer: MTLCommandBuffer?
var commandEncoder: MTLComputeCommandEncoder?
var pipelineState: MTLComputePipelineState?
var inputImage: UIImage
var height, width: Int
// most devices have a limit of 512 threads per group
let threadsPerBlock = MTLSize(width: 32, height: 32, depth: 1)
self.device = MTLCreateSystemDefaultDevice()!
//changes: I did do catch try, and use bundle parameter when making make default library
let frameworkBundle = Bundle(for: type(of: self))
self.defaultLib = device.makeDefaultLibrary()
self.grayscaleShader = defaultLib?.makeFunction(name: "black")
self.commandQueue = self.device.makeCommandQueue()
self.commandBuffer = self.commandQueue?.makeCommandBuffer()
self.commandEncoder = self.commandBuffer?.makeComputeCommandEncoder()
if let shader = grayscaleShader {
self.pipelineState = try? self.device.makeComputePipelineState(function: shader)
} else { fatalError("unable to make compute pipeline") }
self.inputImage = UIImage(named: "stockImage")!
self.height = Int(self.inputImage.size.height)
self.width = Int(self.inputImage.size.width)
func getCGImage(from uiimg: UIImage) -> CGImage? {
uiimg.draw(in: CGRect(origin: .zero, size: uiimg.size))
let contextImage = UIGraphicsGetImageFromCurrentImageContext()
return contextImage?.cgImage
func getMTLTexture(from cgimg: CGImage) -> MTLTexture {
let textureLoader = MTKTextureLoader(device: self.device)
let texture = try textureLoader.newTexture(cgImage: cgimg, options: nil)
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: texture.pixelFormat, width: width, height: height, mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
return texture
} catch {
fatalError("Couldn't convert CGImage to MTLtexture")
func getCGImage(from mtlTexture: MTLTexture) -> CGImage? {
var data = Array<UInt8>(repeatElement(0, count: 4*width*height))
bytesPerRow: 4*width,
from: MTLRegionMake2D(0, 0, width, height),
mipmapLevel: 0)
let bitmapInfo = CGBitmapInfo(rawValue: (CGBitmapInfo.byteOrder32Big.rawValue | CGImageAlphaInfo.premultipliedLast.rawValue))
let colorSpace = CGColorSpaceCreateDeviceRGB()
let context = CGContext(data: &data,
width: width,
height: height,
bitsPerComponent: 8,
bytesPerRow: 4*width,
space: colorSpace,
bitmapInfo: bitmapInfo.rawValue)
return context?.makeImage()
func getUIImage(from cgimg: CGImage) -> UIImage? {
return UIImage(cgImage: cgimg)
func getEmptyMTLTexture() -> MTLTexture? {
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: MTLPixelFormat.rgba8Unorm,
width: width,
height: height,
mipmapped: false)
textureDescriptor.usage = [.shaderRead, .shaderWrite]
return self.device.makeTexture(descriptor: textureDescriptor)
func getInputMTLTexture() -> MTLTexture? {
if let inputImage = getCGImage(from: self.inputImage) {
return getMTLTexture(from: inputImage)
else { fatalError("Unable to convert Input image to MTLTexture") }
func getBlockDimensions() -> MTLSize {
let blockWidth = width / self.threadsPerBlock.width
let blockHeight = height / self.threadsPerBlock.height
return MTLSizeMake(blockWidth, blockHeight, 1)
func applyFilter() -> UIImage? {
let date = Date()
if let encoder = self.commandEncoder, let buffer = self.commandBuffer,
let outputTexture = getEmptyMTLTexture(), let inputTexture = getInputMTLTexture() {
encoder.setTextures([outputTexture, inputTexture], range: 0..<2)
encoder.dispatchThreadgroups(self.getBlockDimensions(), threadsPerThreadgroup: threadsPerBlock)
guard let outputImage = getCGImage(from: outputTexture) else { fatalError("Couldn't obtain CGImage from MTLTexture") }
let date2 = Date()
return getUIImage(from: outputImage)
} else { fatalError("optional unwrapping failed") }
In case someone still need the answer, I found a different approach which is make it as custom CIFilter. It works pretty fast and super easy to undestand!
You using UIImage, CGImage. These objects stored in CPU memory.
Need implement code with using just CIImage or MTLTexture.
These object are storing in GPU memory and have best performace.

Rotating Metal texture 180 degrees

I added a sample code if someone want's to try fixing it:
Im making an AR app using Vuforia SDK. Their sample code contains video background rendering using metal. It works fine for vertical orientation, but I need to change it to portrait. The problem is, that after changing it, the video is rendered upside down. Detected targets are in correct orientation, so I think this should be fixed in Metal rendering class. Could someone help me do that? Below is the code Im using to draw that background. How can I rotate it 180 degrees?
private var mMetalDevice: MTLDevice
private var mVideoBackgroundPipelineState: MTLRenderPipelineState!
private var mUniformColorShaderPipelineState: MTLRenderPipelineState!
private var mTexturedVertexShaderPipelineState: MTLRenderPipelineState!
private var mDefaultSamplerState: MTLSamplerState?
private var mVideoBackgroundVertices: MTLBuffer!
private var mVideoBackgroundIndices: MTLBuffer!
private var mVideoBackgroundTextureCoordinates: MTLBuffer!
/// Initialize the renderer ready for use
init(metalDevice: MTLDevice, layer: CAMetalLayer, library: MTLLibrary?, textureDepth: MTLTexture) {
mMetalDevice = metalDevice
let stateDescriptor = MTLRenderPipelineDescriptor()
// Video background
stateDescriptor.vertexFunction = library?.makeFunction(name: "texturedVertex")
stateDescriptor.fragmentFunction = library?.makeFunction(name: "texturedFragment")
stateDescriptor.colorAttachments[0].pixelFormat = layer.pixelFormat
stateDescriptor.depthAttachmentPixelFormat = textureDepth.pixelFormat
// And create the pipeline state with the descriptor
do {
try mVideoBackgroundPipelineState = metalDevice.makeRenderPipelineState(descriptor: stateDescriptor)
} catch {
print("Failed to create video background render pipeline state:",error)
// Augmentations
// Create pipeline for transparent object overlays
stateDescriptor.vertexFunction = library?.makeFunction(name: "uniformColorVertex")
stateDescriptor.fragmentFunction = library?.makeFunction(name: "uniformColorFragment")
stateDescriptor.colorAttachments[0].pixelFormat = layer.pixelFormat
stateDescriptor.colorAttachments[0].isBlendingEnabled = true
stateDescriptor.colorAttachments[0].rgbBlendOperation = .add
stateDescriptor.colorAttachments[0].alphaBlendOperation = .add
stateDescriptor.colorAttachments[0].sourceRGBBlendFactor = .sourceAlpha
stateDescriptor.colorAttachments[0].sourceAlphaBlendFactor = .sourceAlpha
stateDescriptor.colorAttachments[0].destinationRGBBlendFactor = .oneMinusSourceAlpha
stateDescriptor.colorAttachments[0].destinationAlphaBlendFactor = .oneMinusSourceAlpha
stateDescriptor.depthAttachmentPixelFormat = textureDepth.pixelFormat
do {
try mUniformColorShaderPipelineState = metalDevice.makeRenderPipelineState(descriptor: stateDescriptor)
} catch {
print("Failed to create augmentation render pipeline state:",error)
stateDescriptor.vertexFunction = library?.makeFunction(name: "texturedVertex")
stateDescriptor.fragmentFunction = library?.makeFunction(name: "texturedFragment")
// Create pipeline for rendering textures
do {
try mTexturedVertexShaderPipelineState = metalDevice.makeRenderPipelineState(descriptor: stateDescriptor)
} catch {
print("Failed to create guide view render pipeline state:", error)
mDefaultSamplerState = MetalRenderer.defaultSampler(device: metalDevice)
// Allocate space for rendering data for Video background
mVideoBackgroundVertices = mMetalDevice.makeBuffer(length: MemoryLayout<Float>.size * 3 * 4, options: [.optionCPUCacheModeWriteCombined])
mVideoBackgroundTextureCoordinates = mMetalDevice.makeBuffer(length: MemoryLayout<Float>.size * 2 * 4, options: [.optionCPUCacheModeWriteCombined])
mVideoBackgroundIndices = mMetalDevice.makeBuffer(length: MemoryLayout<UInt16>.size * 6, options: [.optionCPUCacheModeWriteCombined])
/// Render the video background
func renderVideoBackground(encoder: MTLRenderCommandEncoder?, projectionMatrix: MTLBuffer, mesh: VuforiaMesh) {
// Copy mesh data into metal buffers
mVideoBackgroundVertices.contents().copyMemory(from: mesh.vertices, byteCount: MemoryLayout<Float>.size * Int(mesh.numVertices) * 3)
mVideoBackgroundTextureCoordinates.contents().copyMemory(from: mesh.textureCoordinates, byteCount: MemoryLayout<Float>.size * Int(mesh.numVertices) * 2)
mVideoBackgroundIndices.contents().copyMemory(from: mesh.indices, byteCount: MemoryLayout<CShort>.size * Int(mesh.numIndices))
// Set the render pipeline state
// Set the vertex buffer
encoder?.setVertexBuffer(mVideoBackgroundVertices, offset: 0, index: 0)
// Set the projection matrix
encoder?.setVertexBuffer(projectionMatrix, offset: 0, index: 1)
// Set the texture coordinate buffer
encoder?.setVertexBuffer(mVideoBackgroundTextureCoordinates, offset: 0, index: 2)
encoder?.setFragmentSamplerState(mDefaultSamplerState, index: 0)
// Draw the geometry
type: .triangle,
indexCount: 6,
indexType: .uint16,
indexBuffer: mVideoBackgroundIndices,
indexBufferOffset: 0
extension MetalRenderer {
class func defaultSampler(device: MTLDevice) -> MTLSamplerState? {
let sampler = MTLSamplerDescriptor()
sampler.minFilter = .linear
sampler.magFilter = .linear
sampler.mipFilter = .linear
sampler.maxAnisotropy = 1
sampler.sAddressMode = .clampToEdge
sampler.tAddressMode = .clampToEdge
sampler.rAddressMode = .clampToEdge
sampler.normalizedCoordinates = true
sampler.lodMinClamp = 0
sampler.lodMaxClamp = .greatestFiniteMagnitude
return device.makeSamplerState(descriptor: sampler)
Adding code from the view that creates renderer:
import UIKit
import MetalKit
protocol VuforiaViewDelegate: AnyObject {
func renderFrame(vuforiaView: VuforiaView)
class VuforiaView: UIView {
weak var delegate: VuforiaViewDelegate?
var mVuforiaStarted = false
private var mConfigurationChanged = true
private var mRenderer: MetalRenderer!
private var mMetalDevice: MTLDevice!
private var mMetalCommandQueue: MTLCommandQueue!
private var mCommandExecutingSemaphore: DispatchSemaphore!
private var mDepthStencilState: MTLDepthStencilState!
private var mDepthTexture: MTLTexture!
private var mVideoBackgroundProjectionBuffer: MTLBuffer!
private lazy var metalLayer = layer as! CAMetalLayer
override class var layerClass: AnyClass { CAMetalLayer.self }
// Transformations and variables - constantly updated by vuforia frame updates
private var viewport = MTLViewport()
private var trackableProjection = matrix_float4x4()
private var trackableModelView = matrix_float4x4()
private var trackableScaledModelView = matrix_float4x4()
private(set) var worldOriginProjectionMatrix = matrix_float4x4()
private(set) var worldOriginModelViewMatrix = matrix_float4x4()
private(set) var targetPose = matrix_float4x4()
private(set) var targetSize = simd_float3()
override init(frame: CGRect) {
super.init(frame: frame)
required init?(coder: NSCoder) {
super.init(coder: coder)
private func setup() {
contentScaleFactor = UIScreen.main.nativeScale
// Get the system default metal device
mMetalDevice = MTLCreateSystemDefaultDevice()
// Metal command queue
mMetalCommandQueue = mMetalDevice.makeCommandQueue()
// Create a dispatch semaphore, used to synchronise command execution
mCommandExecutingSemaphore = DispatchSemaphore(value: 1)
// Create a CAMetalLayer and set its frame to match that of the view
let layer = self.layer as! CAMetalLayer
layer.device = mMetalDevice
layer.pixelFormat = .bgra8Unorm
layer.framebufferOnly = true
layer.contentsScale = contentScaleFactor
// Get the default library from the bundle (Metal shaders)
let library = mMetalDevice.makeDefaultLibrary()
// Create a depth texture that is needed when rendering the augmentation.
let screenSize = UIScreen.main.bounds.size
let depthTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .depth32Float,
width: Int(screenSize.width * contentScaleFactor),
height: Int(screenSize.height * contentScaleFactor),
mipmapped: false
depthTextureDescriptor.usage = .renderTarget
mDepthTexture = mMetalDevice.makeTexture(descriptor: depthTextureDescriptor)
// Video background projection matrix buffer
mVideoBackgroundProjectionBuffer = mMetalDevice.makeBuffer(length: MemoryLayout<Float>.size * 16, options: [])
// Fragment depth stencil
let depthStencilDescriptor = MTLDepthStencilDescriptor()
depthStencilDescriptor.depthCompareFunction = .less
depthStencilDescriptor.isDepthWriteEnabled = true
mDepthStencilState = mMetalDevice.makeDepthStencilState(descriptor: depthStencilDescriptor)
mRenderer = MetalRenderer(
metalDevice: mMetalDevice,
layer: layer,
library: library,
textureDepth: mDepthTexture
private func configureVuforia() {
let orientationValue: Int32 = {
let orientation = { $0.isKeyWindow })?.windowScene?.interfaceOrientation ?? .portrait
switch orientation {
case .portrait: return 0
case .portraitUpsideDown: return 1
case .landscapeLeft: return 2
case .landscapeRight: return 3
case .unknown: return 4
#unknown default: return 4
let screenSize = UIScreen.main.bounds.size
Int32(screenSize.width * contentScaleFactor),
Int32(screenSize.height * contentScaleFactor),
#objc private func renderFrameVuforia() {
if mVuforiaStarted {
if mConfigurationChanged {
mConfigurationChanged = false
delegate?.renderFrame(vuforiaView: self)
private func renderFrameVuforiaInternal() {
//Check if Camera is Started
guard isCameraStarted() else { return }
// ========== Set up ==========
var viewportsValue = Array(arrayLiteral: 0.0, 0.0, Double(metalLayer.drawableSize.width), Double(metalLayer.drawableSize.height), 0.0, 1.0)
// --- Command buffer ---
// Get the command buffer from the command queue
let commandBuffer = mMetalCommandQueue.makeCommandBuffer()
// Get the next drawable from the CAMetalLayer
let drawable = metalLayer.nextDrawable()
// It's possible for nextDrawable to return nil, which means a call to
// renderCommandEncoderWithDescriptor will fail
guard drawable != nil else { return }
// Wait for exclusive access to the GPU
let _ = mCommandExecutingSemaphore.wait(timeout: .distantFuture)
// -- Render pass descriptor ---
// Set up a render pass decriptor
let renderPassDescriptor = MTLRenderPassDescriptor()
// Draw to the drawable's texture
renderPassDescriptor.colorAttachments[0].texture = drawable?.texture
// Clear the colour attachment in case there is no video frame
renderPassDescriptor.colorAttachments[0].loadAction = .clear
// Store the data in the texture when rendering is complete
renderPassDescriptor.colorAttachments[0].storeAction = .store
// Use textureDepth for depth operations.
renderPassDescriptor.depthAttachment.texture = mDepthTexture
// Get a command encoder to encode into the command buffer
let encoder = commandBuffer?.makeRenderCommandEncoder(descriptor: renderPassDescriptor)
if prepareToRender(
) {
viewport.originX = viewportsValue[0]
viewport.originY = viewportsValue[1]
viewport.width = viewportsValue[2]
viewport.height = viewportsValue[3]
viewport.znear = viewportsValue[4]
viewport.zfar = viewportsValue[5]
// Once the camera is initialized we can get the video background rendering values
// Call the renderer to draw the video background
mRenderer.renderVideoBackground(encoder: encoder, projectionMatrix: mVideoBackgroundProjectionBuffer, mesh: getVideoBackgroundMesh())
// Pass Metal context data to Vuforia Engine (we may have changed the encoder since
// calling Vuforia::Renderer::begin)
// ========== Finish Metal rendering ==========
// Commit the rendering commands
// Command completed handler
commandBuffer?.addCompletedHandler { _ in self.mCommandExecutingSemaphore.signal() }
// Present the drawable when the command buffer has been executed (Metal
// calls to CoreAnimation to tell it to put the texture on the display when
// the rendering is complete)
// Commit the command buffer for execution as soon as possible
Another problem is that in portrait mode something is wrong with aspect ratio, camera background is drawn distorted. But this is for another subject.
Copyright (c) 2020, PTC Inc. All rights reserved.
Vuforia is a trademark of PTC Inc., registered in the United States and other
#include <metal_stdlib>
using namespace metal;
// === Texture sampling shader ===
struct VertexTextureOut
float4 m_Position [[ position ]];
float2 m_TexCoord;
vertex VertexTextureOut texturedVertex(constant packed_float3* pPosition [[ buffer(0) ]],
constant float4x4* pMVP [[ buffer(1) ]],
constant float2* pTexCoords [[ buffer(2) ]],
uint vid [[ vertex_id ]])
VertexTextureOut out;
float4 in(pPosition[vid], 1.0f);
out.m_Position = *pMVP * in;
out.m_TexCoord = pTexCoords[vid];
return out;
fragment half4 texturedFragment(VertexTextureOut inFrag [[ stage_in ]],
texture2d<half> tex2D [[ texture(0) ]],
sampler sampler2D [[ sampler(0) ]])
return tex2D.sample(sampler2D, inFrag.m_TexCoord);
// === Uniform color shader ===
struct VertexOut
float4 m_Position [[ position ]];
vertex VertexOut uniformColorVertex(constant packed_float3* pPosition [[ buffer(0) ]],
constant float4x4* pMVP [[ buffer(1) ]],
uint vid [[ vertex_id ]])
VertexOut out;
float4 in(pPosition[vid], 1.0f);
out.m_Position = *pMVP * in;
return out;
fragment float4 uniformColorFragment(constant float4 &color [[ buffer(0) ]])
return color;
// === Vertex color shader ===
struct VertexColorOut
float4 m_Position [[ position ]];
float4 m_Color;
vertex VertexColorOut vertexColorVertex(constant packed_float3* pPosition [[ buffer(0) ]],
constant float4* pColor [[ buffer(1) ]],
constant float4x4* pMVP [[ buffer(2) ]],
uint vid [[ vertex_id ]])
VertexColorOut out;
float4 in(pPosition[vid], 1.0f);
out.m_Position = *pMVP * in;
out.m_Color = pColor[vid];
return out;
fragment float4 vertexColorFragment(VertexColorOut inFrag [[ stage_in ]])
return inFrag.m_Color;

AVCaptureVideoDataOutputSampleBufferDelegate drop frames using CIFilters for video filtering

I have very strange case where AVCaptureVideoDataOutputSampleBufferDelegate drops frames if I use 13 different filter chains. Let me explain:
I have CameraController setup, nothing special, here is my delegate method:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if !paused {
if connection.output?.connection(with: .audio) == nil {
//capture video
// my try to avoid "Out of buffers error", no luck ;(
lastCapturedBuffer = nil
let err = CMSampleBufferCreateCopy(allocator: kCFAllocatorDefault, sampleBuffer: sampleBuffer, sampleBufferOut: &lastCapturedBuffer)
if err == noErr {
connection.videoOrientation = .portrait
// getting image
let pixelBuffer = CMSampleBufferGetImageBuffer(lastCapturedBuffer!)
// remove if any
CVPixelBufferLockBaseAddress(pixelBuffer!, CVPixelBufferLockFlags(rawValue: 0))
// captured - is just ciimage property
captured = CIImage(cvPixelBuffer: pixelBuffer!)
//remove if any
CVPixelBufferUnlockBaseAddress(pixelBuffer!,CVPixelBufferLockFlags(rawValue: 0))
//CVPixelBufferUnlockBaseAddress(pixelBuffer!, .readOnly)
// transform image to targer resolution
let srcWidth = CGFloat(captured.extent.width)
let srcHeight = CGFloat(captured.extent.height)
let dstWidth: CGFloat = ConstantsManager.shared.k_video_width
let dstHeight: CGFloat = ConstantsManager.shared.k_video_height
let scaleX = dstWidth / srcWidth
let scaleY = dstHeight / srcHeight
var transform = CGAffineTransform.init(scaleX: scaleX, y: scaleY)
captured = captured.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: dstWidth, height: dstHeight))
// mirror for front camera
if front {
var t = CGAffineTransform.init(scaleX: -1, y: 1)
t = t.translatedBy(x: -ConstantsManager.shared.k_video_width, y: 0)
captured = captured.transformed(by: t)
// video capture logic
let writable = canWrite()
if writable,
sessionAtSourceTime == nil {
sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(lastCapturedBuffer!)
videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
if writable, (videoWriterInput.isReadyForMoreMediaData) {
// apply effect in realtime <- here is problem. If I comment next line, it will be fixed but effect will n't be applied
captured = FilterManager.shared.applyFilterForCamera(inputImage: captured)
// current frame in case user wants to save image as photo
self.capturedPhoto = captured
// sent frame to Camcoder view controller
self.delegate?.didCapturedFrame(frame: captured)
} else {
// capture sound
let writable = canWrite()
if writable, (audioWriterInput.isReadyForMoreMediaData) {
//print("write audio buffer")
} else {
// paused
I also implemented didDrop delegate method, here is how I figure out why it drops frames:
func captureOutput(_ output: AVCaptureOutput, didDrop sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
print("did drop")
var mode: CMAttachmentMode = 0
let reason = CMGetAttachment(sampleBuffer, key: kCMSampleBufferAttachmentKey_DroppedFrameReason, attachmentModeOut: &mode)
print("reason \(String(describing: reason))") // Optional(OutOfBuffers)
So I did it like a pro and just commented parts of code to find where is the problem. So, it here:
captured = FilterManager.shared.applyFilterForCamera(inputImage: captured)
FilterManager - is singleton, here is called func:
func applyFilterForCamera(inputImage: CIImage) -> CIImage {
return currentVsFilter!.apply(sourceImage: inputImage)
currentVsFilter is object of VSFilter type - here is example of one:
import Foundation
import AVKit
class TestFilter: CustomFilter {
let _name = "Тестовый Фильтр"
let _displayName = "Test Filter"
var tempImage: CIImage?
var final: CGImage?
override func name() -> String {
return _name
override func displayName() -> String {
return _displayName
override init() {
print("Test Filter init")
// setup my custom kernel filter
self.noise.type = GlitchFilter.GlitchType.allCases[2]
// this returns composition for playback using AVPlayer
override func composition(asset: AVAsset) -> AVMutableVideoComposition {
let composition = AVMutableVideoComposition(asset: asset, applyingCIFiltersWithHandler: { request in
let inputImage = request.sourceImage.cropped(to: request.sourceImage.extent) .userInitiated).async {
let output = self.apply(sourceImage: inputImage, forComposition: true)
request.finish(with: output, context: nil)
let size = FilterManager.shared.cropRectForOrientation().size
composition.renderSize = size
return composition
// this returns actual filtered CIImage, used for both AVPlayer composition and realtime camera
override func apply(sourceImage: CIImage, forComposition: Bool = false) -> CIImage {
// rendered text
tempImage = FilterManager.shared.textRenderedImage()
// some filters chained one by one
self.screenBlend?.setValue(tempImage, forKey: kCIInputImageKey)
self.screenBlend?.setValue(sourceImage, forKey: kCIInputBackgroundImageKey)
self.noise.inputImage = self.screenBlend?.outputImage
self.noise.inputAmount = CGFloat.random(in: 1.0...3.0)
// result
tempImage = self.noise.outputImage
// correct crop
let rect = forComposition ? FilterManager.shared.cropRectForOrientation() : FilterManager.shared.cropRect
final = self.context.createCGImage(tempImage!, from: rect!)
return CIImage(cgImage: final!)
And now, the most strange thing, I have 30 VSFilters and when I got to 13(switching one by one by UIButton) I got error "Out of Buffer", this one:
What I tested:
I changed vsFilters order in filters array inside FilterManager singleton - same
I tried switch from first to 12 one by one, then go back - works, but after I switched to 13tn(of 30th from 0) - bug
Looks like it can handle only 12 VSFIlter objects, like if it retains them somehow or maybe it's related to threading, I don't know.
This app made for iOs devices, tested on iPhone X iOs 13.3.1
This is video editor app to apply different effects to both live stream from camera and video files from camera roll
Maybe someone has experience with this?
Have a great day
Best, Victor
Edit 1. If I reinit cameraController(AVCaptureSession. input/output devices) it works but this is ugly option and it adds lag when switching filters
Ok, so I finally won this battle. In case some one else get this "OutOfBuffer" problem, here is my solution
As I figured out, CIFilter grabs CVPixelBuffer and don't release it while filtering images. It's kinda creates one huge buffer, I guess. Strange thing: it don't create memory leak, so I guess it grabs not particular buffer but creates strong reference to it. As rumors(me) say, it can handle only 12 such references.
So, my approach was to copy CVPixelBuffer and then work with it instead of buffer I got from AVCaptureVideoDataOutputSampleBufferDelegate didOutput func
Here is my new code:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
if !paused {
//print("camera controller \(id) got frame")
if connection.output?.connection(with: .audio) == nil {
//capture video
connection.videoOrientation = .portrait
// getting image
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
// this works!
let copyBuffer = pixelBuffer.copy()
// captured - is just ciimage property
captured = CIImage(cvPixelBuffer: copyBuffer)
//remove if any
// transform image to targer resolution
let srcWidth = CGFloat(captured.extent.width)
let srcHeight = CGFloat(captured.extent.height)
let dstWidth: CGFloat = ConstantsManager.shared.k_video_width
let dstHeight: CGFloat = ConstantsManager.shared.k_video_height
let scaleX = dstWidth / srcWidth
let scaleY = dstHeight / srcHeight
var transform = CGAffineTransform.init(scaleX: scaleX, y: scaleY)
captured = captured.transformed(by: transform).cropped(to: CGRect(x: 0, y: 0, width: dstWidth, height: dstHeight))
// mirror for front camera
if front {
var t = CGAffineTransform.init(scaleX: -1, y: 1)
t = t.translatedBy(x: -ConstantsManager.shared.k_video_width, y: 0)
captured = captured.transformed(by: t)
// video capture logic
let writable = canWrite()
if writable,
sessionAtSourceTime == nil {
sessionAtSourceTime = CMSampleBufferGetPresentationTimeStamp(sampleBuffer)
videoWriter.startSession(atSourceTime: sessionAtSourceTime!)
if writable, (videoWriterInput.isReadyForMoreMediaData) {
self.captured = FilterManager.shared.applyFilterForCamera(inputImage: self.captured)
// current frame in case user wants to save image as photo
self.capturedPhoto = captured
// sent frame to Camcoder view controller
self.delegate?.didCapturedFrame(frame: captured)
} else {
// capture sound
let writable = canWrite()
if writable, (audioWriterInput.isReadyForMoreMediaData) {
//print("write audio buffer")
} else {
// paused
//print("paused camera controller \(id)")
and there is func to copy buffer:
func copy() -> CVPixelBuffer {
precondition(CFGetTypeID(self) == CVPixelBufferGetTypeID(), "copy() cannot be called on a non-CVPixelBuffer")
var _copy : CVPixelBuffer?
guard let copy = _copy else { fatalError() }
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
CVPixelBufferLockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
let copyBaseAddress = CVPixelBufferGetBaseAddress(copy)
let currBaseAddress = CVPixelBufferGetBaseAddress(self)
print("copy data size: \(CVPixelBufferGetDataSize(copy))")
print("self data size: \(CVPixelBufferGetDataSize(self))")
memcpy(copyBaseAddress, currBaseAddress, CVPixelBufferGetDataSize(copy))
//memcpy(copyBaseAddress, currBaseAddress, CVPixelBufferGetDataSize(self) * 2)
CVPixelBufferUnlockBaseAddress(copy, CVPixelBufferLockFlags(rawValue: 0))
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags.readOnly)
return copy
I used it as extension
I hope, this will help anyone with similar problem
Best, Victor

MTKView Drawing Performance

What I am Trying to Do
I am trying to show filters on a camera feed by using a Metal view: MTKView. I am closely following the method of Apple's sample code - Enhancing Live Video by Leveraging TrueDepth Camera Data (link).
What I Have So Far
Following code works great (mainly interpreted from above-mentioned sample code) :
class MetalObject: NSObject, MTKViewDelegate {
private var metalBufferView : MTKView?
private var metalDevice = MTLCreateSystemDefaultDevice()
private var metalCommandQueue : MTLCommandQueue!
private var ciContext : CIContext!
private let colorSpace = CGColorSpaceCreateDeviceRGB()
private var videoPixelBuffer : CVPixelBuffer?
private let syncQueue = DispatchQueue(label: "Preview View Sync Queue", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem)
private var textureWidth : Int = 0
private var textureHeight : Int = 0
private var textureMirroring = false
private var sampler : MTLSamplerState!
private var renderPipelineState : MTLRenderPipelineState!
private var vertexCoordBuffer : MTLBuffer!
private var textCoordBuffer : MTLBuffer!
private var internalBounds : CGRect!
private var textureTranform : CGAffineTransform?
private var previewImage : CIImage?
init(with frame: CGRect) {
self.metalBufferView = MTKView(frame: frame, device: self.metalDevice)
self.metalBufferView!.contentScaleFactor = UIScreen.main.nativeScale
self.metalBufferView!.framebufferOnly = true
self.metalBufferView!.colorPixelFormat = .bgra8Unorm
self.metalBufferView!.isPaused = true
self.metalBufferView!.enableSetNeedsDisplay = false
self.metalBufferView!.delegate = self
self.metalCommandQueue = self.metalDevice!.makeCommandQueue()
self.ciContext = CIContext(mtlDevice: self.metalDevice!)
//Configure Metal
let defaultLibrary = self.metalDevice!.makeDefaultLibrary()!
let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
pipelineDescriptor.vertexFunction = defaultLibrary.makeFunction(name: "vertexPassThrough")
pipelineDescriptor.fragmentFunction = defaultLibrary.makeFunction(name: "fragmentPassThrough")
// To determine how our textures are sampled, we create a sampler descriptor, which
// will be used to ask for a sampler state object from our device below.
let samplerDescriptor = MTLSamplerDescriptor()
samplerDescriptor.sAddressMode = .clampToEdge
samplerDescriptor.tAddressMode = .clampToEdge
samplerDescriptor.minFilter = .linear
samplerDescriptor.magFilter = .linear
sampler = self.metalDevice!.makeSamplerState(descriptor: samplerDescriptor)
do {
renderPipelineState = try self.metalDevice!.makeRenderPipelineState(descriptor: pipelineDescriptor)
} catch {
fatalError("Unable to create preview Metal view pipeline state. (\(error))")
final func update (newVideoPixelBuffer: CVPixelBuffer?) {
self.syncQueue.async {
var filteredImage : CIImage
self.videoPixelBuffer = newVideoPixelBuffer
//Core image filters
//Strictly CIFilters, chained together
self.previewImage = filteredImage
//Ask Metal View to draw
//MARK: - Metal View Delegate
final func draw(in view: MTKView) {
print (Thread.current)
guard let drawable = self.metalBufferView!.currentDrawable,
let currentRenderPassDescriptor = self.metalBufferView!.currentRenderPassDescriptor,
let previewImage = self.previewImage else {
// create a texture for the CI image to render to
let textureDescriptor = MTLTextureDescriptor.texture2DDescriptor(
pixelFormat: .bgra8Unorm,
width: Int(previewImage.extent.width),
height: Int(previewImage.extent.height),
mipmapped: false)
textureDescriptor.usage = [.shaderWrite, .shaderRead]
let texture = self.metalDevice!.makeTexture(descriptor: textureDescriptor)!
if texture.width != textureWidth ||
texture.height != textureHeight ||
self.metalBufferView!.bounds != internalBounds {
setupTransform(width: texture.width, height: texture.height, mirroring: mirroring, rotation: rotation)
// Set up command buffer and encoder
guard let commandQueue = self.metalCommandQueue else {
print("Failed to create Metal command queue")
guard let commandBuffer = commandQueue.makeCommandBuffer() else {
print("Failed to create Metal command buffer")
// add rendering of the image to the command buffer
to: texture,
commandBuffer: commandBuffer,
bounds: previewImage.extent,
colorSpace: self.colorSpace)
guard let commandEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: currentRenderPassDescriptor) else {
print("Failed to create Metal command encoder")
// add vertex and fragment shaders to the command buffer
commandEncoder.label = "Preview display"
commandEncoder.setVertexBuffer(vertexCoordBuffer, offset: 0, index: 0)
commandEncoder.setVertexBuffer(textCoordBuffer, offset: 0, index: 1)
commandEncoder.setFragmentTexture(texture, index: 0)
commandEncoder.setFragmentSamplerState(sampler, index: 0)
commandEncoder.drawPrimitives(type: .triangleStrip, vertexStart: 0, vertexCount: 4)
commandBuffer.present(drawable) // Draw to the screen
final func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
The reason MTKViewDelegate is used instead of subclassing MTKView is that when it was subclassed, the draw call was called on the main thread. With the delegate method shown above, it seems to be a different metal related thread call each loop. Above method seem to give much better performance.
Details on CIFilter usage on update method above had to be redacted. All it is a heavy chain of CIFilters stacked. Unfortunately there is no room for any tweaks with these filters.
Above code seems to slow down the main thread a lot, causing rest of the app UI to be choppy. For example, scrolling a UIScrollview gets seem to be slow and choppy.
Tweak Metal view to ease up on CPU and go easy on the main thread to leave enough juice for rest of the UI.
According to the above graphics, preparation of command buffer is all done in CPU until presented and committed(?). Is there a way to offload that from CPU?
Any hints, feedback, tips, etc to improve the drawing efficiency would be appreciated.
There are a few things you can do to improve the performance:
Render into the view’s drawable directly instead of rendering into a texture and then rendering again to render that texture into the view.
Use the newish CIRenderDestination API to defer the actual texture retrieval to the moment the view is actually rendered to (i.e. when Core Image is done).
Here’s the draw(in view: MTKView) I’m using in my Core Image project, modified for your case:
public func draw(in view: MTKView) {
if let currentDrawable = view.currentDrawable,
let commandBuffer = self.commandQueue.makeCommandBuffer() {
let drawableSize = view.drawableSize
// optional: scale the image to fit the view
let scaleX = drawableSize.width / image.extent.width
let scaleY = drawableSize.height / image.extent.height
let scale = min(scaleX, scaleY)
let scaledImage = previewImage.transformed(by: CGAffineTransform(scaleX: scale, y: scale))
// optional: center in the view
let originX = max(drawableSize.width - scaledImage.extent.size.width, 0) / 2
let originY = max(drawableSize.height - scaledImage.extent.size.height, 0) / 2
let centeredImage = scaledImage.transformed(by: CGAffineTransform(translationX: originX, y: originY))
// create a render destination that allows to lazily fetch the target texture
// which allows the encoder to process all CI commands _before_ the texture is actually available;
// this gives a nice speed boost because the CPU doesn’t need to wait for the GPU to finish
// before starting to encode the next frame
let destination = CIRenderDestination(width: Int(drawableSize.width),
height: Int(drawableSize.height),
pixelFormat: view.colorPixelFormat,
commandBuffer: commandBuffer,
mtlTextureProvider: { () -> MTLTexture in
return currentDrawable.texture
let task = try! self.context.startTask(toRender: centeredImage, to: destination)
// bonus: you can Quick Look the task to see what’s actually scheduled for the GPU
// optional: you can wait for the task execution and Quick Look the info object to get insights and metrics .background).async {
let info = try! task.waitUntilCompleted()
If this is still too slow, you can try setting the priorityRequestLow CIContextOption when creating your CIContext to tell Core Image to render in low priority.

Shadertoy shader port to Metal slow on iOS

I'm trying to learn Metal shaders so I ported this mountain generation shader I found on Shadertoy to Metal.
The port works, but is very slow on iOS. It is reasonably fast on OS X but gets slower when I increase the window size. It is also slow within an OS X playground.
I've done the tutorials on and read the apple docs on the metal shading language but feel like I'm a bit conceptually lacking of how everything is working under the hood. If there is anything that jumps out in this code that is obviously slowing things down I would be very grateful to learn. I am not sure if the slowdown is due to the shader code itself, or the way it's all set up.
Here's the metal shader:
#include <metal_stdlib>
using namespace metal;
constexpr sampler textureSampler(coord::normalized,
mip_filter::linear );
kernel void compute(texture2d<float, access::write> output [[texture(0)]],
texture2d<float, access::sample> input [[texture(1)]],
constant float &timer [[buffer(0)]],
uint2 gid [[thread_position_in_grid]])
int width = input.get_width();
int height = input.get_height();
float2 uv = float2(gid) / float2(width, height);
float4 p = float4(uv,1,1)-0.5;
p.y = -p.y;
float4 d = p*0.5;
float4 t;
float4 c;
p.z += timer*200;
for(float i=1.7;i>0.0;i-=0.002) {
float s=0.5;
t = input.sample(textureSampler,0.3+p.xz*s/3e3) / (s+=s);
// this makes it purple
c = float4(1.0,-0.9,0.8,9.0)+d.x-t*i;
// c = float4(1.0,0.9,0.8,9.0)+d.x-t*i;
if (t.x > p.y*.01+1.3) {
p += d;
output.write(c, gid);
and here is a subclass of MTKView I'm using to render the shader
import Cocoa
import MetalKit
class MetalView: MTKView {
var queue: MTLCommandQueue!
var cps: MTLComputePipelineState!
var timer: Float = 0
var timerBuffer: MTLBuffer!
var shaderName: String!
var texture: MTLTexture!
required public init(coder: NSCoder) {
super.init(coder: coder)
self.framebufferOnly = false
self.preferredFramesPerSecond = 60
func setupTexture() {
let path = Bundle.main.path(forResource: "texture", ofType: "jpg")
let textureLoader = MTKTextureLoader(device: device!)
texture = try! textureLoader.newTexture(withContentsOf: URL(fileURLWithPath: path!), options: nil)
func registerShaders() {
device = MTLCreateSystemDefaultDevice()!
queue = device!.makeCommandQueue()
do {
let library = device!.newDefaultLibrary()!
let kernel = library.makeFunction(name: "compute")!
cps = try device!.makeComputePipelineState(function: kernel)
} catch let e {
timerBuffer = device!.makeBuffer(length: MemoryLayout<Float>.size, options: [])
override public func draw(_ dirtyRect: CGRect) {
if let drawable = currentDrawable {
let commandBuffer = queue.makeCommandBuffer()
let commandEncoder = commandBuffer.makeComputeCommandEncoder()
commandEncoder.setTexture(drawable.texture, at: 0)
commandEncoder.setTexture(texture, at: 1)
commandEncoder.setBuffer(timerBuffer, offset: 0, at: 0)
let threadGroupCount = MTLSizeMake(8, 8, 1)
let threadGroups = MTLSizeMake(drawable.texture.width / threadGroupCount.width, drawable.texture.height / threadGroupCount.height, 1)
commandEncoder.dispatchThreadgroups(threadGroups, threadsPerThreadgroup: threadGroupCount)
func update() {
timer += Float(1.0 / TimeInterval(self.preferredFramesPerSecond))
let bufferPointer = timerBuffer.contents()
memcpy(bufferPointer, &timer, MemoryLayout<Float>.size)
