I am working on an OpenCV project where I need to detect Aruco Marker and I am not using the default OpenCV camera view I have created a native camera that sends data in CMSampleBuffer type
// coverts CMSampleBuffer to Mat object
private func processBuffer(buffer: CMSampleBuffer) {
guard let imgBuf = CMSampleBufferGetImageBuffer(buffer) else { return }
// lock the buffer
CVPixelBufferLockBaseAddress(imgBuf, [])
// get image properties
let width = CVPixelBufferGetWidth(imgBuf)
let height = CVPixelBufferGetHeight(imgBuf)
// create the Mat
let imageMat: Mat = Mat(rows: Int32(height), cols: Int32(width), type: CvType.CV_8UC4)
// unlock again
CVPixelBufferUnlockBaseAddress(imgBuf, [])
private func processImage(_ image: Mat) {
let img = Mat()
let dst = Mat()
image.convert(to: img, rtype: CvType.CV_8UC3)
Imgproc.cvtColor(src: image, dst: dst, code: ColorConversionCodes.COLOR_RGB2GRAY)
let parameters = DetectorParameters.create()
// a method that sets basic details to the dictionary like adaptiveThreshWinSizeMin, adaptiveThreshWinSizeMax, adaptiveThreshWinSizeMax, adaptiveThreshConstant etc
var corners: [Mat] = []
var ids = MatOfInt()
let dictionary = Aruco.getPredefinedDictionary(dict: 8)
image: dst,
dictionary: dictionary,
corners: &corners,
ids: ids,
parameters: parameters
Now after the Aruco.detectMarkers method when I am trying to detect any Aruco Marker, it's not detecting still ids and corners are empty.
I’m using Metal with Swift to build a 3D viewer for iOS and I have some issues to make the depth working. From now, I can draw and render a single shape correctly in 3D (like a simple square plane (4 triangles (2 for each face)) or a tetrahedron (4 triangles)).
However, when I try to draw 2 shapes together, the depth between these two shapes doesn’t work. For example, a plane is placed at Z axes = 0 behind a tetra which is placed at Z > 0. If I look a this scene from the back (camera placed somewhere at Z < 0), it’s ok. But when I look at this scene from the front (camera placed somewhere at Z > 0), it doesn’t work. The plane is drawn before the tetra even if it is placed behind the tetra.
I think that the plane is always drawn on the screen before the tetra (no matter the position of the camera) because the call of drawPrimitives for the plane is done before the call for the tetra. However, I was thinking that all the depth and stencil settings will deal with that properly.
I don’t know if the depth isn’t working because depth texture, stencil state and so on are not correctly set or because each shape is drawn in a different call of drawPrimitives.
In other words, do I have to draw all shapes in the same call of drawPrimitives to make the depth working ? The idea of this multiple call to drawPrimitives is to deal with different kinds of primitive type for each shape (triangle or line or …).
This is how I set the depth stencil state and the depth texture and the render pipeline :
init() {
// some miscellaneous initialisation …
// …
// all MTL stuff :
commandQueue = device.makeCommandQueue()
// Stencil descriptor
let depthStencilDescriptor = MTLDepthStencilDescriptor()
depthStencilDescriptor.depthCompareFunction = .less
depthStencilDescriptor.isDepthWriteEnabled = true
depthStencilState = device.makeDepthStencilState(descriptor: depthStencilDescriptor)!
// Library and pipeline descriptor & state
let library = try! device.makeLibrary(source: shaders, options: nil)
// Our vertex function name
let vertexFunction = library.makeFunction(name: "basic_vertex_function")
// Our fragment function name
let fragmentFunction = library.makeFunction(name: "basic_fragment_function")
// Create basic descriptor
let renderPipelineDescriptor = MTLRenderPipelineDescriptor()
// Attach the pixel format that si the same as the MetalView
renderPipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
renderPipelineDescriptor.depthAttachmentPixelFormat = .depth32Float_stencil8
renderPipelineDescriptor.stencilAttachmentPixelFormat = .depth32Float_stencil8
//renderPipelineDescriptor.stencilAttachmentPixelFormat = .stencil8
// Attach the shader functions
renderPipelineDescriptor.vertexFunction = vertexFunction
renderPipelineDescriptor.fragmentFunction = fragmentFunction
// Try to update the state of the renderPipeline
do {
renderPipelineState = try device.makeRenderPipelineState(descriptor: renderPipelineDescriptor)
} catch {
// Depth Texture
let desc = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .stencil8, width: 576, height: 723, mipmapped: false)
desc.storageMode = .private
desc.usage = .pixelFormatView
depthTexture = device.makeTexture(descriptor: desc)!
// Uniforms buffer
modelMatrix = Matrix4()
uniformBuffer = device.makeBuffer( length: MemoryLayout<Float>.stride*16*2, options: [])
let bufferPointer = uniformBuffer.contents()
memcpy(bufferPointer, &modelMatrix.matrix.m, MemoryLayout<Float>.stride * 16)
memcpy(bufferPointer + MemoryLayout<Float>.stride * 16, &projectionMatrix.matrix.m, MemoryLayout<Float>.stride * 16)
And the draw function :
function draw(in view: MTKView) {
// create render pass descriptor
guard let drawable = view.currentDrawable,
let renderPassDescriptor = view.currentRenderPassDescriptor else {
renderPassDescriptor.depthAttachment.texture = depthTexture
renderPassDescriptor.depthAttachment.clearDepth = 1.0
//renderPassDescriptor.depthAttachment.loadAction = .load
renderPassDescriptor.depthAttachment.loadAction = .clear
renderPassDescriptor.depthAttachment.storeAction = .store
// Create a buffer from the commandQueue
let commandBuffer = commandQueue.makeCommandBuffer()
let commandEncoder = commandBuffer?.makeRenderCommandEncoder(descriptor: renderPassDescriptor)
// Draw all obj in objects
// objects = array of Object; each object describing vertices and primitive type of a shape
// objects[0] = Plane, objects[1] = Tetra
for obj in objects {
createVertexBuffers(device: view.device!, vertices: obj.vertices)
commandEncoder?.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
commandEncoder?.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
commandEncoder?.drawPrimitives(type: obj.primitive, vertexStart: 0, vertexCount: obj.vertices.count)
Does anyone has an idea of what is wrong or missing ?
Any advice is welcome !
Edited 09/23/2022: Code updated
Few things of the top of my head:
let desc = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .depth32Float_stencil8, width: 576, height: 723, mipmapped: false)
renderPipelineDescriptor.depthAttachmentPixelFormat = .depth32Float_stencil8
Notice the pixeFormat should be same in both places, and since you seem to be using stencil test as well so depth32Float_stencil8 will be perfect.
Now another thing you seem to be missing is, clearing depth texture before every render pass, am I right?
So, you should set load action of depth attachment to .clear, like this:
renderPassDescriptor.depthAttachment.loadAction = .clear
Fourth (Subjective to your usecase)*
If none of the above works, you might need to discard framents with alpha = 0 in your fragment function by calling discard_fragment() when color you are returning has alpha 0
Also note for future:
Ideally you want depth texture to be fresh and empty when every new frame starts getting rendered (first draw call of a render pass) and then reuse it for subsequent draw calls in same render pass by setting load action .load and store action .store.
ex: Assuming you have 3 draw calls, say drawing polygons wiz triangle, rectangle, sphere in one frame, then your depth attachment setup should be like this:
Frame 1 Starts:
First Draw: triangle
loadAction: Clear
storeAction: Store
Second Draw: rectangle
loadAction: load
storeAction: Store
Third Draw: sphere
loadAction: load
storeAction: store/dontcare
Frame 2 Starts: Notice you clear depth buffer for 1st draw call of new frame
First Draw: triangle
loadAction: Clear
storeAction: Store
Second Draw: rectangle
loadAction: load
storeAction: Store
Third Draw: sphere
loadAction: load
storeAction: store/dontcare
Your depth texture pixel format is not correct, try to change its pixel format to: MTLPixelFormatDepth32Float or MTLPixelFormatDepth32Float_Stencil8.
I'd like to use the o3d.PointCloud.create_from_depth_image function to convert a depth image into point cloud.
Open3D docs say the following: An Open3D Image can be directly converted to/from a numpy array.
I have a CVPixelBuffer coming from camera.
How to create an o3d.geometry.Image from pixel array without saving it to disk first?
here's my code:
guard let cameraCalibrationData = frame.cameraCalibrationData else { return }
let frameIntrinsics = cameraCalibrationData.intrinsicMatrix
let referenceDimensions = cameraCalibrationData.intrinsicMatrixReferenceDimensions
let width = Float(referenceDimensions.width)
let height = Float(referenceDimensions.height)
let fx = frameIntrinsics.columns.0[0]
let fy = frameIntrinsics.columns.0[1]
let cx = frameIntrinsics.columns.2[0]
let cy = frameIntrinsics.columns.2[1]
let intrinsics = self.o3d.camera.PinholeCameraIntrinsic()
intrinsics.set_intrinsics(width, height, fx, fy, cx, cy)
//how to convert CVPixelBuffer depth to o3d geometry IMAGE ?
let depth : CVPixelBuffer = frame.depthDataMap
let depthImage = self.o3d.geometry.Image()
let cloud = self.o3d.geometry.PointCloud.create_from_depth_image(depthImage, intrinsics)
How can i export collage video using different resolution videos? I'm trying to achieve like showing first image below, I'm using AVCustomEdit demo and have done so far, I created AVMutableVideoComposition pass all video trackIDs to customVideoCompositorClass and getting all videos CVPixelBuffer and than converting in MTLTexture than render all textures but problem is my video output size is square(destinationTexture) and videos size is portrait or landscape thats why every video is squeezed also how can i rotate scale position and mask shape every video? also how can i apply cifilters? should i convert every CVPixelBuffer to ciImage and ciImage back to CVPixelBuffer?
override func renderPixelBuffer(backgroundTexture: MTLTexture,
firstPixelBuffer: CVPixelBuffer,
secondPixelBuffer: CVPixelBuffer,
thirdPixelBuffer: CVPixelBuffer,
fourthPixelBuffer: CVPixelBuffer,
destinationPixelBuffer: CVPixelBuffer) {
// Create a MTLTexture from the CVPixelBuffer.
guard let firstTexture = buildTextureForPixelBuffer(firstPixelBuffer) else { return }
guard let secondTexture = buildTextureForPixelBuffer(secondPixelBuffer) else { return }
guard let thirdTexture = buildTextureForPixelBuffer(thirdPixelBuffer) else { return }
guard let fourthTexture = buildTextureForPixelBuffer(fourthPixelBuffer) else { return }
guard let destinationTexture = buildTextureForPixelBuffer(destinationPixelBuffer) else { return }
We must maintain a reference to the pixel buffer until the Metal rendering is complete. This is because the
'buildTextureForPixelBuffer' function above uses CVMetalTextureCacheCreateTextureFromImage to create a
Metal texture (CVMetalTexture) from the IOSurface that backs the CVPixelBuffer, but
CVMetalTextureCacheCreateTextureFromImage doesn't increment the use count of the IOSurface; only the
CVPixelBuffer, and the CVMTLTexture own this IOSurface. Therefore we must maintain a reference to either
the pixel buffer or Metal texture until the Metal rendering is done. The MTLCommandBuffer completion
handler below is then used to release these references.
pixelBuffers = RenderPixelBuffers(firstBuffer: firstPixelBuffer,
secondBuffer: secondPixelBuffer,
thirdBuffer: thirdPixelBuffer,
fourthBuffer: fourthPixelBuffer,
destinationBuffer: destinationPixelBuffer)
// Create a new command buffer for each renderpass to the current drawable.
let commandBuffer = commandQueue.makeCommandBuffer()!
commandBuffer.label = "MyCommand"
Obtain a drawable texture for this render pass and set up the renderpass
descriptor for the command encoder to render into.
let renderPassDescriptor = setupRenderPassDescriptorForTexture(destinationTexture)
// Create a render command encoder so we can render into something.
let renderEncoder = commandBuffer.makeRenderCommandEncoder(descriptor: renderPassDescriptor)!
renderEncoder.label = "MyRenderEncoder"
guard let renderPipelineState = renderPipelineState else { return }
modelConstants.modelViewMatrix = matrix_identity_float4x4
// Render background texture.
renderTexture(renderEncoder, texture: backgroundTexture, pipelineState: renderPipelineState)
var translationMatrix = matrix_float4x4(translation: simd_float3(-0.5, 0.5, 0))
// var rotationMatrix = matrix_float4x4(rotationZ: radians(fromDegrees: -90))
var scaleMatrix = matrix_float4x4(scaling: 0.25)
var modelMatrix = translationMatrix * scaleMatrix
modelConstants.modelViewMatrix = modelMatrix
// Render first texture.
renderTexture(renderEncoder, texture: firstTexture, pipelineState: renderPipelineState)
// translationMatrix = matrix_float4x4(translation: simd_float3(0.5, -0.5, 0))
// rotationMatrix = matrix_float4x4(rotationZ: radians(fromDegrees: -45))
// scaleMatrix = matrix_float4x4(scaling: 0.5)
// modelMatrix = translationMatrix * scaleMatrix * rotationMatrix
// modelConstants.modelViewMatrix = modelMatrix
// // Render second texture.
// renderTexture(renderEncoder, texture: secondTexture, pipelineState: renderPipelineState)
// // Render third texture.
// renderTexture(renderEncoder, texture: thirdTexture, pipelineState: renderPipelineState)
// // Render fourth texture.
// renderTexture(renderEncoder, texture: fourthTexture, pipelineState: renderPipelineState)
// We're done encoding commands.
// Use the command buffer completion block to release the reference to the pixel buffers.
commandBuffer.addCompletedHandler({ _ in
self.pixelBuffers = nil // Release the reference to the pixel buffers.
// Finalize rendering here & push the command buffer to the GPU.
I would recommend to use a library called MetalPetal. It is an image processing framework based on Metal.You have to convert the CVPixelBuffer in to MetalImage that is MTIImage. and then you can do anything in the image like there are premade filters and you can apply to it or you can use even CIFilter or your custom filters and you can transform , rotate , crop every frame so that collage frames are accurate . then you have to convert the MTIimage to cvpixelbuffer again . Here you can also CIImage but it will be slow i guess. And you are getting box images maybe for the render size . Please see the render size .
I've made a custom CIFilter based on a custom kernel, I can't make it work the output image is filled with black and I can't understand why.
Here is the shader:
// MARK: Custom kernels
float4 eight_bit(sampler image, sampler palette_image, float paletteSize) {
float4 color = image.sample(image.coord());
float dist = distance(color, palette_image.sample(float2(0,0)));
float4 returnColor = palette_image.sample(float2(0,0));
for (int i = 1; i < floor(paletteSize); ++i) {
float tempDist = distance(color, palette_image.sample(float2(i,0)));
if (tempDist < dist) {
dist = tempDist;
returnColor = palette_image.sample(float2(i,0));
return returnColor;
The first sampler is the image that needs to be elaborated the second image is and image that contains the colors of a specific palette that must be used in that image.
The palette image is create from an array of RGBA values, passed to a Data buffer an created by using this CIImage initializer init(bitmapData data: Data, bytesPerRow: Int, size: CGSize, format: CIFormat, colorSpace: CGColorSpace?). The image is 1px in height and number of color wide. The image is obtained correctly and it looks like that:
Trying to inspect the shader I've found:
If I return color I get the original image, thus means that the sampler image is passed correctly
If I try to return a color from any pixel in palette_image the resulting image from the filter is black
I'm starting to think that the palette_image is somehow not passed correctly. Here how the image is passed through the filter:
override var outputImage: CIImage? {
guard let inputImage = inputImage else
return nil
let palette = EightBitColorFilter.palettes[Int(0)]
let paletteImage = EightBitColorFilter.image(from: palette)
let extent = inputImage.extent
let pixellateImage = inputImage.applyingFilter("CIPixellate", parameters: [kCIInputScaleKey: inputScale])
// let sampler = CISampler(image: paletteImage)
let arguments = [pixellateImage, paletteImage, Float(palette.count)] as [Any]
let final = kernel.apply(extent: extent, roiCallback: {
(index, rect) in
return rect
}, arguments: arguments)
return final
Your sampling coordinates are off.
Samplers use relative coordinates in Core Image, i.e. (0,0) corresponds to the upper left corner, (1,1) the lower right corner of the whole input image.
So try something like this:
float4 eight_bit(sampler image, sampler palette_image, float paletteSize) {
float4 color = image.sample(image.coord());
// initial offset to land in the middle of the first pixel
float2 firstPaletteCoord = float2(1.0 / (2.0 * palletSize), 0.5);
float dist = distance(color, palette_image.sample(firstPaletteCoord));
float4 returnColor = palette_image.sample(firstPaletteCoord);
for (int i = 1; i < floor(paletteSize); ++i) {
// step one pixel further
float2 paletteCoord = firstPaletteCoord + float2(1.0 / paletteSize, 0.0);
float4 paletteColor = palette_image.sample(paletteCoord);
float tempDist = distance(color, paletteColor);
if (tempDist < dist) {
dist = tempDist;
returnColor = paletteColor;
return returnColor;
I'm applying list of filters to detect shape during camera capturing process. Once shape is detected - want to save it to photos for review. Googled for imageFromCurrentFramebuffer, but is always saves black picture.
// camera init
var videoCamera = GPUImageVideoCamera(sessionPreset: AVCaptureSessionPreset1920x1080, cameraPosition: .Back)
videoCamera!.outputImageOrientation = .Portrait;
// filter init
var houghTransformFilter = GPUImageHoughTransformLineDetector()
houghTransformFilter!.lineDetectionThreshold = 0.3
houghTransformFilter!.useNextFrameForImageCapture() //without this crashes
houghTransformFilter!.linesDetectedBlock = {
// my custom shape detection logic
if (found) {
var capturedImage:UIImage? = self.houghTransformFilter!.imageFromCurrentFramebuffer()
UIImageWriteToSavedPhotosAlbum(capturedImage, nil, nil, nil);
I had a similar problem. You probably need to add this line before the call to grab the frame: