How to crop/resize texture array in Metal - metal

Say I have a N-channel MPSImage or texture array that is based on MTLTexture.
How do I crop a region from it, copying all the N channels, but changing "pixel size"?

I'll just address the crop case, since the resize case involves resampling and is marginally more complicated. Let me know if you really need that.
Let's assume your source MPSImage is a 12 feature channel (3 slice) image that is 128x128 pixels, that your destination image is an 8 feature channel image (2 slices) that is 64x64 pixels, and that you want to copy the bottom-right 64x64 region of the last two slices of the source into the destination.
There is no API that I'm aware of that allows you to copy from/to multiple slices of an array texture at once, so you'll need to issue multiple blit commands to cover all the slices:
let sourceRegion = MTLRegionMake3D(64, 64, 0, 64, 64, 1)
let destOrigin = MTLOrigin(x: 0, y: 0, z: 0)
let firstSlice = 1
let lastSlice = 2 // inclusive
let commandBuffer = commandQueue.makeCommandBuffer()
let blitEncoder = commandBuffer.makeBlitCommandEncoder()
for slice in firstSlice...lastSlice {
blitEncoder.copy(from: sourceImage.texture,
sourceSlice: slice,
sourceLevel: 0,
sourceOrigin: sourceRegion.origin,
sourceSize: sourceRegion.size,
to: destImage.texture,
destinationSlice: slice - firstSlice,
destinationLevel: 0,
destinationOrigin: destOrigin)
}
blitEncoder.endEncoding()
commandBuffer.commit()

I'm not sure why you want to crop, but keep in mind that the MPSCNN layers can work on a smaller portion of your MPSImage. Just set the offset and clipRect properties and the layer will only work on that region of the source image.
In fact, you could do your crops this way using an MPSCNNNeuronLinear. Not sure if that is any faster or slower than using a blit encoder but it's definitely simpler.
Edit: added a code example. This is typed from memory so it may have small errors, but this is the general idea:
// Declare this somewhere:
let linearNeuron = MPSCNNNeuronLinear(a: 1, b: 0)
Then when you run your neural network, add the following:
let yourImage: MPSImage = ...
let commandBuffer = ...
// This describes the size of the cropped image.
let imgDesc = MPSImageDescriptor(...)
// If you're going to use the cropped image in other layers
// then it's a good idea to make it a temporary image.
let tempImg = MPSTemporaryImage(commandBuffer: commandBuffer, imageDescriptor: imgDesc)
// Set the cropping offset:
linearNeuron.offset = MPSOffset(x: ..., y: ..., z: 0)
// The clip rect is the size of the output image.
linearNeuron.clipRect = MTLRegionMake(0, 0, imgDesc.width, imgDesc.height)
linearNeuron.encode(commandBuffer: commandBuffer, sourceImage: yourImage, destinationImage: tempImg)
// Here do your other layers, taking tempImg as input.
. . .
commandBuffer.commit()

Related

3D viewer for iOS using MetalKit and Swift - Depth doesn’t work

I’m using Metal with Swift to build a 3D viewer for iOS and I have some issues to make the depth working. From now, I can draw and render a single shape correctly in 3D (like a simple square plane (4 triangles (2 for each face)) or a tetrahedron (4 triangles)).
However, when I try to draw 2 shapes together, the depth between these two shapes doesn’t work. For example, a plane is placed at Z axes = 0 behind a tetra which is placed at Z > 0. If I look a this scene from the back (camera placed somewhere at Z < 0), it’s ok. But when I look at this scene from the front (camera placed somewhere at Z > 0), it doesn’t work. The plane is drawn before the tetra even if it is placed behind the tetra.
I think that the plane is always drawn on the screen before the tetra (no matter the position of the camera) because the call of drawPrimitives for the plane is done before the call for the tetra. However, I was thinking that all the depth and stencil settings will deal with that properly.
I don’t know if the depth isn’t working because depth texture, stencil state and so on are not correctly set or because each shape is drawn in a different call of drawPrimitives.
In other words, do I have to draw all shapes in the same call of drawPrimitives to make the depth working ? The idea of this multiple call to drawPrimitives is to deal with different kinds of primitive type for each shape (triangle or line or …).
This is how I set the depth stencil state and the depth texture and the render pipeline :
init() {
// some miscellaneous initialisation …
// …
// all MTL stuff :
commandQueue = device.makeCommandQueue()
// Stencil descriptor
let depthStencilDescriptor = MTLDepthStencilDescriptor()
depthStencilDescriptor.depthCompareFunction = .less
depthStencilDescriptor.isDepthWriteEnabled = true
depthStencilState = device.makeDepthStencilState(descriptor: depthStencilDescriptor)!
// Library and pipeline descriptor & state
let library = try! device.makeLibrary(source: shaders, options: nil)
// Our vertex function name
let vertexFunction = library.makeFunction(name: "basic_vertex_function")
// Our fragment function name
let fragmentFunction = library.makeFunction(name: "basic_fragment_function")
// Create basic descriptor
let renderPipelineDescriptor = MTLRenderPipelineDescriptor()
// Attach the pixel format that si the same as the MetalView
renderPipelineDescriptor.colorAttachments[0].pixelFormat = .bgra8Unorm
renderPipelineDescriptor.depthAttachmentPixelFormat = .depth32Float_stencil8
renderPipelineDescriptor.stencilAttachmentPixelFormat = .depth32Float_stencil8
//renderPipelineDescriptor.stencilAttachmentPixelFormat = .stencil8
// Attach the shader functions
renderPipelineDescriptor.vertexFunction = vertexFunction
renderPipelineDescriptor.fragmentFunction = fragmentFunction
// Try to update the state of the renderPipeline
do {
renderPipelineState = try device.makeRenderPipelineState(descriptor: renderPipelineDescriptor)
} catch {
print(error.localizedDescription)
}
// Depth Texture
let desc = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .stencil8, width: 576, height: 723, mipmapped: false)
desc.storageMode = .private
desc.usage = .pixelFormatView
depthTexture = device.makeTexture(descriptor: desc)!
// Uniforms buffer
modelMatrix = Matrix4()
modelMatrix.multiplyLeft(worldMatrix)
uniformBuffer = device.makeBuffer( length: MemoryLayout<Float>.stride*16*2, options: [])
let bufferPointer = uniformBuffer.contents()
memcpy(bufferPointer, &modelMatrix.matrix.m, MemoryLayout<Float>.stride * 16)
memcpy(bufferPointer + MemoryLayout<Float>.stride * 16, &projectionMatrix.matrix.m, MemoryLayout<Float>.stride * 16)
}
And the draw function :
function draw(in view: MTKView) {
// create render pass descriptor
guard let drawable = view.currentDrawable,
let renderPassDescriptor = view.currentRenderPassDescriptor else {
return
}
renderPassDescriptor.depthAttachment.texture = depthTexture
renderPassDescriptor.depthAttachment.clearDepth = 1.0
//renderPassDescriptor.depthAttachment.loadAction = .load
renderPassDescriptor.depthAttachment.loadAction = .clear
renderPassDescriptor.depthAttachment.storeAction = .store
// Create a buffer from the commandQueue
let commandBuffer = commandQueue.makeCommandBuffer()
let commandEncoder = commandBuffer?.makeRenderCommandEncoder(descriptor: renderPassDescriptor)
commandEncoder?.setRenderPipelineState(renderPipelineState)
commandEncoder?.setFrontFacing(.counterClockwise)
commandEncoder?.setCullMode(.back)
commandEncoder?.setDepthStencilState(depthStencilState)
// Draw all obj in objects
// objects = array of Object; each object describing vertices and primitive type of a shape
// objects[0] = Plane, objects[1] = Tetra
for obj in objects {
createVertexBuffers(device: view.device!, vertices: obj.vertices)
commandEncoder?.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
commandEncoder?.setVertexBuffer(uniformBuffer, offset: 0, index: 1)
commandEncoder?.drawPrimitives(type: obj.primitive, vertexStart: 0, vertexCount: obj.vertices.count)
}
commandEncoder?.endEncoding()
commandBuffer?.present(drawable)
commandBuffer?.commit()
}
Does anyone has an idea of what is wrong or missing ?
Any advice is welcome !
Edited 09/23/2022: Code updated
Few things of the top of my head:
First
let desc = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .depth32Float_stencil8, width: 576, height: 723, mipmapped: false)
Second
renderPipelineDescriptor.depthAttachmentPixelFormat = .depth32Float_stencil8
Notice the pixeFormat should be same in both places, and since you seem to be using stencil test as well so depth32Float_stencil8 will be perfect.
Third
Now another thing you seem to be missing is, clearing depth texture before every render pass, am I right?
So, you should set load action of depth attachment to .clear, like this:
renderPassDescriptor.depthAttachment.loadAction = .clear
Fourth (Subjective to your usecase)*
If none of the above works, you might need to discard framents with alpha = 0 in your fragment function by calling discard_fragment() when color you are returning has alpha 0
Also note for future:
Ideally you want depth texture to be fresh and empty when every new frame starts getting rendered (first draw call of a render pass) and then reuse it for subsequent draw calls in same render pass by setting load action .load and store action .store.
ex: Assuming you have 3 draw calls, say drawing polygons wiz triangle, rectangle, sphere in one frame, then your depth attachment setup should be like this:
Frame 1 Starts:
First Draw: triangle
loadAction: Clear
storeAction: Store
Second Draw: rectangle
loadAction: load
storeAction: Store
Third Draw: sphere
loadAction: load
storeAction: store/dontcare
Frame 2 Starts: Notice you clear depth buffer for 1st draw call of new frame
First Draw: triangle
loadAction: Clear
storeAction: Store
Second Draw: rectangle
loadAction: load
storeAction: Store
Third Draw: sphere
loadAction: load
storeAction: store/dontcare
Your depth texture pixel format is not correct, try to change its pixel format to: MTLPixelFormatDepth32Float or MTLPixelFormatDepth32Float_Stencil8.

Use output of a reduction CIFilter as color input for another filter

I'm new to CoreImage / Metal, so my apologies in advance if my question is naive. I spent a week going over the CoreImage documentation and examples and I couldn't figure this one out.
Suppose I have a reduction filter such as CIAreaAverage which outputs a 1x1 image. Is it possible to convert that image into a color that I can pass as an argument of another CIFilter? I'm aware that I can do this by rendering the CIAreaAverage output into a CVPixelBuffer, but I'm trying to do this in one render pass.
Edit #1 (Clarification):
Let's say I want to correct the white balance by allowing the user to sample from an image a gray pixel:
let pixelImage = inputImage.applyingFilter("CICrop", arguments: [
"inputRectangle": CGRect(origin: pixelCoordinates, size: CGSize(width: 1, height: 1))
])
// We know now that the extent of pixelImage is 1x1.
// Do something to convert the image into a pixel to be passed as an argument in the filter below.
let pixelColor = ???
let outputImage = inputImage.applyingFilter("CIWhitePointAdjust", arguments: [
"inputColor": pixelColor
])
Is there a way to tell the CIContext to convert the 1x1 CIImage into CIColor?
If you want to use the result of CIAreaAverage in a custom CIFilter (i.e. you don't need it for a CIColor parameter), you can directly pass it as a CIImage to that filter and read the value via sampler in the kernel:
extern "C" float4 myKernel(sampler someOtherInput, sampler average) {
float4 avg = average.sample(float2(0.5, 0.5)); // average only contains one pixel, so always sample that
// ...
}
You can also call .clampedToExtent() on the average CIImage before you pass it to another filter/kernel. This will cause Core Image to treat the average image as if it were infinitely large, containing the same value everywhere. Then it doesn't matter at which coordinate you sample the value. This might be useful if you want to use the average value in a custom CIColorKernel.
Something you can do that doesn't involve Metal is use CoreImage itself. Let's say you want a 640x640 image of the output from CIAreaAverage that is called ciPixel:
let crop = CIFilter(name: "CICrop")
crop?.setValue(ciPixel, forKey: "inputImage")
crop?.setValue(CIVector(x: 0, y: 0, z: 640, w: 640), forKey: "inputRectangle")
ciOutput = crop?.outputImage

Unexpected behaviour with CIKernel

I made this example to show the problem. It takes 1 pixel from texture by hardcoded coordinate and use as result for each pixel in shader. I expect that all the image will be in the same color. When images are small it works perfectly, but when I work with big images it has strange result. For example, here image has size 7680x8580 and you can see 4 squares:
Here is my code
kernel vec4 colorKernel(sampler source)
{
vec4 key = sample(source, samplerTransform(source, vec2(100., 200.)));
return key;
}
Here is how I init Kernel:
override var outputImage: CIImage? {
return colorFillKernel.apply(
extent: CGRect(origin: CGPoint.zero, size: inputImage!.extent.size),
roiCallback:
{
(index, rect) in
return rect
},
arguments: [
inputImage])
}
Also, this code shows image properly, without changes and squares:
vec2 dc = destCoord();
return sample(source, samplerTransform(source, dc));
On a public documentation it says "Core Image automatically splits large images into smaller tiles for rendering, so your callback may be called multiple times." but I can't find ways how to handle this situations. I have kaleidoscopic effects and from any this tile I need to be able to get pixel from another tile as well...
I think the problem occurs due to a wrongly defined region of interest in combination with tiling.
In the roiCallback, Core Image is asking you which area of the input image (at index in case you have multiple inputs) you kernel needs to look at in order to produce the given region (rect) of the output image. The reason why this is a closure is due to tiling:
If the processed image is too large, Core Image is breaking it down into multiple tiles, renders those tiles separately, and stitches them together again afterward. And for each tile Core Image is asking you what part of the input image your kernel needs to read to produce the tile.
So for your input image, the roiCallback might be called something like four times (or even more) during rendering, for example with the following rectangles:
CGRect(x: 0, y: 0, width: 4096, height: 4096) // top left
CGRect(x: 4096, y: 0, width: 3584, height: 4096) // top right
CGRect(x: 0, y: 4096, width: 4096, height: 4484) // bottom left
CGRect(x: 4096, y: 4096, width: 3584, height: 4484) // bottom right
This is an optimization mechanism of Core Image. It wants to only read and process the pixels that are needed to produce a given region of the output. So it's best to adapt the ROI as best as possible to your use case.
Now the ROI depends on the kernel. There are basically four scenarios:
Your kernel has a 1:1 mapping between input pixel and output pixel. So in order to produce an output color value, it needs to read the pixel at the same position from the input image. In this case, you just return the input rect in your roiCallback. (Or even better, you use a CIColorKernel that is made for this use case.)
Your kernel performs some kind of convolution and not only requires the input pixel at the same coordinate as the output but also some region around it (for instance for a blur operation). Your roiCallback could look like this then:
let inset = self.radius // like radius of CIGaussianBlur
let roiCallback: CIKernelROICallback = { _, rect in
return rect.insetBy(dx: -inset, dy: -inset)
}
Your kernel always needs to read a specific region of the input, regardless of which part of the output is rendered. Then you can just return that specific region in the callback:
let roiCallback: CIKernelROICallback = { CGRect(x: 100, y: 200, width: 1, height: 1) }
The kernel always needs access to the whole input image. This is for example the case when you use some kind of lookup table to derive colors. In this case, you can just return the extent of the input and ignore the parameters:
let roiCallback: CIKernelROICallback = { inputImage.extent }
For your example, scenario 3 should be the right choice. For your kaleidoscopic effects, I assume that you need a certain region or source pixels around the destination coordinate in order to produce an output pixel. So it would be best if you'd calculate the size of that region and use a roiCallback like in scenario 2.
P.S.: Using the Core Image Kernel Language (CIKernel(source: "<code>")) is super duper deprecated now. You should consider writing your kernels in the Metal Shading Language instead. Check out this year's WWDC talk to learn more. 🙂

MTKView frequently displaying scrambled MTLTextures

I am working on an MTKView-backed paint program which can replay painting history via an array of MTLTextures that store keyframes. I am having an issue in which sometimes the content of these MTLTextures is scrambled.
As an example, say I want to store a section of the drawing below as a keyframe:
During playback, sometimes the drawing will display exactly as intended, but sometimes, it will display like this:
Note the distorted portion of the picture. (The undistorted portion constitutes a static background image that's not part of the keyframe in question)
I describe the way I Create individual MTLTextures from the MTKView's currentDrawable below. Because of color depth issues I won't go into, the process may seem a little round-about.
I first get a CGImage of the subsection of the screen that constitutes a keyframe.
I use that CGImage to create an MTLTexture tied to the MTKView's device.
I store that MTLTexture into a MTLTextureStructure that stores the MTLTexture and the keyframe's bounding-box (which I'll need later)
Lastly, I store in an array of MTLTextureStructures (keyframeMetalArray). During playback, when I hit a keyframe, I get it from this keyframeMetalArray.
The associated code is outlined below.
let keyframeCGImage = weakSelf!.canvasMetalViewPainting.mtlTextureToCGImage(bbox: keyframeBbox, copyMode: copyTextureMode.textureKeyframe) // convert from MetalTexture to CGImage
let keyframeMTLTexture = weakSelf!.canvasMetalViewPainting.CGImageToMTLTexture(cgImage: keyframeCGImage)
let keyframeMTLTextureStruc = mtlTextureStructure(texture: keyframeMTLTexture, bbox: keyframeBbox, strokeType: brushTypeMode.brush)
weakSelf!.keyframeMetalArray.append(keyframeMTLTextureStruc)
Without providing specifics about how each conversion is happening, I wonder if, from an architecture design point, I'm overlooking something that is corrupting my data stored in the keyframeMetalArray. It may be unwise to try to store these MTLTextures in volatile arrays, but I don't know that for a fact. I just figured using MTLTextures would be the quickest way to update content.
By the way, when I swap out arrays of keyframes to arrays of UIImage.pngData, I have no display issues, but it's a lot slower. On the plus side, it tells me that the initial capture from currentDrawable to keyframeCGImage is working just fine.
Any thoughts would be appreciated.
p.s. adding a bit of detail based on the feedback:
mtlTextureToCGImage:
func mtlTextureToCGImage(bbox: CGRect, copyMode: copyTextureMode) -> CGImage {
let kciOptions = [convertFromCIContextOption(CIContextOption.outputPremultiplied): true,
convertFromCIContextOption(CIContextOption.useSoftwareRenderer): false] as [String : Any]
let bboxStrokeScaledFlippedY = CGRect(x: (bbox.origin.x * self.viewContentScaleFactor), y: ((self.viewBounds.height - bbox.origin.y - bbox.height) * self.viewContentScaleFactor), width: (bbox.width * self.viewContentScaleFactor), height: (bbox.height * self.viewContentScaleFactor))
let strokeCIImage = CIImage(mtlTexture: metalDrawableTextureKeyframe,
options: convertToOptionalCIImageOptionDictionary(kciOptions))!.oriented(CGImagePropertyOrientation.downMirrored)
let imageCropCG = cicontext.createCGImage(strokeCIImage, from: bboxStrokeScaledFlippedY, format: CIFormat.RGBA8, colorSpace: colorSpaceGenericRGBLinear)
cicontext.clearCaches()
return imageCropCG!
} // end of func mtlTextureToCGImage(bbox: CGRect)
CGImageToMTLTexture:
func CGImageToMTLTexture (cgImage: CGImage) -> MTLTexture {
// Note that we forego the more direct method of creating stampTexture:
//let stampTexture = try! MTKTextureLoader(device: self.device!).newTexture(cgImage: strokeUIImage.cgImage!, options: nil)
// because MTKTextureLoader seems to be doing additional processing which messes with the resulting texture/colorspace
let width = Int(cgImage.width)
let height = Int(cgImage.height)
let bytesPerPixel = 4
let rowBytes = width * bytesPerPixel
//
let texDescriptor = MTLTextureDescriptor.texture2DDescriptor(pixelFormat: .rgba8Unorm,
width: width,
height: height,
mipmapped: false)
texDescriptor.usage = MTLTextureUsage(rawValue: MTLTextureUsage.shaderRead.rawValue)
texDescriptor.storageMode = .shared
guard let stampTexture = device!.makeTexture(descriptor: texDescriptor) else {
return brushTextureSquare // return SOMETHING
}
let dstData: CFData = (cgImage.dataProvider!.data)!
let pixelData = CFDataGetBytePtr(dstData)
let region = MTLRegionMake2D(0, 0, width, height)
print ("[MetalViewPainting]: w= \(width) | h= \(height) region = \(region.size)")
stampTexture.replace(region: region, mipmapLevel: 0, withBytes: pixelData!, bytesPerRow: Int(rowBytes))
return stampTexture
} // end of func CGImageToMTLTexture (cgImage: CGImage)
The type of distortion looks like a bytes-per-row alignment issue between CGImage and MTLTexture. You're probably only seeing this issue when your image is a certain size that falls outside of the bytes-per-row alignment requirement of your MTLDevice. If you really need to store the texture as a CGImage, ensure that you are using the bytesPerRow value of the CGImage when copying back to the texture.

Rotate my SceneKit material

I'm taking images with AVCapturePhotoOutput and then using their JPEG representation as the texture on a SceneKit SCNPlane that is the same aspect ratio as the image:
let image = UIImage(data: dataImage!)
let rectangle = SCNPlane(width:9, height:12)
let rectmaterial = SCNMaterial()
rectmaterial.diffuse.contents = image
rectmaterial.isDoubleSided = true
rectangle.materials = [rectmaterial]
let rectnode = SCNNode(geometry: rectangle)
let pos = sceneSpacePosition(inFrontOf: self.pictCamera, atDistance: 16.5) // 16.5 is arbitrary, but makes the rectangle the same size as the camera
rectnode.position = pos
rectnode.orientation = self.pictCamera.orientation
pictView.scene?.rootNode.addChildNode(rectnode)
sceneSpacePosition is a bit of code that can be found here on SO that maps CoreMotion into SceneKit orientation. It is used to place the rectangle, which does indeed appear at the right location with the right size. All very cool.
The problem is that the image is rotated 90 degrees to the rectangle. So I did the obvious:
rectmaterial.diffuse.contentsTransform = SCNMatrix4MakeRotation(Float.pi / 2, 0, 0, 1)
This does not work property; the resulting image is unrecognizable. It appears that one small part of the image has been stretched to a huge size. I thought it might be the axis, but I tried all three with the same result.
Any ideas?
You are rotating on the upper left corner as suggested by Alain T.
If you move your image down, you may get the rotation you were expecting.
Try this:
let translation = SCNMatrix4MakeTranslation(0, -1, 0)
let rotation = SCNMatrix4MakeRotation(Float.pi / 2, 0, 0, 1)
let transform = SCNMatrix4Mult(translation, rotation)
rectmaterial.diffuse.contentsTransform = transform

Resources