Metal Shading language for Core Image color kernel, how to pass an array of float3 - ios

I'm trying to port some CIFilter from this source by using metal shading language for Core Image.
I have a palette of color composed by an array of RGB struct and I want to pass them as an argument to a custom CI color image kernel.
The RGB struct is converted into an array of SIMD3<Float>.
static func SIMD3Palette(_ palette: [RGB]) -> [SIMD3<Float>] {
return palette.map{$0.toFloat3()}
}
The kernel should take and array of simd_float3 values, the problem is the when I launch the filter it tells me that the argument at index 1 is expecting an NSData.
override var outputImage: CIImage? {
guard let inputImage = inputImage else
{
return nil
}
let palette = EightBitColorFilter.palettes[Int(inputPaletteIndex)]
let extent = inputImage.extent
let arguments = [inputImage, palette, Float(palette.count)] as [Any]
let final = colorKernel.apply(extent: extent, arguments: arguments)
return final
}
This is the kernel:
float4 eight_bit(sample_t image, simd_float3 palette[], float paletteSize, destination dest) {
float dist = distance(image.rgb, palette[0]);
float3 returnColor = palette[0];
for (int i = 1; i < floor(paletteSize); ++i) {
float tempDist = distance(image.rgb, palette[i]);
if (tempDist < dist) {
dist = tempDist;
returnColor = palette[i];
}
}
return float4(returnColor, 1);
}
I'm wondering how can I pass a data buffer to the kernel since converting it into an NSData seems not enough. I saw some example but they are using "full" shading language that is not available for Core Image that is a sort of subset for dealing only with fragments.

Update
We have now figured out how to pass data buffers directly into Core Image kernels. Using a CIImage as described below is not needed, but still possible.
Assuming that you have your raw data as an NSData, you can just pass it to the kernel on invocation:
kernel.apply(..., arguments: [data, ...])
Note: Data might also work, but I know that NSData is an argument type that allows Core Image to cache filter results based on input arguments. So when in doubt, better cast to NSData.
Then in the kernel function, you only need to declare the parameter with an appropriate constant type:
extern "C" float4 myKernel(constant float3 data[], ...) {
float3 data0 = data[0];
// ...
}
Previous Answer
Core Image kernels don't seem to support pointer or array parameter types. Though there seem to be something coming with iOS 13. From the Release Notes:
Metal CIKernel instances support arguments with arbitrarily structured data.
But, as so often with Core Image, there seem to be no further documentation for that…
However, you can still use the "old way" of passing buffer data by wrapping it in a CIImage and sampling it in the kernel. For example:
let array: [Float] = [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]
let data = array.withUnsafeBufferPointer { Data(buffer: $0) }
let dataImage = CIImage(bitmapData: data, bytesPerRow: data.count, size: CGSize(width: array.count/4, height: 1), format: .RGBAf, colorSpace: nil)
Note that there is no CIFormat for 3-channel images since the GPU doesn't support those. So you either have to use single-channel .Rf and re-pack the values inside your kernel to float3 again, or add some strides to your data and use .RGBAf and float4 respectively (which I'd recommend since it reduces texture fetches).
When you pass that image into your kernel, you probably want to set the sampling mode to nearest, otherwise you might get interpolated values when sampling between two pixels:
kernel.apply(..., arguments: [dataImage.samplingNearest(), ...])
In your (Metal) kernel, you can assess the data as you would with a normal input image via a sampler:
extern "C" float4 myKernel(coreimage::sampler data, ...) {
float4 data0 = data.sample(data.transform(float2(0.5, 0.5))); // data[0]
float4 data1 = data.sample(data.transform(float2(1.5, 0.5))); // data[1]
// ...
}
Note that I added 0.5 to the coordinates so that they point in the middle of a pixel in the data image to avoid ambiguity and interpolation.
Also note that pixel values you get from a sampler always have 4 channels. So even when you are creating your data image with formate .Rf, you'll get a float4 when sampling it (the other values are filled with 0.0 for G and B and 1.0 for alpha). In this case, you can just do
float data0 = data.sample(data.transform(float2(0.5, 0.5))).x;
Edit
I previously forgot to transform the sample coordinate from absolute pixel space (where (0.5, 0.5) would be the middle of the first pixel) to relative sampler space (where (0.5, 0.5) would be the middle of the whole buffer). It's fixed now.

I made it, event if the answer was good and also deploys to lower target the result wasn't exactly what I was expecting. The difference between the original kernel written as a string and the above method to create an image to be used as a source of data were kind of big.
Didn't get exactly the reason, but the image I was passing as a source of the palette was kind of different from the created one in size and color(probably due to color spaces).
Since there was no documentation about this statement:
Metal CIKernel instances support arguments with arbitrarily structured
data.
I tried a lot in my spare time and came up to this.
First the shader:
float4 eight_bit_buffer(sampler image, constant simd_float3 palette[], float paletteSize, destination dest) {
float4 color = image.sample(image.transform(dest.coord()));
float dist = distance(color.rgb, palette[0]);
float3 returnColor = palette[0];
for (int i = 1; i < floor(paletteSize); ++i) {
float tempDist = distance(color.rgb, palette[i]);
if (tempDist < dist) {
dist = tempDist;
returnColor = palette[i];
}
}
return float4(returnColor, 1);
}
Second the palette transformation into SIMD3<Float>:
static func toSIMD3Buffer(from palette: [RGB]) -> Data {
var simd3Palette = SIMD3Palette(palette)
let size = MemoryLayout<SIMD3<Float>>.size
let count = palette.count * size
let palettePointer = UnsafeMutableRawPointer.allocate(
byteCount: simd3Palette.count * MemoryLayout<SIMD3<Float>>.stride,
alignment: MemoryLayout<SIMD3<Float>>.alignment)
let simd3Pointer = simd3Palette.withUnsafeMutableBufferPointer { (buffer) -> UnsafeMutablePointer<SIMD3<Float>> in
let p = palettePointer.initializeMemory(as: SIMD3<Float>.self,
from: buffer.baseAddress!,
count: buffer.count)
return p
}
let data = Data(bytesNoCopy: simd3Pointer, count: count * MemoryLayout<SIMD3<Float>>.stride, deallocator: .free)
return data
}
The first time I tried by appending SIMD3 to the Data object but wasn't working probably due to memory alignment.
Remember to dealloc the memory created after you used it.
Hope to help someone else.

Related

Metal Core Image Kernel use of DOD

I wrote the following Metal Core Image Kernel to produce constant red color.
extern "C" float4 redKernel(coreimage::sampler inputImage, coreimage::destination dest)
{
return float4(1.0, 0.0, 0.0, 1.0);
}
And then I have this in Swift code:
class CIMetalRedColorKernel: CIFilter {
var inputImage:CIImage?
static var kernel:CIKernel = { () -> CIKernel in
let bundle = Bundle.main
let url = bundle.url(forResource: "Kernels", withExtension: "ci.metallib")!
let data = try! Data(contentsOf: url)
return try! CIKernel(functionName: "redKernel", fromMetalLibraryData: data)
}()
override var outputImage: CIImage? {
guard let inputImage = inputImage else {
return nil
}
let dod = inputImage.extent
return CIMetalRedColorKernel.kernel.apply(extent: dod, roiCallback: { index, rect in
return rect
}, arguments: [inputImage])
}
}
As you can see, the dod is given to be the extent of the input image. But when I run the filter, I get a whole red image beyond the extent of the input image (DOD), why? I have multiple filters chained together and the overall size is 1920x1080. Isn't the red filter supposed to run only for DOD rectangle passed in it and produce clear pixels for anything outside the DOD?
With the extent parameter of the kernel call you signal the region for which the kernel produces meaningful results—or, as you correctly named it, the domain of definition.
However, this also means that whatever it produces outside this region is basically undefined and up to you as the kernel developer to decide.
A generator kernel like the one you wrote usually has an infinite domain of definition since it just produces a red color, regardless of the input. To restrict the output to a specific area, you can apply a crop to it:
let dod = inputImage.extent
let result = CIMetalTestRenderer.kernel.apply(extent: .infinite, roiCallback: { index, rect in
return rect
}, arguments: [inputImage])
return result.cropped(to: dod)
After the cropping, everything outside of dod will be transparent.
Update:
It turns out you have to set the extent parameter of the kernel call to .infinite to make this work. I suspect that cropped(to:) checks if the image already has the given extent and will do nothing in this case. So to make CI really apply the cropping, you have to specify the domain of definition your kernel actually produces.
I think the counter-intuitive thing here is that CI does not apply your kernel to just the pixels of the extent you specify. It seems there is some automatic clamp-to-extent going on when the result is not cropped properly, but honestly, I'm also rather confused by this...

Swift: EXC_BAD_ACCESS when accessing an external library

I'm using an external library written in C that applies filters to an image. It receives the original image pixels as an array of float values and writes the new image float values on another array. One specific filter creates a mask to be used by a sharpen filter, and I don't know why but it only works with smaller images, while bigger images (a million pixels more or less) cause the application to crash, giving an EXC_BAD_ACCESS error right after executing the wrapper that calls the external lib function. Is there anything wrong with my code, which creates the parameters that will be passed to the external lib, or is the problem likely in the external library?
func allocateMaskArgs() { //method to allocate mask parameters in memory, to be used by sharpen filter
let size = originalImageMatrix.params[0] * originalImageMatrix.params[1] //height of the image multiplied by width
if maskBuffer != nil {
self.maskBuffer.deallocate()
}
maskBuffer = UnsafeMutablePointer<UnsafeMutablePointer<Float>?>.allocate(capacity: 2)
let constantPointer: UnsafeMutablePointer<Float>?
constantPointer = UnsafeMutablePointer<Float>.allocate(capacity: 1)
constantPointer!.advanced(by: 0).pointee = 4.0 //this is the intensity value of the mask, it should always be 4
maskBuffer.advanced(by: 0).pointee = constantPointer
let maskArrayPointer: UnsafeMutablePointer<Float>? //this is where the mask created by createMask() should be stored by the external lib function
maskArrayPointer = UnsafeMutablePointer<Float>.allocate(capacity: size)
maskBuffer.advanced(by: 1).pointee = maskArrayPointer
}
func createMask() { //creates sharpen mask and stores in maskBuffer
var input_params: [Int] = [self.originalImageMatrix.params[0], self.originalImageMatrix.params[1]]
var output_params: [Int] = [self.newImageMatrix.params[0], self.newImageMatrix.params[1]]
self.imagingAPI.applyFilters(self.originalImageMatrix.v!, input_params: &input_params, output_image: self.newImageMatrix.v!, output_params: &output_params, filter_id: 11, args: self.maskBuffer)
}
The external library function is accessed through this wrapper function:
- (void) applyFilters: (float *) input_image input_params: (long *) input_params output_image : (float *) output_image output_params : (long *) output_params filter_id : (int) filter_id args : (float**) args;

Core image filter with custom metal kernel doesn't work

I've made a custom CIFilter based on a custom kernel, I can't make it work the output image is filled with black and I can't understand why.
Here is the shader:
// MARK: Custom kernels
float4 eight_bit(sampler image, sampler palette_image, float paletteSize) {
float4 color = image.sample(image.coord());
float dist = distance(color, palette_image.sample(float2(0,0)));
float4 returnColor = palette_image.sample(float2(0,0));
for (int i = 1; i < floor(paletteSize); ++i) {
float tempDist = distance(color, palette_image.sample(float2(i,0)));
if (tempDist < dist) {
dist = tempDist;
returnColor = palette_image.sample(float2(i,0));
}
}
return returnColor;
}
The first sampler is the image that needs to be elaborated the second image is and image that contains the colors of a specific palette that must be used in that image.
The palette image is create from an array of RGBA values, passed to a Data buffer an created by using this CIImage initializer init(bitmapData data: Data, bytesPerRow: Int, size: CGSize, format: CIFormat, colorSpace: CGColorSpace?). The image is 1px in height and number of color wide. The image is obtained correctly and it looks like that:
Trying to inspect the shader I've found:
If I return color I get the original image, thus means that the sampler image is passed correctly
If I try to return a color from any pixel in palette_image the resulting image from the filter is black
I'm starting to think that the palette_image is somehow not passed correctly. Here how the image is passed through the filter:
override var outputImage: CIImage? {
guard let inputImage = inputImage else
{
return nil
}
let palette = EightBitColorFilter.palettes[Int(0)]
let paletteImage = EightBitColorFilter.image(from: palette)
let extent = inputImage.extent
let pixellateImage = inputImage.applyingFilter("CIPixellate", parameters: [kCIInputScaleKey: inputScale])
// let sampler = CISampler(image: paletteImage)
let arguments = [pixellateImage, paletteImage, Float(palette.count)] as [Any]
let final = kernel.apply(extent: extent, roiCallback: {
(index, rect) in
return rect
}, arguments: arguments)
return final
}
Your sampling coordinates are off.
Samplers use relative coordinates in Core Image, i.e. (0,0) corresponds to the upper left corner, (1,1) the lower right corner of the whole input image.
So try something like this:
float4 eight_bit(sampler image, sampler palette_image, float paletteSize) {
float4 color = image.sample(image.coord());
// initial offset to land in the middle of the first pixel
float2 firstPaletteCoord = float2(1.0 / (2.0 * palletSize), 0.5);
float dist = distance(color, palette_image.sample(firstPaletteCoord));
float4 returnColor = palette_image.sample(firstPaletteCoord);
for (int i = 1; i < floor(paletteSize); ++i) {
// step one pixel further
float2 paletteCoord = firstPaletteCoord + float2(1.0 / paletteSize, 0.0);
float4 paletteColor = palette_image.sample(paletteCoord);
float tempDist = distance(color, paletteColor);
if (tempDist < dist) {
dist = tempDist;
returnColor = paletteColor;
}
}
return returnColor;
}

Metal compute shader with 1D data buffer in and out?

I understand it is possible to pass a 1D array buffer to a metal shader, but is it possible to have it output to a 1D array buffer? I don't want it to write to a texture - I just need an array of processed values.
I can get values out with the shader with the following code, but they are one value at a time. Ideally I could get a whole array out (in the same order as the input 1D array buffer).
Any examples or pointers would be greatly appreciated!
var resultdata = [Float](repeating: 0, count: 3)
let outVectorBuffer = device.makeBuffer(bytes: &resultdata, length: MemoryLayout<float3>.size, options: [])
commandEncoder!.setBuffer(outVectorBuffer, offset: 0, index: 6)
commandBuffer!.addCompletedHandler {commandBuffer in
let data = NSData(bytes: outVectorBuffer!.contents(), length: MemoryLayout<float3>.size)
var out: float3 = float3(0,0,0)
data.getBytes(&out, length: MemoryLayout<float3>.size)
print("data: \(out)")
}
//In the Shader:
kernel void compute1d(
...
device float3 &outBuffer [[buffer(6)]],
outBuffer = float3(1.0, 2.0, 3.0);
)
Two things:
You need to create the buffer large enough to hold however many float3 elements as you want. You really need to use .stride and not .size when calculating the buffer size, though. In particular, float3 has 16-byte alignment, so there's padding between elements in an array. So, you would use something like MemoryLayout<float3>.stride * desiredNumberOfElements.
Then, in the shader, you need to change the declaration of outBuffer from a reference to a pointer. So, device float3 *outBuffer [[buffer(6)]]. Then you can index into it to access the elements (e.g. outBuffer[2] = ...;).

how to describe packed_float3 in Metal vertex shader MTLVertexAttributeDescriptor?

I am passing an array of structs to my Metal shader vertex function. The struct looks like this:
struct Vertex {
var x,y,z: Float // position data
var r,g,b,a: Float // color data
var s,t: Float // texture coordinates
var nX,nY,nZ: Float // normal
func floatBuffer() -> [Float] {
return [x,y,z,r,g,b,a,s,t,nX,nY,nZ]
}
};
The floatBuffer function is used to assemble the vertices into one big array of Floats. I am able to pass this into my shader function by using a struct definition which uses "packed" data types, like this:
struct VertexIn {
packed_float3 position;
packed_float4 color;
packed_float2 texCoord;
packed_float3 normal;
};
vertex VertexOut basic_vertex(
const device VertexIn* vertex_array [[ buffer(0) ]],
.
.
.
This works. However, I would like to know how to do the same thing using MTLVertexAttributeDescriptors and the associated syntax. Right now I am getting mangled polygons, presumably because of the byte alignment differences with float3 and packed_float3?
This is how I'm trying to define it now and getting the garbage polygons. I got an error that "packed_float3" is not valid for attributes, so I was trying to figure out how to make regular float3, float4, etc work.
struct VertexIn {
float3 position [[attribute(RayVertexAttributePosition)]];
float4 color [[attribute(RayVertexAttributeColor)]];
float2 texCoord [[attribute(RayVertexAttributeTexCoord)]];
float3 normal [[attribute(RayVertexAttributeNormal)]];
};
class func buildMetalVertexDescriptor() -> MTLVertexDescriptor {
let mtlVertexDescriptor = MTLVertexDescriptor()
var offset = 0
mtlVertexDescriptor.attributes[RayVertexAttribute.position.rawValue].format = MTLVertexFormat.float3
mtlVertexDescriptor.attributes[RayVertexAttribute.position.rawValue].offset = offset
mtlVertexDescriptor.attributes[RayVertexAttribute.position.rawValue].bufferIndex = RayBufferIndex.positions.rawValue
offset += 3*MemoryLayout<Float>.stride
mtlVertexDescriptor.attributes[RayVertexAttribute.color.rawValue].format = MTLVertexFormat.float4
mtlVertexDescriptor.attributes[RayVertexAttribute.color.rawValue].offset = offset
mtlVertexDescriptor.attributes[RayVertexAttribute.color.rawValue].bufferIndex = RayBufferIndex.positions.rawValue
offset += MemoryLayout<float4>.stride
mtlVertexDescriptor.attributes[RayVertexAttribute.texCoord.rawValue].format = MTLVertexFormat.float2
mtlVertexDescriptor.attributes[RayVertexAttribute.texCoord.rawValue].offset = offset
mtlVertexDescriptor.attributes[RayVertexAttribute.texCoord.rawValue].bufferIndex = RayBufferIndex.positions.rawValue
offset += MemoryLayout<float2>.stride
mtlVertexDescriptor.attributes[RayVertexAttribute.normal.rawValue].format = MTLVertexFormat.float3
mtlVertexDescriptor.attributes[RayVertexAttribute.normal.rawValue].offset = offset
mtlVertexDescriptor.attributes[RayVertexAttribute.normal.rawValue].bufferIndex = RayBufferIndex.positions.rawValue
offset += 3*MemoryLayout<Float>.stride
print("stride \(offset)")
mtlVertexDescriptor.layouts[RayBufferIndex.positions.rawValue].stride = offset
mtlVertexDescriptor.layouts[RayBufferIndex.positions.rawValue].stepRate = 1
mtlVertexDescriptor.layouts[RayBufferIndex.positions.rawValue].stepFunction = MTLVertexStepFunction.perVertex
return mtlVertexDescriptor
}
Notice that I specify the first attribute as a float3, but I specify an offset of 3 floats instead of the 4 that a float3 would normally use. But it isn't enough, apparently. I'm wondering how to set up a MTLVertexDescriptor and the shader struct with attributes so that it handles the 'packed' data from my structs?
Thanks very much.
The key is in this part of your question: "Notice that I specify the first attribute as a float3, but I specify an offset of 3 floats instead of the 4 that a float3 would normally use".
The SIMD float3 type takes up 16 bytes, it has the same memory layout as the non-packed Metal float3 type. So when you set the offset to only 3*MemoryLayout.stride you are missing the last 4 bytes which are still present causing the next field to pull from those extra bytes and for the rest of the data to be offset.
To really use packed types to transfer data to Metal (or any graphics API) you either have to stick with what you were doing before and specify x, y, z in three separate Floats in an array, or you have to define your own struct like this:
struct Vector3 {
var x: Float
var y: Float
var z: Float
}
Swift doesn't have any guarantees that this struct will be three Floats packed closely together, but for now and the foreseeable future it works and will be 12 bytes in size on most platforms.
If you want to be able to do vector operations on a struct like this then I would suggest looking for a library that defines types like these to save yourself some time as you will run into the same types of problems with 3x3 matrices also.
I ran into the same problems so I ended up rolling my own:
https://github.com/jkolb/Swiftish

Resources