Metal kernel -- 24-bit chicanery - metal

Below is my kernel. It works wonderfully if both the input and output buffers contain RGBA-32 bit pixel data. I've made this kernel slightly inefficient to show Metal's seeming ineptitude in dealing with 24-bit data.
(I previously had this working with the input and output buffers being declared as containing uint32_t data)
kernel void stripe_Kernel(device const uchar *inBuffer [[ buffer(0) ]],
device uchar4 *outBuffer [[ buffer(1) ]],
device const ushort *imgWidth [[ buffer(2) ]],
device const ushort *imgHeight [[ buffer(3) ]],
device const ushort *packWidth [[ buffer(4) ]],
uint2 gid [[ thread_position_in_grid ]])
{
const ushort imgW = imgWidth[0];
const ushort imgH = imgHeight[0];
const ushort packW = packWidth[0]; // eg. 2048
uint32_t posX = gid.x; // eg. 0...2047
uint32_t posY = gid.y; // eg. 0...895
uint32_t sourceX = ((int)(posY/imgH)*packW + posX) % imgW;
uint32_t sourceY = (int)(posY%imgH);
uint32_t ptr = (sourceY*imgW + sourceX)*4; // this is for 32-bit data
uchar4 pixel = uchar4(inBuffer[ptr],inBuffer[ptr+1],inBuffer[ptr+2],255);
outBuffer[posY*packW + posX] = pixel;
}
I should mention that the inBuffer has been allocated as follows:
unsigned char *diskFrame;
posix_memalign((void *)&diskFrame,0x4000,imgHeight*imgWidth*4);
Now... if I actually have 24-bit data in there, and use multipliers of 3 (wherever I have 4), I get a entirely black image.
What's with that?

Related

iOS MetalKit: Loop through array in MSL

This seems to be a silly question, but I can't find a good way to loop through an array and currently, I have to pass a buffer that contains the element count to my kernel function.
kernel void test_func(constant const int2* array [[ buffer(0) ]],
constant const int& arrayCount [[ buffer(1) ]],
device half4* result [[ buffer(2) ]],
uint2 pos [[thread_position_in_grid]]) {
// some code to end early if pos is outside of my data
for(ulong i = 0; i < sizeof(array) / sizeof(int2) /*(ulong) arrayCount*/; i += 1 ) {
// do something
}
}
Calculation using sizeof always yields incorrect results, on the other hand, using the count buffer return correct results. Seems like MSL doesn't support for each loop of c++ 11.
There should be a better way to do this, right?

Is there a iOS Metal value for bt601?

I have sample metal code that I'm trying to convert to iOS. Is there an iOS compatible value that I can use for bt601?
#include <metal_stdlib>
#include "utilities.h" // error not found
using namespace metal;
kernel void laplace(texture2d<half, access::read> inTexture [[ texture(0) ]],
texture2d<half, access::read_write> outTexture [[ texture(1) ]],
uint2 gid [[ thread_position_in_grid ]]) {
constexpr int kernel_size = 3;
constexpr int radius = kernel_size / 2;
half3x3 laplace_kernel = half3x3(0, 1, 0,
1, -4, 1,
0, 1, 0);
half4 acc_color(0, 0, 0, 0);
for (int j = 0; j <= kernel_size - 1; j++) {
for (int i = 0; i <= kernel_size - 1; i++) {
uint2 textureIndex(gid.x + (i - radius), gid.y + (j - radius));
acc_color += laplace_kernel[i][j] * inTexture.read(textureIndex).rgba;
}
}
half value = dot(acc_color.rgb, bt601); //bt601 not defined
half4 gray_color(value, value, value, 1.0);
outTexture.write(gray_color, gid);
}
It seems that the intention here is simply to derive a single "luminance" value from the RGB output of the kernel. In that case, bt601 would be a three-element vector whose components are the desired weights of the respective channels, summing to 1.0.
Borrowing values from Rec. 601, we might define it like this:
float3 bt601(0.299f, 0.587f, 0.114f);
This is certainly a common choice. Another popular choice uses coefficients found in the Rec. 709 standard. That would look like this:
float3 bt709(0.212671f, 0.715160f, 0.072169f);
Both of these vectors will give you a single gray value that approximates the brightness of a linear sRGB color. Whether either of them is "correct" depends on the provenance of your data and how you process it further down the pipeline.
For whatever it's worth, the MetalPerformanceShaders MPSImageThresholdBinary kernel seems to favor the BT.601 values.
I'd recommend taking a look at this answer for more detail on the issues, and conditions under which the use of these values is appropriate.

I want to implement instance normalization

I am writing a metal cnn code.
Metal provides MPSCNNLocalContrastNormalization,
Since the concept of Instance Normalization is slightly different, I intend to implement it as a Kernel Function.
However, the problem is that the mean and variance for each R, G, B should be obtained when feature is R, G, B in texture received from input in kernel function.
I want to get some hints on how to implement this.
kernel void instance_normalization_2darray(texture2d_array<float, access::sample> src [[ texture(0) ]],
texture2d_array<float, access::write> dst [[ texture(1) ]],
uint3 tid [[thread_position_in_grid]]) {
}
kernel void calculate_avgA(texture2d_array<float, access::read> texture_in [[texture(0)]],
texture2d_array<float, access::write> texture_out [[texture(1)]],
uint3 tid [[thread_position_in_grid]])
{
int width = texture_in.get_width();
int height = texture_in.get_height();
int depth = texture_in.get_array_size();
float4 outColor;
uint3 kernelIndex(0,0,0);
uint3 textureIndex(0,0,0);
for(int k = 0; k < depth; k++) {
outColor = (0.0, 0.0, 0.0, 0.0);
for (int i=0; i < width; i++)
{
for (int j=0; j < height; j++)
{
kernelIndex = uint3(i, j, k);
textureIndex = uint3(tid.x + i, tid.y + j, tid.z + k);
float4 color = texture_in.read(textureIndex.xy, textureIndex.z).rgba;
outColor += color;
}
}
outColor = outColor / (width * height);
texture_out.write(float4(outColor.rgba), tid.xy, textureIndex.z);
}
}
Mr.Bista
I had the same problem for this, apple didn't provide some function for this with fast speed.
And I just use MPSCNNPoolingAverage for caculate mean before kernels.
Maybe it is a temporary method for it.
And other algorithm is not better than this ,such as reduction sum algorithm after my test with codes.
So I will continue to track better implementation for this.

write method of texture2d<int, access:write> do not work in metal shader function

As mentioned in Apple's document, texture2d of shading language could be of int type. I have tried to use texture2d of int type as parameter of shader language, but the write method of texture2d failed to work.
kernel void dummy(texture2d<int, access::write> outTexture [[ texture(0) ]],
uint2 gid [[ thread_position_in_grid ]])
{
outTexture.write( int4( 2, 4, 6, 8 ), gid );
}
However, if I replace the int with float, it worked.
kernel void dummy(texture2d<float, access::write> outTexture [[ texture(0) ]],
uint2 gid [[ thread_position_in_grid ]])
{
outTexture.write( float4( 1.0, 0, 0, 1.0 ), gid );
}
Could other types of texture2d, such texture2d of int, texture2d of short and so on, be used as shader function parameters, and how to use them? Thanks for reviewing my question.
The related host codes:
MTLTextureDescriptor *desc = [MTLTextureDescriptor texture2DDescriptorWithPixelFormat:MTLPixelFormatRGBA8Unorm
desc.usage = MTLTextureUsageShaderWrite;
id<MTLTexture> texture = [device newTextureWithDescriptor:desc];
[commandEncoder setTexture:texture atIndex:0];
The code to show the output computed by GPU, w and h represents width and height of textrue, respectively.
uint8_t* imageBytes = malloc(w*h*4);
memset( imageBytes, 0, w*h*4 );
MTLRegion region = MTLRegionMake2D(0, 0, [texture width], [texture height]);
[texture getBytes:imageBytes bytesPerRow:[texture width]*4 fromRegion:region mipmapLevel:0];
for( int j = 0; j < h; j++ )
{
printf("%3d: ", j);
for( int i = 0; i < w*pixel_size; i++ )
{
printf(" %3d",imageBytes[j*w*pixel_size+i] );
}
printf("\n")
}
The problem is that the pixel format you used to create this texture (MTLPixelFormatRGBA8Unorm) is normalized, meaning that the expected pixel value range is 0.0-1.0. For normalized pixel types, the required data type for reading or writing to this texture within a Metal kernel is float or half-float.
In order to write to a texture with integers, you must select an integer pixel format. Here are all of the available formats:
https://developer.apple.com/documentation/metal/mtlpixelformat
The Metal Shading Language Guide states that:
Note: If T is int or short, the data associated with the texture must use a signed integer format. If T is uint or ushort, the data associated with the texture must use an unsigned integer format.
All you have to do is make sure the texture you write to in the API (host code) matches what you have in the kernel function. Alternatively, you can also cast the int values into float before writing to the outTexture.

Compiler error when trying to add constant float3x3 to shader file

I am trying to add this code to my Metal language file:
constant float3x3 rgb2xyz(
float3(0.412453f, 0.212671f, 0.019334f),
float3(0.357580f, 0.715160f, 0.119193f),
float3(0.180423f, 0.072169f, 0.950227f)
);
or this
constant float3x3 rgb2xyz = float3x3(
float3(0.412453f, 0.212671f, 0.019334f),
float3(0.357580f, 0.715160f, 0.119193f),
float3(0.180423f, 0.072169f, 0.950227f)
);
The metal compiler gives me the following error:
No matching constructor for initialization of 'const constant float3x3' (aka 'const constant matrix<float, 3, 3>')
However if I do
typedef struct {
float3x3 matrix;
float3 offset;
float zoom;
} Conversion;
constant Conversion colorConversion = {
.matrix = float3x3(
float3 ( 1.164f, 1.164f, 1.164f ),
float3 ( 0.000f, -0.392f, 2.017f ),
float3 ( 1.596f, -0.813f, 0.000f )
),
.offset = float3 ( -(16.0f/255.0f), -0.5f, -0.5f )
};
I don't get any compile error.
Any ideas what is going wrong? It also works without problems with vector types:
constant float3 bgr2xyzCol1(0.357580f, 0.715160f, 0.119193f);
How would be a good way to define a constant matrix directly in the code?
You should pass it in as a constant reference, see WWDC session 604.
e.g. see matrices here, TransformMatrices is a custom data structure in this case
vertex VertexOutput my_vertex(const global float3* position_data [[ buffer(0) ]], const global
 float3* normal_data [[ buffer(1) ]], constant TransformMatrices& matrices [[ buffer(2) ]], uint vid [[ vertex_id ]])
{
VertexOutput out;
float3 n_d = normal_data[vid];
float3 transformed_normal = matrices.normal_matrix * n_d;
float4 p_d = float4(position_data[vid], 1.0f);
 out.position = * p_d;
float4 eye_vector = * p_d;
...
return out;
}

Resources