I need to overlay the edges detected in live video preview with a color of my choice (as is done in Lightroom CC app when you adjust focus). What's the easiest way to draw those lines in real time using Metal or CoreImage? I can use Sobel edge detection to detect the edges using Metal Performance Shader but not sure how to overlay the edges with a color of my choice.
Here is a edge detect shader for metal
kernel void edge_detect(texture2d<half, access::read> inTexture [[ texture(0) ]],
texture2d<half, access::write> outTexture [[ texture(1) ]],
uint2 gid [[ thread_position_in_grid ]]) {
constexpr int kernel_size = 3;
constexpr int radius = kernel_size / 2;
half3x3 horizontal_kernel = half3x3(-1./8., -1./8., -1./8.,
-1./8., 1., -1./8.,
-1./8., -1./8., -1./8.);
half3x3 vertical_kernel = half3x3(-1./8., -1./8., -1./8.,
-1./8., 1., -1./8.,
-1./8., -1./8., -1./8.);
half3 result_horizontal(0,0,0);
half3 result_vertical(0,0,0);
for(int j = 0; j <= kernel_size - 1; j++) {
for(int i = 0; i <= kernel_size - 1; i++) {
uint2 texture_index(gid.x + (i - radius), gid.y + (j - radius));
result_horizontal += horizontal_kernel[i][j] * inTexture.read(texture_index).rgb;
result_vertical += vertical_kernel[i][j] * inTexture.read(texture_index).rgb;
}
}
half3 bt601 = half3(0.299, 0.587, 0.114);
half gray_horizontal = dot(result_horizontal.rgb, bt601);
half gray_vertical = dot(result_vertical.rgb, bt601);
half magnitude = length(half2(gray_horizontal, gray_vertical));
outTexture.write(half4(half3(magnitude), 1), gid);
}
I know this is late, but if anyone still needs it I figured it out just now. It's very easy to find an edge detection shader, but not easy to figure out how to change the color of the detected edges, especially if you are new to this. Here is my kernel:
typedef struct {
simd_float3 rgb;
} AppliedColor;
kernel void edgeEffect(texture2d<half, access::read> inputTexture [[ texture(0) ]],
texture2d<half, access::read_write> outputTexture [[ texture(1) ]],
constant float &edgeStrength [[ buffer(0) ]],
constant AppliedColor &newColor [[ buffer(1) ]],
uint2 gid [[thread_position_in_grid]]) {
constexpr int kernelSize = 3;
constexpr int radius = kernelSize / 2;
half3x3 horizontalKernel = half3x3(-1, -2, -1,
0, 0, 0,
1, 2, 1);
half3x3 verticalKernel = half3x3(1, 0, -1,
2, 0, -2,
1, 0, -1);
half3 horizontalResult(0, 0, 0);
half3 verticalResult(0, 0, 0);
for(int j = 0; j <= kernelSize - 1; j++) {
for(int i = 0; i <= kernelSize - 1; i++) {
uint2 textureIndex(gid.x + (i - radius), gid.y + (j - radius));
horizontalResult += horizontalKernel[i][j] * inputTexture.read(textureIndex).rgb;
verticalResult += verticalKernel[i][j] * inputTexture.read(textureIndex).rgb;
}
}
half horizontalWhite = dot(horizontalResult.rgb, half3(1.0));
half verticalWhite = dot(verticalResult.rgb, half3(1.0));
half magnitude = length(half2(horizontalWhite, verticalWhite)) * edgeStrength;
outputTexture.write(half4(half3(newColor.rgb * magnitude), 1), gid);
}//edgeEffect
This is using Sobel kernels to calculate the derivatives.
Related
Trying to create deferred screenspace decals rendering in Metal by following this article. Though can't seem to figure it out...
These are bounds of the decal...
Actual result...
Potential issue
So apparently it doesn't think that the decal is intersecting the mesh, I'm sampling the depth value correctly, but then when calculating the actual position the the pixel in 3D space something doesn't add up.
Code
vertex VertexOut vertex_decal(
const VertexIn in [[ stage_in ]],
constant DecalVertexUniforms &uniforms [[ buffer(2) ]]
) {
VertexOut out;
out.position = uniforms.projectionMatrix * uniforms.viewMatrix * uniforms.modelMatrix * in.position;
out.viewPosition = (uniforms.viewMatrix * uniforms.modelMatrix * in.position).xyz;
out.normal = uniforms.normalMatrix * in.normal;
out.uv = in.uv;
return out;
}
fragment float4 fragment_decal(
const VertexOut in [[ stage_in ]],
constant DecalFragmentUniforms &uniforms [[ buffer(3) ]],
depth2d<float, access::sample> depthTexture [[ texture(0) ]]
) {
constexpr sampler textureSampler (mag_filter::nearest, min_filter::nearest);
float2 resolution = float2(
depthTexture.get_width(),
depthTexture.get_height()
);
float2 textureCoordinate = in.position.xy / resolution;
float depth = depthTexture.sample(textureSampler, textureCoordinate);
float3 viewRay = in.viewPosition * (uniforms.farClipPlane / in.viewPosition.z);
float3 viewPosition = viewRay * depth;
float3 worldPositon = (uniforms.inverseViewMatrix * float4(viewPosition, 1)).xyz;
float3 objectPositon = (uniforms.inverseModelMatrix * float4(worldPositon, 1)).xyz;
float distX = 0.5 - abs(objectPositon.x);
float distY = 0.5 - abs(objectPositon.y);
float distZ = 0.5 - abs(objectPositon.z);
if(distX > 0 && distY > 0 && distZ > 0) {
return float4(1, 0, 0, 0.5);
} else {
discard_fragment();
}
}
EDIT:
Made a bit of a progress, now it at least renders something, it clips the decal box correctly once its outside of some mesh, but the parts on the mesh are still not completely correct.. to be exact it also renders sides of the box that are overlapping with the mesh under the decal (you can see it on the image below as the red there is a bit darker)
And to add more details, the depthTexture is passed from previous "pass" so it only contains the icosphere on it, and the decal cube shader doesn't write to the depthTexture, just reads from it.
and depth stencil is defined as...
let stencilDescriptor = MTLDepthStencilDescriptor()
stencilDescriptor.depthCompareFunction = .less
stencilDescriptor.isDepthWriteEnabled = false
and render pipeline is defined as...
let renderPipelineDescriptor = MTLRenderPipelineDescriptor()
renderPipelineDescriptor.vertexDescriptor = vertexDescriptor
renderPipelineDescriptor.vertexFunction = vertexLibrary.makeFunction(name: "vertex_decal")
renderPipelineDescriptor.fragmentFunction = fragmentLibrary.makeFunction(name: "fragment_decal")
if let colorAttachment = renderPipelineDescriptor.colorAttachments[0] {
colorAttachment.pixelFormat = .bgra8Unorm
colorAttachment.isBlendingEnabled = true
colorAttachment.rgbBlendOperation = .add
colorAttachment.sourceRGBBlendFactor = .sourceAlpha
colorAttachment.destinationRGBBlendFactor = .oneMinusSourceAlpha
}
renderPipelineDescriptor.colorAttachments[1].pixelFormat = .bgra8Unorm
renderPipelineDescriptor.depthAttachmentPixelFormat = .depth32Float
so the current issue is that it discards only pixels that are out of the mesh that its being projected on, instead of all pixels that are "above" the surface of the icosphere
New Shader Code
fragment float4 fragment_decal(
const VertexOut in [[ stage_in ]],
constant DecalFragmentUniforms &uniforms [[ buffer(3) ]],
depth2d<float, access::sample> depthTexture [[ texture(0) ]]
) {
constexpr sampler textureSampler (mag_filter::nearest, min_filter::nearest);
float2 resolution = float2(
depthTexture.get_width(),
depthTexture.get_height()
);
float2 textureCoordinate = in.position.xy / resolution;
float depth = depthTexture.sample(textureSampler, textureCoordinate);
float3 screenPosition = float3(textureCoordinate * 2 - 1, depth);
float4 viewPosition = uniforms.inverseProjectionMatrix * float4(screenPosition, 1);
float4 worldPosition = uniforms.inverseViewMatrix * viewPosition;
float3 objectPosition = (uniforms.inverseModelMatrix * worldPosition).xyz;
if(abs(worldPosition.x) > 0.5 || abs(worldPosition.y) > 0.5 || abs(worldPosition.z) > 0.5) {
discard_fragment();
} else {
return float4(1, 0, 0, 0.5);
}
}
Finally managed to get it to work properly, so the final shader code is...
the issues that the latest shader had were...
Flipped Y axis on screenPosition
Not converting the objectPosition to NDC space (localPosition)
fragment float4 fragment_decal(
const VertexOut in [[ stage_in ]],
constant DecalFragmentUniforms &uniforms [[ buffer(3) ]],
depth2d<float, access::sample> depthTexture [[ texture(0) ]],
texture2d<float, access::sample> colorTexture [[ texture(1) ]]
) {
constexpr sampler depthSampler (mag_filter::linear, min_filter::linear);
float2 resolution = float2(
depthTexture.get_width(),
depthTexture.get_height()
);
float2 depthCoordinate = in.position.xy / resolution;
float depth = depthTexture.sample(depthSampler, depthCoordinate);
float3 screenPosition = float3((depthCoordinate.x * 2 - 1), -(depthCoordinate.y * 2 - 1), depth);
float4 viewPosition = uniforms.inverseProjectionMatrix * float4(screenPosition, 1);
float4 worldPosition = uniforms.inverseViewMatrix * viewPosition;
float4 objectPosition = uniforms.inverseModelMatrix * worldPosition;
float3 localPosition = objectPosition.xyz / objectPosition.w;
if(abs(localPosition.x) > 0.5 || abs(localPosition.y) > 0.5 || abs(localPosition.z) > 0.5) {
discard_fragment();
} else {
float2 textureCoordinate = localPosition.xy + 0.5;
float4 color = colorTexture.sample(depthSampler, textureCoordinate);
return float4(color.rgb, 1);
}
}
The final results look like this (red are pixels that are kept, blue pixels are discarded)...
I have sample metal code that I'm trying to convert to iOS. Is there an iOS compatible value that I can use for bt601?
#include <metal_stdlib>
#include "utilities.h" // error not found
using namespace metal;
kernel void laplace(texture2d<half, access::read> inTexture [[ texture(0) ]],
texture2d<half, access::read_write> outTexture [[ texture(1) ]],
uint2 gid [[ thread_position_in_grid ]]) {
constexpr int kernel_size = 3;
constexpr int radius = kernel_size / 2;
half3x3 laplace_kernel = half3x3(0, 1, 0,
1, -4, 1,
0, 1, 0);
half4 acc_color(0, 0, 0, 0);
for (int j = 0; j <= kernel_size - 1; j++) {
for (int i = 0; i <= kernel_size - 1; i++) {
uint2 textureIndex(gid.x + (i - radius), gid.y + (j - radius));
acc_color += laplace_kernel[i][j] * inTexture.read(textureIndex).rgba;
}
}
half value = dot(acc_color.rgb, bt601); //bt601 not defined
half4 gray_color(value, value, value, 1.0);
outTexture.write(gray_color, gid);
}
It seems that the intention here is simply to derive a single "luminance" value from the RGB output of the kernel. In that case, bt601 would be a three-element vector whose components are the desired weights of the respective channels, summing to 1.0.
Borrowing values from Rec. 601, we might define it like this:
float3 bt601(0.299f, 0.587f, 0.114f);
This is certainly a common choice. Another popular choice uses coefficients found in the Rec. 709 standard. That would look like this:
float3 bt709(0.212671f, 0.715160f, 0.072169f);
Both of these vectors will give you a single gray value that approximates the brightness of a linear sRGB color. Whether either of them is "correct" depends on the provenance of your data and how you process it further down the pipeline.
For whatever it's worth, the MetalPerformanceShaders MPSImageThresholdBinary kernel seems to favor the BT.601 values.
I'd recommend taking a look at this answer for more detail on the issues, and conditions under which the use of these values is appropriate.
I created a 3D texture from a LUT file on iOS as follows:
let dim = 33
let textureDescriptor = MTLTextureDescriptor()
textureDescriptor.textureType = .type3D
textureDescriptor.pixelFormat = .rgba32Float
textureDescriptor.width = dim
textureDescriptor.height = dim
textureDescriptor.depth = dim
textureDescriptor.usage = .shaderRead
let texture = device.makeTexture(descriptor: textureDescriptor)
texture!.replace(region: MTLRegionMake3D(0, 0, 0, dim, dim, dim),
mipmapLevel:0,
slice:0,
withBytes:values!,
bytesPerRow:dim * MemoryLayout<Float>.size * 4,
bytesPerImage:dim * dim * MemoryLayout<Float>.size * 4)
and then I try to use this LUT in fragment shader as follows:
fragment half4 fragmentShader ( MappedVertex in [[ stage_in ]],
texture2d<float, access::sample> inputTexture [[ texture(0) ]],
texture3d<float, access::sample> lutTexture [[ texture(1) ]]
)
{
constexpr sampler s(s_address::clamp_to_edge, t_address::clamp_to_edge, min_filter::linear, mag_filter::linear);
float3 rgb = inputTexture.sample(s, in.textureCoordinate).rgb;
float3 lookupColor = lutTexture.sample(s, rgb).rgb;
return half4(half3(lookupColor), 1.h);
}
I am afraid that I'm not getting the correct results. Is everything in the code perfect? Am I sampling the 3d texture correctly?
I have next function:
float4 blur(float rad, texture2d<float> tex2D, sampler sampler2D, float2 textureCoordinate){
float width = tex2D.get_width();
float height = tex2D.get_height();
float weight = 1 / ((2 * rad + 1) * (2 * rad + 1));
float4 blured_color = float4(0,0,0,0);
for(int i = -1 * rad; i <= rad; i++){
for (int j = -1 * rad; j <= rad; j++){
blured_color += tex2D.sample(sampler2D, textureCoordinate + float2(i/width, j/height)) * weight;
}
}
return blured_color;
}
It blurs given fragment.
My problem is that, when I call this function it doesn't work properly - it just make picture darker. But when I write the same code without wrapping it in function it works okay:
fragment float4 blured_background_fragment(VertexOut interpolated [[ stage_in ]],
texture2d<float> tex2D [[ texture(0) ]],
sampler sampler2D [[ sampler(0) ]])
{
float4 color = tex2D.sample(sampler2D, interpolated.textureCoordinate);
float3 color3 = float3(color[0] , color[1] , color[2]);
if (is_skin(color3) && !(interpolated.color[0] == 1 && interpolated.color[1] == 1 && interpolated.color[2] == 1)){
float width = tex2D.get_width();
float height = tex2D.get_height();
float rad = 13;
float weight = 1 / ((2 * rad + 1) * (2 * rad + 1));
float4 blured_color = float4(0,0,0,0);
for(int i = -1 * rad; i <= rad; i++){
for (int j = -1 * rad; j <= rad; j++){
blured_color += tex2D.sample(sampler2D, interpolated.textureCoordinate + float2(i/width, j/height)) * weight;
}
}
// Here I try to call this blur function
// float4 blured_color = blur(13, tex2D, sampler2D, interpolated.textureCoordinate);
return blured_color * 0.43 + color * 0.57;
}
else{
return tex2D.sample(sampler2D, interpolated.textureCoordinate);
}
}
I am writing a metal cnn code.
Metal provides MPSCNNLocalContrastNormalization,
Since the concept of Instance Normalization is slightly different, I intend to implement it as a Kernel Function.
However, the problem is that the mean and variance for each R, G, B should be obtained when feature is R, G, B in texture received from input in kernel function.
I want to get some hints on how to implement this.
kernel void instance_normalization_2darray(texture2d_array<float, access::sample> src [[ texture(0) ]],
texture2d_array<float, access::write> dst [[ texture(1) ]],
uint3 tid [[thread_position_in_grid]]) {
}
kernel void calculate_avgA(texture2d_array<float, access::read> texture_in [[texture(0)]],
texture2d_array<float, access::write> texture_out [[texture(1)]],
uint3 tid [[thread_position_in_grid]])
{
int width = texture_in.get_width();
int height = texture_in.get_height();
int depth = texture_in.get_array_size();
float4 outColor;
uint3 kernelIndex(0,0,0);
uint3 textureIndex(0,0,0);
for(int k = 0; k < depth; k++) {
outColor = (0.0, 0.0, 0.0, 0.0);
for (int i=0; i < width; i++)
{
for (int j=0; j < height; j++)
{
kernelIndex = uint3(i, j, k);
textureIndex = uint3(tid.x + i, tid.y + j, tid.z + k);
float4 color = texture_in.read(textureIndex.xy, textureIndex.z).rgba;
outColor += color;
}
}
outColor = outColor / (width * height);
texture_out.write(float4(outColor.rgba), tid.xy, textureIndex.z);
}
}
Mr.Bista
I had the same problem for this, apple didn't provide some function for this with fast speed.
And I just use MPSCNNPoolingAverage for caculate mean before kernels.
Maybe it is a temporary method for it.
And other algorithm is not better than this ,such as reduction sum algorithm after my test with codes.
So I will continue to track better implementation for this.