I'm trying to convert a Tensorflow graph to CoreML and I'm following this tutorial. There's this bit of code that I don't quite understand:
#include <metal_stdlib>
using namespace metal;
kernel void swish(
texture2d_array<half, access::read> inTexture [[texture(0)]],
texture2d_array<half, access::write> outTexture [[texture(1)]],
ushort3 gid [[thread_position_in_grid]])
{
if (gid.x >= outTexture.get_width() ||
gid.y >= outTexture.get_height()) {
return;
}
const float4 x = float4(inTexture.read(gid.xy, gid.z));
const float4 y = x / (1.0f + exp(-x));
outTexture.write(half4(y), gid.xy, gid.z);
}
What I don't understand is the use of gid here. Isn't the grid 2 dimensional? What does gid.z signify? Isn't gid.x is the current x-coordinate of the current pixel?
gid.x and gid.y are the x/y coordinate of the current pixel. So when you do a texture.read(gid.xy) it gives you 4 channels worth of pixel data.
But the "images" used in neural networks may have many more than 4 channels. That's why the data type for the textures is texture2d_array<> instead of just texture2d<>.
The gid.z value refers to the index of the texture "slice" in this array. If the image/tensor has 32 channels, then there are 8 texture slices (because each texture stores up to 4 channels of data).
So the grid really is three dimensional: (x, y, slice).
Related
I'm trying to teach myself the basics of computer graphics on the iPhone and Apple's Metal API. I'm trying to do something pretty basic, but I'm getting a little stuck.
What I want to do is just "texture a quad". Basically, I make a rectangle and I have an image texture that covers the rectangle. I can make that work for the basic case where the image texture just comes from an image of a known format, but I'm having trouble figuring out how to make my code a little more generic and able to handle different formats.
For example, sometimes the image texture comes from an image file, which after decoding it, the pixel data is in the RGB format. Sometimes, my image texture actually comes from a video frame where the data is stored in the YUV format.
Ideally, I'd want to create some sort of "sampler" object or function that can just hand me back an RGB color for a particular texture coordinate. In the code where I prepare for rendering, that's the part with context on which format is getting used, and so it would have enough information to figure out which type of sampler should get used. For example, in the video frame case, it knows that it's working with a video frame and so it creates a YUV sampler and passes it the relevant data. And then from my shader code that just wants to read colors, it can just ask for the color at some particular coordinates, and the YUV sampler would do the proper work to compute the right RGB color. If I passed in an RGB sampler instead, it would just read the RGB data without doing any sort of calculations.
I thought this would be really simple to do? I feel like this has to be a common problem for graphics code that deals with textures in different formats, or colorspaces, or whatever? Am I missing something obvious?
How do you do this without writing a bunch of versions of all of your shaders?
Here are functions for transforming RGBA to YUVA and vice versa on the fly.
float4 rgba2yuva(float4 rgba)
{
float4 yuva = float4(0.0);
yuva.x = rgba.r * 0.299 + rgba.g * 0.587 + rgba.b * 0.114;
yuva.y = rgba.r * -0.169 + rgba.g * -0.331 + rgba.b * 0.5 + 0.5;
yuva.z = rgba.r * 0.5 + rgba.g * -0.419 + rgba.b * -0.081 + 0.5;
yuva.w = rgba.a;
return yuva;
}
float4 yuva2rgba(float4 yuva)
{
float4 rgba = float4(0.0);
rgba.r = yuva.x * 1.0 + yuva.y * 0.0 + yuva.z * 1.4;
rgba.g = yuva.x * 1.0 + yuva.y * -0.343 + yuva.z * -0.711;
rgba.b = yuva.x * 1.0 + yuva.y * 1.765 + yuva.z * 0.0;
rgba.a = yuva.a;
return rgba;
}
I adapted the code from here: https://github.com/libretro/glsl-shaders/blob/master/nnedi3/shaders/
Simple OpenGL shaders are quite straightforward to port to Metal. I pretty much just changed the datatype vec4 to float4. If you want a half version, just substitute float4 for half4.
metal shader function ARK, now you can use #Jeshua Lacock to convert between the two.
// tweak your color offsets as desired
#include <metal_stdlib>
using namespace metal;
kernel void YUVColorConversion(texture2d<uint, access::read> yTexture [[texture(0)]],
texture2d<uint, access::read> uTexture [[texture(1)]],
texture2d<uint, access::read> vTexture [[texture(2)]],
texture2d<float, access::write> outTexture [[texture(3)]],
uint2 gid [[thread_position_in_grid]])
{
float3 colorOffset = float3(0, -0.5, -0.5);
float3x3 colorMatrix = float3x3(
float3(1, 1, 1),
float3(0, -0.344, 1.770),
float3(1.403, -0.714, 0)
);
uint2 uvCoords = uint2(gid.x / 2, gid.y / 2);
float y = yTexture.read(gid).r / 255.0;
float u = uTexture.read(uvCoords).r / 255.0;
float v = vTexture.read(uvCoords).r / 255.0;
float3 yuv = float3(y, u, v);
float3 rgb = colorMatrix * (yuv + colorOffset);
outTexture.write(float4(float3(rgb), 1.0), gid);
}
Good ref here , and then you can build pipelines or variants for processing specifically what you need like here
#include <metal_stdlib>
#include <simd/simd.h>
#include <metal_texture>
#include <metal_matrix>
#include <metal_geometric>
#include <metal_math>
#include <metal_graphics>
#include "AAPLShaderTypes.h"
using namespace metal;
// Variables in constant address space.
constant float3 lightPosition = float3(0.0, 1.0, -1.0);
// Per-vertex input structure
struct VertexInput {
float3 position [[attribute(AAPLVertexAttributePosition)]];
float3 normal [[attribute(AAPLVertexAttributeNormal)]];
half2 texcoord [[attribute(AAPLVertexAttributeTexcoord)]];
};
// Per-vertex output and per-fragment input
typedef struct {
float4 position [[position]];
half2 texcoord;
half4 color;
} ShaderInOut;
// Vertex shader function
vertex ShaderInOut vertexLight(VertexInput in [[stage_in]],
constant AAPLFrameUniforms& frameUniforms [[ buffer(AAPLFrameUniformBuffer) ]],
constant AAPLMaterialUniforms& materialUniforms [[ buffer(AAPLMaterialUniformBuffer) ]]) {
ShaderInOut out;
// Vertex projection and translation
float4 in_position = float4(in.position, 1.0);
out.position = frameUniforms.projectionView * in_position;
// Per vertex lighting calculations
float4 eye_normal = normalize(frameUniforms.normal * float4(in.normal, 0.0));
float n_dot_l = dot(eye_normal.rgb, normalize(lightPosition));
n_dot_l = fmax(0.0, n_dot_l);
out.color = half4(materialUniforms.emissiveColor + n_dot_l);
// Pass through texture coordinate
out.texcoord = in.texcoord;
return out;
}
// Fragment shader function
fragment half4 fragmentLight(ShaderInOut in [[stage_in]],
texture2d<half> diffuseTexture [[ texture(AAPLDiffuseTextureIndex) ]]) {
constexpr sampler defaultSampler;
// Blend texture color with input color and output to framebuffer
half4 color = diffuseTexture.sample(defaultSampler, float2(in.texcoord)) * in.color;
return color;
}
I have an MTLTexture in RGBA8Unorm format, and a screen texture (in MTKView) in BGRA8Unorm format (reversed). In the Metal shader, when I sample from that texture using sample(), I get float4. When I write to texture in metal shader, I also write float4. It seems that when I am inside the shader code, float4 always represents the same order of components RGBA regardless of the original format the texture is in ([0] for red, [1] for green, [2] for blue, and [3] for alpha). Is my conclusion correct that the meaning of the components of the sampled/written float4 is always the same inside the shader, regardless of what the storage format of the texture is?
UPDATE: I use the following code to write to a texture with RGBA8Unnorm format:
kernel void
computeColourMap(constant Uniforms &uniforms [[buffer(0)]],
constant array<float, 120> &s [[buffer(1)]],
constant array<float, 120> &red [[buffer(2)]],
constant array<float, 120> &green [[buffer(3)]],
constant array<float, 120> &blue [[buffer(4)]],
texture2d<float, access::write> output [[texture(0)]],
uint2 id [[thread_position_in_grid]])
{
if (id.x >= output.get_width() || id.y >= output.get_height()) {
return;
}
uint i = id.x % 120;
float4 col (0, 0, 0, 1);
col.x += amps[i] * red[i];
col.y += amps[i] * green[i];
col.z += amps[i] * blue[i];
output.write(col, id);
}
I then use the following shaders for the rendering stage:
vertex VertexOut
vertexShader(const device VertexIn *vertexArray [[buffer(0)]],
unsigned int vid [[vertex_id]])
{
VertexIn vertex_in = vertexArray[vid];
VertexOut vertex_out;
vertex_out.position = vertex_in.position;
vertex_out.textureCoord = vertex_in.textureCoord;
return vertex_out;
}
fragment float4
fragmentShader(VertexOut interpolated [[stage_in]],
texture2d<float> colorTexture [[ texture(0) ]])
{
const float4 colorSample = colorTexture.sample(nearestSampler,
interpolated.textureCoord);
return colorSample;
}
where colourTexture passed into the fragment shader is the one I generated in RGBA8Unorm format, and in Swift I have:
let renderPipelineDescriptor = MTLRenderPipelineDescriptor()
renderPipelineDescriptor.vertexFunction = library.makeFunction(name: "vertexShader")!
renderPipelineDescriptor.fragmentFunction = library.makeFunction(name: "fragmentShader")!
renderPipelineDescriptor.colorAttachments[0].pixelFormat = colorPixelFormat
the colorPixelFormat of the MTKView is BGRA8Unorm (reversed relative to texture), which is not the same as my texture, but the colours on the screen come out correct.
UPDATE 2: one further pointer that within a shader the colour represented by float4 always has order of rgba is: float4 type actually has accessors called v.r, v.g, v.b, v.rgb, etc...
The vector always has 4 components, but the type of the components is not necessarily float. When you declare a texture, you specify the component type as a template argument (texture2d<float ...> in your code).
For example, from Metal Shading Language Specification v2.1, section 5.10.1:
The following member functions can be used to sample from a 1D
texture.
Tv sample(sampler s, float coord) const
Tv is a 4-component vector type based on the templated type used
to declare the texture type. If T is float, Tv is float4. If T is half,
Tv is half4. If T is int, Tv is int4. If T is uint, Tv is uint4. If T
is short, Tv is short4 and if T is ushort, Tv is ushort4.
The same Tv type is used in the declaration of write(). The functions for other texture types are documented in a similar manner.
And, yes, component .r always contains the red component (if present), etc. And [0] always corresponds to .r (or .x).
I have a metal kernel function. Usually you access pixels like this:
kernel void edgeDetect(texture2d<half, access::sample> inTexture [[ texture(0) ]],
texture2d<half, access::write> outTexture [[ texture(1) ]],
device const uint *roi [[ buffer(0) ]],
uint2 grid [[ thread_position_in_grid ]]) {
if (grid.x >= outTexture.get_width() || grid.y >= outTexture.get_height()) {
return;
}
half c[9];
for (int i=0; i < 3; ++i) {
for (int j=0; j < 3; ++j) {
c[3*i+j] = inTexture.read(grid + uint2(i-1,j-1)).x;
}
}
half3 Lx = 2.0*(c[7]-c[1]) + c[6] + c[8] - c[2] - c[0];
half3 Ly = 2.0*(c[3]-c[5]) + c[6] + c[0] - c[2] - c[8];
half3 G = sqrt(Lx*Lx+Ly*Ly);
outTexture.write(half4(G, 0.0), grid);
}
Now I need to access pixels in the neighbourhood of the current grid position like this:
half4 inColor = inTexture.read(grid - uint2(-1,-1));
Basically this works, but on the thread boundaries I have "discontinuities" as shown in this image (the brick wall pattern).
This is clear since each thread is passed only it's sub-texture to process. So beyond thread boundaries I can't access pixels.
My question is: What is the concept when I need to address pixels beyond the current position in a compute kernel ? Is this possible with compute kernels at all ?
I have found the issue:
The line
c[3*i+j] = inTexture.read(grid + uint2(i-1,j-1)).x;
must be changed to:
c[3*i+j] = inTexture.read(grid + uint2(i,j)).x;
Obvisouly the position indices of -1 into the texture failed and produced the brick wall like artefacts shown in the image above.
To ensure somebody has attached it to this comment as an answer: there is no restriction on which pixels you can access in a compute shader. Your grid size affects scheduling only.
Your error is instantiating unsigned uint2 with negative numbers. At the first iteration of your loop you will attempt to construct uint2(-1, -1), which is the same as uint2(4294967295, 4294967295) and therefore way out of bounds.
You can use int2, or as per your self-answer just avoid negative numbers.
I have two different textures. One is Colored one and other is simply alpha image I want to mask both Image Textures. How I can do that in Metal Shader Language. one texture is 128*128 size other is 256*256 in size. I want the mask texture in the size of 128*128.
fragment float4 fragmentShaderone(VertexOut params[[stage_in]],
texture2d<float, access::sample>srcTexture [[texture(0)]],
texture2d<float, access::sample> maskTexture [[texture(1)]])
{
constexpr sampler defaultSampler;
float srcColor = float4(texture.sample(defaultSampler, float2(params.textureCoordinates))) * float4(1,0,0,40.0/255.0);
float4 maskColor = float4(texture4.sample(defaultSampler, float2(params.textureCoordinates))) ;
return srcColor * maskColor
}
Here In sampling Texture I am using same coordinates for mask and source image.
I generate simple 2D grid with triangle strip representing water surface. First generated vertex has position [0,0] and the last one has [1,1]. For my water simulation I need to store current positions of vertices to a texture and then sample these values from the texture in the next frame to get the previous state of the water surface.
So, I created the texture in a size of vertices. For example if I will have a 10x10 vertices grid, I use a texture with 10x10 pixels (one pixel for one vertex data). And set this texture as a render target to render all vertex data into it.
According to this: MSDN Coordinate Systems, If I will use current positions of vertices in the grid (bottom-left at [0;0], top-right at [1;1]), rendered texture looks like this:
So I need to do some conversion to NDC. I convert it in a vertex shader like this:
[vertex.x * 2 - 1; vertex.y * 2 - 1]
Consider this 3x3 grid:
Now, grid is stretched to whole texture size. Texture coordinates are different from NDC and apparently I can use original coordinates of the grid (before conversion) to sample values from the texture and get previous values (positions) of vertices.
Here is a sample of my vertex/pixel shader code:
This vertex shader converts coordinates and sends it to pixel shader with SV_POSITION semantics (describes the pixel location).
struct VertexInput
{
float4 pos : POSITION;
float2 tex : TEXCOORD;
};
struct VertexOutput
{
float4 pos : SV_POSITION;
float2 tex : TEXCOORD;
};
// convertes coordinates from 0,0 origin to -1,-1, etc.
float2 toNDC(float2 px)
{
return float2(px.x * 2 - 1, px.y * 2 - 1);
}
VertexOutput main( VertexInput input )
{
VertexOutput output;
float2 ndc = toNDC(float2(input.pos.x, input.pos.z));
output.pos = float4(ndc, 1, 1);
output.tex = float2(input.pos.x, input.pos.z);
return output;
}
And here's the pixel shader saving values from vertex shader at defined pixel location (SV_POSITION).
struct PixelInput
{
float4 pos : SV_POSITION;
float2 tex : TEXCOORD;
};
float4 main(PixelInput input) : SV_TARGET
{
return float4(input.tex.x, input.tex.y, 0, 1);
}
And we're finally getting to my problem! I use graphics debugger in Visual Studio 2012 which allows me to look at the rendered texture and its values. I would expect that at the pixel location [0,1] (in texel coordinate system) should be value [0,0] (or [0,0,0,1] to be precise, for RGBA format) but it seems that value of final pixel is interpolated between 3 vertices and I have a wrong value for a given vertex.
Screenshot from VS graphics debugger:
Rendered 3x3 texture ([0;1] location in texel coordinate system):
Values from vertex and pixel shader:
How to render the exact value from vertex shader to texture for a given pixel?
I am pretty new to computer graphics and Direct3D 11, so please excuse my deficiencies.