When using GLSL, it's easy writing into specific mipmap level.
But I found out it seems missing in Metal shading language.
Well, I might be wrong. Maybe there are some workaround.
You have two options here:
If you are using Metal 2.3 or higher, you can use void write(Tv color, uint2 coord, uint lod = 0) or void write(Tv color, ushort2 coord, ushort lod = 0) methods on metal::texture2d. The problem is, that even with Metal 2.3 lod must be 0 on Intel and AMD GPUs.
To work around that limitation, you can make an MTLTexture view using newTextureViewWithPixelFormat:textureType:levels:slices: (https://developer.apple.com/documentation/metal/mtltexture/1515409-newtextureviewwithpixelformat?language=objc) for the level you want to write.
Related
In the compute shader, I can see that bicubic is an option but only if __HAVE_BICUBIC_FILTERING__ is defined. When I set bicubic as the filtering option, I get a syntax error. Linear or Nearest compile fine. I'm building for metal 2 on MacOS.
Another method would be to define the sampler on the CPU with a sampler descriptor but there is no option for bicubic there either.
The comments in this thread discuss their inability to set bicubic.
I've searched the metal shading language specification, and bicubic is not mentioned there at all. Any help would be appreciated.
It depends on the type of device, operating system (version and type), and processor architecture.
The code below can be easy compiled on the following configurations: iOS 15, iPhone 12 / 13, Xcode 13.
#if defined(__HAVE_BICUBIC_FILTERING__)
constexpr sampler textureSampler (mag_filter::bicubic, min_filter::bicubic);
const half4 colorSample = colorTexture.sample (textureSampler, in.textureCoordinate);
return float4(colorSample);
#endif
I'm trying to learn MSL through the Metal Shading Language Specification, and saw that you can set LOD options when sampling a texture by specifying the options in the sample function. This is one of the examples given in the spec:
Tv sample(sampler s, float2 coord, lod_options options, int2 offset = int2(0)) const
lod_options include bias, level, gradient2d, etc.
I've looked all over but cannot find the usage syntax for this. Are these named arguments? Is lod_options a struct? For example, if I want to specify the LOD level, what is the correct way to do it? I know these options can also be specified in the sampler object itself, but if I want to do it here, what would be the right syntax to do so?
There is no lod_options type as such; you can think of it as a placeholder for one of the bias, level, gradient2d, etc. types. Each of these types is a different struct, which allows the Metal standard library to have an overloaded variant of the sample function for each such option.
To specify, for example, the mipmap level to sample, you'd provide a parameter of level type:
float4 color = myTexture.sample(mySampler, coords, level(1));
I'm porting a directx hlsl script to webgl 2, but I cannot find the equivalent of a structuredbuffer.
I can only see a constant buffer which are limited to 64k size and use aligning. Should I split the structuredbuffers to constant buffers?
The more-or-less equivalent in OpenGL land of D3D's StructuredBuffers are Shader Storage Buffer Objects. However, WebGL 2.0 is based on OpenGL ES 3.0, which does not include SSBOs.
Metal supports kernel in addition to the standard vertex and fragment functions. I found a metal kernel example that converts an image to grayscale.
What exactly is the difference between doing this in a kernel vs fragment? What can a compute kernel do (better) that a fragment shader can't and vice versa?
Metal has four different types of command encoders:
MTLRenderCommandEncoder
MTLComputeCommandEncoder
MTLBlitCommandEncoder
MTLParallelRenderCommandEncoder
If you're just doing graphics programming, you're most familiar with the MTLRenderCommandEncoder. That is where you would set up your vertex and fragment shaders. This is optimized to deal with a lot of draw calls and object primitives.
The kernel shaders are primarily used for the MTLComputeCommandEncoder. I think the reason a kernel shader and a compute encoder were used for the image processing example is because you're not drawing any primitives as you would be with the render command encoder. Even though both methods are utilizing graphics, in this instance it's simply modifying color data on a texture rather than calculating depth of multiple objects on a screen.
The compute command encoder is also more easily set up to do parallel computing using threads:
https://developer.apple.com/reference/metal/mtlcomputecommandencoder
So if your application wanted to utilize multithreading on data modification, it's easier to do that in this command encoder than the render command encoder.
Apple says in their Best Practices For Shaders to avoid branching if possible, and especially branching on values calculated within the shader. So I replaced some if statements with the built-in clamp() function. My question is, are clamp(), min(), and max() likely to be more efficient, or are they merely convenience (i.e. macro) functions that simply expand to if blocks?
I realize the answer may be implementation dependent. In any case, the functions are obviously cleaner and make plain the intent, which the compiler could do something with.
Historically speaking GPUs have supported per-fragment instructions such as MIN and MAX for much longer than they have supported arbitrary conditional branching. One example of this in desktop OpenGL is the GL_ARB_fragment_program extension (now superseded by GLSL) which explicitly states that it doesn't support branching, but it does provide instructions for MIN and MAX as well as some other conditional instructions.
I'd be pretty confident that all GPUs will still have dedicated hardware for these operations given how common min(), max() and clamp() are in shaders. This isn't guaranteed by the specification because an implementation can optimize code however it sees fit, but in the real world you should use GLSL's built-in functions rather than rolling your own.
The only exception would be if your conditional was being used to avoid a large amount of additional fragment processing. At some point the cost of a branch will be less than the cost of running all the code in the branch, but the balance here will be very hardware dependent and you'd have to benchmark to see if it actually helps in your application on its target hardware. Here's the kind of thing I mean:
void main() {
vec3 N = ...;
vec3 L = ...;
float NDotL = dot(N, L);
if (NDotL > 0.0)
{
// Lots of very intensive code for an awesome shadowing algorithm that we
// want to avoid wasting time on if the fragment is facing away from the light
}
}
Just clamping NDotL to 0-1 and then always processing the shadow code on every fragment only to multiply through your final shadow term by NDotL is a lot of wasted effort if NDotL was originally <= 0, and we can theoretically avoid this overhead with a branch. The reason this kind of thing is not always a performance win is that it is very dependent on how the hardware implements shader branching.