Metal supports kernel in addition to the standard vertex and fragment functions. I found a metal kernel example that converts an image to grayscale.
What exactly is the difference between doing this in a kernel vs fragment? What can a compute kernel do (better) that a fragment shader can't and vice versa?
Metal has four different types of command encoders:
MTLRenderCommandEncoder
MTLComputeCommandEncoder
MTLBlitCommandEncoder
MTLParallelRenderCommandEncoder
If you're just doing graphics programming, you're most familiar with the MTLRenderCommandEncoder. That is where you would set up your vertex and fragment shaders. This is optimized to deal with a lot of draw calls and object primitives.
The kernel shaders are primarily used for the MTLComputeCommandEncoder. I think the reason a kernel shader and a compute encoder were used for the image processing example is because you're not drawing any primitives as you would be with the render command encoder. Even though both methods are utilizing graphics, in this instance it's simply modifying color data on a texture rather than calculating depth of multiple objects on a screen.
The compute command encoder is also more easily set up to do parallel computing using threads:
https://developer.apple.com/reference/metal/mtlcomputecommandencoder
So if your application wanted to utilize multithreading on data modification, it's easier to do that in this command encoder than the render command encoder.
Related
I am writing a simple engine for a simple game, so far I enjoy my little hobby project but the game grew and it now has roughly 800 of game objects at a time in a scene.
Every object, just like in Unity, has a transform component that calculates transformation matrix when the component is initialized. I started to notice that with 800 objects it takes 5.4 milliseconds just to update each matrix (for example if every object has moved) without any additional components or anything else.
I use GLKit math library, which for some reason faster than using native simd types. using simd types triples the time of calculation
here is a pice of code that runs it
let Translation : GLKMatrix4 = GLKMatrix4MakeTranslation(position.x, position.y, position.z)
let Scale : GLKMatrix4 = GLKMatrix4MakeScale(scale.x, scale.y, scale.z)
let Rotation : GLKMatrix4 = GLKMatrix4MakeRotationFromEulerVector(rotation)
//Produce model matrix
let SRT = GLKMatrix4Multiply(Translation, GLKMatrix4Multiply(Rotation, Scale))
Question: I am looking for a way to optimize this so I can use more game objects. and utilize more components on my objects
There could be multiple bottlenecks in your program.
Optimise your frame dependencies to avoid stalls as much as possible, e.g. by precomputing frame data on CPU. This is a good resource to learn about this technique.
Make sure that all matrices are stored in one MTLBuffer which is indexed from your vertex stage
On Apple silicon and iOS use MTLResourceStorageModeShared
If you really want to scale to tens of thousands of objects, then compute your matrices in a compute shader to store them in an MTLBuffer. Then, use indirect rendering to issue your draw calls.
In general, learn about AZDO.
Learn about compute shaders: https://developer.apple.com/documentation/metal/basic_tasks_and_concepts/performing_calculations_on_a_gpu
Learn about indirect rendering:
https://developer.apple.com/documentation/metal/indirect_command_buffers
I'm rewriting an OpenGL filter from the Android version of the app I'm currently working at in Metal. It uses the following texture lookup function:
vec4 texture2D(sampler2D sampler, vec2 coord, float bias)
Assuming my filter kernel function looks like this:
float4 fname(sampler src) {
...
}
The texture lookup call would be the following:
src.sample(coord)
But how can I pass the bias parameter? (the sample function takes only 1 argument)
I'm afraid Core Image only supports 2D textures – no mipmapping and LOD selection. Only bilinear sampling is available.
If you need different LODs, you need to pass different samplers to your kernel and do the interpolation yourself.
I'm porting a directx hlsl script to webgl 2, but I cannot find the equivalent of a structuredbuffer.
I can only see a constant buffer which are limited to 64k size and use aligning. Should I split the structuredbuffers to constant buffers?
The more-or-less equivalent in OpenGL land of D3D's StructuredBuffers are Shader Storage Buffer Objects. However, WebGL 2.0 is based on OpenGL ES 3.0, which does not include SSBOs.
I have lately learned a shader.
Speeking of this as I know simply,
First, Make a buffer that saves vertices information.
Then make a shader file and compile.
Finally, Set a shader and Draw.
But studying code, I guess that there is no direct connection between
a shader and buffer has vertices. So I wonder How can a shader read a vertex information? Just does a shader read a existent buffer?
I am not sure that my intend will be well delivered.
Because I can't speak English well. I hope you guys understand me.
You are not mentioned about the InputLayout, to render it is necessary to define in the context:
Vertex buffer,
Index buffer (optional),
Input layout (how the data will be distributed in the Vertex Shader parameters, sizes, types, "offset for each stride"),
VS and PS
While loading texture within kernel function in Metal, is it possible to find the default z-value (if it exists at all) of the texture being sampled, the z-near and z-far values (likewise, if these values exist at all when the kernel is used instead of the normal pipeline using shaders) of the space in which the texture resides?
What I am trying to understand is:
When sampling a texture within kernel function, is it possible for us to change (or set) the z value of the texture before writing it? I have not been able to find this information along with the z-near and z-far values (is it even possible that we define these values manually when using the kernel function?) from the documentation.
Thanks.