What are Apple's Metal (Metal Shader Language) texture coordinates? - ios

In iOS or OS/X what texture coordinates are used in Metal Shader Language kernel function? For example, given an MTLTexture and uint2 gid[[thread_position_in_grid]] Is gid.x and gid.ybetween 0..1 (x and y are floats) or 0..inTexture.get_width() (x and y are integers).
Thanks in Advance

thread_position_in_grid is an index (an integer) in the grid that takes values in the ranges you specify in dispatchThreadgroups:threadsPerThreadgroup:. It's up to you to decide how many thread groups you want, and how many threads per group.
In the following sample code you can see that threadsPerGroup.width * numThreadgroups.width == inputImage.width and threadsPerGroup.height * numThreadgroups.height == inputImage.height. In this case, a position in the grid will thus be a non-normalized (integer) pixel coordinate.

Each launch of a compute shader in Metal is accompanied by a dense rectangular 3D grid of thread IDs. The dimensions of the grid is set when you call [MTLComputeCommandEncoder dispatchThreadGroups:threadsPerThreadgroup:]. You can for example have a threadgroup size of {16,16,1} (256 threads in a threadgroup as a 16x16x1 square), and threadgroup count of {1,2,1}, which will cause two threadgroups to be launched with a total area of 512 threads in the shape {16,32,1}. These are the integers that appear at the top of your kernel as [[thread_position_in_grid]]. The thread position is the way that you tell which thread you are, just like the threadID parameter passed to a block by dispatch_apply().
Metal specifies no mapping from [[thread_position_in_grid]] to coordinates in a texture. This is done by you in software in your compute shader. If you want to read every other pixel in a region of a texture at some offset in the image, then you need to multiply the threadID by two and add an offset in your kernel before passing the new coordinate to texture2d.sample. Since Metal can not launch partial threadgroups, it is up to you to make sure that unneeded threadgroups are not executed. For example, when applied to a smaller texture, the full size of your 32x64 launch might cause you to write off the end of your texture. In this case you must check the threadID to see if the thread will write off the end and then either return out of the shader or skip over the texture write call for that thread to avoid the problem.

thread_position_in_grid is always made of unsigned integers, and provides these options, but none of them are related to texture coordinates. It may be helpful to ask another, related question, because you seem to be conflating the idea of textures and kernel functions.
16- or -32 bit
1D, 2D, or 3D

Related

How Metal distribute the image block to each thread group?

For example, if I want to do a grayscale transformation, I need to set up my threadsPerGroup and thread group in the following way.
NSUInteger maxTotalThreadsPerThreadgroup = [self.computePipelineState maxTotalThreadsPerThreadgroup];
MTLSize threadgroupCounts = MTLSizeMake(threadExecutionWidth * 2, threadExecutionWidth * 2, 1);
MTLSize threadsPerThreadGroup = MTLSizeMake([self.texutre width] / threadgroupCounts.width + 1,
[self.texutre height] / threadgroupCounts.height + 1,
1);
I know the image will be chopped into different blocks and each one will be processed by one thread group. But it seems in the kernel, we will just read the 2d texture, and then output the processed texture.
But the question is that how the image is chopped into different 2d texture? How do we know if each block of image get assigned to a thread to process? Is this done by Metal itself? Or we need to manually assign each block to each threadgroup by using the gid?
Metal doesn't know or care whether your shader is operating on an image. It doesn't "chop" the image or anything like that.
A compute shader is processed over a "grid". The grid is an abstraction. It's an arbitrary way for you to organize the work. Metal doesn't assign any significance to the grid, such as associating a position in the grid with a pixel in an image.
Such an association, if it exists, is implicit in how your shader code behaves. Yes, that is largely based on what the shader does with thread_position_in_grid, thread_position_in_threadgroup, thread_index_in_threadgroup, etc.
So, if you're using a gid variable with the thread_position_in_grid attribute, and you use its coordinates as image coordinates, then that usage is what dictates that each grid position corresponds to an image pixel. Once you do that, then it follows that each thread group corresponds to a block of the image, since a thread group is just a block of grid positions. Again, though, this is not something that Metal is doing, it's something that your shader is doing.
You could do something entirely different and Metal wouldn't care.

Depth Buffer Clear Behavior between Draw Calls?

I have been testing WebGL to see whether I can batch-draw polygons in a particular way. I am going to simplify the use case, but it goes something along the lines of the following:
First, my vertices are simply:
vertices[v0_xy0, v1_xyz, ... vn_xyz]
In my case, each vertex must have a z value in the range (0 - 100) (I pick 100 arbitrarily) because I want all of those vertices to be depth tested against each other using those z values. On batch N + 1, I am limited to depth values (0 - 100) again, but I need the vertices in this batch to be guaranteed to be drawn atop all previous batches (layers of vertices). In other words, vertices within each batch are depth tested against each, but each batch is just drawn atop the previous one as if there were no depth testing.
At first I was going to try drawing to a texture with a framebuffer and depthbuffer attachment, draw to the canvas, repeat for the next group of vertices, but I realized that I might be able to do just this:
// pseudocode
function drawBuffers()
// clear both the color and the depth
gl.clearDepth(1.0);
gl.clear(gl.CLEAR_COLOR_BUFFER_BIT | gl.DEPTH_BUFFER_BIT);
// iterate over all vertex batches
for each vertexBatch in vertexBatches do
// draw the batch with depth testing
gl.draw(vertexBatch);
// clear the depth buffer
/* QUESTION: does this guarantee that subsequent batches
will be drawn atop previous batches, or will the pixels be written at
random (sometimes underneath, sometimes above)?
*/
gl.clearDepth(1.0);
gl.clear(gl.DEPTH_BUFFER_BIT);
endfor
end drawBuffers
I tested the above by drawing two overlapping quads, clearing the depth buffer, translating left and in negative z (in an attempt to "go under" the previous batch), and drawing the two overlapping quads again. I think that this works because I see that the second pair of quads are drawn in front of the first pair even though their z values are behind the previous pair's z values;
I am not certain that my test is reliable though. Could there be some undefined behavior involved? Is it just a coincidence that my test works as a result of the clearDepth setting and shapes?
May I have clarification so I can confirm whether my method will work for sure?
Thank you.
Since WebGL is based on OpenGL ES see OpenGL ES 1.1 Full Specification, 4.1.6 Depth Buffer Test, page 104:
The depth buffer test discards the incoming fragment if a depth comparison fails.
....
The comparison is specified with
void DepthFunc( enum func );
This command takes a single symbolic constant: one of NEVER, ALWAYS, LESS, LEQUAL, EQUAL, GREATER, GEQUAL, NOTEQUAL. Accordingly, the depth buffer test passes never, always, if the incoming fragment’s zw value is less than, less than or equal to, equal to, greater than, greater than or equal to, or not equal to the depth value stored at the location given by the incoming fragment’s (xw, yw) coordinates.
This means, if the clear value for the depth buffer glClearDepth is 1.0 (1.0 is the initial value)
gl.clearDepth(1.0);
and the depth buffer is cleared
gl.clear(gl.DEPTH_BUFFER_BIT);
and the depth function glDepthFunc is LESS or LEQUAL (LESS is the initial value)
gl.enable(gl.DEPTH_TEST);
gl.depthFunc(gl.LEQUAL);
then the next fragment which is drawn to any (xw, yw) coordinates, will pass the depth test and will overwrite the fragment stored at the location (xw, yw).
(Of course gl.BLEND has to be disabled and the fragment has to be in clip space)

iOS Metal Shader - Texture read and write access?

I'm using a metal shader to draw many particles onto the screen. Each particle has its own position (which can change) and often two particles have the same position. How can I check if the texture2d I write into does not have a pixel at a certain position yet? (I want to make sure that I only draw a particle at a certain position if there hasn't been drawn a particle yet, because I get an ugly flickering if many particles are drawn at the same positon)
I've tried outTexture.read(particlePosition), but this obviously doesn't work, because of the texture access qualifier, which is access::write.
Is there a way I can have read and write access to a texture2d at the same time? (If there isn't, how could I still solve my problem?)
There are several approaches that could work here. In concurrent systems programming, what you're talking about is termed first-write wins.
1) If the particles only need to preclude other particles from being drawn (and aren't potentially obscured by other elements in the scene in the same render pass), you can write a special value to the depth buffer to signify that a fragment has already been written to a particular coordinate. For example, you'd turn on depth test (using the depth compare function Equal), clear the depth buffer to some distant value (like 1.0), and then write a value of 0.0 to the depth buffer in the fragment function. Any subsequent write to a given pixel will fail to pass the depth test and will not be drawn.
2) Use framebuffer read-back. On iOS, Metal allows you to read from the currently-bound primary renderbuffer by attributing a parameter to your fragment function with [[color(0)]]. This parameter will contain the current color value in the renderbuffer, which you can test against to determine whether it has been written to. This does require you to clear the texture to a predetermined color that will never otherwise be produced by your fragment function, so it is more limited than the above approach, and possibly less performant.
All of the above applies whether you're rendering to a drawable's texture for direct presentation to the screen, or to some offscreen texture.
To answer the read and write part : you can specify a read/write access for the output texture as such :
texture2d<float, access::read_write> outTexture [[texture(1)]],
Also, your texture descriptor must specify usage :
textureDescriptor?.usage = [.shaderRead, .shaderWrite]

Is DirectX 11 compute capable of writing more than 10k vertices to a RWStructuredBuffer?

I have a vertex buffer with an unordered access view, which I'm using to fill the vertices using a compute shader, which treats the UAV as a RWStructuredBuffer, using an equivalent struct to the vertex definition. There are 216000 vertices (i.e. 60 x 60 x 60). But my compute shader seems to fill only about 8000 of them, leaving the rest with their initial values. Is there a limit on the number of elements in a structured buffer that can be written in this way?
As it turns out, if you turn on DirectX error-checking, assigning the UAV of a vertex buffer as a RWStructuredBuffer in the shader is considered to be an error. So although this actually works (for a limited number of vertices), it's not supported.

Difference between Texture2D and Texture2DMS in DirectX11

I'm using SharpDX and I want to do antialiasing in the Depth buffer. I need to store the Depth Buffer as a texture to use it later. So is it a good idea if this texture is a Texture2DMS? Or should I take another approach?
What I really want to achieve is:
1) Depth buffer scaling
2) Depth test supersampling
(terms I found in section 3.2 of this paper: http://gfx.cs.princeton.edu/pubs/Cole_2010_TFM/cole_tfm_preprint.pdf
The paper calls for a depth pre-pass. Since this pass requires no color, you should leave the render target unbound, and use an "empty" pixel shader. For depth, you should create a Texture2D (not MS) at 2x or 4x (or some other 2Nx) the width and height of the final render target that you're going to use. This isn't really "supersampling" (since the pre-pass is an independent phase with no actual pixel output) but it's similar.
For the second phase, the paper calls for doing multiple samples of the high-resolution depth buffer from the pre-pass. If you followed the sizing above, every pixel will correspond to some (2N)^2 depth values. You'll need to read these values and average them. Fortunately, there's a hardware-accelerated way to do this (called PCF) using SampleCmp with a COMPARISON sampler type. This samples a 2x2 stamp, compares each value to a specified value (pass in the second-phase calculated depth here, and don't forget to add some epsilon value (e.g. 1e-5)), and returns the averaged result. Do 2x2 stamps to cover the entire area of the first-phase depth buffer associated with this pixel, and average the results. The final result represents how much of the current line's spine corresponds to the foremost depth of the pre-pass. Because of the PCF's smooth filtering behavior, as lines become visible, they will slowly fade in, as opposed to the aliased "dotted" line effect described in the paper.

Resources