I tried to create a kernel with parameter
device int &errorC [[ buffer(2) ]],
and set the buffer using
[encoder setBytes:&count length:sizeof(int) atIndex: 2];
But I receives error message saying that failed assertion Compute Function: Bytes are being bound at index 2 to a shader argument with write access enabled.'
Why? It seems that I should not use setBytes. But how can I set an integer and the kernel can write back to it?
The -setBytes:... method can only provide constant data, not writable device data. This method of providing data can be more efficient than providing a buffer of your own precisely because it may (behind the scenes) not use a writable buffer to hold the data.
Among other things, if you want the CPU to be able to read the value written to errorC, note that there's no way when you use -setBytes:.... There's no -getBytes... method.
If you want the data to be writable, you do need to provide a buffer using the -setBuffer:offset:atIndex: method.
Related
I want to implement an A-Buffer algorithm for order-independent-transparency in my Metal application. The description of the technique mentions using an atomic counter. I've never used one of these or even heard of them. I just read about atomic variables in the Metal Shading Language Specification, but I can't figure out how to actually implement or use one.
Does anyone have experience with these in Metal? Can you point me to an example of how to set up and use a simple integer counter? Basically each render pass I need to be able to increment an integer from within the fragment shader, starting from zero. This is used to index into the A-Buffer.
Thanks!
Well, your question is lacking sufficient detail to provide much more than a general overview. You might consider adding an incomplete shader function, with pseudo-code where you're not sure how to implement something.
Anyway, an atomic counter is a variable of type atomic_uint (or atomic_int if you need sign). To be useful, the variable needs to be shared across a particular address space. Your example sounds like it needs device address space. So, you would want a device variable backed by a buffer. You would declare it as:
fragment FragmentOut my_fragment_func(device atomic_uint &counter [[buffer(0)]], ...)
{
...
}
You could also use a struct type for the parameter and have a field of the struct be your atomic_uint variable.
To atomically increment the atomic variable by 1 and obtain the prior value, you could do this:
uint value = atomic_fetch_add_explicit(&counter, 1, memory_order_relaxed);
The initial value of the atomic variable is taken from the contents of the buffer at a point before the draw or dispatch command is executed. It's not documented as such in the spec, but the size and bit-interpretation of an atomic type seems to match the corresponding non-atomic type. That is, you would write a uint (a.k.a. unsigned int or uint32_t) to the buffer to initialize an atomic_uint.
I have a D3D11 buffer with a few million elements that is supposed to hold data in the R8G8B8A8_UNorm format.
The desired behavior is the following: One shader calculates a vec4 and writes it to the buffer in a random access pattern. In the next pass, another shader reads the data in a random access pattern and processes them further.
My best guess would be to create an UnorderedAccessView with the R8G8B8A8_UNorm format. But how do I declare the RWBuffer<?> in HLSL, and how do I write to and read from it? Is it necessary to declare it as RWBuffer<uint> and do the packing from vec4 to uint manually?
In OpenGL I would create a buffer and a buffer texture. Then I can declare an imageBuffer with the rgba8 format in the shader, access it with imageLoad and imageStore, and the hardware does all the conversions for me. Is this possible in D3D11?
This is a little tricky due to a lot of different gotchas, but you should be able to do something like this.
In your shader that writes to the buffer declare:
RWBuffer<float4> WriteBuf : register( u1 );
Note that it is bound to register u1 instead of u0. Unordered access views (UAV) must start at slot 1 because the u# register is also used for render targets.
To write to the buffer just do something like:
WriteBuf[0] = float4(0.5, 0.5, 0, 1);
Note that you must write all 4 values at once.
In your C++ code, you must create an unordered access buffer, and bind it to a UAV. You can use the DXGI_FORMAT_R8G8B8A8_UNORM format. When you write 4 floats to it, the values will automatically be converted and packed. The UAV can be bound to the pipeline using OMSetRenderTargetsAndUnorderedAccessViews.
In your shader that reads from the buffer declare a read only buffer:
Buffer<float4> ReadBuf : register( t0 );
Note that this buffer uses t0 because it will be bound as a shader resource view (SRV) instead of UAV.
To access the buffer use something like:
float4 val = ReadBuf[0];
In your C++ code, you can bind the same buffer you created earlier to an SRV instead of a UAV. The SRV can be bound to the pipeline using PSSetShaderResources and can also be created with DXGI_FORMAT_R8G8B8A8_UNORM.
You can't bind both the SRV and UAV using the same buffer to the pipeline at the same time. So you must bind the UAV first and run your first shader pass. Then unbind the UAV, bind SRV, and run the second shader pass.
There are probably other ways to do this as well. Note that all of this requires shader model 5.
Hi,
I just want some clarification on Why we can not pass the 2D array pointer as argument to the kernel .
Why it is not allowed .
What will happen if I use this as argument (Internally??as I know the code will give some error) .
Please do the needful .
Because in OpenCL 1.x the device has a separate address space. Kernels executing on the device wouldn't know what to do with a pointer that is only useful in host address space.
Note that in OpenCL 2.0 Shared Virtual Memory (SVM) removes this restriction and allows buffers containing pointers to be used on both host and device side.
To be a little more concrete, let's say I have a kernel with a pointer to a pointer as a parameter:
kernel void foo(global float * global *a){...}
The runtime in this case knows that it has a private pointer to an array of global pointers to global floats.
When I pass a buffer to the runtime:
clSetKernelArg(fooKernel, 0, sizeof(cl_mem), &aBuffer);
The runtime knows it expects a pointer. As this is OpenCL 1.x it knows then it must have had the arg set to a pointer. It knows the that the pointer it received is a pointer to a cl_mem. It can look at that cl_mem, find the address it has been allocated in device memory, pass that address into the device and launch the kernel. However, it does not know (in general) which buffer you are trying to point to for the pointers inside the buffer. Even if it did know that they pointed to the same buffer, it would have to go through the entire buffer updating every pointer field. If the pointers were nested it would have to have a lot more information. It just isn't feasible in the general case.
OpenCL 2.0 supports shared virtual memory, which means the addresses are fixed, and the same on the host and device (even if the memory isn't necessarily directly shared). There is no need to do the conversion and hence it becomes possible.
suppose my class has a member Eigen::Matrix alpha;
How I can check from gdb whihc function write in its space.
From Eigen i can get its address using alpha.data and it should be stored in 100 size_of(double) bytes after this address.
Watchpoints
If the array containing your data is static, GDB should be able to find out the size of the array, so watch alpha.data should tell you whenever any value in the array changes. However, if data is a pointer, watch alpha.data will tell you when the pointer changes, and not when the data in your array changes, which is not very useful. You will then need to manually tell GDB to watch the region occupied by your array: watch (double[100]) *alpha.data.
I am trying to implement hardware instancing with Direct3D 10+ using Structured Buffers for the per instance data but I've not used them before.
I understand how to implement instancing when combining the per vertex and per instance data into a single structure in the Vertex Shader - i.e. you bind two vertex buffers to the input assembler and call the DrawIndexedInstanced function.
Can anyone tell me the procedure for binding the input assembler and making the draw call etc. when using Structured Buffers with hardware instancing? I can't seem to find a good example of it anywhere.
It's my understanding that Structured Buffers are bound as ShaderResourceViews, is this correct?
Yup, that's exactly right. Just don't put any per-instance vertex attributes in your vertex buffer or your input layout and create a ShaderResourceView of the buffer and set it on the vertex shader. You can then use the SV_InstanceID semantic to query which instance you're on and just fetch the relevant struct from your buffer.
StructuredBuffers are very similar to normal buffers. The only differences are that you specify the D3D11_RESOURCE_MISC_BUFFER_STRUCTURED flag on creation, fill in StructureByteStride and when you create a ShaderResourceView the Format is DXGI_UNKNOWN (the format is specified implicitly by the struct in your shader).
StructuredBuffer<MyStruct> myInstanceData : register(t0);
is the syntax in HLSL for a StructuredBuffer and you just access it using the [] operator like you would an array.
Is there anything else that's unclear?