DirectX11: Pass data from ComputeShader to VertexShader? - directx

Is it possible to apply a filter to the geometry data that is to be rendered using Compute Shader and then use the result as an input buffer in the Vertex Shader? That would save me the trouble (&time) of reading back the data.
Any help is much appreciated.

Yes absolutely. First you create two identicals ID3D11Buffer of structures using BIND_VERTEX_BUFFER, BIND_SHADER_RESOURCE and BIND_UNORDERED_ACCESS usage flags, and the associated UAVs and SRVs.
First step is to apply your filter to input source buffer and write to the destination buffer during your compute pass.
Then during the draw pass, you just have to bind the destination buffer to the IA stage. You can do some ping-pong if you need to accumulate computations on the vertices (I assume that by filter you mean a functional map, for refering to the Functional Programming term).

Related

DirectCompute: How to read from a RWTexture2D<float4>?

I have the following buffer:
RWTexture2D<float4> Output : register(u0);
This buffer is used by a compute shader for rendering a computed image.
To write a pixel in that texture, I just use code similar to this:
Output[XY] = SomeFunctionReturningFloat4(SomeArgument);
This works very well and my computed image is correctly rendered on screen.
Now at some stage in the compute shader, I would like to read back an
already computed pixel and process it again.
Output[XY] = SomeOtherFunctionReturningFloat4(Output[XY]);
The compiler return an error:
error X3676: typed UAV loads are only allowed for single-component 32-bit element types
Any help appreciated.
In Compute Shaders, data access is limited on some data types, and not at all intuitive and straightforward. In your case, you use a
RWTexture2D<float4>
That is a UAV typed of DXGI_FORMAT_R32G32B32A32_FLOAT format.
This forma is only supported for UAV typed store, but it’s not supported by UAV typed load.
Basically, you can only write on it, but not read it. UAV typed load only supports 32 bit formats, in your case DXGI_FORMAT_R32_FLOAT, that can only contain a single component (32 bits and that’s all).
Your code should run if you use a RWTexture2D<float> but I suppose this is not enough for you.
Possible workarounds that spring to my minds are:
1. using 4 different RWTexture2D<float>, one for each component
2. using 2 different textures, RWTexture2D<float4> to write your values and Texture2D<float4> to read from
3. Use a RWStructuredBufferinstead of the texture.
I don’t know your code so I don’t know if solutions 1. and 2. could be viable. However, I strongly suggest going for 3. and using StructuredBuffer. A RWStructuredBuffer can hold any type of struct and can easily cover all your needs. To be honest, in compute shaders I almost only use them to pass data. If you need the final output to be a texture, you can do all your calculations on the buffer, then copy the results on the texture when you’re done. I would add that drivers often use CompletePath to access RWTexture2D data, and FastPath to access RWStructuredBuffer data, making the former awfully slower than the latter.
Reference for data type access is here. Scroll down to UAV typed load.

Performing a transpose operation on a DirectX Surface Buffer

I am using IMFSourceReader with hardware acceleration enabled to decode videos and read them into my application. After the ReadSample call, I get hold of the IDirect3DSurface9 from the IMFSample. At this point, I use the LockRect() call to access the raw-bytes and copy them into my applications buffer.
I would like to perform additional operations on the GPU such as transpose and a possible conversion of the image data from row-major order to column-major order.
Is there a Blt operation I can setup to this?
I came across the ID3DXBaseEffect interface but I am not sure that is applicable in my case.
Would appreciate any inputs.
Dinesh
With IDirect3DSurface9, you can use shader (ID3DXBaseEffect).
To do it on GPU directly, before copy the raw-bytes to your application, i will try this :
Call IMFSourceReader::GetServiceForStream to query for MR_VIDEO_ACCELERATION_SERVICE and IDirect3DDeviceManager9.
use IDirect3DDeviceManager9 to query the IDirect3DDevice9 (IDirect3DDeviceManager9::LockDevice).
Use IDirect3DDevice9, IDirect3DSurface9, a new RenderTarget, shader, as usual with Directx.
copy the raw-bytes from the final RenderTarget (after shader apply).
EDIT
See here : mofo7777 github
Under MediaFoundationTransform > MFTDirectxAware > MFTVideoShaderEffect, i'll show the concept.

Dynamic output from compute shader

If I am generating 0-12 triangles in a compute shader, is there a way I can stream them to a buffer that will then be used for rendering to screen?
My current strategy is:
create a buffer of float3 of size threads * 12, so can store the maximum possible number of triangles;
write to the buffer using an index that depends on the thread position in the grid, so there are no race conditions.
If I want to render from this though, I would need to skip the empty memory. It sounds ugly, but probably there is no other way currently. I know CUDA geometry shaders can have variable length output, but I wonder if/how games on iOS can generate variable-length data on GPU.
UPDATE 1:
As soon as I wrote the question, I thought about the possibility of using a second buffer that would point out how many triangles are available for each block. The vertex shader would then process all vertices of all triangles of that block.
This will not solve the problem of the unused memory though and as I have a big number of threads, the total memory wasted would be considerable.
What you're looking for is the Metal equivalent of D3D's "AppendStructuredBuffer". You want a type that can have structures added to it atomically.
I'm not familiar with Metal, but it does support Atomic operations such as 'Add' which is all you really need to roll your own Append Buffer. Initialise the counter to 0 and have each thread add '1' to the counter and use the original value as the index to write to in your buffer.

WebGL one-to-many data processing

Is there any scheme using WebGL which allows to process one data record to an previously unknown number of records?
Using OpenGL for example, a geometry program can be used to multiply vertices depending on their attributes, and thus output data of unknown length.
Is there any trick to use WebGL in a likewise fashion, or is this only possible on the JavaScript side?
Yup, there is no Geometry Shader in WebGL (just Vertex and Fragment).
So, yes, something multiplicative needs to be implemented on the JS side, by making more data or more calls to gl.drawTriangles/gl.drawElements.
One approach that might be applicable, is to have lots of data (triangles, say), and use the Fragment Shader to algorithmicly throw-away some or all of them. Kind of the opposite of multiplying. But if you keep the same triangles, and change their processing with uniforms, or perhaps smaller data in textures, you can at least save the hit of sending up lots of different data.
To "Throw away" a vertex, need to put it outside the NDC (the -1 to +1 unit cube), for all three vertices of a triangle.

Direct3D 10 Hardware Instancing using Structured Buffers

I am trying to implement hardware instancing with Direct3D 10+ using Structured Buffers for the per instance data but I've not used them before.
I understand how to implement instancing when combining the per vertex and per instance data into a single structure in the Vertex Shader - i.e. you bind two vertex buffers to the input assembler and call the DrawIndexedInstanced function.
Can anyone tell me the procedure for binding the input assembler and making the draw call etc. when using Structured Buffers with hardware instancing? I can't seem to find a good example of it anywhere.
It's my understanding that Structured Buffers are bound as ShaderResourceViews, is this correct?
Yup, that's exactly right. Just don't put any per-instance vertex attributes in your vertex buffer or your input layout and create a ShaderResourceView of the buffer and set it on the vertex shader. You can then use the SV_InstanceID semantic to query which instance you're on and just fetch the relevant struct from your buffer.
StructuredBuffers are very similar to normal buffers. The only differences are that you specify the D3D11_RESOURCE_MISC_BUFFER_STRUCTURED flag on creation, fill in StructureByteStride and when you create a ShaderResourceView the Format is DXGI_UNKNOWN (the format is specified implicitly by the struct in your shader).
StructuredBuffer<MyStruct> myInstanceData : register(t0);
is the syntax in HLSL for a StructuredBuffer and you just access it using the [] operator like you would an array.
Is there anything else that's unclear?

Resources