Do I need to use the DirectXMath structures (XMFLOAT3, XMFLOAT4, XMMATRIX, XMFLOAT4X4, etc.) when I am setting the data for the vertex and constant buffers for a directx application. I have my own Vector3 and Matrix structures and would prefer to use those.
You do not need to use DirectXMath. All you need to do is ensure that the data you write matches the standard (IEEE-754) floating point layout (or whatever other format you have specified). For vector types, just ensure that the elements are in contiguous memory.
Related
I have some design qustion for a vulkan game engine:
In my game engine i bound all "static" textures resources on one huge descriptor-set(256k descriptors), and my shaders access those samplers by an dynamic indexing.
[For example when i want to sample a some normals-map that belong to a currtain gameobject i add an new uint into the material's ubo that represent the index of the object's normals-map descriptor inside the huge descriptor set, then i sample it and compute the final object color.]
I wondered whether this way to access objects textures is efficient compare to the idia to bind each object's texture on his per-object descriptor set(alongside the material ubo).
Does the size of an descriptor-set can drastically affect on the texel access speed?
or my idia is suck?
Again, sorry about my English.
There are no performance issues with indexing from an array of sampler descriptors. The only real reason not to do things this way is that implementations may not let you dynamically index such arrays. But if you're requiring that from the implementation (all desktop implementations allow it), then just keep doing it; it's a common technique for reducing the number of state changes you have to issue on the CPU.
I have the following buffer:
RWTexture2D<float4> Output : register(u0);
This buffer is used by a compute shader for rendering a computed image.
To write a pixel in that texture, I just use code similar to this:
Output[XY] = SomeFunctionReturningFloat4(SomeArgument);
This works very well and my computed image is correctly rendered on screen.
Now at some stage in the compute shader, I would like to read back an
already computed pixel and process it again.
Output[XY] = SomeOtherFunctionReturningFloat4(Output[XY]);
The compiler return an error:
error X3676: typed UAV loads are only allowed for single-component 32-bit element types
Any help appreciated.
In Compute Shaders, data access is limited on some data types, and not at all intuitive and straightforward. In your case, you use a
RWTexture2D<float4>
That is a UAV typed of DXGI_FORMAT_R32G32B32A32_FLOAT format.
This forma is only supported for UAV typed store, but it’s not supported by UAV typed load.
Basically, you can only write on it, but not read it. UAV typed load only supports 32 bit formats, in your case DXGI_FORMAT_R32_FLOAT, that can only contain a single component (32 bits and that’s all).
Your code should run if you use a RWTexture2D<float> but I suppose this is not enough for you.
Possible workarounds that spring to my minds are:
1. using 4 different RWTexture2D<float>, one for each component
2. using 2 different textures, RWTexture2D<float4> to write your values and Texture2D<float4> to read from
3. Use a RWStructuredBufferinstead of the texture.
I don’t know your code so I don’t know if solutions 1. and 2. could be viable. However, I strongly suggest going for 3. and using StructuredBuffer. A RWStructuredBuffer can hold any type of struct and can easily cover all your needs. To be honest, in compute shaders I almost only use them to pass data. If you need the final output to be a texture, you can do all your calculations on the buffer, then copy the results on the texture when you’re done. I would add that drivers often use CompletePath to access RWTexture2D data, and FastPath to access RWStructuredBuffer data, making the former awfully slower than the latter.
Reference for data type access is here. Scroll down to UAV typed load.
In WebGL calls to texSubImage2D and readPixels require a Format and Type parameters. In addition texSubImage2D requires an InternalFormat parameter. While it is easy to find documentation about which combinations of these parameters are valid, it is unclear exactly what these parameters mean and how to go about using them efficiently, particularly given that some internal formats can be paired with multiple types e.g.
R16F/HALF_FLOAT vs R16F/FLOAT or GL_R11F_G11F_B10F/FLOAT vs GL_R11F_G11F_B10F/GL_UNSIGNED_INT_10F_11F_11F_REV (where the notation I am using is InternalFormat/Type)
Also both of these API calls can be used in combination with a pixels parameter that can be a TypedArray -- it this case it is unclear which choices of TypedArray are valid for a given InternalFormat/Format/Type combo (and which choice is optimal in terms of avoiding casting)
For instance, is it true that the internal memory used by the GPU per texel is determined solely by the InternalFormat -- either in an implementation dependent way (e.g. WebGL1 unsized formats) or, for some newly added InternalFormats in WegGL2, a fully specified way.
Are the Format and Type parameters related primarily to how data is marshalled into and out of ArrayBuffers? For instance, if I use GL_R11F_G11F_B10F/GL_UNSIGNED_INT_10F_11F_11F_REV
does this mean I should be passing texSubImage2D an Uint32Array with each element of the array having its bits carefully twiddled in javascript whereas if I use GL_R11F_G11F_B10F/Float then I should use a Float32Array with three times number of elements as the prior case, and WebGL will handle the bit twiddling for me? Does WebGL try to check that the TypedArray I have passed is consistent with the Format/Type I have chosen or does it operate directly on the underlying ArrayBuffer? Could I have used a Float64Array in the last instance? And what to do about HALF_FLOAT?
It looks like bulk of the question can be answered by referring to section 3.7.6 Texture Objects of the WebGL2 Spec. In particular the info in the table found in the documentation for texImage2D which clarifies which TypedArray is required for each Type:
TypedArray WebGL Type
---------- ----------
Int8Array BYTE
Uint8Array UNSIGNED_BYTE
Int16Array SHORT
Uint16Array UNSIGNED_SHORT
Uint16Array UNSIGNED_SHORT_5_6_5
Uint16Array UNSIGNED_SHORT_5_5_5_1
Uint16Array UNSIGNED_SHORT_4_4_4_4
Int32Array INT
Uint32Array UNSIGNED_INT
Uint32Array UNSIGNED_INT_5_9_9_9_REV
Uint32Array UNSIGNED_INT_2_10_10_10_REV
Uint32Array UNSIGNED_INT_10F_11F_11F_REV
Uint32Array UNSIGNED_INT_24_8
Uint16Array HALF_FLOAT
Float32Array FLOAT
My guess is that
InternalFormat determines how much GPU memory is used to store the texture
Format and Type governs how data is marshalled into/ out of the texture and javascript.
Type determines what type of TypedArray must be used
Format plus the pixelStorei parameters (section 6.10) determine how many elements the TypedArray will need and which elements will actually by used (will things be tightly packed, will some rows be padded etc)
Todo:
Workout details for
encoding/decoding some of the more obscure Type values to and from javascript.
calculating typed array size requirement and stride info given Type, Format, and pixelStorei parameters
I am in the middle of rendering different textures on multiple meshes of a model, but I do not have much clues about the procedures. Someone suggested for each mesh, create its own descriptor sets and call vkCmdBindDescriptorSets() and vkCmdDrawIndexed() for rendering like this:
// Pipeline with descriptor set layout that matches the shared descriptor sets
vkCmdBindPipeline(...pipelines.mesh...);
...
// Mesh A
vkCmdBindDescriptorSets(...&meshA.descriptorSet... );
vkCmdDrawIndexed(...);
// Mesh B
vkCmdBindDescriptorSets(...&meshB.descriptorSet... );
vkCmdDrawIndexed(...);
However, the above approach is quite different from the chopper sample and vulkan's samples that makes me have no idea where to start the change. I really appreciate any help to guide me to a correct direction.
Cheers
You have a conceptual object which is made of multiple meshes which have different texturing needs. The general ways to deal with this are:
Change descriptor sets between parts of the object. Painful, but it works on all Vulkan-capable hardware.
Employ array textures. Each individual mesh fetches its data from a particular layer in the array texture. Of course, this restricts you to having each sub-mesh use textures of the same size. But it works on all Vulkan-capable hardware (up to 128 array elements, minimum). The array layer for a particular mesh can be provided as a push-constant, or a base instance if that's available.
Note that if you manage to be able to do it by base instance, then you can render the entire object with a multi-draw indirect command. Though it's not clear that a short multi-draw indirect would be faster than just baking a short sequence of drawing commands into a command buffer.
Employ sampler arrays, as Sascha Willems suggests. Presumably, the array index for the sub-mesh is provided as a push-constant or a multi-draw's draw index. The problem is that, regardless of how that array index is provided, it will have to be a dynamically uniform expression. And Vulkan implementations are not required to allow you to index a sampler array with a dynamically uniform expression. The base requirement is just a constant expression.
This limits you to hardware that supports the shaderSampledImageArrayDynamicIndexing feature. So you have to ask for that, and if it's not available, then you've got to work around that with #1 or #2. Or just don't run on that hardware. But the last one means that you can't run on any mobile hardware, since most of them don't support this feature as of yet.
Note that I am not saying you shouldn't use this method. I just want you to be aware that there are costs. There's a lot of hardware out there that can't do this. So you need to plan for that.
The person that suggested the above code fragment was me I guess ;)
This is only one way of doing it. You don't necessarily have to create one descriptor set per mesh or per texture. If your mesh e.g. uses 4 different textures, you could bind all of them at once to different binding points and select them in the shader.
And if you a take a look at NVIDIA's chopper sample, they do it pretty much the same way only with some more abstraction.
The example also sets up descriptor sets for the textures used :
VkDescriptorSet *textureDescriptors = m_renderer->getTextureDescriptorSets();
binds them a few lines later :
VkDescriptorSet sets[3] = { sceneDescriptor, textureDescriptors[0], m_transform_descriptor_set };
vkCmdBindDescriptorSets(m_draw_command[inCommandIndex], VK_PIPELINE_BIND_POINT_GRAPHICS, layout, 0, 3, sets, 0, NULL);
and then renders the mesh with the bound descriptor sets :
vkCmdDrawIndexedIndirect(m_draw_command[inCommandIndex], sceneIndirectBuffer, 0, inCount, sizeof(VkDrawIndexedIndirectCommand));
vkCmdDraw(m_draw_command[inCommandIndex], 1, 1, 0, 0);
If you take a look at initDescriptorSets you can see that they also create separate descriptor sets for the cubemap, the terrain, etc.
The LunarG examples should work similar, though if I'm not mistaken they never use more than one texture?
I am trying to implement hardware instancing with Direct3D 10+ using Structured Buffers for the per instance data but I've not used them before.
I understand how to implement instancing when combining the per vertex and per instance data into a single structure in the Vertex Shader - i.e. you bind two vertex buffers to the input assembler and call the DrawIndexedInstanced function.
Can anyone tell me the procedure for binding the input assembler and making the draw call etc. when using Structured Buffers with hardware instancing? I can't seem to find a good example of it anywhere.
It's my understanding that Structured Buffers are bound as ShaderResourceViews, is this correct?
Yup, that's exactly right. Just don't put any per-instance vertex attributes in your vertex buffer or your input layout and create a ShaderResourceView of the buffer and set it on the vertex shader. You can then use the SV_InstanceID semantic to query which instance you're on and just fetch the relevant struct from your buffer.
StructuredBuffers are very similar to normal buffers. The only differences are that you specify the D3D11_RESOURCE_MISC_BUFFER_STRUCTURED flag on creation, fill in StructureByteStride and when you create a ShaderResourceView the Format is DXGI_UNKNOWN (the format is specified implicitly by the struct in your shader).
StructuredBuffer<MyStruct> myInstanceData : register(t0);
is the syntax in HLSL for a StructuredBuffer and you just access it using the [] operator like you would an array.
Is there anything else that's unclear?