DirectX/HLSL error PS2.0 - directx

i am getting this error
Shader uses texture addressing operations in a dependency chain that is too complex for the target shader model (ps_2_0) to handle.
since i have added this line to my pixelshader:
float Gauss[NUMWT] = { 5.052271056506993e-15, 9.134720359492243e-12, 6.07588281731559e-9, 0.0000014867195067797903, 0.00013383022504883334, 0.004431848388225362, 0.053990966224306644, 0.2419707232244606, 0.39894227826685835, 0.2419707232244606, 0.053990966224306644, 0.004431848388225362, 0.00013383022504883334, 0.0000014867195067797903, 6.07588281731559e-9, 9.134720359492243e-12, 5.052271056506993e-15 };
is this array too big?

This bit
Shader uses texture addressing operations
is probably the key to where your problem is. Did the error not come with a line and character number?
Look for spots where you're sampling from textures or calculating the position, within complex paths (possibly branching).

Did a quick test and the array compiles fine for me using ps_2_0 (defining NUMWT as 17) so I guess your error is somewhere else.
Also why should it have any issues handling such "small" arrays? textures are far bigger.

Related

DirectCompute: How to read from a RWTexture2D<float4>?

I have the following buffer:
RWTexture2D<float4> Output : register(u0);
This buffer is used by a compute shader for rendering a computed image.
To write a pixel in that texture, I just use code similar to this:
Output[XY] = SomeFunctionReturningFloat4(SomeArgument);
This works very well and my computed image is correctly rendered on screen.
Now at some stage in the compute shader, I would like to read back an
already computed pixel and process it again.
Output[XY] = SomeOtherFunctionReturningFloat4(Output[XY]);
The compiler return an error:
error X3676: typed UAV loads are only allowed for single-component 32-bit element types
Any help appreciated.
In Compute Shaders, data access is limited on some data types, and not at all intuitive and straightforward. In your case, you use a
RWTexture2D<float4>
That is a UAV typed of DXGI_FORMAT_R32G32B32A32_FLOAT format.
This forma is only supported for UAV typed store, but it’s not supported by UAV typed load.
Basically, you can only write on it, but not read it. UAV typed load only supports 32 bit formats, in your case DXGI_FORMAT_R32_FLOAT, that can only contain a single component (32 bits and that’s all).
Your code should run if you use a RWTexture2D<float> but I suppose this is not enough for you.
Possible workarounds that spring to my minds are:
1. using 4 different RWTexture2D<float>, one for each component
2. using 2 different textures, RWTexture2D<float4> to write your values and Texture2D<float4> to read from
3. Use a RWStructuredBufferinstead of the texture.
I don’t know your code so I don’t know if solutions 1. and 2. could be viable. However, I strongly suggest going for 3. and using StructuredBuffer. A RWStructuredBuffer can hold any type of struct and can easily cover all your needs. To be honest, in compute shaders I almost only use them to pass data. If you need the final output to be a texture, you can do all your calculations on the buffer, then copy the results on the texture when you’re done. I would add that drivers often use CompletePath to access RWTexture2D data, and FastPath to access RWStructuredBuffer data, making the former awfully slower than the latter.
Reference for data type access is here. Scroll down to UAV typed load.

iOS Import .obj file to Model I/O without duplicating vertices

I'm trying to import a .obj file to use in Scene Kit using the Model I/O framework. I initially used the simple MDLAsset initWithURL: function, but after transferring the mesh to a SCNGeometry, I realized this function was triangulizing the mesh, such that each face had 3 unique vertices, and there were separate vertices at the same location for border faces. This was causing some major problems with my other functions, so I tried to fix it by instead using the MDLAsset initWithURL:vertexDescriptor:bufferAllocator:preserveTopology function with preserveTopology set to YES with the descriptor/allocator set to the default with nil. This preserving topology fixed my problem of duplicating vertices, so the faces/edges were all good, but in the process I lost the normals data.
By lost the normals, I don't mean multiple indexing, I mean after setting preserveTopology to YES, the buffer did not contain any normals values at all. Whereas before it was v1/n1/v2/n2... and the stride was 24 bytes (3 dimensions *4 bytes/float * 2 attributes), now the first half of the buffer is v1/v2/... with a stride of 12 and the entire 2nd half of the buffer is just 0.0 floats.
Also something weird with this, when you look at the SCNGeometrySources of the Geometry, there are 2 sources, 1 with semantic kGeometrySourceSemanticVertex, and 1 with semantic kGeometrySourceSemanticNormal. You would think that the semantic vertex source would contain the position data, and the semantic normal source would contain the normal data. However that is not the case. No matter what you set preserveTopology, they are buffers of size to contain both position and normal data with identical values. So when I said before there was no normal data, I mean both of these buffers, semantic vertex AND semantic normal went from being v1/n1/v2/n2... to v1/v2/.../(0.0, 0.0, 0.0)/(0.0, 0.0, 0.0)/... I went into the mdlmesh's buffer (before the transfer to scene kit) at found the same problem, so the problem must be with the initWithURL, not with the model i/o to scenekit bridge.
So I figured there must be something wrong with the default vertex descriptor and buffer allocator (since I was using nil) and went about trying to create my own that matched these 2 possible data formats. Alas after much trying I was unable to get something that worked.
Any ideas on how I should do this? How to give MDLAsset the proper vertexDescriptor and bufferAllocator (I feel like nil should be ok here) for importing a .obj file? Thanks
An obj file with vertices and normals has vertices, indicated by v lines, normals, indicated by vn lines, and faces, indicated by f lines.
The v and vn lines will just be the floating point values you expect, and the f line will be of the form -
f v0//n0 v1//n1 etc
Since OpenGL and Metal don't allow multiple indexing, you'll see the first effect of vertices being duplicated. For example,
f 0//0 1//2 2//0
can't work as a vertex buffer because it would require different indices per vertex. So typical OBJ parsers have to create new vertices that allow the face to become
f 0//0 1//1 2//2
The preserve topology option doesn't help you. It preserves the connectivity and shape of the mesh (no triangulation occurs, shared edges remain shared) but it still enforces a single index per vertex component.
One solution would be to make sure that your tool that is outputting the OBJ files uses single indexing during export, if that is an option.
Another option, and this won't solve the problem immediately, would be file a request that multiple-indexing be supported at the Model I/O level. SceneKit would still have to uniquely-index because it is has to be able to render.
Another option would be to use a format like PLY that doesn't have multiple indexing.

Instance vs Loops in HLSL Model 5 Geometry Shaders

I'm looking at getting a program written for DirectX11 to play nice on DirectX10. To do that, I need to compile the shaders for model 4, not 5. Right now the only problem with that is that the geometry shaders use instancing which is unsupported by 4. The general model is
[instance(NUM_INSTANCES)]
void Gs(..., in uint instanceId : SV_GSInstanceID) { }
I can't seem to find many documents on why this exists, because my thought is: can't I just replace this with a loop from instanceId=0 to instanceId=NUM_INSTANCES-1?
The answer seems to be no, as it doesn't seems to output correctly, but besides my exact problem - can you help me understand why the concept of instancing exists. Is there some implication on the entire pipeline that instancing has beyond simply calling the main function twice with a different index?
With regards to why my replacement did not work:
Geometry shaders are annotated with [maxvertexcount(N)]. I had incorrectly assumed this was the vertex input count, and ignored it. In fact, input is determined by the type of primitive coming in, and so this was about the output. Before, if N was my output over I instances, each instance output N vertices. But now that I want to use a loop, a single instance outputs N*I vertices. As such, the answer was to do as I suggested, and also use [maxvertexcount(N*NUM_INSTANCES)].
To more broadly answer my question on why instances may be useful in a world that already has loops, I can only guess
Loops are not truly supported in shaders, it turns out - graphics card cores do not have a concept of control flow. When loops are written in shaders, the loop is unrolled (see [unroll]). This has limitations, makes compilation slower, and makes the shader blob bigger.
Instances can be parallelized - one GPU core can run one instance of a shader while another runs the next instance of the same shader with the same input.

Clarification on Texture Units Per Program in OpenGL ES 2.0 on iOS

According to Apple's documentation for OpenGL ES 2.0 limitations:
"You can use up to 8 textures in a fragment shader."
This doesn't seem remarkably specific, and so I had assumed that it meant, "you can reference a maximum of 8 texture units for each pass of your fragment shader."
So what I've been doing, is in a given pass of my fragment shader, I reference only one texture unit, starting at texture unit 0. The next pass, I reference unit 1. The next pass, unit 2, and so on. Based on my above assumption, this should work fine for up to GL_MAX_COMBINED_TEXTURE_IMAGE_UNITS.
However, it only seems to work up until unit 7, and break at anything >= unit 8.
Does it sound like this would cause me to hit the maximum texture units? Or am I missing something?
If I am indeed hitting the upper limit of allowable texture units in a fragment shader, I suppose what I'll do is hold a mutable array of programs, and at runtime build as many programs as I need for how many textures I've defined, and when I hit steps of the texture unit index that is a multiple of 8, switch to the appropriate program.
Does this sound reasonable, or am I missing something?
You can query the maximum number of texture units on a device via something like the following:
+ (GLint)maximumTextureUnitsForThisDevice;
{
GLint maxTextureUnits;
glGetIntegerv(GL_MAX_TEXTURE_IMAGE_UNITS, &maxTextureUnits);
return maxTextureUnits;
}
Running this on the most powerful device I have, the Retina iPad, I get a maximum of 8 texture units. This means that you won't be able to bind a texture to anything above GL_TEXTURE7, and limits the number of simultaneous textures you can feed into a shader to 8.
However, as Tim suggests in his comment, if you just need to access one texture at a time, there's nothing stopping you from just sequentially binding each texture to a single unit, processing it, and then binding the next texture to that same unit.

Sampling BC5_SNORM texture yields incorrect value range

I'm working with Direct3d 11, and I've come across something strange. I have taken a normal map and encoded it to a DDS file twice. Once with R8G8B8A8_SNORM encoding, and once with BC5_SNORM.
Next I load each texture using D3DX11CreateShaderResourceViewFromFile in conjunction with D3DX11GetImageInfoFromFile. When I sample these textures in my pixel shader I find that the R8G8B8A8_SNORM texture is returning values in the range [-1,1], which is what I would expect for a SNORM texture. However, the BC5_SNORM texture is returning values in the range [0,1], which doesn't make any sense to me.
I double an triple checked with my debugger and PIX. The format of the texture is correct (BC5_*S*NORM), so I am at a loss for why it's not returning signed values.
I managed to reproduce the same issue as you and I also got the same behaviour when doing a conversion from a R8G8B8A8_SNORM texture (with -1 to +1 values) to BC5_SNORM (producing only 0 to 1 values) when doing the conversion through D3Dx11LoadTextureFromTexture. There does appear to be a fault in D3DX11, at least regarding BC5_SNORM, in that, regardless of all kinds of input formats, the (BC5)SNORM output is always in the 0 to 1 range.
As suggested by #chuckwalbourn I can confirm that the DirectXTex utilities, which supersedes the now deprecated D3DX11, does respect and correctly handle signed values for BC5_SNORM outputs.
You can either have your program write out a temporary .dds (using D3DX11SaveTextureToFile with a R8G8B8A8_SNORM texture) and then invoke the standalone DirectXTex 'texconv.exe' utility to convert to BC5_SNORM, or wrangle the DirectXTex library into your program and use the 'Convert(...)' function appropriately.

Resources