I'm fairly new to compute shaders and I've just started an implementation of one for an Nbody simulation and I've come across a problem that I can't solve on my own.
Here's everything that is contained in the compute file and the entry point is ParticleComputeShader. I am only dispatching 1 thread and creating 1024 in the shader. There are only 1024 particles while I debug and tweak it so each thread has it's own particle to relate to.
The problem seems to be distance != 0.0f and the calculation related to the distance. Before I had the check in it was returning the position as 1.QNaN so it was dividing by 0 somewhere in the code. My thoughts on this is that I'm incorrectly accessing the StructuredBuffer using j and it's screwing up the next few calculations.
Another note: Position.w is the mass of the particle.
struct ConstantParticleData
{
float4 position;
float4 velocity;
};
struct ParticleData
{
float4 position;
float4 velocity;
};
namespace Constants
{
float BIG_G = 6.674e-11f;
float SOFTEN = 0.01f;
}
StructuredBuffer<ConstantParticleData> inputConstantParticleData : register( t0 );
RWStructuredBuffer<ParticleData> outputParticleData : register( u0 );
[numthreads(1024, 1, 1)]
void ParticleComputeShader( int3 dispatchThreadID : SV_DispatchThreadID )
{
float3 acceleration = float3(0.0f, 0.0f, 0.0f);
for(int j = 0; j < 1024; j++)
{
float3 r_ij;
r_ij.x = inputConstantParticleData[j].position.x - inputConstantParticleData[dispatchThreadID.x].position.x;
r_ij.y = inputConstantParticleData[j].position.y - inputConstantParticleData[dispatchThreadID.x].position.y;
r_ij.z = inputConstantParticleData[j].position.z - inputConstantParticleData[dispatchThreadID.x].position.z;
float distance = 0.0f;
distance = length(r_ij);
if(distance != 0.0f)
{
float bottomLine = pow(distance, 2) + pow(Constants::SOFTEN, 2);
acceleration += Constants::BIG_G * ((inputConstantParticleData[j].position.w * r_ij) /
pow(bottomLine, 1.5));
}
}
acceleration = acceleration / inputConstantParticleData[dispatchThreadID.x].position.w;
outputParticleData[dispatchThreadID.x].velocity = inputConstantParticleData[dispatchThreadID.x].velocity +
float4(acceleration.x, acceleration.y, acceleration.z, 0.0f);
outputParticleData[dispatchThreadID.x].position = inputConstantParticleData[dispatchThreadID.x].position +
float4(outputParticleData[dispatchThreadID.x].velocity.x,
outputParticleData[dispatchThreadID.x].velocity.y,
outputParticleData[dispatchThreadID.x].velocity.z,
0.0f);
}
Any help will be appreciated. The shader works for simple input -> output and only started to begin giving troubles when I tried to use more of the input buffer than inputConstantParticleData[dispatchThreadID.x] at any one time.
You can't really set global variables in HLSL. The compiler allows them because in case you use the shader through FX, that will set the globals up for you through constant buffers. Glad to see you solved it, just wanted to post why having the float defined as a local variable fixed the issue.
The problem with this code was that the Namespace variable Constants::BIG_G was not working or being used correctly. Moving this to inside of the function and just declaring it simply as float BIG_G fixed the problems I was having.
Related
So I've been working on a Directx11/hlsl rendering engine with the goal of creating a realistic planet which you can view from both on the surface and also at a planetary level. The planet is a normalized cube, which is procedurally generated using noise and as you move closer to the surface of the planet, a binary-based triangle tree splits until the desired detail level is reached. I got vertex normal calculations to work correctly, and I recently started trying to implement normal mapping for my terrain textures, and I have gotten something that seems to work for the most part. However, when the sun is pointing almost perpendicular to the ground (90 degrees), it is way more lit up
However, from the opposite angle (270 degrees), I am getting something that seems
, but may as well be just as off.
The debug lines that are being rendered are the normal, tangent, and bitangents (which all appear to be correct and fit the topology of the terrain)
Here is my shader code:
Vertex shader:
PSIn mainvs(VSIn input)
{
PSIn output;
output.WorldPos = mul(float4(input.Position, 1.f), Instances[input.InstanceID].WorldMatrix); // pass pixel world position as opposed to screen space position for lighitng calculations
output.Position = mul(output.WorldPos, CameraViewProjectionMatrix);
output.TexCoord = input.TexCoord;
output.CameraPos = CameraPosition;
output.Normal = normalize(mul(input.Normal, (float3x3)Instances[input.InstanceID].WorldMatrix));
float3 Tangent = normalize(mul(input.Tangent, (float3x3)Instances[input.InstanceID].WorldMatrix));
float3 Bitangent = normalize(cross(output.Normal, Tangent));
output.TBN = transpose(float3x3(Tangent, Bitangent, output.Normal));
return output;
}
Pixel shader (Texcoord scalar is for smaller textures closer to planet surface):
float3 FetchNormalVector(float2 TexCoord)
{
float3 Color = NormalTex.Sample(Samp, TexCoord * TexcoordScalar);
Color *= 2.f;
return normalize(float3(Color.x - 1.f, Color.y - 1.f, Color.z - 1.f));
}
float3 LightVector = -SunDirection;
float3 TexNormal = FetchNormalVector(input.TexCoord);
float3 WorldNormal = normalize(mul(input.TBN, TexNormal));
float nDotL = max(0.0, dot(WorldNormal, LightVector));
float4 SampleColor = float4(1.f, 1.f, 1.f, 1.f);
SampleColor *= nDotL;
return float4(SampleColor.xyz, 1.f);
Thanks in advance, and let me know if you have any insight as to what could be the issue here.
Edit 1: I tried it with a fixed blue value instead of sampling from the normal texture, which gives me the correct and same results as if I had not applied mapping (as expected). Still don't have a lead on what would be causing this issue.
Edit 2: I just noticed the strangest thing. At 0, 0, +Z, there are these hard seams that only appear with normal mapping enabled
It's a little hard to see, but it seems almost like there are multiple tangents associated to the same vertex (since I'm not using indexing yet) because the debug lines appear to split on the seams.
Here is my code that I'm using to generate the tangents (bitangents are calculated in the vertex shader using cross(Normal, Tangent))
v3& p0 = Chunk.Vertices[0].Position;
v3& p1 = Chunk.Vertices[1].Position;
v3& p2 = Chunk.Vertices[2].Position;
v2& uv0 = Chunk.Vertices[0].UV;
v2& uv1 = Chunk.Vertices[1].UV;
v2& uv2 = Chunk.Vertices[2].UV;
v3 deltaPos1 = p1 - p0;
v3 deltaPos2 = p2 - p0;
v2 deltaUV1 = uv1 - uv0;
v2 deltaUV2 = uv2 - uv0;
f32 r = 1.f / (deltaUV1.x * deltaUV2.y - deltaUV1.y * deltaUV2.x);
v3 Tangent = (deltaPos1 * deltaUV2.y - deltaPos2 * deltaUV1.y) * r;
Chunk.Vertices[0].Tangent = Normalize(Tangent - (Chunk.Vertices[0].Normal * DotProduct(Chunk.Vertices[0].Normal, Tangent)));
Chunk.Vertices[1].Tangent = Normalize(Tangent - (Chunk.Vertices[1].Normal * DotProduct(Chunk.Vertices[1].Normal, Tangent)));
Chunk.Vertices[2].Tangent = Normalize(Tangent - (Chunk.Vertices[2].Normal * DotProduct(Chunk.Vertices[2].Normal, Tangent)));
Also for reference, this is the main article I was looking at while implementing all of this: link
Edit 3:
Here is an image of the planet from a distance with normal mapping enabled:
And one from the same angle without:
There is a handy feature in three.js 3d library that you can set the sampler to repeat mode and set the repeat attribute to some values you like, for example, (3, 5) means this texture will repeat 3 times horizontally and 5 times vertically. But now I'm using DirectX and I cannot find some good solutions for this problem. Note that the UV coordinates of vertices still ranges from 0 to 1, and I don't want to change my HLSL codes because I want a programmable solution for this, thanks very much!
Edit : presume I have a cube model already. And the texture coordinates of its vertices are between0 and 1. If i use wrap mode or clamp mode for sampling textures it's all OK now. But I want to repeat a texture on one of its faces, and I first need to change to wrap mode. That's i already knows. Then I have to edit my model so that texture coordinates range 0-3. What if I don't change my model? So far i came out one way: I need to add a variable to pixel shader represents how many times does the map repeats and I will multiply this factor to coordinate when sampling. Not a graceful solution i think emmmm…
Since you've edited your Question, there is another Answer to your problem:
From what I understood, you have a face with uv's like so:
0,1 1,1
-------------
| |
| |
| |
-------------
0,0 1,0
But want the texture repeated 3 times (for example) instead of 1 time.
(Without changing the original model)
Multiple solutions here:
You could do it, when updating your buffers (if you do it):
D3D11_MAPPED_SUBRESOURCE resource;
HRESULT hResult = D3DDeviceContext->Map(vertexBuffer, 0, D3D11_MAP_WRITE_DISCARD, 0, &resource);
if(hResult != S_OK) return false;
YourVertexFormat *ptr=(YourVertexFormat*)resource.pData;
for(int i=0;i<vertexCount;i++)
{
ptr[i] = vertices[i];
ptr[i].uv.x *= multiplyX; //in your case 3
ptr[i].uv.y *= multiplyY; //in your case 5
}
D3DDeviceContext->Unmap(vertexBuffer, 0);
But if you don't need updating the buffer anyways, i wouldn't recommend it, because it is terribly slow.
A faster way is to use the vertex shader:
cbuffer MatrixBuffer
{
matrix worldMatrix;
matrix viewMatrix;
matrix projectionMatrix;
};
struct VertexInputType
{
float4 position : POSITION0;
float2 uv : TEXCOORD0;
// ...
};
struct PixelInputType
{
float4 position : SV_POSITION;
float2 uv : TEXCOORD0;
// ...
};
PixelInputType main(VertexInputType input)
{
input.position.w = 1.0f;
PixelInputType output;
output.position = mul(input.position, worldMatrix);
output.position = mul(output.position, viewMatrix);
output.position = mul(output.position, projectionMatrix);
This is what you basicly need:
output.uv = input.uv * 3; // 3x3
Or more advanced:
output.uv = float2(input.u * 3, input.v * 5);
// ...
return output;
}
I would recommend the vertex shader solution, because it's fast and in directx you use vertex shaders anyways, so it's not as expensive as the buffer update solution...
Hope that helped solving your problems :)
You basicly want to create a sampler state like so:
ID3D11SamplerState* m_sampleState;
3D11_SAMPLER_DESC samplerDesc;
samplerDesc.Filter = D3D11_FILTER_MIN_MAG_MIP_LINEAR;
samplerDesc.AddressU = D3D11_TEXTURE_ADDRESS_WRAP;
samplerDesc.AddressV = D3D11_TEXTURE_ADDRESS_WRAP;
samplerDesc.AddressW = D3D11_TEXTURE_ADDRESS_WRAP;
samplerDesc.MipLODBias = 0.0f;
samplerDesc.MaxAnisotropy = 1;
samplerDesc.ComparisonFunc = D3D11_COMPARISON_ALWAYS;
samplerDesc.BorderColor[0] = 0;
samplerDesc.BorderColor[1] = 0;
samplerDesc.BorderColor[2] = 0;
samplerDesc.BorderColor[3] = 0;
samplerDesc.MinLOD = 0;
samplerDesc.MaxLOD = D3D11_FLOAT32_MAX;
// Create the texture sampler state.
result = ifDEVICE->ifDX11->getD3DDevice()->CreateSamplerState(&samplerDesc, &m_sampleState);
And when you are setting your shader constants, call this:
ifDEVICE->ifDX11->getD3DDeviceContext()->PSSetSamplers(0, 1, &m_sampleState);
Then you can write your pixel shaders like this:
Texture2D Texture;
SamplerState SampleType;
...
float4 main(PixelInputType input) : SV_TARGET
{
float4 textureColor = shaderTexture.Sample(SampleType, input.uv);
...
}
Hope that helps...
I had a dynamic light shader for which the shaded sprite was fine in my own test program, but started resembling an eclipse once I imported it into my friend's physics based game. I narrowed it down by simplifying the gradient to be purely based on the X value within the shape, and making the outside of the circle in the sprite red, but as you can see, the rotation continues to cause problems (can't post images, so here's links to the album).
Circle at different rotations(not in order, but labelled by radian values): http://imgur.com/a/Preth
Everything I researched about matrix math says I am using the correct formula for rotation, but I figure maybe I'm doing something wrong. Here is my .fx shader code:
float rotationrads; /*assumed rotation is in radians*/
sampler TextureSampler: register(s0);
float4 staticlight(float2 Tex: TEXCOORD0) : COLOR0
{
float4 Color = tex2D(TextureSampler, Tex);
float2 NewTex;
/*Get the new X and Y values by applying the UV formula with the rotation*/
NewTex.x = (Tex.x * cos(rotationrads)) - (Tex.y * sin(rotationrads));
NewTex.y = (Tex.y * sin(rotationrads)) + (Tex.y * cos(rotationrads));
if(Color.a > 0.0)
{
Color.r = (Color.r * NewTex.x);
Color.g = (Color.g * NewTex.x);
Color.b = (Color.b * NewTex.x);
}
else
{
Color.r = 100;
Color.g = 0;
Color.b = 0;
Color.a = 100;
}
return Color;
}
technique StaticLightOnly
{
pass Pass1
{
PixelShader = compile ps_2_0 staticlight();
}
}
If anyone has experience with sprite-based rotation in 2d shaders, I'd appreciate any help with this! Thanks in advanced!
Because rotations are performed about the origin, you have to move the rotation center (0.5, 0.5) to the origin, execute the rotation and then undo the translation.
I have a simple enough shader that supports multiple point lights.
Lights are stored as an array of Light structs (up to a max size) and I pass in the number of active lights when it changes.
The problem is in the PixelShader function:
It's basic stuff, get the base color from the texture, loop through the lights array for 0 to numActiveLights and add the effect, and it works fine, but performance is terrible!
BUT if I replace the reference to the global var numActiveLights with a constant of the same value performance is fine.
I just can't fathom why referencing the variable makes a 30+ fps difference.
Can anyone please explain?
Full Shader code:
#define MAX_POINT_LIGHTS 16
struct PointLight
{
float3 Position;
float4 Color;
float Radius;
};
float4x4 World;
float4x4 View;
float4x4 Projection;
float3 CameraPosition;
float4 SpecularColor;
float SpecularPower;
float SpecularIntensity;
float4 AmbientColor;
float AmbientIntensity;
float DiffuseIntensity;
int activeLights;
PointLight lights[MAX_POINT_LIGHTS];
bool IsLightingEnabled;
bool IsAmbientLightingEnabled;
bool IsDiffuseLightingEnabled;
bool IsSpecularLightingEnabled;
Texture Texture;
sampler TextureSampler = sampler_state
{
Texture = <Texture>;
Magfilter = POINT;
Minfilter = POINT;
Mipfilter = POINT;
AddressU = WRAP;
AddressV = WRAP;
};
struct VS_INPUT
{
float4 Position : POSITION0;
float2 TexCoord : TEXCOORD0;
float3 Normal : NORMAL0;
};
struct VS_OUTPUT
{
float3 WorldPosition : TEXCOORD0;
float4 Position : POSITION0;
float3 Normal : TEXCOORD1;
float2 TexCoord : TEXCOORD2;
float3 ViewDir : TEXCOORD3;
};
VS_OUTPUT VS_PointLighting(VS_INPUT input)
{
VS_OUTPUT output;
float4 worldPosition = mul(input.Position, World);
output.WorldPosition = worldPosition;
float4 viewPosition = mul(worldPosition, View);
output.Position = mul(viewPosition, Projection);
output.Normal = normalize(mul(input.Normal, World));
output.TexCoord = input.TexCoord;
output.ViewDir = normalize(CameraPosition - worldPosition);
return output;
}
float4 PS_PointLighting(VS_OUTPUT IN) : COLOR
{
if(!IsLightingEnabled) return tex2D(TextureSampler,IN.TexCoord);
float4 color = float4(0.0f, 0.0f, 0.0f, 0.0f);
float3 n = normalize(IN.Normal);
float3 v = normalize(IN.ViewDir);
float3 l = float3(0.0f, 0.0f, 0.0f);
float3 h = float3(0.0f, 0.0f, 0.0f);
float atten = 0.0f;
float nDotL = 0.0f;
float power = 0.0f;
if(IsAmbientLightingEnabled) color += (AmbientColor*AmbientIntensity);
if(IsDiffuseLightingEnabled || IsSpecularLightingEnabled)
{
//for (int i = 0; i < activeLights; ++i)//works but perfoemnce is terrible
for (int i = 0; i < 7; ++i)//performance is fine but obviously isn't dynamic
{
l = (lights[i].Position - IN.WorldPosition) / lights[i].Radius;
atten = saturate(1.0f - dot(l, l));
l = normalize(l);
nDotL = saturate(dot(n, l));
if(IsDiffuseLightingEnabled) color += (lights[i].Color * nDotL * atten);
if(IsSpecularLightingEnabled) color += (SpecularColor * SpecularPower * atten);
}
}
return color * tex2D(TextureSampler, IN.TexCoord);
}
technique PerPixelPointLighting
{
pass
{
VertexShader = compile vs_3_0 VS_PointLighting();
PixelShader = compile ps_3_0 PS_PointLighting();
}
}
My guess is that changing the loop constraint to be a compile-time constant is allowing the HLSL compiler to unroll the loop. That is, instead of this:
for (int i = 0; i < 7; i++)
doLoopyStuff();
It's getting turned into this:
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
doLoopyStuff();
Loops and conditional branches can be a significant performance hit inside of shader code, and should be avoided wherever possible.
EDIT
This is just off the top of my head, but maybe you could try something like this?
for (int i = 0; i < MAX_LIGHTS; i++)
{
color += step(i, activeLights) * lightingFunction();
}
This way you calculate all possible lights, but always get a value of 0 for inactive lights. The benefit would depend on the complexity of the lighting function, of course; you would need to do more profiling.
Try using PIX to profile it. http://wtomandev.blogspot.com/2010/05/debugging-hlsl-shaders.html
Alternatively, read this rambling speculation:
Maybe because with a constant, the compiler can unravel and collapse your loop's instructions. When you replace it with a variable, the compiler becomes unable to make the same assumptions.
Though, somewhat unrelated to your actual question, I would push a lot of those conditions /calculations to the software level.
if(IsDiffuseLightingEnabled || IsSpecularLightingEnabled)
^- Like that.
Also, I think you could precompute a few thing before you call the shader program as well. Like l = (lights[i].Position - IN.WorldPosition) / lights[i].Radius; Pass a precomputed array of those rather than calculating each time over every pixel.
I might be misinformed of the optimizations that the HLSL compiler does, but I think each calculation you do like that on the pixel shader gets executed screen w*h times (though this is done insanely parallel), and I vaguely remember there being some limits to the number of instructions you could have in a shader (like 72?). (though I think that restriction was liberalized a lot in higher versions of HLSL). Maybe the fact that your shader generates so many instructions -- maybe it breaks your program up and turns it into a multi-pass pixel shader on compilation. If that's the case, that probably adds significant overhead.
Actually, here's another idea that might be stupid: Passing a variable to a shader has it transmit the data to the GPU. That transmission happens with limited bandwidth. Perhaps the compiler is smart enough such that when you're only staticly indexing the first 7 elements in an array, only transfer 7 elements. When the compiler doesn't make that optimization (because you aren't iterating with constants), it pushes the WHOLE array every frame, and you're flooding the bus. If that's the case, then my earlier suggestion of pushing calculations out, and passing more results in, would only make the problem worse, heh.
Details:
I'm in the proccess of procedural planet generation; so far I have done the dynamic LOD work, but my current software algorithm is very very slow. I decided to do it using DX11's new tessellation features instead.
Currently my sphere is a subdivided icosahedron. (20 sides all equilateral triangles)
Back when I was subdividing using my software algorithm, one triangle would be
split into four children across the midpoints of the parent forming the Hyrule symbol each time...like this: http://puu.sh/1xFIx
As you can see, each triangle subdivided created more and more equilateral triangles, i.e. each one was exactly the same shape.
But now that I am using the GPU to tessellate in HLSL, the result is definately not
what I am looking for: http://puu.sh/1xFx7
Questions:
Is there anything I can do in the Hull and Domain shaders to change the tessellation
so that it subdivides into sets of equilateral triangles like the first image?
Should I be using the geometry shader for something like this? If so, would it be
slower then the tessellator?
I tried using Tessellation Shader, but I encontred a problem: the domain shader only pass the uv coordinate (SV_DomainLocation) and the input patch for positionining the vertices, when the domain location for vertex is 0.3, 0.3, 0.3 (center vertex) is impossible to know the correct position because you need information about the other vertices or a index(x, y) of iteration that's not provided by the Domain Shader Stage.
because this problem I write the code in geometry shader, this shader is very limited for tessellations because the output stream cannot have a size bigger than 1024 bytes (in shader model 5.0). I implemented the calculation of vertex positions using the uv (like SV_DomainLocation) but this only tessellate the triangles, you must use part of your code to calculate added position in center of triangles to create the precise final result.
this is the code for equilateral triangles tessellation:
// required for array
#define MAX_ITERATIONS 5
void DrawTriangle(float4 p0, float4 p1, float4 p2, inout TriangleStream<VS_OUT> stream)
{
VS_OUT v0;
v0.pos = p0;
stream.Append(v0);
VS_OUT v1;
v1.pos = p1;
stream.Append(v1);
VS_OUT v2;
v2.pos = p2;
stream.Append(v2);
stream.RestartStrip();
}
[maxvertexcount(128)] // directx rule: maxvertexcount * sizeof(VS_OUT) <= 1024
void gs(triangle VS_OUT input[3], inout TriangleStream<VS_OUT> stream)
{
int itc = min(tess, MAX_ITERATIONS);
float fitc = itc;
float4 past_pos[MAX_ITERATIONS];
float4 array_pass[MAX_ITERATIONS];
for (int pi = 0; pi < MAX_ITERATIONS; pi++)
{
past_pos[pi] = float4(0, 0, 0, 0);
array_pass[pi] = float4(0, 0, 0, 0);
}
// -------------------------------------
// Tessellation kernel for the control points
for (int x = 0; x <= itc; x++)
{
float4 last;
for (int y = 0; y <= x; y++)
{
float2 seg = float2(x / fitc, y / fitc);
float3 uv;
uv.x = 1 - seg.x;
uv.z = seg.y;
uv.y = 1 - (uv.x + uv.z);
// ---------------------------------------
// Domain Stage
// uv Domain Location
// x,y IterationIndex
float4 fpos = input[0].pos * uv.x;
fpos += input[1].pos * uv.y;
fpos += input[2].pos * uv.z;
if (x > 0 && y > 0)
{
DrawTriangle(past_pos[y - 1], last, fpos, stream);
if (y < x)
{
// add adjacent triangle
DrawTriangle(past_pos[y - 1], fpos, past_pos[y], stream);
}
}
array_pass[y] = fpos;
last = fpos;
}
for (int i = 0; i < MAX_ITERATIONS; i++)
{
past_pos[i] = array_pass[i];
}
}
}