I've been writing my own HLSL pixel shader for dynamic lighting using raycasting. Unfortunately, since I'm using this out of XNA, I can only use up to ps_3_0. As you can see, the limitations difference between 3 and 4 are drastic, especially in instruction slots and temp registers:
Specifically, I'm running out of instruction slots. This limitation is preventing me from having accurate rays. (As in the amount of pixels the ray's position increases by from the light source has to be a lot less than I want)
Here's what it looks like.
And here's the relevant part of the code:
//Cast rays (Point Light)
float4 CastRays(float2 texCoord: TEXCOORD0) : COLOR0
float dir;
float2 move;
float2 rayPos;
float2 pixelPos = float2(texCoord.x * width, texCoord.y * height);
float dist = distance(pixelPos, lightPos);
if (sqrt(dist) <= lightSize)
rayPos = lightPos;
dir = atan2(lightPos.y - pixelPos.y, lightPos.x - pixelPos.x);
move = float2(cos(dir), sin(dir)) * rayLength;
for (int ii = 0; ii < clamp(dist / rayLength, 0, 7); ii++)
if (tex2D(s0, float2(rayPos.x / width, rayPos.y / height)).a > 0)
return black;
rayPos -= move;
return black;
float light = 1 - clamp((float)abs(sqrt(dist)) / lightSize, 0.0, 1.0);
return lightColor * light;
The limitation variable I've put in place is rayLength. The larger this number, the less accurate the rays are. I can give more specific examples of what this looks like if anybody wants.
I'm very new to the concept of raycasting, and fairly new to HLSL. Is there any way I can make this work under the limitations and/or increase the limits?
Thank you.


Shadow Mapping - Space Transformations are going bad

I am currently studying shadow mapping, and my biggest issue right now is the transformations between spaces. This is my current working theory/steps.
Pass 1:
Get depth of pixel from camera, store in depth buffer
Get depth of pixel from light, store in another buffer
Pass 2:
Use texture coordinate to sample camera's depth buffer at current pixel
Convert that depth to a view space position by multiplying the projection coordinate with invProj matrix. (also do a perspective divide).
Take that view position and multiply by invV (camera's inverse view) to get a world space position
Multiply world space position by light's viewProjection matrix.
Perspective divide that projection-space coordinate, and manipulate into [0..1] to sample from light depth buffer.
Get current depth from light and closest (sampled) depth, if current depth > closest depth, it's in shadow.
Shader Code
PS_INPUT vs(VS_INPUT input) {
output.pos = mul(input.vPos, mvp);
output.cameraDepth =;
float4 vPosInLight = mul(input.vPos, m);
vPosInLight = mul(vPosInLight, light.viewProj);
output.lightDepth =;
float cameraDepth = input.cameraDepth.x / input.cameraDepth.y;
//Bundle cameraDepth in alpha channel of a normal map.
output.normal = float4(input.normal, cameraDepth);
//4 Lights in total -- although only 1 is active right now. Going to use r/g/b/a for each light depth.
output.lightDepths.r = input.lightDepth.x / input.lightDepth.y;
Pass 2 (Screen Quad):
float4 ps(PS_INPUT input) : SV_TARGET{
float4 pixelPosView = depthToViewSpace(input.texCoord);
float4 pixelPosWorld = mul(pixelPosView, invV);
float4 pixelPosLight = mul(pixelPosWorld, light.viewProj);
float shadow = shadowCalc(pixelPosLight);
//For testing / visualisation
return float4(shadow,shadow,shadow,1);
float4 depthToViewSpace(float2 xy) {
//Get pixel depth from camera by sampling current texcoord.
//Extract the alpha channel as this holds the depth value.
//Then, transform from [0..1] to [-1..1]
float z = (_normal.Sample(_sampler, xy).a) * 2 - 1;
float x = xy.x * 2 - 1;
float y = (1 - xy.y) * 2 - 1;
float4 vProjPos = float4(x, y, z, 1.0f);
float4 vPositionVS = mul(vProjPos, invP);
vPositionVS = float4( / vPositionVS.w,1);
return vPositionVS;
float shadowCalc(float4 pixelPosL) {
//Transform pixelPosLight from [-1..1] to [0..1]
float3 projCoords = ( / pixelPosL.w) * 0.5 + 0.5;
float closestDepth = _lightDepths.Sample(_sampler, projCoords.xy).r;
float currentDepth = projCoords.z;
return currentDepth > closestDepth; //Supposed to have bias, but for now I just want shadows working haha
CPP Matrices
// (Position, LookAtPos, UpDir)
auto lightView = XMMatrixLookAtLH(XMLoadFloat4(&pos4), XMVectorSet(0,0,0,1), XMVectorSet(0,1,0,0));
// (FOV, AspectRatio (1000/680), NEAR, FAR)
auto lightProj = XMMatrixPerspectiveFovLH(1.57f , 1.47f, 0.01f, 10.0f);
XMStoreFloat4x4(&_cLightBuffer.light.viewProj, XMMatrixTranspose(XMMatrixMultiply(lightView, lightProj)));
Current Outputs
White signifies that a shadow should be projected there. Black indicates no shadow.
CameraPos (0, 2.5, -2)
CameraLookAt (0, 0, 0)
CameraFOV (1.57)
CameraNear (0.01)
CameraFar (10.0)
LightPos (0, 2.5, -2)
LightLookAt (0, 0, 0)
LightFOV (1.57)
LightNear (0.01)
LightFar (10.0)
If I change the CameraPosition to be (0, 2.5, 2), basically just flipped on the Z axis, this is the result.
Obviously a shadow shouldn't change its projection depending on where the observer is, so I think I'm making a mistake with the invV. But I really don't know for sure. I've debugged the light's projView matrix, and the values seem correct - going from CPU to GPU. It's also entirely possible I've misunderstood some theory along the way because this is quite a tricky technique for me.
Aha! Found my problem. It was a silly mistake, I was calculating the depth of pixels from each light, but storing them in a texture that was based on the view of the camera. The following image should explain my mistake better than I can with words.
For future reference, the solution I decided was to scrap my idea for storing light depths in texture channels. Instead, I basically make a new pass for each light, and bind a unique depth-stencil texture to render the geometry to. When I want to do light calculations, I bind each of the depth textures to a shader resource slot and go from there. Obviously this doesn't scale well with many lights, but for my student project where I'm only required to have 2 shadow casters, it suffices.
_context->DrawIndexed(indexCount, 0, 0); //Draw to regular render target
_sunlight->use(1, _context); //Use sunlight shader (basically just runs a Vertex Shader & Null Pixel shader so depth can be written to depth map)
_context->DrawIndexed(indexCount, 0, 0); //Draw to sunlight depth target
ID3D11RenderTargetView* nullrv = { nullptr };
ctx->OMSetRenderTargets(1, &nullrv, _sunlightDepthStencilView);
//The purpose of setting a null render target before doing the draw call is
//that a draw call with only a depth target bound is much faster.
//(At least I believe so, from my reading online)

Normal mapping on a large sphere is not entirely correct

So I've been working on a Directx11/hlsl rendering engine with the goal of creating a realistic planet which you can view from both on the surface and also at a planetary level. The planet is a normalized cube, which is procedurally generated using noise and as you move closer to the surface of the planet, a binary-based triangle tree splits until the desired detail level is reached. I got vertex normal calculations to work correctly, and I recently started trying to implement normal mapping for my terrain textures, and I have gotten something that seems to work for the most part. However, when the sun is pointing almost perpendicular to the ground (90 degrees), it is way more lit up
However, from the opposite angle (270 degrees), I am getting something that seems
, but may as well be just as off.
The debug lines that are being rendered are the normal, tangent, and bitangents (which all appear to be correct and fit the topology of the terrain)
Here is my shader code:
Vertex shader:
PSIn mainvs(VSIn input)
PSIn output;
output.WorldPos = mul(float4(input.Position, 1.f), Instances[input.InstanceID].WorldMatrix); // pass pixel world position as opposed to screen space position for lighitng calculations
output.Position = mul(output.WorldPos, CameraViewProjectionMatrix);
output.TexCoord = input.TexCoord;
output.CameraPos = CameraPosition;
output.Normal = normalize(mul(input.Normal, (float3x3)Instances[input.InstanceID].WorldMatrix));
float3 Tangent = normalize(mul(input.Tangent, (float3x3)Instances[input.InstanceID].WorldMatrix));
float3 Bitangent = normalize(cross(output.Normal, Tangent));
output.TBN = transpose(float3x3(Tangent, Bitangent, output.Normal));
return output;
Pixel shader (Texcoord scalar is for smaller textures closer to planet surface):
float3 FetchNormalVector(float2 TexCoord)
float3 Color = NormalTex.Sample(Samp, TexCoord * TexcoordScalar);
Color *= 2.f;
return normalize(float3(Color.x - 1.f, Color.y - 1.f, Color.z - 1.f));
float3 LightVector = -SunDirection;
float3 TexNormal = FetchNormalVector(input.TexCoord);
float3 WorldNormal = normalize(mul(input.TBN, TexNormal));
float nDotL = max(0.0, dot(WorldNormal, LightVector));
float4 SampleColor = float4(1.f, 1.f, 1.f, 1.f);
SampleColor *= nDotL;
return float4(, 1.f);
Thanks in advance, and let me know if you have any insight as to what could be the issue here.
Edit 1: I tried it with a fixed blue value instead of sampling from the normal texture, which gives me the correct and same results as if I had not applied mapping (as expected). Still don't have a lead on what would be causing this issue.
Edit 2: I just noticed the strangest thing. At 0, 0, +Z, there are these hard seams that only appear with normal mapping enabled
It's a little hard to see, but it seems almost like there are multiple tangents associated to the same vertex (since I'm not using indexing yet) because the debug lines appear to split on the seams.
Here is my code that I'm using to generate the tangents (bitangents are calculated in the vertex shader using cross(Normal, Tangent))
v3& p0 = Chunk.Vertices[0].Position;
v3& p1 = Chunk.Vertices[1].Position;
v3& p2 = Chunk.Vertices[2].Position;
v2& uv0 = Chunk.Vertices[0].UV;
v2& uv1 = Chunk.Vertices[1].UV;
v2& uv2 = Chunk.Vertices[2].UV;
v3 deltaPos1 = p1 - p0;
v3 deltaPos2 = p2 - p0;
v2 deltaUV1 = uv1 - uv0;
v2 deltaUV2 = uv2 - uv0;
f32 r = 1.f / (deltaUV1.x * deltaUV2.y - deltaUV1.y * deltaUV2.x);
v3 Tangent = (deltaPos1 * deltaUV2.y - deltaPos2 * deltaUV1.y) * r;
Chunk.Vertices[0].Tangent = Normalize(Tangent - (Chunk.Vertices[0].Normal * DotProduct(Chunk.Vertices[0].Normal, Tangent)));
Chunk.Vertices[1].Tangent = Normalize(Tangent - (Chunk.Vertices[1].Normal * DotProduct(Chunk.Vertices[1].Normal, Tangent)));
Chunk.Vertices[2].Tangent = Normalize(Tangent - (Chunk.Vertices[2].Normal * DotProduct(Chunk.Vertices[2].Normal, Tangent)));
Also for reference, this is the main article I was looking at while implementing all of this: link
Edit 3:
Here is an image of the planet from a distance with normal mapping enabled:
And one from the same angle without:

Calculation failing in the compute shader. HLSL DX11

I'm fairly new to compute shaders and I've just started an implementation of one for an Nbody simulation and I've come across a problem that I can't solve on my own.
Here's everything that is contained in the compute file and the entry point is ParticleComputeShader. I am only dispatching 1 thread and creating 1024 in the shader. There are only 1024 particles while I debug and tweak it so each thread has it's own particle to relate to.
The problem seems to be distance != 0.0f and the calculation related to the distance. Before I had the check in it was returning the position as 1.QNaN so it was dividing by 0 somewhere in the code. My thoughts on this is that I'm incorrectly accessing the StructuredBuffer using j and it's screwing up the next few calculations.
Another note: Position.w is the mass of the particle.
struct ConstantParticleData
float4 position;
float4 velocity;
struct ParticleData
float4 position;
float4 velocity;
namespace Constants
float BIG_G = 6.674e-11f;
float SOFTEN = 0.01f;
StructuredBuffer<ConstantParticleData> inputConstantParticleData : register( t0 );
RWStructuredBuffer<ParticleData> outputParticleData : register( u0 );
[numthreads(1024, 1, 1)]
void ParticleComputeShader( int3 dispatchThreadID : SV_DispatchThreadID )
float3 acceleration = float3(0.0f, 0.0f, 0.0f);
for(int j = 0; j < 1024; j++)
float3 r_ij;
r_ij.x = inputConstantParticleData[j].position.x - inputConstantParticleData[dispatchThreadID.x].position.x;
r_ij.y = inputConstantParticleData[j].position.y - inputConstantParticleData[dispatchThreadID.x].position.y;
r_ij.z = inputConstantParticleData[j].position.z - inputConstantParticleData[dispatchThreadID.x].position.z;
float distance = 0.0f;
distance = length(r_ij);
if(distance != 0.0f)
float bottomLine = pow(distance, 2) + pow(Constants::SOFTEN, 2);
acceleration += Constants::BIG_G * ((inputConstantParticleData[j].position.w * r_ij) /
pow(bottomLine, 1.5));
acceleration = acceleration / inputConstantParticleData[dispatchThreadID.x].position.w;
outputParticleData[dispatchThreadID.x].velocity = inputConstantParticleData[dispatchThreadID.x].velocity +
float4(acceleration.x, acceleration.y, acceleration.z, 0.0f);
outputParticleData[dispatchThreadID.x].position = inputConstantParticleData[dispatchThreadID.x].position +
Any help will be appreciated. The shader works for simple input -> output and only started to begin giving troubles when I tried to use more of the input buffer than inputConstantParticleData[dispatchThreadID.x] at any one time.
You can't really set global variables in HLSL. The compiler allows them because in case you use the shader through FX, that will set the globals up for you through constant buffers. Glad to see you solved it, just wanted to post why having the float defined as a local variable fixed the issue.
The problem with this code was that the Namespace variable Constants::BIG_G was not working or being used correctly. Moving this to inside of the function and just declaring it simply as float BIG_G fixed the problems I was having.

Applying a DirectX shader to a rotated texture in XNA

I had a dynamic light shader for which the shaded sprite was fine in my own test program, but started resembling an eclipse once I imported it into my friend's physics based game. I narrowed it down by simplifying the gradient to be purely based on the X value within the shape, and making the outside of the circle in the sprite red, but as you can see, the rotation continues to cause problems (can't post images, so here's links to the album).
Circle at different rotations(not in order, but labelled by radian values):
Everything I researched about matrix math says I am using the correct formula for rotation, but I figure maybe I'm doing something wrong. Here is my .fx shader code:
float rotationrads; /*assumed rotation is in radians*/
sampler TextureSampler: register(s0);
float4 staticlight(float2 Tex: TEXCOORD0) : COLOR0
float4 Color = tex2D(TextureSampler, Tex);
float2 NewTex;
/*Get the new X and Y values by applying the UV formula with the rotation*/
NewTex.x = (Tex.x * cos(rotationrads)) - (Tex.y * sin(rotationrads));
NewTex.y = (Tex.y * sin(rotationrads)) + (Tex.y * cos(rotationrads));
if(Color.a > 0.0)
Color.r = (Color.r * NewTex.x);
Color.g = (Color.g * NewTex.x);
Color.b = (Color.b * NewTex.x);
Color.r = 100;
Color.g = 0;
Color.b = 0;
Color.a = 100;
return Color;
technique StaticLightOnly
pass Pass1
PixelShader = compile ps_2_0 staticlight();
If anyone has experience with sprite-based rotation in 2d shaders, I'd appreciate any help with this! Thanks in advanced!
Because rotations are performed about the origin, you have to move the rotation center (0.5, 0.5) to the origin, execute the rotation and then undo the translation.

Pix, A couple of issues I'm not understanding

I've been asked to split questions which I asked here:
HLSL and Pix number of questions
I thought two and three would both fit in the same question as a solution of one may help resolve the other. I'm trying to debug a shader and seem to be running into issues. Firstly Pix seems to be skipping a large amount of code when I'm running analyse mode. This is analysing an experiment with F12 captures and with D3DX analysis turned off. I have to turn it off as I'm using XNA. The shader code in question is below:
float4 PixelShaderFunction(float2 OriginalUV : TEXCOORD0) : COLOR0
// Get the depth buffer value at this pixel.
float4 color = float4 (0, 0,0,0);
float4 finalColor = float4(0,0,0,0);
float zOverW = tex2D(mySampler, OriginalUV);
// H is the viewport position at this pixel in the range -1 to 1.
float4 H = float4(OriginalUV.x * 2 - 1, (1 - OriginalUV.y) * 2 - 1,
zOverW, 1);
// Transform by the view-projection inverse.
float4 D = mul(H, xViewProjectionInverseMatrix);
// Divide by w to get the world position.
float4 worldPos = D / D.w;
// Current viewport position
float4 currentPos = H;
// Use the world position, and transform by the previous view-
// projection matrix.
float4 previousPos = mul(worldPos, xPreviousViewProjectionMatrix);
// Convert to nonhomogeneous points [-1,1] by dividing by w.
previousPos /= previousPos.w;
// Use this frame's position and last frame's to compute the pixel
// velocity.
float2 velocity = (currentPos - previousPos)/2.f;
// Get the initial color at this pixel.
color = tex2D(sceneSampler, OriginalUV);
OriginalUV += velocity;
for(int i = 1; i < 1; ++i, OriginalUV += velocity)
// Sample the color buffer along the velocity vector.
float4 currentColor = tex2D(sceneSampler, OriginalUV);
// Add the current color to our color sum.
color += currentColor;
// Average all of the samples to get the final blur color.
finalColor = color / xNumSamples;
return finalColor;
With a captured frame and when debugging a pixel I can only see two lines working. These are color = tex2D(sceneSampler, OriginalUV) and finalColor = color / xNumSamples. The rest of it Pix just skips or doesn't do.
Also can I debug in real time using Pix? I'm wondering if this method would reveal more information.
It would appear that most of that shader code is being optimized out (not compiled because it is irrelevant).
In the end, all that matters in the return value of finalColor which is set with color and xNumSamples.
// Average all of the samples to get the final blur color.
finalColor = color / xNumSamples;
I am not sure where xNumSamples gets set, but you can see that the only line that matters to color is color = tex2D(sceneSampler, OriginalUV); (hence it not being removed).
Every line before that is irrelevant because it will be overwritten by that line.
The only bit that follows is that for loop:
for(int i = 1; i < 1; ++i, OriginalUV += velocity)
But this would never execute because i < 1 is false from the get-go (i is assigned a starting value of 1).
Hope that helps!
To answer you second question, I believe to debug shaders in real-time you need to use something like Nvidia's FX Composer and Shader Debugger. However, those run outside of your game, so results are not always useful.
