I've been asked to split questions which I asked here:
HLSL and Pix number of questions
I thought two and three would both fit in the same question as a solution of one may help resolve the other. I'm trying to debug a shader and seem to be running into issues. Firstly Pix seems to be skipping a large amount of code when I'm running analyse mode. This is analysing an experiment with F12 captures and with D3DX analysis turned off. I have to turn it off as I'm using XNA. The shader code in question is below:
float4 PixelShaderFunction(float2 OriginalUV : TEXCOORD0) : COLOR0
{
// Get the depth buffer value at this pixel.
float4 color = float4 (0, 0,0,0);
float4 finalColor = float4(0,0,0,0);
float zOverW = tex2D(mySampler, OriginalUV);
// H is the viewport position at this pixel in the range -1 to 1.
float4 H = float4(OriginalUV.x * 2 - 1, (1 - OriginalUV.y) * 2 - 1,
zOverW, 1);
// Transform by the view-projection inverse.
float4 D = mul(H, xViewProjectionInverseMatrix);
// Divide by w to get the world position.
float4 worldPos = D / D.w;
// Current viewport position
float4 currentPos = H;
// Use the world position, and transform by the previous view-
// projection matrix.
float4 previousPos = mul(worldPos, xPreviousViewProjectionMatrix);
// Convert to nonhomogeneous points [-1,1] by dividing by w.
previousPos /= previousPos.w;
// Use this frame's position and last frame's to compute the pixel
// velocity.
float2 velocity = (currentPos - previousPos)/2.f;
// Get the initial color at this pixel.
color = tex2D(sceneSampler, OriginalUV);
OriginalUV += velocity;
for(int i = 1; i < 1; ++i, OriginalUV += velocity)
{
// Sample the color buffer along the velocity vector.
float4 currentColor = tex2D(sceneSampler, OriginalUV);
// Add the current color to our color sum.
color += currentColor;
}
// Average all of the samples to get the final blur color.
finalColor = color / xNumSamples;
return finalColor;
}
With a captured frame and when debugging a pixel I can only see two lines working. These are color = tex2D(sceneSampler, OriginalUV) and finalColor = color / xNumSamples. The rest of it Pix just skips or doesn't do.
Also can I debug in real time using Pix? I'm wondering if this method would reveal more information.
Cheers,
It would appear that most of that shader code is being optimized out (not compiled because it is irrelevant).
In the end, all that matters in the return value of finalColor which is set with color and xNumSamples.
// Average all of the samples to get the final blur color.
finalColor = color / xNumSamples;
I am not sure where xNumSamples gets set, but you can see that the only line that matters to color is color = tex2D(sceneSampler, OriginalUV); (hence it not being removed).
Every line before that is irrelevant because it will be overwritten by that line.
The only bit that follows is that for loop:
for(int i = 1; i < 1; ++i, OriginalUV += velocity)
But this would never execute because i < 1 is false from the get-go (i is assigned a starting value of 1).
Hope that helps!
To answer you second question, I believe to debug shaders in real-time you need to use something like Nvidia's FX Composer and Shader Debugger. However, those run outside of your game, so results are not always useful.
Related
I am currently studying shadow mapping, and my biggest issue right now is the transformations between spaces. This is my current working theory/steps.
Pass 1:
Get depth of pixel from camera, store in depth buffer
Get depth of pixel from light, store in another buffer
Pass 2:
Use texture coordinate to sample camera's depth buffer at current pixel
Convert that depth to a view space position by multiplying the projection coordinate with invProj matrix. (also do a perspective divide).
Take that view position and multiply by invV (camera's inverse view) to get a world space position
Multiply world space position by light's viewProjection matrix.
Perspective divide that projection-space coordinate, and manipulate into [0..1] to sample from light depth buffer.
Get current depth from light and closest (sampled) depth, if current depth > closest depth, it's in shadow.
Shader Code
Pass1:
PS_INPUT vs(VS_INPUT input) {
output.pos = mul(input.vPos, mvp);
output.cameraDepth = output.pos.zw;
..
float4 vPosInLight = mul(input.vPos, m);
vPosInLight = mul(vPosInLight, light.viewProj);
output.lightDepth = vPosInLight.zw;
}
PS_OUTPUT ps(PS_INPUT input){
float cameraDepth = input.cameraDepth.x / input.cameraDepth.y;
//Bundle cameraDepth in alpha channel of a normal map.
output.normal = float4(input.normal, cameraDepth);
//4 Lights in total -- although only 1 is active right now. Going to use r/g/b/a for each light depth.
output.lightDepths.r = input.lightDepth.x / input.lightDepth.y;
}
Pass 2 (Screen Quad):
float4 ps(PS_INPUT input) : SV_TARGET{
float4 pixelPosView = depthToViewSpace(input.texCoord);
..
float4 pixelPosWorld = mul(pixelPosView, invV);
float4 pixelPosLight = mul(pixelPosWorld, light.viewProj);
float shadow = shadowCalc(pixelPosLight);
//For testing / visualisation
return float4(shadow,shadow,shadow,1);
}
float4 depthToViewSpace(float2 xy) {
//Get pixel depth from camera by sampling current texcoord.
//Extract the alpha channel as this holds the depth value.
//Then, transform from [0..1] to [-1..1]
float z = (_normal.Sample(_sampler, xy).a) * 2 - 1;
float x = xy.x * 2 - 1;
float y = (1 - xy.y) * 2 - 1;
float4 vProjPos = float4(x, y, z, 1.0f);
float4 vPositionVS = mul(vProjPos, invP);
vPositionVS = float4(vPositionVS.xyz / vPositionVS.w,1);
return vPositionVS;
}
float shadowCalc(float4 pixelPosL) {
//Transform pixelPosLight from [-1..1] to [0..1]
float3 projCoords = (pixelPosL.xyz / pixelPosL.w) * 0.5 + 0.5;
float closestDepth = _lightDepths.Sample(_sampler, projCoords.xy).r;
float currentDepth = projCoords.z;
return currentDepth > closestDepth; //Supposed to have bias, but for now I just want shadows working haha
}
CPP Matrices
// (Position, LookAtPos, UpDir)
auto lightView = XMMatrixLookAtLH(XMLoadFloat4(&pos4), XMVectorSet(0,0,0,1), XMVectorSet(0,1,0,0));
// (FOV, AspectRatio (1000/680), NEAR, FAR)
auto lightProj = XMMatrixPerspectiveFovLH(1.57f , 1.47f, 0.01f, 10.0f);
XMStoreFloat4x4(&_cLightBuffer.light.viewProj, XMMatrixTranspose(XMMatrixMultiply(lightView, lightProj)));
Current Outputs
White signifies that a shadow should be projected there. Black indicates no shadow.
CameraPos (0, 2.5, -2)
CameraLookAt (0, 0, 0)
CameraFOV (1.57)
CameraNear (0.01)
CameraFar (10.0)
LightPos (0, 2.5, -2)
LightLookAt (0, 0, 0)
LightFOV (1.57)
LightNear (0.01)
LightFar (10.0)
If I change the CameraPosition to be (0, 2.5, 2), basically just flipped on the Z axis, this is the result.
Obviously a shadow shouldn't change its projection depending on where the observer is, so I think I'm making a mistake with the invV. But I really don't know for sure. I've debugged the light's projView matrix, and the values seem correct - going from CPU to GPU. It's also entirely possible I've misunderstood some theory along the way because this is quite a tricky technique for me.
Aha! Found my problem. It was a silly mistake, I was calculating the depth of pixels from each light, but storing them in a texture that was based on the view of the camera. The following image should explain my mistake better than I can with words.
For future reference, the solution I decided was to scrap my idea for storing light depths in texture channels. Instead, I basically make a new pass for each light, and bind a unique depth-stencil texture to render the geometry to. When I want to do light calculations, I bind each of the depth textures to a shader resource slot and go from there. Obviously this doesn't scale well with many lights, but for my student project where I'm only required to have 2 shadow casters, it suffices.
_context->DrawIndexed(indexCount, 0, 0); //Draw to regular render target
_sunlight->use(1, _context); //Use sunlight shader (basically just runs a Vertex Shader & Null Pixel shader so depth can be written to depth map)
_sunlight->bindDSVSetNullRenderTarget(_context);
_context->DrawIndexed(indexCount, 0, 0); //Draw to sunlight depth target
bindDSVSetNullRenderTarget(ctx){
ID3D11RenderTargetView* nullrv = { nullptr };
ctx->OMSetRenderTargets(1, &nullrv, _sunlightDepthStencilView);
}
//The purpose of setting a null render target before doing the draw call is
//that a draw call with only a depth target bound is much faster.
//(At least I believe so, from my reading online)
So I've been working on a Directx11/hlsl rendering engine with the goal of creating a realistic planet which you can view from both on the surface and also at a planetary level. The planet is a normalized cube, which is procedurally generated using noise and as you move closer to the surface of the planet, a binary-based triangle tree splits until the desired detail level is reached. I got vertex normal calculations to work correctly, and I recently started trying to implement normal mapping for my terrain textures, and I have gotten something that seems to work for the most part. However, when the sun is pointing almost perpendicular to the ground (90 degrees), it is way more lit up
However, from the opposite angle (270 degrees), I am getting something that seems
, but may as well be just as off.
The debug lines that are being rendered are the normal, tangent, and bitangents (which all appear to be correct and fit the topology of the terrain)
Here is my shader code:
Vertex shader:
PSIn mainvs(VSIn input)
{
PSIn output;
output.WorldPos = mul(float4(input.Position, 1.f), Instances[input.InstanceID].WorldMatrix); // pass pixel world position as opposed to screen space position for lighitng calculations
output.Position = mul(output.WorldPos, CameraViewProjectionMatrix);
output.TexCoord = input.TexCoord;
output.CameraPos = CameraPosition;
output.Normal = normalize(mul(input.Normal, (float3x3)Instances[input.InstanceID].WorldMatrix));
float3 Tangent = normalize(mul(input.Tangent, (float3x3)Instances[input.InstanceID].WorldMatrix));
float3 Bitangent = normalize(cross(output.Normal, Tangent));
output.TBN = transpose(float3x3(Tangent, Bitangent, output.Normal));
return output;
}
Pixel shader (Texcoord scalar is for smaller textures closer to planet surface):
float3 FetchNormalVector(float2 TexCoord)
{
float3 Color = NormalTex.Sample(Samp, TexCoord * TexcoordScalar);
Color *= 2.f;
return normalize(float3(Color.x - 1.f, Color.y - 1.f, Color.z - 1.f));
}
float3 LightVector = -SunDirection;
float3 TexNormal = FetchNormalVector(input.TexCoord);
float3 WorldNormal = normalize(mul(input.TBN, TexNormal));
float nDotL = max(0.0, dot(WorldNormal, LightVector));
float4 SampleColor = float4(1.f, 1.f, 1.f, 1.f);
SampleColor *= nDotL;
return float4(SampleColor.xyz, 1.f);
Thanks in advance, and let me know if you have any insight as to what could be the issue here.
Edit 1: I tried it with a fixed blue value instead of sampling from the normal texture, which gives me the correct and same results as if I had not applied mapping (as expected). Still don't have a lead on what would be causing this issue.
Edit 2: I just noticed the strangest thing. At 0, 0, +Z, there are these hard seams that only appear with normal mapping enabled
It's a little hard to see, but it seems almost like there are multiple tangents associated to the same vertex (since I'm not using indexing yet) because the debug lines appear to split on the seams.
Here is my code that I'm using to generate the tangents (bitangents are calculated in the vertex shader using cross(Normal, Tangent))
v3& p0 = Chunk.Vertices[0].Position;
v3& p1 = Chunk.Vertices[1].Position;
v3& p2 = Chunk.Vertices[2].Position;
v2& uv0 = Chunk.Vertices[0].UV;
v2& uv1 = Chunk.Vertices[1].UV;
v2& uv2 = Chunk.Vertices[2].UV;
v3 deltaPos1 = p1 - p0;
v3 deltaPos2 = p2 - p0;
v2 deltaUV1 = uv1 - uv0;
v2 deltaUV2 = uv2 - uv0;
f32 r = 1.f / (deltaUV1.x * deltaUV2.y - deltaUV1.y * deltaUV2.x);
v3 Tangent = (deltaPos1 * deltaUV2.y - deltaPos2 * deltaUV1.y) * r;
Chunk.Vertices[0].Tangent = Normalize(Tangent - (Chunk.Vertices[0].Normal * DotProduct(Chunk.Vertices[0].Normal, Tangent)));
Chunk.Vertices[1].Tangent = Normalize(Tangent - (Chunk.Vertices[1].Normal * DotProduct(Chunk.Vertices[1].Normal, Tangent)));
Chunk.Vertices[2].Tangent = Normalize(Tangent - (Chunk.Vertices[2].Normal * DotProduct(Chunk.Vertices[2].Normal, Tangent)));
Also for reference, this is the main article I was looking at while implementing all of this: link
Edit 3:
Here is an image of the planet from a distance with normal mapping enabled:
And one from the same angle without:
I created a custom CIKernel in Metal. This is useful because it is close to real-time. I am avoiding any cgcontext or cicontext that might lag in real time. My kernel essentially does a Hough transform, but I can't seem to figure out how to read the white points from the image buffer.
Here is kernel.metal:
#include <CoreImage/CoreImage.h>
extern "C" {
namespace coreimage {
float4 hough(sampler src) {
// Math
// More Math
// eventually:
if (luminance > 0.8) {
uint2 position = src.coord()
// Somehow add this to an array because I need to know the x,y pair
}
return float4(luminance, luminance, luminance, 1.0);
}
}
}
I am fine if this part can be extracted to a different kernel or function. The caveat to CIKernel, is its return type is a float4 representing the new color of a pixel. Ideally, instead of a image -> image filter, I would like an image -> array sort of deal. E.g. reduce instead of map. I have a bad hunch this will require me to render it and deal with it on the CPU.
Ultimately I want to retrieve the qualifying coordinates (which there can be multiple per image) back in my swift function.
FINAL SOLUTION EDIT:
As per suggestions of the answer, I am doing large per-pixel calculations on the GPU, and some math on the CPU. I designed 2 additional kernels that work like the builtin reduction kernels. One kernel returns a 1 pixel high image of the highest values in each column, and the other kernel returns a 1 pixel high image of the normalized y-coordinate of the highest value:
/// Returns the maximum value in each column.
///
/// - Parameter src: a sampler for the input texture
/// - Returns: maximum value in for column
float4 maxValueForColumn(sampler src) {
const float2 size = float2(src.extent().z, src.extent().w);
/// Destination pixel coordinate, normalized
const float2 pos = src.coord();
float maxV = 0;
for (float y = 0; y < size.y; y++) {
float v = src.sample(float2(pos.x, y / size.y)).x;
if (v > maxV) {
maxV = v;
}
}
return float4(maxV, maxV, maxV, 1.0);
}
/// Returns the normalized coordinate of the maximum value in each column.
///
/// - Parameter src: a sampler for the input texture
/// - Returns: normalized y-coordinate of the maximum value in for column
float4 maxCoordForColumn(sampler src) {
const float2 size = float2(src.extent().z, src.extent().w);
/// Destination pixel coordinate, normalized
const float2 pos = src.coord();
float maxV = 0;
float maxY = 0;
for (float y = 0; y < size.y; y++) {
float v = src.sample(float2(pos.x, y / size.y)).x;
if (v > maxV) {
maxY = y / size.y;
maxV = v;
}
}
return float4(maxY, maxY, maxY, 1.0);
}
This won't give every pixel where luminance is greater than 0.8, but for my purposes, it returns enough: the highest value in each column, and its location.
Pro: copying only (2 * image width) bytes over to the CPU instead of every pixel saves TONS of time (a few ms).
Con: If you have two major white points in the same column, you will never know. You might have to alter this and do calculations by row instead of column if that fits your use-case.
FOLLOW UP:
There seems to be a problem in rendering the outputs. The Float values returned in metal are not correlated to the UInt8 values I am getting in swift.
This unanswered question describes the problem.
Edit: This answered question provides a very convenient metal function. When you call it on a metal value (e.g. 0.5) and return it, you will get the correct value (e.g. 128) on the CPU.
Check out the filters in the CICategoryReduction (like CIAreaAverage). They return images that are just a few pixels tall, containing the reduction result. But you still have to render them to be able to read the values in your Swift function.
The problem for using this approach for your problem is that you don't know the number of coordinates you are returning beforehand. Core Image needs to know the extend of the output when it calls your kernel, though. You could just assume a static maximum number of coordinates, but that all sounds tedious.
I think you are better off using Accelerate APIs for iterating the pixels of your image (parallelized, super efficiently) on the CPU to find the corresponding coordinates.
You could do a hybrid approach where you do the per-pixel heavy math on the GPU with Core Image and then do the analysis on the CPU using Accelerate. You can even integrate the CPU part into your Core Image pipeline using a CIImageProcessorKernel.
I'm trying to write a pixel shader - I'd like to use Texture.SampleCmpLevelZero as this is usable in loop constructs where Texture.Sample is not.
I've constructed a texture and can sample it fine with Texture.Sample, but switching to SampleCmpLevelZero works for the first few frames, then goes blank, then rarely but intermittently renders correctly.
My scene is static (and the texture data too) - I'm rendering one quad and there is no camera movement of any kind - I can reproduce this reliably by just changing the single line in the PS shader function.
Has anyone seen this?
Thanks
SamplerState sampPointClamp
{
Filter = MIN_MAG_MIP_POINT;
AddressU = Clamp;
AddressV = Clamp;
};
SamplerComparisonState ShadowSampler
{
// sampler state
Filter = MIN_MAG_MIP_POINT;
AddressU = Clamp;
AddressV = Clamp;
// sampler comparison state
ComparisonFunc = LESS;
//ComparisonFilter = COMPARISON_MIN_MAG_MIP_POINT;
};
texture2D tex;
//on the fly full screen quad
PS_IN VS(uint id : SV_VertexID)
{
PS_IN ret;
ret.uv = float2( id & 1, (id & 2) >> 1 );
ret.pos = float4( ret.uv * float2( 2.0f, -2.0f ) + float2( -1.0f, 1.0f), 0.0f, 1.0f );
return ret;
}
float4 PS( PS_IN input ) : SV_Target
{
//return float4(tex.SampleCmpLevelZero(ShadowSampler, input.uv, 0), 0, 0, 1); // Does not work properly
return float4(tex.Sample(sampPointClamp, input.uv).x, 0, 0, 1); // Works fine
}
Sample should work in loops just fine:
float4 PSColUV(COLUV_PIXEL input) : SV_Target
{
float4 output;
for (int i = 0; i < 4; i++)
{
float f = float(i) / 256.0;
float2 uv = input.UV + float2(i,i);
output += g_txDiffuse.Sample(g_samLinear, uv);
}
return input.Col * output/4.0;
}
produces:
ps_4_0
dcl_sampler s0, mode_default
dcl_resource_texture2d (float,float,float,float) t0
dcl_input_ps linear v1.xyzw
dcl_input_ps linear v2.xy
dcl_output o0.xyzw
dcl_temps 3
0: mov r0.xyzw, l(0,0,0,0)
1: mov r1.x, l(0)
2: loop
3: ige r1.y, r1.x, l(4)
4: breakc_nz r1.y
5: itof r1.y, r1.x
6: add r1.yz, r1.yyyy, v2.xxyx
7: sample r2.xyzw, r1.yzyy, t0.xyzw, s0
8: add r0.xyzw, r0.xyzw, r2.xyzw
9: iadd r1.x, r1.x, l(1)
10: endloop
11: mul r0.xyzw, r0.xyzw, v1.xyzw
12: mul o0.xyzw, r0.xyzw, l(0.250000, 0.250000, 0.250000, 0.250000)
13: ret
Also, you do realise that you're doing a PCF lookup rather than a normal texture sample, and that this won't give you the data in the texture, but rather it's going to compare all the texel subsamples (e.g. 8 in bilinear) with your reference value (0), calculate 0 or 1 depending on if they're LESS or GREATEREQUAL to your reference value, the filter those boolean values into a number between 0 and 1
Reply to comment:
thanks - I think Sample can't be in a loop with a variable length or a
length not known at compile time (?). The error I got was " error
X4014: cannot have divergent gradient operations inside loops error:
There was an error compiling expression". On your other point - I do
want an exact sample - I thought that was what I'm getting - I'm just
trying to do some procedural texture generation using the texture
buffer as a table of values to let me compute the true texel value
based on (u,v) etc.. – AnonDev
http://msdn.microsoft.com/en-gb/library/windows/desktop/bb219848%28v=vs.85%29.aspx
"Interaction of Per-Pixel Flow Control With Screen Gradients"
Remember that pixels are executed in (at minimum) a 2x2 block. You can't have control flow that would cause some pixels to sample whilst others do not, nor can you have calculations inside control flow that would cause a sample operation to get different gradients.
(Well, you can, but you need to use SampleGrad for that. But! That's not what you want in this instance. )
You say "exact" sample. Do you mean that your resource only has a single mip map and you want to get each texel in the resource without filtering? (i.e. you were doing a point filter?). Given your explanation of the texture being a table of values, then I don't see why you would need the texture to be a mipchain, and only the top level contains useful info. In which case you can use SampleLevel() with a LOD of 0. This means there will be no divergence in the derivatives, as the sample op isn't using derivatives!
This is the same reason SampleCmpLevelZero works but SampleCmp will not :) If you are point sampling, then another good candidate would be Load(), as you give it exact texel positions as you can even use it on buffers. So if your texture look-up positions are based of the pixel (X,Y) for instance, then you can pass these straight into Load (after accounting for the half texel offset..).
Anyway, you really don't want to be using SampleCmp/LevelZero. It does the wrong thing that you're after! It's used for shadow maps and so on. Use SampleLevel with a LOD of 0 instead.
The problem was:
SamplerComparisonState ShadowSampler
{
// sampler state
Filter = MIN_MAG_MIP_POINT;
AddressU = Clamp;
AddressV = Clamp;
// sampler comparison state
ComparisonFunc = LESS;
//ComparisonFilter = COMPARISON_MIN_MAG_MIP_POINT;
};
It looks like there was a time when ComparisonFilter existed as an attribute (as it turns up in the docs) e.g. build 3/5/2013 of http://msdn.microsoft.com/en-gb/library/windows/desktop/bb509644(v=vs.85).aspx
but will not compile if present.
I fixed the above behaviour by changing the Filter attribute to have the value COMPARISON_MIN_MAG_MIP_POINT - at that point it all worked
I had a dynamic light shader for which the shaded sprite was fine in my own test program, but started resembling an eclipse once I imported it into my friend's physics based game. I narrowed it down by simplifying the gradient to be purely based on the X value within the shape, and making the outside of the circle in the sprite red, but as you can see, the rotation continues to cause problems (can't post images, so here's links to the album).
Circle at different rotations(not in order, but labelled by radian values): http://imgur.com/a/Preth
Everything I researched about matrix math says I am using the correct formula for rotation, but I figure maybe I'm doing something wrong. Here is my .fx shader code:
float rotationrads; /*assumed rotation is in radians*/
sampler TextureSampler: register(s0);
float4 staticlight(float2 Tex: TEXCOORD0) : COLOR0
{
float4 Color = tex2D(TextureSampler, Tex);
float2 NewTex;
/*Get the new X and Y values by applying the UV formula with the rotation*/
NewTex.x = (Tex.x * cos(rotationrads)) - (Tex.y * sin(rotationrads));
NewTex.y = (Tex.y * sin(rotationrads)) + (Tex.y * cos(rotationrads));
if(Color.a > 0.0)
{
Color.r = (Color.r * NewTex.x);
Color.g = (Color.g * NewTex.x);
Color.b = (Color.b * NewTex.x);
}
else
{
Color.r = 100;
Color.g = 0;
Color.b = 0;
Color.a = 100;
}
return Color;
}
technique StaticLightOnly
{
pass Pass1
{
PixelShader = compile ps_2_0 staticlight();
}
}
If anyone has experience with sprite-based rotation in 2d shaders, I'd appreciate any help with this! Thanks in advanced!
Because rotations are performed about the origin, you have to move the rotation center (0.5, 0.5) to the origin, execute the rotation and then undo the translation.