Comparison operations in HLSL - directx

I am reading through a HLSL implementation of CSM shadow mapping and have come across a line of code i don't quite understande
float3 pos;
float3 CascadeDistances;
...
float3 weights = ( pos.z < CascadeDistances );
Can someone please tell me what is happening here and what is the result of this assignment.
I think it may expand out to something like
float3 weights;
weights.x = ( pos.z < CascadeDistances.x ) ? 1 : 0;
weights.y = ( pos.z < CascadeDistances.y ) ? 1 : 0;
weights.z = ( pos.z < CascadeDistances.z ) ? 1 : 0;
Can someone please confirm if this is correct of if i am way off.
Any help would be appreciated.

I contacted the author of this shader and after some time he replied and confirmed that this is how this particular expression evaluates

Related

How to vectorize Mersenne Twister loops over arrays

Currently i'm working with an custom implementation of the Mersenne Twister, and i'd like to improve my understanding of vector operations.
I have the following code:
#define N 624
#define M 397
for( k = N -1; k; k-- )
{
array[i] = (array[i] ^ ((array[i-1] ^ (array[i-1] >> 30)) * 1566083941UL)) - i;
array[i] &= 0xffffffffUL;
++i;
if ( i >= N )
{
array[0] = array[N-1];
i = 1;
}
}
Here i'm working with 32 bit integers only, so as i understand, I could perform 8 times as much operations at the same time, using AVX2 instructions? How can I do that in practice?
I know how to deal with addition of 2 vectors, but this case seems to be more complicated. I don't know how to begin.
For a scalar approach i'd work like that, but i'd like to get sure how to perform these actions in my case.
for (i = 0; i < 1024; i++)
{
C[i] = A[i]*B[i];
}
for (i = 0; i < 1024; i+=4)
{
C[i:i+3] = A[i:i+3]*B[i:i+3];
}
Unfortunately at my university there are no lessons about intrinsics, but i'm quite curious in order to get an improvement.
I'm also doing some thoughts, about how to create the array using vectors? Maybe matrix? (Maybe _mm256_setr_epi32)
I hope to get some advice regarding this topic!

How to accumulate and retain records in Buffer[] - in a MQL4 Custom Indicator?

I am making a custom indicator, that displays the change in the closing price of a certain currency.
for ( i = limit; i >= 0; i-- ) {
totaleur = 0;
for ( x = i; x < i + 1; x++ ) {
totaleur = ( ( iClose( "EURUSD", 0, x )
- iClose( "EURUSD", 0, x - 1 )
)
/ iClose( "EURUSD", 0, x - 1 )
);
}
ExtMapBuffer1[i] = totaleur;
return(0);
}
in this case the indicator displays only the change in price of each observation.
Any ideas how to make it display the change in an observation plus all previous observations?
There are several important points to realise, so as to make the goal achieved:
1: do not prematurely escape in the first round, via return(0)
moving the command return(0); outside the code-execution block {...}from the forward-stepping ( i decreases ) for( i = limit ;...; i--){...} code-execution block will let the pointer step forward in (i),as the Custom Indicator uses a progressive, discontinuous, per-partes incremental ( in time ), evaluation ( ref. MQL4 documentation on Custom Indicator iCustom(...) calling interface parameters )
2: decide, whether the inner-loop ( a sum of fractions ) was correctly coded
the proposed expression providesa sum of N, per-Bar relative differences,nota sum of N absolute-differences, divided by a net price change over N bars.
While this might be working, the point is, whether the intended model is correct to sum relative differences ( percent change over different, variable individual bases ), or whether the sum ob absolute differences ought be only at the very end of the loop divided by a one, common, base -- the net price difference between the first and the last point ( over the N-bars base ), which is commonly a quantitative modelling practice when a noisy signal is subject to some cheap smoothing technique.
3: correct problems in accessing TimeSeries vectors ( negative index )
given the outer for ( i = limit; i >= 0; i-- ) loop permits the i to become zero,
given the inner for ( x = i; ... ) thus permits x == 0,
the x - 1 < 0 becomes a problem,
wherethe instruction iClose( _Symbol, PERIOD_CURRENT, x - 1 ) requests to access a value, that does not yet exist ( has a negative index into TimeSeries vector ).
for ( i = limit; // SET:_______________________ START at BAR[i == limit]
i >= 0; // PRE: PRE-CONDITION i >= 0
i-- // UPD: POST-UPDATE i-- STEP FORWARD IN TIME
) { // ___________________________________________________________
totaleur = 0; // ZEROISED
for ( x = i; // SET:_________________ START at BAR[x = (i)]
x < i + 1; // PRE: PRE-CONDITION x < (i)+1
x++ // UPD: POST-UPDATE x++ +1 STEP ( ONCE )
) { // _____________________________________________________
totaleur = ( ( iClose( "EURUSD", 0, x )
- iClose( "EURUSD", 0, x - 1 )
)
/ iClose( "EURUSD", 0, x - 1 )
);
} // LOOP KEPT STORING ANY INTERIM VALUE FOR EACH x INTO THE SAME <var>
ExtMapBuffer1[i] = totaleur;
return(0); //___________________________________DO NOT PREMATURELY RET/EXIT RIGHT FROM THE 1st LOOP
}
As you might have already noticed, the code permits just one loop in the inner for(){...}
If you need a sum of N previous observations - you need something like this:
for(i=limit; i>=0; i--) {
double totaleur = 0;
for(x=i; x<i+N; x++) {
totaleur += ((iClose("EURUSD", 0, x)- iClose("EURUSD", 0, x-1))/iClose("EURUSD", 0, x-1));
}
ExtMapBuffer1[i]=totaleur;
}
when you have return(0); inside loop - indicator will stop there and do not run that cycle with next parameter - so be careful with it

DX11 HLSL Secondary Texture Coordinates Lost

Been banging my head up against the wall with this for a while. Despite the fact that I THINK I have a proper Vertex Format defined with D3D11_INPUT_ELEMENT_DESC, no matter what I do, I can't see to read my TEXCOORD1 values from this shader. To test this shader, I put random values into my second set of UV coordinates just to see if they were reaching the shader, but to my dismay, I haven't been able to find these random values anywhere. I have also watched the data go into the mapped memory directly, and I am pretty sure the random values were there when they were mapped.
Here is the Shader code:
sampler ImageSampler: register(s0);
Texture2D <float4> ImageTexture: register(t0);
Texture2D <float4> ReflectionTexture: register(t1);
//Texture2D <float4> ReflectionMap: register(t0);
struct PS_IN
{
float4 InPos: SV_POSITION;
float2 InTex: TEXCOORD;
float2 InRef: TEXCOORD1;
float4 InCol: COLOR0;
};
float4 main(PS_IN input): SV_TARGET
{
float4 res;
float4 mul;
float2 tcRef;
float4 res1 = ImageTexture.Sample(ImageSampler, input.InTex) * input.InCol;
float4 res2 = ReflectionTexture.Sample(ImageSampler, input.InRef+input.InTex);
mul.r = 0.5;
mul.g = 0.5;
mul.b = 0.5;
mul.a = 0.5;
res = res1 + res2;
res = res * mul;
res.a = res1.a;
res.r = input.InRef.x;//<-----should be filled with random stuff... not working
res.b = input.InRef.y;//<-----should be filled with random stuff... not working
return res;
}
Here is my D3D11_ELEMENT_DESC... (sorry it is in pascal, but I like pascal)
const
CanvasVertexLayout: array[0..3] of D3D11_INPUT_ELEMENT_DESC =
((SemanticName: 'POSITION';
SemanticIndex: 0;
Format: DXGI_FORMAT_R32G32_FLOAT;
InputSlot: 0;
AlignedByteOffset: 0;
InputSlotClass: D3D11_INPUT_PER_VERTEX_DATA;
InstanceDataStepRate: 0),
(SemanticName: 'TEXCOORD';
SemanticIndex: 0;
Format: DXGI_FORMAT_R32G32_FLOAT;
InputSlot: 0;
AlignedByteOffset: 8;
InputSlotClass: D3D11_INPUT_PER_VERTEX_DATA;
InstanceDataStepRate: 0),
(SemanticName: 'TEXCOORD';
SemanticIndex: 1;
Format: DXGI_FORMAT_R32G32_FLOAT;
InputSlot: 0;
AlignedByteOffset: 16;
InputSlotClass: D3D11_INPUT_PER_VERTEX_DATA;
InstanceDataStepRate: 0),
(SemanticName: 'COLOR';
SemanticIndex: 0;
Format: DXGI_FORMAT_R8G8B8A8_UNORM;
InputSlot: 0;
AlignedByteOffset: 24;
InputSlotClass: D3D11_INPUT_PER_VERTEX_DATA;
InstanceDataStepRate: 0)
);
And here's the Vertext Struct
TVertexEntry = packed record
X, Y: Single;
U, V: Single;
u2,v2:single;
Color: LongWord;
end;
Since the COLOR semantic follows the TEXTURE semantics, my best guess is that the problem is with the SHADER and not the pascal code... but since I'm new to this kind of stuff, I'm obviously lost
Any insight is appreciated.
Answering my own question. Since I'm new to Shaders in general, maybe this will help some other newbs.
I was assuming that all I needed to do was add a second set of UV coordinates to the Vertex Format and add a D3D11_INPUT_ELEMENT_DESC for it. However, there is also a vertex shader involved, more-or-less a passthrough and that vertex shader needs to be aware of the new UV coordinates and let them pass through. I was just making a 2D engine so I didn't think that I'd even have to mess with VertexShaders... go figure. So I modified the vertex shader, and this was the result:
void main(
float2 InPos: POSITION0,
float2 InTex: TEXCOORD0,
float2 InTex2: TEXCOORD1,//<--added
float4 InCol: COLOR0,
out float4 OutPos: SV_POSITION,
out float2 OutTex: TEXCOORD2,
out float2 OutTex2: TEXCOORD3,//<--added
out float4 OutCol: COLOR0)
{
OutPos = float4(InPos, 0.0, 1.0);
OutTex = InTex;
OutCol = InCol;
OutTex2 = InTex2;//<--added
}

How to use FFTW on deviceMotion.userAcceleration.x

I am working on an app. that its measuring the motion of a device. (xyz direction )
and now i must use fftw to filter the data.
i don't know how to call the data through fftw. hier below is a part of my code trying to execute the X-data ( i am working on each direction separate so X, Y, and then Z)
// FFTW for X-data
int SIZE = 97;
fftw_complex *dataX, *fft_resultX;
fftw_plan plan_X;
int i ;
dataX = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * SIZE);
fft_resultX = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * SIZE);
plan_X = fftw_plan_dft_1d(SIZE, dataX, fft_resultX,
FFTW_FORWARD, FFTW_ESTIMATE); // FFTW_MEASURE
for( i = 0 ; i < SIZE ; i++ ) {
dataX[i][0] = 1.0; // real
dataX[i][1] = 0.0; // complex
for( i = 0 ; i < SIZE ; i++ ) {
fprintf( stdout, "dataX[%d] = { %2.2f, %2.2f }\n",
i, dataX[i][0], dataX[i][1] );
}
fftw_execute( plan_X);
for( i = 0 ; i < SIZE ; i++ ) {
fprintf( stdout, "fft_resultX[%d] = { %2.2f, %2.2f }\n",
i, fft_resultX[i][0], fft_resultX[i][1] );
}
and hier is the userAcceleration:
[[weakSelf.graphViews objectAtIndex:kDeviceMotionGraphTypeUserAcceleration] addX:deviceMotion.userAcceleration.x y:deviceMotion.userAcceleration.y z:deviceMotion.userAcceleration.z];
for example, when i am writing :
dataX= deviceMotion.userAcceleration.x;
i am getting this error:
Assigning to 'fftw_complex *' (aka '_Complex double *') from incompatible type 'double'
any idea how to make fftw work on it ?
thanks for every try
You can't simply convert real data to a real imaginary pair. Each complex number is made up of 2 doubles.
You need to store all your acceleration data into a larger array (256 entries for example) where the x value is assigned to the real part of the complex number and 0 is assigned to the imaginary part.

using HLSL to invisibly stress a graphics card - How to stress the memory?

I've been developing for a bit an invisible (read: doesn't produce any visual output) stressor to test the capabilities of my graphics card (and as a exploration of DirectCompute in general, with which I'm pretty new). I've got the following code right now that I'm pretty proud of:
RWStructuredBuffer<uint> BufferOut : register(u0);
[numthreads(1, 1, 1)]
void CSMain( uint3 DTid : SV_DispatchThreadID )
{
uint total = 0;
float p = 0;
while(p++ < 40.0){
float s= 4.0;
float M= pow(2.0,p) - 1.0;
for(uint i=0; i <= p - 2; i++)
{
s=((s*s) - 2) % M;
}
if(s < 1.0) total++;
}
BufferOut[DTid.x] = total;
}
This runs the Lucas Lehmer Test for the first 40 powers of two. When I dispatch this code in a timed loop and look at my graphics cards stats using GPU-Z, my GPU load shoots to 99% for the duration. I'm pretty happy with this, but I also notice that the heat generation from a fully loaded out GPU is actually pretty minimal (I'm getting about a 5 to 10 degree Celsius jump, nowhere near the heat jump I get when running, say, Borderlands 2). My thought is that most of my heat comes from memory accesses, so I would need to include consistent memory accesses across the run. My initial code looked like this:
RWStructuredBuffer<uint> BufferOut : register(u0);
groupshared float4 memory_buffer[1024];
[numthreads(1, 1, 1)]
void CSMain( uint3 DTid : SV_DispatchThreadID )
{
uint total = 0;
float p = 0;
while(p++ < 40.0){
[fastop] // to lower compile times - Code efficiency is strangely not what Im looking for right now.
for(uint i = 0; i < 1024; ++i)
float s= 4.0;
float M= pow(2.0,p) - 1.0;
for(uint i=0; i <= p - 2; i++)
{
s=((s*s) - 2) % M;
}
if(s < 1.0) total++;
}
BufferOut[DTid.x] = total;
}
Read a lot of non-coherent samples in large textures. Try both DXT1 compressed and non-compressed values. And use render to texture. And MRT. All will beat on the GPU memory systems.

Resources