DirectX compute shader: how to write a function with variable array size argument? - directx

I'm trying to write a function within a compute shader (HLSL) that accept an argument being an array on different size. The compiler always reject it.
Example (not working!):
void TestFunc(in uint SA[])
{
int K;
for (K = 0; SA[K] != 0; K++) {
// Some code using SA array
}
}
[numthreads(1, 1, 1)]
void CSMain(
uint S1[] = {1, 2, 3, 4 }; // Compiler happy and discover the array size
uint S2[] = {10, 20}; // Compiler happy and discover the array size
TestFunc(S1);
TestFunc(S2);
}
If I give an array size in TestFunc(), then the compiler is happy when calling TestFunc() passing that specific array size but refuse the call for another size.

You cannot have function parameters of indeterminate size.
You need to initialize an array of know length, and an int variable that holds the array length.
void TestFunc(in uint SA[4], in uint saCount)
{ int K;
for (K = 0; SA[K] != 0; K++)
{
// Some code using SA array, saCount is your array length;
}
}
[numthreads(1, 1, 1)]
void CSMain()
{
uint S1count = 4;
uint S1[] = {1, 2, 3, 4 };
uint S2count = 2;
uint S2[] = {10, 20,0,0};
TestFunc(S1, S1count);
TestFunc(S2, S2count);
}
In my example I have set your array max size as 4, but you can set it bigger if needed. You can also set multiple functions for different array lengths, of set up multiple passes if your data overflows your array max size.
Edit to answer comment
The issue is that array dimensions of function parameters must be explicit as the compiler error states. This cannot be avoided. What you can do however, is avoid passing the array at all. If you in-line your TestFunc in your CSMain, you avoid passing the array and your routine compiles and runs. I know it can make your code longer and harder to maintain, but it's the only way to do what you want with an array of unspecified length. The advantage is that this way you have access to array.Length that might make your code simpler.

Related

How do I allocate an array at runtime in Rust?

Once I have allocated the array, how do I manually free it? Is pointer arithmetic possible in unsafe mode?
Like in C++:
double *A=new double[1000];
double *p=A;
int i;
for(i=0; i<1000; i++)
{
*p=(double)i;
p++;
}
delete[] A;
Is there any equivalent code in Rust?
Based on your question, I'd recommend reading the Rust Book if you haven't done so already. Idiomatic Rust will almost never involve manually freeing memory.
As for the equivalent to a dynamic array, you want a vector. Unless you're doing something unusual, you should avoid pointer arithmetic in Rust. You can write the above code variously as:
// Pre-allocate space, then fill it.
let mut a = Vec::with_capacity(1000);
for i in 0..1000 {
a.push(i as f64);
}
// Allocate and initialise, then overwrite
let mut a = vec![0.0f64; 1000];
for i in 0..1000 {
a[i] = i as f64;
}
// Construct directly from iterator.
let a: Vec<f64> = (0..1000).map(|n| n as f64).collect();
It is completely possible to allocate a fixed-sized array on the heap:
let a = Box::new([0.0f64; 1000]);
Because of deref coercion, you can still use this as an array:
for i in 0..1000 {
a[i] = i as f64;
}
You can manually free it by doing:
std::mem::drop(a);
drop takes ownership of the array, so this is completely safe. As mentioned in the other answer, it is almost never necessary to do this, the box will be freed automatically when it goes out of scope.

Accelerate framework "sign" function

I'm trying to find a super fast way of getting the sign of each value in a vector. I was hoping to find a function in the accelerate framework to do this, but couldn't find one. Here's what it would do:
float *inputVector = .... // some audio vector
int length = ...// length of input vector.
float *outputVector = ....// result
for( int i = 0; i<length; i++ )
{
if( inputVector[i] >= 0 ) outputVector[i] = 1;
else outputVector[i] = -1;
}
Ok, I think I've found a way...
vvcopysignf() "Copies an array, setting the sign of each value based on a second array."
So, one method would be to make an array of 1s, then use this function to change the sign of the 1s based on an input array.
float *ones = ... // a vector filled with 1's
float *input = .... // an input vector
float *output = ... // an output vector
int bufferSize = ... // size of the vectors;
vvcopysignf(output, ones, input, &bufferSize);
//output now is an array of -1s and 1s based the sign of the input.

Any good idea for OpenCL atom_inc separation?

I want to count the total non-zero points number in an image using OpenCL.
Since it is an adding work, I used the atom_inc.
And the kernel code is shown here.
__kernel void points_count(__global unsigned char* image_data, __global int* total_number, __global int image_width)
{
size_t gidx = get_global_id(0);
size_t gidy = get_global_id(1);
if(0!=*(image_data+gidy*image_width+gidx))
{
atom_inc(total_number);
}
}
My question is, by using atom_inc it will be much redundant right?
Whenever we meet a non-zero point, we should wait for the atom_inc.
I have a idea like this, we can separate the whole row into hundreds groups, we find the number in different groups and add them at last.
If we can do something like this:
__kernel void points_count(__global unsigned char* image_data, __global int* total_number_array, __global int image_width)
{
size_t gidx = get_global_id(0);
size_t gidy = get_global_id(1);
if(0!=*(image_data+gidy*image_width+gidx))
{
int stepy=gidy%10;
atom_inc(total_number_array+stepy);
}
}
We will separate the whole problem into more groups.
In that case, we can add the numbers in the total_number_array one by one.
Theoretically speaking, it will have a great performance improvement right?
So, does anyone have some advice about the summing issue here?
Thanks!
Like mentioned in the comments this is a reduction problem.
The idea is to keep separate counts and then put them back together at the end.
Consider using local memory to store the values.
Declare a local buffer to be used by each work group.
Keep track of the number of occurrences in this buffer by using the local_id as the index.
Sum these values at the end of execution.
A very good introduction to the reduction problem using Opencl is shown here:
http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-simple-reductions/
The reduction kernel could look like this (taken from the link above):
__kernel
void reduce(
__global float* buffer,
__local float* scratch,
__const int length,
__global float* result) {
int global_index = get_global_id(0);
int local_index = get_local_id(0);
// Load data into local memory
if (global_index < length) {
scratch[local_index] = buffer[global_index];
} else {
// Infinity is the identity element for the min operation
scratch[local_index] = INFINITY;
}
barrier(CLK_LOCAL_MEM_FENCE);
for(int offset = get_local_size(0) / 2;
offset > 0;
offset >>= 1) {
if (local_index < offset) {
float other = scratch[local_index + offset];
float mine = scratch[local_index];
scratch[local_index] = (mine < other) ? mine : other;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
if (local_index == 0) {
result[get_group_id(0)] = scratch[0];
}
}
For further explanation see the proposed link.

Assign value is garbage or undefined

I have posted screenshot of my error code.
heights output
please any one can help me?
I think the static analyzer is not seeing how _numberOfColumns can become non-zero, and hence its insistence that garbage is being assigned. You need to check that you are actually providing some means for _numberOfColumns to become non-zero.
Generally when I am writing loops that want to find the largest or the smallest value, I initialize the size variable to the largest (if I want the smallest) or smallest (if I want the largest) amount, and I think this will solve most of your issues:
float shortestHeight = FLT_MAX;
for (unsigned i = 0; i < _numberOfColumns; i++)
{
// etc.
}
The analyzer is correct. Your code will access garbage memory if _numberOfColumns is 0, thus allocating 0 bytes for heights, making heights[0] garbage. The analyzer doesn't know what values _numberOfColumns can have, but you can tell it by using assert(_numberOfColumns>0).
Take this C program for example:
int main(int argc, const char * argv[])
{
int n = argc-1;
int *a = malloc(n*sizeof(int));
for (int i=0; i<n; i++) {
a[i] = i;
}
int foo = a[0];
free(a);
return foo;
}
the size of a is determined by the number of arguments. If you have no arguments n == 0. If you are sure that your program (or just that part of your program) will always assign something greater than 0 to a, you can use an assertion. Adding assert(n>0) will tell the analyzer exactly that.

converting byte[] into long in blackberry

iam getting from c.dot net web service
byte[] data = new byte[] {-33, -96,0, 0, 0,0,0,0};
I want to convert this into long value
I tried this
long result = (long)ByteBuffer.wrap(index).getInt();
I am getting the result as -543162368 wheras actual value is 41183
First off you want to call getLong() instead of getInt() on the buffer.
However, the data you're receiving is little-endian, which means that it starts with the low order byte first. ByteBuffers are constructed as default with big endian order. You need to set the order to LITTLE_ENDIAN to get the correct value out.
ByteBuffer buffer = ByteBuffer.wrap(index)
buffer.order(ByteOrder.LITTLE_ENDIAN);
long result = buffer.getLong();
Since you apparently can't set the byte order or use getLong, you will need to do it like this:
// Reverse array
for (int i = 0; i < 4; ++i)
{
byte temp = data[i];
data[i] = data[8-i];
data[8-i] = temp;
}
// Get two ints and shift the first int into the high order bytes
// of the result.
ByteBuffer buffer = ByteBuffer.wrap(data);
long result = ((long)buffer.getInt()) << 32;
result |= (long)buffer.getInt();
result should now contain the value.

Resources