I am trying to perform 2D convolutions with OpenCV using the HAL functions.
I understand that I can perform this by instantiating a Filter2D object by means of the function
cv::hal::Filter2D::create(uchar *kernel_data, size_t kernel_step, int kernel_type, int kernel_width, int kernel_height, int max_width, int max_height, int stype, int dtype, int borderType, double delta, int anchor_x, int anchor_y, bool isSubmatrix, bool isInplace);
then use the function
cv::hal::Filter2D::apply(...);
The create function takes 15 arguments. So far, I haven't found any documentation about them, other than the argument names and types. This is far from being sufficient.
Where can I get better information ?
The only doc about hall::filter2d i was able to find is here. It's not the exact Filter2D but i think the param brief explanation might help you a bit.
/**
#brief hal_filterInit
#param context double pointer to user-defined context
#param kernel_data pointer to kernel data
#param kernel_step kernel step
#param kernel_type kernel type (CV_8U, ...)
#param kernel_width kernel width
#param kernel_height kernel height
#param max_width max possible image width, can be used to allocate working buffers
#param max_height max possible image height
#param src_type source image type
#param dst_type destination image type
#param borderType border processing mode (CV_HAL_BORDER_REFLECT, ...)
#param delta added to pixel values
#param anchor_x relative X position of center point within the kernel
#param anchor_y relative Y position of center point within the kernel
#param allowSubmatrix indicates whether the submatrices will be allowed as source image
#param allowInplace indicates whether the inplace operation will be possible
#sa cv::filter2D, cv::hal::Filter2D
*/
inline int hal_ni_filterInit(cvhalFilter2D **context, uchar *kernel_data, size_t kernel_step, int kernel_type, int kernel_width, int kernel_height, int max_width, int max_height, int src_type, int dst_type, int borderType, double delta, int anchor_x, int anchor_y, bool allowSubmatrix, bool allowInplace) { return CV_HAL_ERROR_NOT_IMPLEMENTED; }
Related
I am new to metal. I want to use metal compute to do some math, so I create a kernel function (shader?), let's say
kernel void foo(device float *data1,
device float *data2,
device float *result,
int flag,
uint index [[thread_position_in_grid]])
{
if(flag==SOMETHING)
{
}...
}
Any idea to encode a scalar value to the flag parameter in MTLComputeCommandEncoder?
You are already doing it. There isn't much difference between a void* buffer with "arbitrary" data and an int.
Juse make the binding a device or constant (since it's a flag I would assume constant is more suitable) address space reference and decorate if with [[ buffer(n) ]] attribute for better readability (and other buffer bindings also), so your new function signature is gonna look like
kernel void foo(device float *data1 [[buffer(0)]],
device float *data2 [[buffer(1)]],
device float *result [[buffer(2)]],
device int& flag [[buffer(3)]],
uint index [[thread_position_in_grid]])
As for the encoder, you can use setBuffer or setBytes on your MTLComputeCommandEncoder but basically, the easiest way to do this would be
id<MTLComputeCommandEncoder> encoder = ...
// ...
int flag = SomeFlag | SomeOtherFlag
[encoder setBytes:&flag length:sizeof(flag) atIndex:3];
I'm trying to write a function within a compute shader (HLSL) that accept an argument being an array on different size. The compiler always reject it.
Example (not working!):
void TestFunc(in uint SA[])
{
int K;
for (K = 0; SA[K] != 0; K++) {
// Some code using SA array
}
}
[numthreads(1, 1, 1)]
void CSMain(
uint S1[] = {1, 2, 3, 4 }; // Compiler happy and discover the array size
uint S2[] = {10, 20}; // Compiler happy and discover the array size
TestFunc(S1);
TestFunc(S2);
}
If I give an array size in TestFunc(), then the compiler is happy when calling TestFunc() passing that specific array size but refuse the call for another size.
You cannot have function parameters of indeterminate size.
You need to initialize an array of know length, and an int variable that holds the array length.
void TestFunc(in uint SA[4], in uint saCount)
{ int K;
for (K = 0; SA[K] != 0; K++)
{
// Some code using SA array, saCount is your array length;
}
}
[numthreads(1, 1, 1)]
void CSMain()
{
uint S1count = 4;
uint S1[] = {1, 2, 3, 4 };
uint S2count = 2;
uint S2[] = {10, 20,0,0};
TestFunc(S1, S1count);
TestFunc(S2, S2count);
}
In my example I have set your array max size as 4, but you can set it bigger if needed. You can also set multiple functions for different array lengths, of set up multiple passes if your data overflows your array max size.
Edit to answer comment
The issue is that array dimensions of function parameters must be explicit as the compiler error states. This cannot be avoided. What you can do however, is avoid passing the array at all. If you in-line your TestFunc in your CSMain, you avoid passing the array and your routine compiles and runs. I know it can make your code longer and harder to maintain, but it's the only way to do what you want with an array of unspecified length. The advantage is that this way you have access to array.Length that might make your code simpler.
I need to get memory offset from struct, the file is: https://github.com/BlastHackNet/mod_s0beit_sa/blob/master/src/samp.h I need to get
struct stObject : public stSAMPEntity < object_info >
{
uint8_t byteUnk0[2];
uint32_t ulUnk1;
int iModel;
uint8_t byteUnk2;
float fDrawDistance;
float fUnk;
float fPos[3];
// ...
};
fPos memory offset( as 0x1111 ). I don't know how to do it. Please help me.
Take a look at the offsetof operator: http://www.cplusplus.com/reference/cstddef/offsetof/
I want to count the total non-zero points number in an image using OpenCL.
Since it is an adding work, I used the atom_inc.
And the kernel code is shown here.
__kernel void points_count(__global unsigned char* image_data, __global int* total_number, __global int image_width)
{
size_t gidx = get_global_id(0);
size_t gidy = get_global_id(1);
if(0!=*(image_data+gidy*image_width+gidx))
{
atom_inc(total_number);
}
}
My question is, by using atom_inc it will be much redundant right?
Whenever we meet a non-zero point, we should wait for the atom_inc.
I have a idea like this, we can separate the whole row into hundreds groups, we find the number in different groups and add them at last.
If we can do something like this:
__kernel void points_count(__global unsigned char* image_data, __global int* total_number_array, __global int image_width)
{
size_t gidx = get_global_id(0);
size_t gidy = get_global_id(1);
if(0!=*(image_data+gidy*image_width+gidx))
{
int stepy=gidy%10;
atom_inc(total_number_array+stepy);
}
}
We will separate the whole problem into more groups.
In that case, we can add the numbers in the total_number_array one by one.
Theoretically speaking, it will have a great performance improvement right?
So, does anyone have some advice about the summing issue here?
Thanks!
Like mentioned in the comments this is a reduction problem.
The idea is to keep separate counts and then put them back together at the end.
Consider using local memory to store the values.
Declare a local buffer to be used by each work group.
Keep track of the number of occurrences in this buffer by using the local_id as the index.
Sum these values at the end of execution.
A very good introduction to the reduction problem using Opencl is shown here:
http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-simple-reductions/
The reduction kernel could look like this (taken from the link above):
__kernel
void reduce(
__global float* buffer,
__local float* scratch,
__const int length,
__global float* result) {
int global_index = get_global_id(0);
int local_index = get_local_id(0);
// Load data into local memory
if (global_index < length) {
scratch[local_index] = buffer[global_index];
} else {
// Infinity is the identity element for the min operation
scratch[local_index] = INFINITY;
}
barrier(CLK_LOCAL_MEM_FENCE);
for(int offset = get_local_size(0) / 2;
offset > 0;
offset >>= 1) {
if (local_index < offset) {
float other = scratch[local_index + offset];
float mine = scratch[local_index];
scratch[local_index] = (mine < other) ? mine : other;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
if (local_index == 0) {
result[get_group_id(0)] = scratch[0];
}
}
For further explanation see the proposed link.
How do I convert an array<System:Byte>^ to a Mat in openCV. I am being passed a array<System:Byte>^ in c++/cli, but I need to convert it to Mat to be able to read it and display it.
You can use constructor Mat::Mat(int rows, int cols, int type, void* data, size_t step=AUTO_STEP). The conversion may look like this.
void byteArray2Mat(array<System::Byte>^ byteArray, cv::Mat &output)
{
pin_ptr<System::Byte> p = &byteArray[0];
unsigned char* pby = p;
char* pch = reinterpret_cast<char*>(pby);
// assuming your input array has 2 dimensions.
int rows = byteArray->GetLength(0);
int cols = byteArray->GetLength(1);
output = cv::Mat(rows, cols, CV_8UC1, (void*)pch)
}
I don't have c++/CLI to test the program and this may not be most efficient method. At least it should give you an idea on how to get started.