Any good idea for OpenCL atom_inc separation? - opencv

I want to count the total non-zero points number in an image using OpenCL.
Since it is an adding work, I used the atom_inc.
And the kernel code is shown here.
__kernel void points_count(__global unsigned char* image_data, __global int* total_number, __global int image_width)
{
size_t gidx = get_global_id(0);
size_t gidy = get_global_id(1);
if(0!=*(image_data+gidy*image_width+gidx))
{
atom_inc(total_number);
}
}
My question is, by using atom_inc it will be much redundant right?
Whenever we meet a non-zero point, we should wait for the atom_inc.
I have a idea like this, we can separate the whole row into hundreds groups, we find the number in different groups and add them at last.
If we can do something like this:
__kernel void points_count(__global unsigned char* image_data, __global int* total_number_array, __global int image_width)
{
size_t gidx = get_global_id(0);
size_t gidy = get_global_id(1);
if(0!=*(image_data+gidy*image_width+gidx))
{
int stepy=gidy%10;
atom_inc(total_number_array+stepy);
}
}
We will separate the whole problem into more groups.
In that case, we can add the numbers in the total_number_array one by one.
Theoretically speaking, it will have a great performance improvement right?
So, does anyone have some advice about the summing issue here?
Thanks!

Like mentioned in the comments this is a reduction problem.
The idea is to keep separate counts and then put them back together at the end.
Consider using local memory to store the values.
Declare a local buffer to be used by each work group.
Keep track of the number of occurrences in this buffer by using the local_id as the index.
Sum these values at the end of execution.

A very good introduction to the reduction problem using Opencl is shown here:
http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-simple-reductions/
The reduction kernel could look like this (taken from the link above):
__kernel
void reduce(
__global float* buffer,
__local float* scratch,
__const int length,
__global float* result) {
int global_index = get_global_id(0);
int local_index = get_local_id(0);
// Load data into local memory
if (global_index < length) {
scratch[local_index] = buffer[global_index];
} else {
// Infinity is the identity element for the min operation
scratch[local_index] = INFINITY;
}
barrier(CLK_LOCAL_MEM_FENCE);
for(int offset = get_local_size(0) / 2;
offset > 0;
offset >>= 1) {
if (local_index < offset) {
float other = scratch[local_index + offset];
float mine = scratch[local_index];
scratch[local_index] = (mine < other) ? mine : other;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
if (local_index == 0) {
result[get_group_id(0)] = scratch[0];
}
}
For further explanation see the proposed link.

Related

Xcode Optimizer Bug or stupid mistake?

This code mimics some image processing with malloced memory, it's a distilled example of a problem. It runs fine if optimized at other levels including "Fastest Smallest", but fails on GCC_OPTIMIZATION_LEVEL = 3 AKA Fastest [-03] and Fastest Aggressive. It crashes only on the device, seen on a 6,5s,5 and various IOS 9.3, 8.4.
There's something about the allocation sizes that aggravates the issue. There are some notes in the code about what helps make it fail.
Reproduce by creating an single view app project, set the optimization level to "Fastest" and paste this code into main and call it from inside the autorelease pool, or paste it in the view controller and call it from viewDidLoad or anywhere you like.
The debugger isn't very useful with optimizations turned on, but the crash comes in the while loop at "*writeIter = readIter->d;" a EXC_BAD_ACCESS code=1
So that tells me it's reading and the address that triggers the EXC_BAD_ACCESS is the same as readEnd. That should never happen as that's the condition the while is supposed to prevent... optimizer bug or stupid mistake?
#import <stdlib.h>
#import <stdio.h>
/**
Requires this to fail -> GCC_OPTIMIZATION_LEVEL = 3
This won't do it -> GCC_OPTIMIZATION_LEVEL = s
*/
typedef struct {
unsigned char a, b, c, d;
} foo;
void boom()
{
char* memory[1000];
// these sizes are important to reproducing this issue, changing them by +-1 will make it go away
int height = 960; //480,960,1920
int width = 1280; //640,1280,2560
int depth = sizeof(foo);
printf("height = %d, width = %d, total = %d\n\n", height, width, height*width*depth);
for (int i = 0; i < 1000; ++i)
{
memory[i] = malloc(20000); // allocate memory to force the allocations of readBuf and writeBuf to move, numbers
// less than 15k don't effect the alloced addresses of the bufs, so we keep getting
// the same ones and no boom.
foo* readBuf = malloc(height*width*depth);
unsigned char* writeBuf = malloc(height*width); // smaller than read
foo *readIter = readBuf;
foo *readEnd = readBuf + height*width; // only read size of smaller
unsigned char* writeIter = writeBuf;
printf("test: i = %d, readIter = %p, readEnd = %p, writeIter = %p\n", i, readIter, readEnd, writeIter);
while (readIter < readEnd)
{
*writeIter = readIter->d; // you died here during a read, and readIter == readEnd, look at the EXC_BAD_ACCESS address
// (printfed) it's readEnd, and that isn't supposed to happen with the conditional.
++writeIter;
++readIter;
}
free(readBuf);
free(writeBuf);
}
for (int i = 0; i < 1000; ++i)
{
free(memory[i]);
}
}

How to get memory offset?

I need to get memory offset from struct, the file is: https://github.com/BlastHackNet/mod_s0beit_sa/blob/master/src/samp.h I need to get
struct stObject : public stSAMPEntity < object_info >
{
uint8_t byteUnk0[2];
uint32_t ulUnk1;
int iModel;
uint8_t byteUnk2;
float fDrawDistance;
float fUnk;
float fPos[3];
// ...
};
fPos memory offset( as 0x1111 ). I don't know how to do it. Please help me.
Take a look at the offsetof operator: http://www.cplusplus.com/reference/cstddef/offsetof/

Convert array<System:Byte>^ to Mat

How do I convert an array<System:Byte>^ to a Mat in openCV. I am being passed a array<System:Byte>^ in c++/cli, but I need to convert it to Mat to be able to read it and display it.
You can use constructor Mat::Mat(int rows, int cols, int type, void* data, size_t step=AUTO_STEP). The conversion may look like this.
void byteArray2Mat(array<System::Byte>^ byteArray, cv::Mat &output)
{
pin_ptr<System::Byte> p = &byteArray[0];
unsigned char* pby = p;
char* pch = reinterpret_cast<char*>(pby);
// assuming your input array has 2 dimensions.
int rows = byteArray->GetLength(0);
int cols = byteArray->GetLength(1);
output = cv::Mat(rows, cols, CV_8UC1, (void*)pch)
}
I don't have c++/CLI to test the program and this may not be most efficient method. At least it should give you an idea on how to get started.

How to convert long long to 8 byte array in objective C

In my application i have to convert long long number into 8 byte array. Then i have to convert 8 byte array into hexadecimel string. Can you please help me in this. i'm struck up.
One way to do integer/byte array conversion is to use a union:
union {
long long l;
uint8_t b[sizeof(long long)];
} u;
u.l = mylonglong;
Then u.b[] contains the bytes, which can be accessed individually.
EDIT: Please note as pointed out by #NikolaiRuhe this use of union can lead to undefined behaviour, so it might be best to use memcpy() instead:
uint8_t b[sizeof(long long)];
memcpy(b, &mylonglong, sizeof(b));
If you want the hex string of the long long in native-endian order, then:
void hexChar(uint8_t b, char *out)
{
static const char *chars = "0123456789abcdef";
out[0] = chars[(b >> 4) & 0xf];
out[1] = chars[b & 0xf];
}
// Make sure outbuf is big enough
void hexChars(const uint8_t *buffer, size_t len, char *outbuf)
{
for (size_t i = 0; i < len; i++)
{
hexChar(buffer[i], outbuf);
outbuf += 2;
}
*outbuf = '\0';
}
and call it with:
char hex[32];
hexChars(u.b, sizeof(u.b), hex);
However if instead you want the hex value of the long long:
char hex[32];
sprintf(hex, "%llx", mylonglong);
would that do the trick ?
#include <stdio.h>
int main() {
long long int val = 0x424242;
char str_val[32];
snprintf(str_val, sizeof(str_val), "%#llx", val);
printf("Value : %s\n", str_val);
}

Assign value is garbage or undefined

I have posted screenshot of my error code.
heights output
please any one can help me?
I think the static analyzer is not seeing how _numberOfColumns can become non-zero, and hence its insistence that garbage is being assigned. You need to check that you are actually providing some means for _numberOfColumns to become non-zero.
Generally when I am writing loops that want to find the largest or the smallest value, I initialize the size variable to the largest (if I want the smallest) or smallest (if I want the largest) amount, and I think this will solve most of your issues:
float shortestHeight = FLT_MAX;
for (unsigned i = 0; i < _numberOfColumns; i++)
{
// etc.
}
The analyzer is correct. Your code will access garbage memory if _numberOfColumns is 0, thus allocating 0 bytes for heights, making heights[0] garbage. The analyzer doesn't know what values _numberOfColumns can have, but you can tell it by using assert(_numberOfColumns>0).
Take this C program for example:
int main(int argc, const char * argv[])
{
int n = argc-1;
int *a = malloc(n*sizeof(int));
for (int i=0; i<n; i++) {
a[i] = i;
}
int foo = a[0];
free(a);
return foo;
}
the size of a is determined by the number of arguments. If you have no arguments n == 0. If you are sure that your program (or just that part of your program) will always assign something greater than 0 to a, you can use an assertion. Adding assert(n>0) will tell the analyzer exactly that.

Resources