How to convert UIImage to and from bitmap int array (rgb565) - ios

I need to pass my UIImage to an image-processing algorithm that takes int array of the bitmap in rgb565 format.
Later, it returns image-processed int array which I need to convert back to UIImage.
See it's syntax:
int* ImageProcessingAlgorithm(int bitmapArray[], int width, int height);
I searched many places but none seem to have UIImage-to-int-array and vice versa conversion. Nearest I found was this but this deals with char array - I tried fitting it for my purpose but I keep getting various access errors and leaks in UIKit library functions. Maybe I am not managing memory properly or some mistake in int-to-unsigned char-to-int conversion.
I can deal with that part, but still I am not sure it fits my image processing algorithm format (rgb565).
I am newbie to image processing and the image-processing algorithm is a black-box for me so I just need the array of ints that I can pass to and from this algorithm.
One thing that I am sure of is that this algorithm returns the same number of array elements that it takes as input - i.e. both input and output arrays represent the same number of image pixels.
Thanks for your help in advance.

As I figured, CGBitmapContextGetData function returns a void pointer to the array, and it can be converted to any sort of array pointer. What matters is later processing of it.
Here is documentation.
Conversion to rgb565 can be done using this technique, taken from here:
R5 = ( R8 * 249 + 1014 ) >> 11;
G6 = ( G8 * 253 + 505 ) >> 10;
B5 = ( B8 * 249 + 1014 ) >> 11;

Related

Writing UInt16List via IOSink.Add, what's the result?

Trying to write audio samples to a file.
I have List of 16-bit ints
UInt16List _samples = new UInt16List(0);
I add elements to this list as samples come in.
Then I can write to an IOSink like so:
IOSink _ios = ...
List<int> _toWrite;
_toWrite.addAll(_samples);
_ios.add(_toWrite);
or
_ios.add(_samples);
just works, no issues with types despite the signature of add taking List<int> and not UInt16List.
As I read, in Dart the 'int' type is 64 bit.
Are both writes above identical? Do they produce packed 16-bit ints in this file?
A Uint16List is-a List<int>. It's a list of integers which truncates writes to 16-bits, and always reads out 16-bit integers, but it is a list of integers.
If you copy those integers to a plain growable List<int>, it will contain the same integer values.
So, doing ios.add(_sample) will do the same as ios.add(_toWrite), and most likely neither does what you want.
The IOSink's add method expects a list of bytes. So, it will take a list of integers and assume that they are bytes. That means that it will only use the low 8 bits of each integer, which will likely sound awful if you try to play that back as a 16-bit audio sample.
If you want to store all 16 bits, you need to figure out how to store each 16-bit value in two bytes. The easy choice is to just assume that the platform byte order is fine, and do ios.add(_samples.buffer.asUint8List(_samples.offsetInBytes, _samples.lengthInBytes)). This will make a view of the 16-bit data as twice as many bytes, then write those bytes.
The endianness of those bytes (is the high byte first or last) depends on the platform, so if you want to be safe, you can convert the bytes to a fixed byte order first:
if (Endian.host == Endian.little) {
ios.add(
_samples.buffer.asUint8List(_samples.offsetInBytes, _samples.lengthInBytes);
} else {
var byteData = ByteData(_samples.length * 2);
for (int i = 0; i < _samples.length; i++) {
byteData.setUint16(i * 2, _samples[i], Endian.little);
}
var littleEndianData = byteData.buffer.asUint8List(0, _samples.length * 2);
ios.add(littleEndianData);
}

no operator [] match these operands

I am adapting an old code which uses cvMat. I use the constructor from cvMat :
Mat A(B); // B is a cvMat
When I write A[i][j], I get the error no operator [] match these operands.
Why? For information: B is a single channel float matrix (from a MLData object read from a csv file).
The documentation lists the at operator as being used to access a member.
A.at<int>(i,j); //Or whatever type you are storing.
first, you should have a look at the most basic opencv tutorials
so, if you have a 3channel, bgr image (the most common case), you will have to access it like:
Vec3b & pixel = A.at<Vec3b>(y,x); // we're in row,col world, here !
pixel = Vec3b(17,18,19); // at() returns a reference, so you can *set* that, too.
the 1channel (grayscale) version would look like this:
uchar & pixel = A.at<uchar>(y,x);
since you mention float images:
float & pixel = A.at<float>(y,x);
you can't choose the type at will, you have to use, what's inside the Mat, so try to query A.type() before.

What does the function 'vDSP_vfltu16' (in vDSP) actually do?

It is a function in vDSP in iOS. The reference said this function
Converts an array of unsigned 16-bit integers to single-precision floating-point values.
But what actually is created? For example, I have a series of 16-bit integers storing phonetic samples. What do I actually get when I call this function?
Nothing is created. You pass in an array of N unsigned 16 bit short ints in the A parameter and an array of N floats in the __vDSP_C parameter and the routine converts the unsigned short int values to floats. E.g. if A[0] = 42 then __vDSP_C[0] will be set to 42.0f.
void vDSP_vfltu16 (
unsigned short *A,
vDSP_Stride __vDSP_I,
float *__vDSP_C,
vDSP_Stride __vDSP_K,
vDSP_Length __vDSP_N
);
There is reasonable documentation on developer.apple.com: https://developer.apple.com/library/mac/#documentation/Accelerate/Reference/vDSPRef/Reference/reference.html

How to declare local memory in OpenCL?

I'm running the OpenCL kernel below with a two-dimensional global work size of 1000000 x 100 and a local work size of 1 x 100.
__kernel void myKernel(
const int length,
const int height,
and a bunch of other parameters) {
//declare some local arrays to be shared by all 100 work item in this group
__local float LP [length];
__local float LT [height];
__local int bitErrors = 0;
__local bool failed = false;
//here come my actual computations which utilize the space in LP and LT
}
This however refuses to compile, since the parameters length and height are not known at compile time. But it is not clear to my at all how to do this correctly. Should I use pointers with memalloc? How to handle this in a way that the memory is only allocated once for the entire workgroup and not once per work item?
All that I need is 2 arrays of floats, 1 int and 1 boolean that are shared among the entire workgroup (so all 100 work items). But I fail to find any method that does this correctly...
It's relatively simple, you can pass the local arrays as arguments to your kernel:
kernel void myKernel(const int length, const int height, local float* LP,
local float* LT, a bunch of other parameters)
You then set the kernelargument with a value of NULL and a size equal to the size you want to allocate for the argument (in byte). Therefore it should be:
clSetKernelArg(kernel, 2, length * sizeof(cl_float), NULL);
clSetKernelArg(kernel, 3, height* sizeof(cl_float), NULL);
local memory is always shared by the workgroup (as opposed to private), so I think the bool and int should be fine, but if not you can always pass those as arguments too.
Not really related to your problem (and not necessarily relevant, since I do not know what hardware you plan to run this on), but at least gpus don't particulary like workingsizes which are not a multiple of a particular power of two (I think it was 32 for nvidia, 64 for amd), meaning that will probably create workgroups with 128 items, of which the last 28 are basically wasted. So if you are running opencl on gpu it might help performance if you directly use workgroups of size 128 (and change the global work size appropriately)
As a side note: I never understood why everyone uses the underscore variant for kernel, local and global, seems much uglier to me.
You could also declare your arrays like this:
__local float LP[LENGTH];
And pass the LENGTH as a define in your kernel compile.
int lp_size = 128; // this is an example; could be dynamically calculated
char compileArgs[64];
sprintf(compileArgs, "-DLENGTH=%d", lp_size);
clBuildProgram(program, 0, NULL, compileArgs, NULL, NULL);
You do not have to allocate all your local memory outside the kernel, especially when it is a simple variable instead of a array.
The reason that your code cannot compile is that OpenCL does not support local memory initialization. This is specified in the document(https://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/local.html). It is also not feasible in CUDA(Is there a way of setting default value for shared memory array?)
ps:The answer from Grizzly is good enough and it would be better if I can post it as a comment, but I am restricted by the reputation policy. Sorry.

arm asm/neon optimisation for image processing

I m currently working on a painting app on ios.
I use a directly draw into a NSMutableData buffer and apply blending with my brush like this:
- (void) combineColorDestination:(unsigned char*) dest source:(unsigned char*) src
{
const unsigned char sra = ((unsigned char *)src)[3];
const float oneminusalpha = 1.0f - (sra / 255.f);
int d[4];
for (int i=0;i<4;i++)
{
d[i] = oneminusalpha * ((unsigned char *)dest)[i] + ((unsigned char *)src)[i];
if (d[i]>255)
d[i] = 255;
((unsigned char *)dest)[i] = (unsigned char)d[i];
}
}
Any suggestions for optimisations ?
I previously tried to use neon , but i ve got a bug I wasnt able to fix (the bordering pixels was buggy)
I was iterating pixels 2 by 2 like this :
uint8x8_t va = vld1_u8(dest);
uint8x8_t vb = vld1_u8(src);
uint8x8_t res = vqadd_u8(va,vb);
vst1_u8(dest, res);
Suggestions? Alright. Note that these are valid whichever multimedia manipulation you are doing and is hardly restricted to your case.
First, before you even do NEON, you should change your code to have one function that changes a bunch of pixels (at least a row, a rectangle if you can) at once, instead of a function (or method - even worse) that changes one pixel and is called a bunch of times: somehow I doubt the brush is only 1x1 pixel.
Second, except for the column loop (and eventual row loop), there should be no branch (that is, flow control structures). No for (i=0;i<4;i++); just write the code for the four channels in sequence (use a macro if necessary). No if (d[i]>255); express that as an alternative: dest[i] = (temp>255?255:temp); at the very least, if not replacing it by a more efficient way to do saturation (tricks using subtractions, shifts, and masks exist).
Third, avoid any conversion between floating-point and integer; this is always valid advice, but float->int conversions are particularly devastating on ARM. Since you're manipulating integers, this means foregoing floating-point here.
And once you've done that, surprise, besides making your code faster you have in fact done the preparation work for NEON: NEON is only remotely useful if you process a bunch of pixels at once, if there is no branch, and if you don't convert between floating-point and integer all over the place. So only then will we talk about NEON, if it is even necessary at this point.

Resources