I'm writing a library on top of the OpenCV and I have a question about crossplatformability.
My question is: does OpenCV runs if int size is other than 32 bits, but 16, 64 or 128? Because if yes, I'd like to support those platforms, otherwise it would simplify my high level interfaces. I didn't find any information and I'm not this familiar with C++ to solve this conundrum myself by looking on sources.
The int type is assumed to be 4-bytes wide: https://github.com/opencv/opencv/blob/master/modules/core/include/opencv2/core/hal/interface.h
bit shifts and masks; it will definitely broke for 16-bit.
generic 128-bit SIMD C++ implementation assumes 128-bit register = 4 x int values
some algorithms use SoftFloat library with assumption int = int32_t (example)
the type identifier for int matrices is named CV_32S
...?
So: answer is "No".
Thanks to #mshabunin
Related
Newbie in IOS programming here.
I was looking at the Foundation Types Data Reference and have started to use the NSInteger typedef on the assumption that it will make my app more portable. However I often have a use for 16-bit and 8-bit integers and I don't see an NSShort or NSByte.
It seems wasteful to allocate a 32/64 bit variable for something that has a small range, say 0 to 12.
Are there any symbols that are defined for that?
Use uint8_t and uint16_t if you want types that are a specific size. There are also similar types for 32 and 64 bits values.
I have a constant at the top of my code...
__constant uint uintmaxx = (uint)( (((ulong)1)<<32) - 1 );
It compiles fine on AMD and NVIDIA OpenCL compilers... then executes.
(correct) on ATI cards, returns... 4294967295 or (all 32 bits = 1)
(wrong) on NVIDIA cards, returns... 2147483648 or (only 32'nd bit = 1)
I also tried -1 + 1<<32 and it worked on ATI but not NVIDIA.
What gives? Am I just missing something?
While I'm on the topic of OpenCL compiler differences, does anyone know a good resource that lists the compiler differences between AMD and NVIDIA?
OpenCL conveniently provides that for you already. You can use the predefined UINT_MAX in your kernel code and the implementation will guarantee that it holds the correct value.
However there is also nothing wrong in the method you use. The spec guarantees that uint is 32bits and ulong 64bits, ints are twos complement and everything that is not explicitly mentioned works exactly as is written in C99 spec.
Even just this should work and give you the correct result:
uint uintmaxx = -1;
It seems that NVidia just has a broken compiler, if not I really hope I'll be corrected on the issue. The really odd part there is that how on earth the 32nd bit is 1? Shift to left by 32 moves the original bit to the 33rd place. So what on earth places a bit in the 32nd spot? The only thing I got in my mind is that they don't respect operator ordering at all and transform the formula into (ulong)1 << (32-1) or something like that.
You probably should file a bug report. But to be frank considering that they hate OpenCL as much as Microsoft hates OpenGL, if not even more, I wouldn't really anticipate fast response times.
I fully agree with #sharpneli answer. But just try this:
__constant uint uintmaxx = -1;
And like sharpneli said, use the UINT_MAX macro, it is the safer way.
I'm reading and writing lots of FITS and DNG images which may contain data of an endianness different from my platform and/or opencl device.
Currently I swap the byte order in the host's memory if necessary which is very slow and requires an extra step.
Is there a fast way to pass a buffer of int/float/short having wrong endianess to an opencl-kernel?
Using an extra kernel run just for fixing the endianess would be ok; using some overheadless auto-fixing-read/-write operation would be perfect.
I know about the variable attribute ((endian(host/device))) but this doesn't help with a big endian FITS file on a little endian platform using a little endian device.
I thought about a solution like this one (neither implemented nor tested, yet):
uint4 mask = (uint4) (3, 2, 1, 0);
uchar4 swappedEndianness = shuffle(originalEndianness, mask);
// to be applied on a float/int-buffer somehow
Hoping there's a better solution out there.
Thanks in advance,
runtimeterror
Sure. Since you have a uchar4 - you can simply swizzle the components and write them back.
output[tid] = input[tid].wzyx;
swizzling is very also performant on SIMD architectures with very little cost, so you should be able to combine it with other operations in your kernel.
Hope this helps!
Most processor architectures perform best when using instructions to complete the operation which can fit its register width, for example 32/64-bit width. When CPU/GPU performs such byte-wise operators, using subscripts .wxyz for uchar4, they needs to use a mask to retrieve each byte from the integer, shift the byte, and then using integer add or or operator to the result. For the endianness swaping, the processor needs to perform above integer and, shift, add/or for 4 times because there are 4 bytes.
The most efficient way is as follows
#define EndianSwap(n) (rotate(n & 0x00FF00FF, 24U)|(rotate(n, 8U) & 0x00FF00FF)
n could be in any gentype, for example, an uint4 variable. Because OpenCL does not allow C++ type overloading, so the best choice is macro.
Apple CoreGraphics.framework, CGGeometry.h:
CG_INLINE bool __CGSizeEqualToSize(CGSize size1, CGSize size2)
{
return size1.width == size2.width && size1.height == size2.height;
}
#define CGSizeEqualToSize __CGSizeEqualToSize
Why do they (Apple) compare floats with ==? I can't believe this is a mistake. So can you explain me?
(I've expected something like fabs(size1.width - size2.width) < 0.001).
Floating point comparisons are native width on all OSX and iOS architectures.
For float, that comes to:
i386, x86_64:
32 bit XMM register (or memory for second operand)
using an instructions in the family of ucomiss
ARM:
32 bit register
using instructions in the same family as vcmp
Some of the floating point comparison issues have been removed by restricting storage to 32/64 for these types. Other platforms may use the native 80 bit FPU often (example). On OS X, SSE instructions are favored, and they use natural widths. So, that reduces many of the floating point comparison issues.
But there is still room for error, or times when you will favor approximation. One hidden detail about CGGeometry types' values is that they may be rounded to a nearby integer (you may want to do this yourself in some cases).
Given the range of CGFloat (float or double-x86_64) and typical values, it's reasonable to assume the rounded values generally be represented accurately enough, such that the results will be suitably comparable in the majority of cases. Therefore, it's "pretty safe", "pretty accurate", and "pretty fast" within those confines.
There are still times when you may prefer approximated comparisons in geometry calculations, but apple's implementation is what I'd consider the closest to a reference implementation for the general purpose solution in this context.
I've read that the signed char and unsigned char types are not guaranteed to be 8 bits on every platform, but sometimes they have more than 8 bits.
If so, using OpenCv how can we be sure that CV_8U is always 8bit?
I've written a short function which takes a 8 bit Mat and happens to convert, if needed, CV_8SC1 Mat elements into uchars and CV_8UC1 into schar.
Now I'm afraid it is not platform independent an I should fix the code in some way (but don't know how).
P.S.: Similarly, how can CV_32S always be int, also on machine with no 32bit ints?
Can you give a reference of this (I've never heard of that)? Probably you mean the padding that may be added at the end of a row in a cv::Mat. That is of no problem, since the padding is usually not used, and especially no problem if you use the interface functions, e.g. the iterators (c.f.). If you would post some code, we could see, if your implementation actually had such problems.
// template methods for iteration over matrix elements.
// the iterators take care of skipping gaps in the end of rows (if any)
template<typename _Tp> MatIterator_<_Tp> begin();
template<typename _Tp> MatIterator_<_Tp> end();
the CV_32S will be always 32-bit integer because they use types like those defined in inttypes.h (e.g. int32_t, uint32_t) and not the platform specific int, long, whatever.