Mat_<uchar> for Image. Why? - opencv

I'm reading a code, in this code I can not understand why we use Mat_<uchar> for image (in opencv) for use:
thereshold
what is the advantage of using this matrix?

OpenCV threshold function accepts as source image a 1 channel (i.e. grayscale) matrix, either 8 bit or 32 bit floating point.
So, in your case, you're passing a single channel 8 bit matrix. Its OpenCV type is CV_8UC1.
A Mat_<uchar> is also typedef-ined as Mat1b, and the values of the pixels are in the range [0, 255], since the underlying type (uchar aka unsigned char) is 8 bit, with possible values from 0 to 2^8 - 1.

Related

How does the "compressed form" of `cv::convertMaps` work?

The documentation for convertMaps says that it supports the following transformation:
(CV_32FC1, CV_32FC1)→(CV_16SC2, CV_16UC1) This is the most frequently used conversion operation, in which the original floating-point maps (see remap) are converted to a more compact and much faster fixed-point representation. The first output array contains the rounded coordinates and the second array (created only when nninterpolation=false) contains indices in the interpolation tables.
I understand that (CV_32FC1, CV_32FC1) is encoding (x, y) coordinates as floats. How does the fixed point format work? What is encoded in each 2-channel entry of the CV_16SC2 matrix? What interpolation tables does the CV_16UC1 matrix index into?
I'm going by what I remember from the last time I investigated this. Grain of salt and all that.
the fixed point format splits the integer and fractional parts of your (x,y)-coordinates into different maps.
it's "compact" in that CV_32FC2 or 2x CV_32FC1 uses 8 bytes per pixel, while CV_16SC2 + CV_16UC1 uses 6 bytes per pixel. also it's integer-only, so using it can free up floating point compute resources for other work.
the integer parts go into the first map, which is 2-channel. no surprises there.
the fractional parts are converted to 5-bit integers, i.e. they're multiplied by 32. then they're packed together, lowest 5 bits from one coordinate, higher next 5 bits from the other one.
the resulting funny number has a range of 0 .. 1023, or 0b00000_00000 .. 0b11111_11111, which encodes fractional parts (0.0, 0.0) and (0.96875, 0.96875) respectively (that's 31/32).
during remap...
the integer map is used to look up, for every resulting pixel, several pixels in the source image required for interpolation.
the fractional map is taken as an index into an "interpolation table", which is internal to OpenCV. it contains whatever factors and shifts required to correctly blend the several sampled pixels into one resulting pixel, all using integer math. I guess there are multiple tables, one for each interpolation method (linear, cubic, ...).

How are the bits allocated?

In OpenCV, the type of the elements in a cv::Mat object could be for instance CV_32FC1, CV_32FC3 which represent 32-bit floating point with one channel and 32-bit floating point with three channels, respectively.
The CV_32FC3 type can be used to represent color images which have blue, green and red channels plus an alpha channel used to represent transparency, with each channel getting 8 bits.
I'm wondering how the bits being allocated in CV_32FC1 type, when there's only one channel?
32F means float. The number after the C is the number of channels. So CV_32FC3 means 3 floats per pixel, while CV_32FC1 is one float only. What these floats mean is up to you and not explicitly stored in the Mat.
The float is stored in memory as it would in your regular C program (typically in little endian).
A classical BGR image (default channel ordering in OpenCV) would be a CV_8UC3: 8 bit unsigned integer per channel in three channels.

Converting matches from 8-bit 4 channels to 64-bit 1 channel in OpenCV

I have a vector of Point2f which have color space CV_8UC4 and need to convert them to CV_64F, is the following code correct?
points1.convertTo(points1, CV_64F);
More details:
I am trying to use this function to calculate the essential matrix (rotation/translation) through the 5-point algorithm, instead of using the findFundamentalMath included in OpenCV, which is based on the 8-point algorithm:
https://github.com/prclibo/relative-pose-estimation/blob/master/five-point-nister/five-point.cpp#L69
As you can see it first converts the image to CV_64F. My input image is a CV_8UC4, BGRA image. When I tested the function, both BGRA and greyscale images produce valid matrices from the mathematical point of view, but if I pass a greyscale image instead of color, it takes way more to calculate. Which makes me think I'm not doing something correctly in one of the two cases.
I read around that when the change in color space is not linear (which I suppose is the case when you go from 4 channels to 1 like in this case), you should normalize the intensity value. Is that correct? Which input should I give to this function?
Another note, the function is called like this in my code:
vector<Point2f>imgpts1, imgpts2;
for (vector<DMatch>::const_iterator it = matches.begin(); it!= matches.end(); ++it)
{
imgpts1.push_back(firstViewFeatures.second[it->queryIdx].pt);
imgpts2.push_back(secondViewFeatures.second[it->trainIdx].pt);
}
Mat mask;
Mat E = findEssentialMat(imgpts1, imgpts2, [camera focal], [camera principal_point], CV_RANSAC, 0.999, 1, mask);
The fact I'm not passing a Mat, but a vector of Point2f instead, seems to create no problems, as it compiles and executes properly.
Is it the case I should store the matches in a Mat?
I am no sure do you mean by vector of Point2f in some color space, but if you want to convert vector of points into vector of points of another type you can use any standard C++/STL function like copy(), assign() or insert(). For example:
copy(floatPoints.begin(), floatPoints.end(), doublePoints.begin());
or
doublePoints.insert(doublePoints.end(), floatPoints.begin(), floatPoints.end());
No, it is not. A std::vector<cv::Pointf2f> cannot make use of the OpenCV convertTo function.
I think you really mean that you have a cv::Mat points1 of type CV_8UC4. Note that those are RxCx4 values (being R and C the number of rows and columns), and that in a CV_64F matrix you will have RxC values only. So, you need to be more clear on how you want to transform those values.
You can do points1.convertTo(points1, CV_64FC4) to get a RxCx4 matrix.
Update:
Some remarks after you updated the question:
Note that a vector<cv::Point2f> is a vector of 2D points that is not associated to any particular color space, they are just coordinates in the image axes. So, they represent the same 2D points in a grey, rgb or hsv image. Then, the execution time of findEssentialMat doesn't depend on the image color space. Getting the points may, though.
That said, I think your input for findEssentialMat is ok (the function takes care of the vectors and convert them into their internal representation). In this cases, it is very useful to draw the points in your image to debug the code.

Kinect Depth Image

In my application I am getting the depth frame similar to the depth frame retrieved from Depth Basics Sample. What I don't understand is, why are there discrete levels in the image? I don't know what do I call these sudden changes in depth values. Clearly my half of my right hand is all black and my left hand seems divided into 3 such levels. What is this and how do I remove this?
When I run the KinectExplorer Sample app I get the depth as follows. This is the depth image I want to generate from the raw depth data.
I am using Microsoft Kinect SDK's (v1.6) NuiApi along with OpenCV. I have the following code:
BYTE *pBuffer = (BYTE*)depthLockedRect.pBits; //pointer to data having 8-bit jump
USHORT *depthBuffer = (USHORT*) pBuffer; //pointer to data having 16-bit jump
int cn = 4;
this->depthFinal = cv::Mat::zeros(depthHeight,depthWidth,CV_8UC4); //8bit 4 channel
for(int i=0;i<this->depthFinal.rows;i++){
for(int j=0;j<this->depthFinal.cols;j++){
USHORT realdepth = ((*depthBuffer)&0x0fff); //Taking 12LSBs for depth
BYTE intensity = (BYTE)((255*realdepth)/0x0fff); //Scaling to 255 scale grayscale
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 0] = intensity;
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 1] = intensity;
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 2] = intensity;
depthBuffer++;
}
}
The stripes that you see, are due to the wrapping of depth values, as caused by the %256 operation. Instead of applying the modulo operation (%256), which is causing the bands to show up, remap the depth values along the entire range, e.g.:
BYTE intensity = depth == 0 || depth > 4095 ? 0 : 255 - (BYTE)(((float)depth / 4095.0f) * 255.0f);
in case your max depth is 2048, replace the 4095 with 2047.
More pointers:
the Kinect presumably returns a 11bit value (0-2047) but you only use 8bit (0-255).
new Kinect versions seem to return a 12bit value (0-4096)
in the Kinect explorer source code, there's a file called DepthColorizer.cs where most of the magic seems to happen. I believe that this code makes the depth values so smooth in the kinect explorer - but I might be wrong.
I faced the same problem while I was working on a project which involved visualization of depth map. However I used OpenNI SDK with OpenCV instead of Kinect SDK libraries. The problem was same and hence the solution will work for you as it did for me.
As mentioned in previous answers to your question, Kinect Depth map is 11-bit (0-2047). While in examples, 8-bit data types are used.
What I did in my code to get around this was to acquire the depth map into a 16-bit Mat, and then convert it to 8-bit uchar Mat by using scaling options in convertTo function for Mat
First I initialize a Mat for acquiring depth data
Mat depthMat16UC1(XN_VGA_Y_RES, XN_VGA_X_RES, CV_16UC1);
Here XN_VGA_Y_RES, XN_VGA_X_RES defines the resolution of the acquired depth map.
The code where I do this is as follows:
depthMat16UC1.data = ((uchar*)depthMD.Data());
depthMat16UC1.convertTo(depthMat8UC1, CV_8U, 0.05f);
imshow("Depth Image", depthMat8UC1);
depthMD is metadata containing the data retrieved from Kinect sensor.
I hope this helps you in some way.
The visualization of the depth image data has discreet levels that are coarse (0 to 255 in your code example), but the actual depth image data are numbers between 0 and 2047. Still discreet, of course, but not in such coarse units as the colors chosen to depict them.
The kinect v2 can see 8 meter depth, (but accuracy beyond 4.5 decreases).
It start around 0.4 meter.
So one needs to express a number 8000 to a color.
A way to do this is use RGB colors just a numbers.
then you could potentially store a number like 255x255x255 i a pixel.
Or if you had different color format then it would be different.
Storing 8000 in that 255x255x255 max number will result in a certain amount of R+G+B, and that gives this banding effect.
But you could ofcourse devide 8000 or substract a number, or remove beyond a certain value.

OpenCV data types

depth Pixel depth in bits. The supported depths are:
IPL_DEPTH_8U Unsigned 8-bit integer
IPL_DEPTH_8S Signed 8-bit integer
IPL_DEPTH_16U Unsigned 16-bit integer
IPL_DEPTH_16S Signed 16-bit integer
IPL_DEPTH_32S Signed 32-bit integer
IPL_DEPTH_32F Single-precision floating point
IPL_DEPTH_64F Double-precision floating point
What these value actually stands for?
How much bits presents each one?
What is the difference between:
Unsigned 8-bit integer and Signed 8-bit integer ?
Unsigned 16-bit integer and Signed 16-bit integer ?
If they demands 8 and 16 bit respectivly?
What's the sense to use data types with floating point?
An unsigned 8 bit has values from 0 to 255, while a signed 8 bit has values from -127 to 127. Most digital cameras use unsigned data. Signed data is mainly the result of an operation on an image, such as a Canny edge detection.
The reason for higher bit depth images, such as 16 bit, is more detail in the image. This allows more operations, such as white balancing or brightening the image, without creating artifacts in the image. For example, a dark image that has been brightened to much has distinct banding in the image. A 16 bit image will allow the image to be brightened more than an 8 bit image, because there is more information to start with.
Some operations work better with floating point data. For example, a FFT(Fast Fourier Transform). If too many operations are done on an image, then the error from rounding the pixel values to an integer every time, start to accumulate. Using a floating point number mitigates this, but doesn't eliminate this.

Resources