Kinect Depth Image

Kinect Depth Image - opencv

In my application I am getting the depth frame similar to the depth frame retrieved from Depth Basics Sample. What I don't understand is, why are there discrete levels in the image? I don't know what do I call these sudden changes in depth values. Clearly my half of my right hand is all black and my left hand seems divided into 3 such levels. What is this and how do I remove this?
When I run the KinectExplorer Sample app I get the depth as follows. This is the depth image I want to generate from the raw depth data.
I am using Microsoft Kinect SDK's (v1.6) NuiApi along with OpenCV. I have the following code:
BYTE *pBuffer = (BYTE*)depthLockedRect.pBits; //pointer to data having 8-bit jump
USHORT *depthBuffer = (USHORT*) pBuffer; //pointer to data having 16-bit jump
int cn = 4;
this->depthFinal = cv::Mat::zeros(depthHeight,depthWidth,CV_8UC4); //8bit 4 channel
for(int i=0;i<this->depthFinal.rows;i++){
for(int j=0;j<this->depthFinal.cols;j++){
USHORT realdepth = ((*depthBuffer)&0x0fff); //Taking 12LSBs for depth
BYTE intensity = (BYTE)((255*realdepth)/0x0fff); //Scaling to 255 scale grayscale
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 0] = intensity;
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 1] = intensity;
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 2] = intensity;
depthBuffer++;
}
}

The stripes that you see, are due to the wrapping of depth values, as caused by the %256 operation. Instead of applying the modulo operation (%256), which is causing the bands to show up, remap the depth values along the entire range, e.g.:
BYTE intensity = depth == 0 || depth > 4095 ? 0 : 255 - (BYTE)(((float)depth / 4095.0f) * 255.0f);
in case your max depth is 2048, replace the 4095 with 2047.
More pointers:
the Kinect presumably returns a 11bit value (0-2047) but you only use 8bit (0-255).
new Kinect versions seem to return a 12bit value (0-4096)
in the Kinect explorer source code, there's a file called DepthColorizer.cs where most of the magic seems to happen. I believe that this code makes the depth values so smooth in the kinect explorer - but I might be wrong.

I faced the same problem while I was working on a project which involved visualization of depth map. However I used OpenNI SDK with OpenCV instead of Kinect SDK libraries. The problem was same and hence the solution will work for you as it did for me.
As mentioned in previous answers to your question, Kinect Depth map is 11-bit (0-2047). While in examples, 8-bit data types are used.
What I did in my code to get around this was to acquire the depth map into a 16-bit Mat, and then convert it to 8-bit uchar Mat by using scaling options in convertTo function for Mat
First I initialize a Mat for acquiring depth data
Mat depthMat16UC1(XN_VGA_Y_RES, XN_VGA_X_RES, CV_16UC1);
Here XN_VGA_Y_RES, XN_VGA_X_RES defines the resolution of the acquired depth map.
The code where I do this is as follows:
depthMat16UC1.data = ((uchar*)depthMD.Data());
depthMat16UC1.convertTo(depthMat8UC1, CV_8U, 0.05f);
imshow("Depth Image", depthMat8UC1);
depthMD is metadata containing the data retrieved from Kinect sensor.
I hope this helps you in some way.

The visualization of the depth image data has discreet levels that are coarse (0 to 255 in your code example), but the actual depth image data are numbers between 0 and 2047. Still discreet, of course, but not in such coarse units as the colors chosen to depict them.

The kinect v2 can see 8 meter depth, (but accuracy beyond 4.5 decreases).
It start around 0.4 meter.
So one needs to express a number 8000 to a color.
A way to do this is use RGB colors just a numbers.
then you could potentially store a number like 255x255x255 i a pixel.
Or if you had different color format then it would be different.
Storing 8000 in that 255x255x255 max number will result in a certain amount of R+G+B, and that gives this banding effect.
But you could ofcourse devide 8000 or substract a number, or remove beyond a certain value.

Related

OpenCV C2 types of images?

Does the second channel of a C2 image represent the alpha channel or do they just fill the gap between C1-C3,C4?

You are mistaking colorspaces with channels. For example you have a greyscale colorspace, which is represented with 1 channel. Then you have BGR with 3 channels, and BGRA with 4. Here the 4th channel is the Alpha value. OpenCV supports several types of colorspaces.
OpenCV is opened to your needs, in some cases you have a mat with 2 values per pixel, for example Dense Optical Flow results, which have a vector of movement of each pixel (x,y vector). You may even create a greyscale image with alpha value for whatever reason or algorithm you have... in this case it will be a CV_8UC2. However this is not a standard colorspace in OpenCV, and a lot of the algorithms have hard constraints on the color space so they may not work with this Mat type.
A cv::Mat can have more than 4 channels even (up to 512 the last time I checked, for more info check the constant CV_CN_MAX), but beware that this may not work with all of OpenCV functions and it will more like a container to your custom algorithms.

How can I get the depth intensity from kinect depth image since it respresents distance of pixel from sensor

Recently i read a paper , they extract depth intensity and distance of pixel from camera using depth image. But, as far I know, each pixel value in depth image represents distance in mm [range:0-65536] then how can they extract depth intensity within a range [0 to 255] from depth image. I don't understand it. kinect sensor returns uint16 depth frame which includes the each pixel distance from sensor. It does not return any intensity value, then how can the paper demonstrates that they extract depth intensity . I am really confused.
Here is the paper link
This is the graph what I want to extract(collected from the paper:

Since there is no an answer for this question , i will suggest you approach for getting your own depth image data .
One simple way can be scaling the image based on following formula:
Pixel_value=Pixel_value/4500*65535
If you want see the exact image that you get from uint8 ; I guess the following steps will work for you.
Probably while casting the image to uint8 matlab firstly clip the values above some threshold lets say 4095=2**12-1 (i'm not sure about value) and then it makes right shifts (4 shifts in our case) to make it inside the range of 0-255.
So i guess multiplying the value of uint8 with 256 and casting it as uint16 will help you get the same image
Pixel_uint16_value= Pixel_uint8_value*256 //or Pixel_uint16_value= Pixel_uint8_value<<8
//dont forget to cast the result as uint16
The other way to converting raw data to depth image in millimeters.
The depth image
should be stored in millimeters and as 16 bit unsigned
integers. The following two formulas can be used for
converting raw-data to millimeters .
distance = 1000/ (− 0.00307 ∗ rawDisparity + 3.33 )
distance = 123.6 ∗ tan ( rawDisparity/2842.5 + 1.1863 )
Save each distance value to coressponding rawdisparty pixel. Save them as 16 bit unsigned grayscale png images. Check this link for details.

Quick answer:
You can get the intensity by getting the intensity of corresponding IR pixel. let say you have a IR pixel array irdata,
then you can get the intensity of the ith pixel by
byte intensity = (byte)(irdata[i] >> 8);
In Kinect v2 only has two cameras, One is RGB camera and other one is IR camera. It uses IR camera to calculate the depth of the image by using the time-of-flight (TOF). If you need more information, please comment here or find my project on Kinect in github https://github.com/shanilfernando/VRInteraction. I'm more than happy to help you.
Edit
As you know depth is the distance between Kinect sensor to the object in a given space. The Kinect IR emitter emit bunch of IR rays and start counting time. Once the IR rays reflect back to the depth sensor(IR sensor) of the kinect, it stop the time counter. The time (t) between emission and receiving that specific ray is called the time-of-flight of that ray. Then distance (d) between kinect and the object can be calculated by
d = (t * speed-of-light)/2
This is done for all the rays it emits and build the IR image and depth image. Each and every ray represent a pixel in IR and depth images.
I read your reference paper, First of all, they are NOT using a depth image which is captured from the Kinect V2. It clearly said its resolution is 640 × 480 and effective distance range from 0.8 meters to 3.5 meters.
I want you to clearly understnad, the depth frame and depth image are two different things elements. If you check the depth frame, each pixel is a distance and in depth image each pixel is intensity(How much bright/brightness).
In this plot they are trying to plot intensity of the star point against the actual distance of the star point. They are starting with a depth (intensity) image, NOT depth frame. depth frame you can scale in to a depth image where values are 0 to 255 where near points has higher values and further points has lower values.

I guess you were trying to read depth from a Image file .png because of which the data is converted to binary form.
I would suggest you to save the depth image in .tiff format rather than png format.

Multidimensional cv::Mat initialization and display

Being a Matlab/Python guy and a novice in C++, I'm having major frustration moving to OpenCV in C++ for image processing purposes. I'm working with Kinect v2 so there is only one Windows example I found online which I'm modifying.
This example gives the depthMap as a cv::Mat and I've calculated surface normals on this depth image taken from a kinect v2. This surface normal image contains the i,j,k vector components (3 channels) per row,col element and I'm trying to visualize this surface normal image (3-D float matrix) as an RGB image. This is really easy to do in matlab since you just do an imshow(normMap) and it shows an RGB (ijk) image with the color specifying the orientation of the normal.
I'm trying to do a similar thing in C++. Since I'm using opencv, I decided to use the cv::Mat to store the ijk channels per pixel and I initialized the normal matrix (lets call it normMat) to a CV_32F matrix as follows:
int sizes[3] = { height, width, 3 };
cv::Mat normMat(3, sizes, CV_32F, cv::Scalar::all(0));
However, if I debug, the dims (normMat.rows and normMat.cols) are showing -1, so I don't know whether my initialization is bad or if I missed something, or whether it's normal.
I'm storing the surface normal components as:
normMat.at<float>(i, j, 0) = fVec3[0];
normMat.at<float>(i, j, 1) = fVec3[1];
normMat.at<float>(i, j, 2) = fVec3[2];
And they seem to be getting stored correctly as I've verified that in debug (using normMat.at<float>(i,j,k)).
Then I'm trying to display the image as:
normMat.convertTo(normColorMap, CV_8UC3, 255.0 / (4000), 255.0 / 500);
cv::imshow("Normals", normColorMap);
But the second line throws an exception:
OpenCV Error: Assertion failed (p[-1]
<= 2) in cv::MatSize::operator.....
I also tried displaying normMat directly which throws the same exception. Which means that there's obviously something wrong with displaying a 3 channel image as a 3D matrix, or converting it to a 2D-3 Channel Mat. I also tried initializing normMat as
cv::Mat normMat(height, width, CV_32FC3, cv::Scalar::all(0));
But I'm having issues adding data to the 3 channels of a "2D 3 channel matrix" (CV_32FC3), if I may say it this way.
All I want to do is display this normMat as an RGB image. Any inputs,suggestions are highly appreciated. Thanks!

Converting matches from 8-bit 4 channels to 64-bit 1 channel in OpenCV

I have a vector of Point2f which have color space CV_8UC4 and need to convert them to CV_64F, is the following code correct?
points1.convertTo(points1, CV_64F);
More details:
I am trying to use this function to calculate the essential matrix (rotation/translation) through the 5-point algorithm, instead of using the findFundamentalMath included in OpenCV, which is based on the 8-point algorithm:
https://github.com/prclibo/relative-pose-estimation/blob/master/five-point-nister/five-point.cpp#L69
As you can see it first converts the image to CV_64F. My input image is a CV_8UC4, BGRA image. When I tested the function, both BGRA and greyscale images produce valid matrices from the mathematical point of view, but if I pass a greyscale image instead of color, it takes way more to calculate. Which makes me think I'm not doing something correctly in one of the two cases.
I read around that when the change in color space is not linear (which I suppose is the case when you go from 4 channels to 1 like in this case), you should normalize the intensity value. Is that correct? Which input should I give to this function?
Another note, the function is called like this in my code:
vector<Point2f>imgpts1, imgpts2;
for (vector<DMatch>::const_iterator it = matches.begin(); it!= matches.end(); ++it)
{
imgpts1.push_back(firstViewFeatures.second[it->queryIdx].pt);
imgpts2.push_back(secondViewFeatures.second[it->trainIdx].pt);
}
Mat mask;
Mat E = findEssentialMat(imgpts1, imgpts2, [camera focal], [camera principal_point], CV_RANSAC, 0.999, 1, mask);
The fact I'm not passing a Mat, but a vector of Point2f instead, seems to create no problems, as it compiles and executes properly.
Is it the case I should store the matches in a Mat?

I am no sure do you mean by vector of Point2f in some color space, but if you want to convert vector of points into vector of points of another type you can use any standard C++/STL function like copy(), assign() or insert(). For example:
copy(floatPoints.begin(), floatPoints.end(), doublePoints.begin());
or
doublePoints.insert(doublePoints.end(), floatPoints.begin(), floatPoints.end());

No, it is not. A std::vector<cv::Pointf2f> cannot make use of the OpenCV convertTo function.
I think you really mean that you have a cv::Mat points1 of type CV_8UC4. Note that those are RxCx4 values (being R and C the number of rows and columns), and that in a CV_64F matrix you will have RxC values only. So, you need to be more clear on how you want to transform those values.
You can do points1.convertTo(points1, CV_64FC4) to get a RxCx4 matrix.
Update:
Some remarks after you updated the question:
Note that a vector<cv::Point2f> is a vector of 2D points that is not associated to any particular color space, they are just coordinates in the image axes. So, they represent the same 2D points in a grey, rgb or hsv image. Then, the execution time of findEssentialMat doesn't depend on the image color space. Getting the points may, though.
That said, I think your input for findEssentialMat is ok (the function takes care of the vectors and convert them into their internal representation). In this cases, it is very useful to draw the points in your image to debug the code.

Convert kinects depth to RGB

I'm using OpenNI and OpenCV (but without the latest code with openni support). If I just send the depth channel to the screen - it will look dark and difficult to understand something. So I want to show a depth channel for the user in a color but cannot find how to do that without losing of accuracy. Now I do it like that:
xn::DepthMetaData xDepthMap;
depthGen.GetMetaData(xDepthMap);
XnDepthPixel* depthData = const_cast<XnDepthPixel*>(xDepthMap.Data());
cv::Mat depth(frame_height, frame_width, CV_16U, reinterpret_cast<void*>(depthData));
cv::Mat depthMat8UC1;
depth.convertTo(depthMat8UC1, CV_8UC1);
cv::Mat falseColorsMap;
cv::applyColorMap(depthMat8UC1, falseColorsMap, cv::COLORMAP_AUTUMN);
depthWriter << falseColorsMap;
But in this case I get worse (loosing details) output than, for instance, kinects software for windows shows me. So I'm looking for a function in OpenNI or OpenCV with a better transformation.

ghttps://github.com/OpenNI/OpenNI2/blob/master/Samples/Common/OniSampleUtilities.h
the link is the code for histogram equalization. In short, it makes the probability of each level equal and optimizes mapping between 10,000 levels and 255 levels. That is why Kinect yellowish map looks better than naive I=255*z/z_range.
NOTE: don’t use color for visualization since a human eye is more sensitive to luminance change than to color variation. So with 255 levels of luminance you will get better contrast than with 255*255*255 levels of color. If you still decide to go along the color mapping avenue use HSV color space where you can manipulate Hue 0..360 deg, Value 1..0 and better set saturation to max. Map depth to hue and value, convert to RGB and display. Than go back to histogram equalization ;)

Try this:
const float scaleFactor = 0.05f;
depth.convertTo(depthMat8UC1, CV_8UC1, scaleFactor);
imshow("depth gray",depthMat8UC1);
Play with the value to get a result you're happy with

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart