Multidimensional cv::Mat initialization and display - opencv

Being a Matlab/Python guy and a novice in C++, I'm having major frustration moving to OpenCV in C++ for image processing purposes. I'm working with Kinect v2 so there is only one Windows example I found online which I'm modifying.
This example gives the depthMap as a cv::Mat and I've calculated surface normals on this depth image taken from a kinect v2. This surface normal image contains the i,j,k vector components (3 channels) per row,col element and I'm trying to visualize this surface normal image (3-D float matrix) as an RGB image. This is really easy to do in matlab since you just do an imshow(normMap) and it shows an RGB (ijk) image with the color specifying the orientation of the normal.
I'm trying to do a similar thing in C++. Since I'm using opencv, I decided to use the cv::Mat to store the ijk channels per pixel and I initialized the normal matrix (lets call it normMat) to a CV_32F matrix as follows:
int sizes[3] = { height, width, 3 };
cv::Mat normMat(3, sizes, CV_32F, cv::Scalar::all(0));
However, if I debug, the dims (normMat.rows and normMat.cols) are showing -1, so I don't know whether my initialization is bad or if I missed something, or whether it's normal.
I'm storing the surface normal components as:
normMat.at<float>(i, j, 0) = fVec3[0];
normMat.at<float>(i, j, 1) = fVec3[1];
normMat.at<float>(i, j, 2) = fVec3[2];
And they seem to be getting stored correctly as I've verified that in debug (using normMat.at<float>(i,j,k)).
Then I'm trying to display the image as:
normMat.convertTo(normColorMap, CV_8UC3, 255.0 / (4000), 255.0 / 500);
cv::imshow("Normals", normColorMap);
But the second line throws an exception:
OpenCV Error: Assertion failed (p[-1]
<= 2) in cv::MatSize::operator.....
I also tried displaying normMat directly which throws the same exception. Which means that there's obviously something wrong with displaying a 3 channel image as a 3D matrix, or converting it to a 2D-3 Channel Mat. I also tried initializing normMat as
cv::Mat normMat(height, width, CV_32FC3, cv::Scalar::all(0));
But I'm having issues adding data to the 3 channels of a "2D 3 channel matrix" (CV_32FC3), if I may say it this way.
All I want to do is display this normMat as an RGB image. Any inputs,suggestions are highly appreciated. Thanks!

Related

How to use OpenCV stereoCalibrate output to map pixels from one camera to another

Context: I have two cameras of very different focus and size that I want to align for image processing. One is RGB, one is near-infrared. The cameras are in a static rig, so fixed relative to each other. Because the image focus/width are so different, it's hard to even get both images to recognize the chessboard at the same time. Pretty much only works when the chessboard is centered in both images with very little skew/tilt.
I need to perform computations on the aligned images, so I need as good of a mapping between the optical frames as I can get. Right now the results I'm getting are pretty far off. I'm not sure if I'm using the method itself wrong, or if I am misusing the output. Details and image below.
Computation: I am using OpenCV stereoCalibrate to estimate the rotation and translation matrices with the following code, and throwing out bad results based on final error.
int flag = cv::CALIB_FIX_INTRINSIC;
double err = cv::stereoCalibrate(temp_points_object_vec, temp_points_alignvec, temp_points_basevec, camera_mat_align, camera_distort_align, camera_mat_base, camera_distort_base, mat_align.size(), rotate_mat, translate_mat, essential_mat, F, flag, cv::TermCriteria(cv::TermCriteria::MAX_ITER + cv::TermCriteria::EPS, 30, 1e-6));
if (last_error_ == -1.0 || (err < last_error_ + improve_threshold_)) {
// -1.0 indicate first calibration, accept points. Other cond indicates acceptable error.
points_alignvec_.push_back(addalign);
points_basevec_.push_back(addbase);
points_object_vec_.push_back(object_points);
}
The result doesn't produce an OpenCV error as is, and due to the large difference between images, more than half of the matched points are rejected. Results are much better since I added the conditional on the error, but still pretty poor. Error as computed above starts around 30, but doesn't get lower than 15-17. For comparison, I believe a "good" error would be <1. So for starters, I know the output isn't great, but on top of that, I'm not sure I'm using the output right for validating visually. I've attached images showing some of the best and worst results I see. The middle image on the right of each shows the "cross-validated" chessboard keypoints. These are computed like this (note addalign is the temporary vector containing only the chessboard keypoints from the current image in the frame to be aligned):
for (int i = 0; i < addalign.size(); i++) {
cv::Point2f validate_pt;// = rotate_mat * addalign.at(i) + translate_mat;
// Project pixel from image aligned to 3D
cv::Point3f ray3d = align_camera_model_.projectPixelTo3dRay(addalign.at(i));
// Rotate and translate
rotate_mat.convertTo(rotate_mat, CV_32F);
cv::Mat temp_result = rotate_mat * cv::Mat(ray3d, false);
cv::Point3f ray_transformed;
temp_result.copyTo(cv::Mat(ray_transformed, false));
cv::Mat tmat = cv::Mat(translate_mat, false);
ray_transformed.x += tmat.at<float>(0);
ray_transformed.y += tmat.at<float>(1);
ray_transformed.z += tmat.at<float>(2);
// Reproject to base image pixel
cv::Point2f pixel = base_camera_model_.project3dToPixel(ray_transformed);
corners_validated.push_back(pixel);
}
Here are two images showing sample outputs, including both raw images, both images with "drawChessboard," and a cross-validated image showing the base image with above-computed keypoints translated from the alignment image.
Better result
Worse result
In the computation of corners_validated, I'm not sure I'm using rotate_mat andtranslate_mat correctly. I'm sure there is probably an OpenCV method that does this more efficiently, but I just did it the way that made sense to me at the time.
Also relevant: This is all inside a ROS package, using ROS noetic on Ubuntu 20.04 which only permits the use of OpenCV 4.2, so I don't have access to some of the newer opencv methods.

OpenCv inRange function behaves differently when called from Objective C++

I have used the excellent answer to the question here:
How to detect bullet holes on the target using python
I have verified that it works in both Python 2 and 3.6, but I would like to use the concept in an iOS application written in Objective C(++). This is my attempt at translating it. Ultimately, I need it to work with an image taken by the camera, so I don't want to use imread, but I've checked that this makes no difference.
UIImage *nsi = [UIImage imageNamed:#"CANDX.jpg"];
cv::Mat original;
UIImageToMat(nsi, original);
cv::Mat thresholded;
cv::inRange(original, cv::Scalar(40,40,40), cv::Scalar(160,160,160), thresholded);
cv::Mat kernel = cv::Mat::ones(10, 10, CV_64FC1);
cv::Mat opening;
cv::morphologyEx(thresholded, opening, cv::MORPH_OPEN, kernel);
vector<vector<cv::Point>> contours;
cv::findContours(opening, contours, CV_RETR_LIST, CV_CHAIN_APPROX_NONE);
The call to inRange, with the same values as the Python version, gives a completely black image. Indeed, it is impossible to pick values for lower- and upper-bounds that do not result in this outcome. I've tried converting the image to HSV and using HSV values for lower- and upper-bound. This makes a slight difference in that I can get some vaguely recognisable outcomes, but nothing like the useful result I should be getting.
If I substitute the 'thresholded' image from the answer and comment out the inRange call, the morphology and findContours calls work okay.
Am I doing something wrong in setting up the inRange call?
As you mention in the comments, the data type of original is CV_8UC4 -- i.e. it's a 4 channel image. However, in your call to cv::inRange, you provide ranges for only 3 channels.
cv::Scalar represents a 4-element vector. When you call the constructor with only 3 values, a default value of 0 is used for the 4-th element.
Hence, your call to inRange is actually equivalent to this:
cv::inRange(original, cv::Scalar(40,40,40,0), cv::Scalar(160,160,160,0), thresholded);
You're looking only for pixels that have the alpha channel set to 0 (fully transparent). Since the image came from a camera, it's highly unlikely there will be any transparent pixels -- the alpha channel is probably just all 255s.
There are 2 options to solve this:
Drop the unneeded alpha channel. One way to do this is to use cv::cvtColor, e.g.
cv::cvtColor(original, original, cv::COLOR_BGRA2BGR);
Specify desired range for all the channels, e.g.
cv::inRange(original, cv::Scalar(40,40,40,0), cv::Scalar(160,160,160,255), thresholded);

Converting matches from 8-bit 4 channels to 64-bit 1 channel in OpenCV

I have a vector of Point2f which have color space CV_8UC4 and need to convert them to CV_64F, is the following code correct?
points1.convertTo(points1, CV_64F);
More details:
I am trying to use this function to calculate the essential matrix (rotation/translation) through the 5-point algorithm, instead of using the findFundamentalMath included in OpenCV, which is based on the 8-point algorithm:
https://github.com/prclibo/relative-pose-estimation/blob/master/five-point-nister/five-point.cpp#L69
As you can see it first converts the image to CV_64F. My input image is a CV_8UC4, BGRA image. When I tested the function, both BGRA and greyscale images produce valid matrices from the mathematical point of view, but if I pass a greyscale image instead of color, it takes way more to calculate. Which makes me think I'm not doing something correctly in one of the two cases.
I read around that when the change in color space is not linear (which I suppose is the case when you go from 4 channels to 1 like in this case), you should normalize the intensity value. Is that correct? Which input should I give to this function?
Another note, the function is called like this in my code:
vector<Point2f>imgpts1, imgpts2;
for (vector<DMatch>::const_iterator it = matches.begin(); it!= matches.end(); ++it)
{
imgpts1.push_back(firstViewFeatures.second[it->queryIdx].pt);
imgpts2.push_back(secondViewFeatures.second[it->trainIdx].pt);
}
Mat mask;
Mat E = findEssentialMat(imgpts1, imgpts2, [camera focal], [camera principal_point], CV_RANSAC, 0.999, 1, mask);
The fact I'm not passing a Mat, but a vector of Point2f instead, seems to create no problems, as it compiles and executes properly.
Is it the case I should store the matches in a Mat?
I am no sure do you mean by vector of Point2f in some color space, but if you want to convert vector of points into vector of points of another type you can use any standard C++/STL function like copy(), assign() or insert(). For example:
copy(floatPoints.begin(), floatPoints.end(), doublePoints.begin());
or
doublePoints.insert(doublePoints.end(), floatPoints.begin(), floatPoints.end());
No, it is not. A std::vector<cv::Pointf2f> cannot make use of the OpenCV convertTo function.
I think you really mean that you have a cv::Mat points1 of type CV_8UC4. Note that those are RxCx4 values (being R and C the number of rows and columns), and that in a CV_64F matrix you will have RxC values only. So, you need to be more clear on how you want to transform those values.
You can do points1.convertTo(points1, CV_64FC4) to get a RxCx4 matrix.
Update:
Some remarks after you updated the question:
Note that a vector<cv::Point2f> is a vector of 2D points that is not associated to any particular color space, they are just coordinates in the image axes. So, they represent the same 2D points in a grey, rgb or hsv image. Then, the execution time of findEssentialMat doesn't depend on the image color space. Getting the points may, though.
That said, I think your input for findEssentialMat is ok (the function takes care of the vectors and convert them into their internal representation). In this cases, it is very useful to draw the points in your image to debug the code.

OpenCV MedianBlur function crashing when Mat is of type CV_32S

I am working on Demosaicing a Bayer pattern to RGB image without using OpenCV's direct conversion function. I have used Bilinear interpolation and got it to work, but I want to improve the quality by using The Freeman Method. This method requires Median Filter. OpenCV has medianBlur function which does that. But I am having trouble using this function. When the cv::Mat to which I apply medianBlur is of type CV_8UC1 then it works, but if it is of type CV_32S, then it does not work.
This does NOT work:
redGreenMedian.create(input.size(), CV_32S);
blueGreenMedian.create(input.size(), CV_32S);
blueMinusGreen.create(input.size(), CV_32S);
redMinusGreen.create(input.size(), CV_32S);
for(int i = 1; i <= 3; i += 2)
{
cv::medianBlur(redMinusGreen, redGreenMedian, i);
cv::medianBlur(blueMinusGreen, blueGreenMedian, i);
}
If I change all CV_32S to CV_8UC1 then it works. On debugging I found that it crashes in the second iteration only not in first one. However, I need it to run for both iterations. This does NOT work when written separately too:
cv::medianBlur(redMinusGreen, redGreenMedian, 3);
As an aside, I do not have to use CV_32S, but I need the ability to store negative numbers in the matrices.
NOTE: I have tried making all numbers in the matrices positive and then use medianBlur, but it still did not work.
All help is appreciated. Thanks in advance.
The OpenCV documentation seems to be very clear:
src – input 1-, 3-, or 4-channel image; when ksize is 3 or 5, the image depth should be CV_8U, CV_16U, or CV_32F, for larger aperture sizes, it can only be CV_8U.
As you need to cater for signed values, I think your best option is to use CV_32F.
Also, the documentation says
ksize – aperture linear size; it must be odd and greater than 1, for example: 3, 5, 7
Your loop applies sizes of 1 and 3 (if I read your code correctly), the first of which is invalid, which possibly explains why your first iteration doesn't crash (because it fails earlier).

Kinect Depth Image

In my application I am getting the depth frame similar to the depth frame retrieved from Depth Basics Sample. What I don't understand is, why are there discrete levels in the image? I don't know what do I call these sudden changes in depth values. Clearly my half of my right hand is all black and my left hand seems divided into 3 such levels. What is this and how do I remove this?
When I run the KinectExplorer Sample app I get the depth as follows. This is the depth image I want to generate from the raw depth data.
I am using Microsoft Kinect SDK's (v1.6) NuiApi along with OpenCV. I have the following code:
BYTE *pBuffer = (BYTE*)depthLockedRect.pBits; //pointer to data having 8-bit jump
USHORT *depthBuffer = (USHORT*) pBuffer; //pointer to data having 16-bit jump
int cn = 4;
this->depthFinal = cv::Mat::zeros(depthHeight,depthWidth,CV_8UC4); //8bit 4 channel
for(int i=0;i<this->depthFinal.rows;i++){
for(int j=0;j<this->depthFinal.cols;j++){
USHORT realdepth = ((*depthBuffer)&0x0fff); //Taking 12LSBs for depth
BYTE intensity = (BYTE)((255*realdepth)/0x0fff); //Scaling to 255 scale grayscale
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 0] = intensity;
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 1] = intensity;
this->depthFinal.data[i*this->depthFinal.cols*cn + j*cn + 2] = intensity;
depthBuffer++;
}
}
The stripes that you see, are due to the wrapping of depth values, as caused by the %256 operation. Instead of applying the modulo operation (%256), which is causing the bands to show up, remap the depth values along the entire range, e.g.:
BYTE intensity = depth == 0 || depth > 4095 ? 0 : 255 - (BYTE)(((float)depth / 4095.0f) * 255.0f);
in case your max depth is 2048, replace the 4095 with 2047.
More pointers:
the Kinect presumably returns a 11bit value (0-2047) but you only use 8bit (0-255).
new Kinect versions seem to return a 12bit value (0-4096)
in the Kinect explorer source code, there's a file called DepthColorizer.cs where most of the magic seems to happen. I believe that this code makes the depth values so smooth in the kinect explorer - but I might be wrong.
I faced the same problem while I was working on a project which involved visualization of depth map. However I used OpenNI SDK with OpenCV instead of Kinect SDK libraries. The problem was same and hence the solution will work for you as it did for me.
As mentioned in previous answers to your question, Kinect Depth map is 11-bit (0-2047). While in examples, 8-bit data types are used.
What I did in my code to get around this was to acquire the depth map into a 16-bit Mat, and then convert it to 8-bit uchar Mat by using scaling options in convertTo function for Mat
First I initialize a Mat for acquiring depth data
Mat depthMat16UC1(XN_VGA_Y_RES, XN_VGA_X_RES, CV_16UC1);
Here XN_VGA_Y_RES, XN_VGA_X_RES defines the resolution of the acquired depth map.
The code where I do this is as follows:
depthMat16UC1.data = ((uchar*)depthMD.Data());
depthMat16UC1.convertTo(depthMat8UC1, CV_8U, 0.05f);
imshow("Depth Image", depthMat8UC1);
depthMD is metadata containing the data retrieved from Kinect sensor.
I hope this helps you in some way.
The visualization of the depth image data has discreet levels that are coarse (0 to 255 in your code example), but the actual depth image data are numbers between 0 and 2047. Still discreet, of course, but not in such coarse units as the colors chosen to depict them.
The kinect v2 can see 8 meter depth, (but accuracy beyond 4.5 decreases).
It start around 0.4 meter.
So one needs to express a number 8000 to a color.
A way to do this is use RGB colors just a numbers.
then you could potentially store a number like 255x255x255 i a pixel.
Or if you had different color format then it would be different.
Storing 8000 in that 255x255x255 max number will result in a certain amount of R+G+B, and that gives this banding effect.
But you could ofcourse devide 8000 or substract a number, or remove beyond a certain value.

Resources