I am working on a 3D model reconstruction application with Kinect sensor. I use Microsoft SDK to get depth data, I want to calculate the location of each point in the real-world. I have read several articles about it and I have implemented several depth-calibration methods but all of them do not work in my application. the closest calibration was http://openkinect.org/wiki/Imaging_Information
but my result in Meshlab was not acceptable.
I calculate depth value by this method:
private double GetDistance(byte firstByte, byte secondByte)
{
double distance = (double)(firstByte >> 3 | secondByte << 5);
return distance;
}
and then I used below methods to calculate distance of real-world
public static float RawDepthToMeters(int depthValue)
{
if (depthValue < 2047)
{
return (float)(0.1 / ((double)depthValue * -0.0030711016 + 3.3309495161));
}
return 0.0f;
}
public static Point3D DepthToWorld(int x, int y, int depthValue)
{
const double fx_d = 5.9421434211923247e+02;
const double fy_d = 5.9104053696870778e+02;
const double cx_d = 3.3930780975300314e+02;
const double cy_d = 2.4273913761751615e+02;
double depth = RawDepthToMeters(depthValue);
Point3D result = new Point3D((float)((x - cx_d) * depth / fx_d),
(float)((y - cy_d) * depth / fy_d), (float)(depth));
return result;
}
these methods did not work well and generated scene was not correct. then I used below method, the result is better than previous method but it is not acceptable yet.
public static Point3D DepthToWorld(int x, int y, int depthValue)
{
const int w = 640;
const int h = 480;
int minDistance = -10;
double scaleFactor = 0.0021;
Point3D result = new Point3D((x - w / 2) * (depthValue + minDistance) * scaleFactor * (w/h),
(y - h / 2) * (depthValue + minDistance) * scaleFactor,depthValue);
return result;
}
I was wondering if you let me know how can I calculate real-world position based on my depth pixel values calculating by my method.
The getDistance() function you're using to calculate real depth is referred to kinect player detection. So check that you are opening your kinect stream accordingly or maybe you should get only the raw depth data
Runtime nui = Runtime.Kinects[0] ;
nui.Initialize(RuntimeOptions.UseDepth);
nui.DepthStream.Open(
ImageStreamType.Depth,
2,
ImageResolution.Resolution320x240,
ImageType.Depth);
and then compute depth by simply bitshifting second byte by 8:
Distance (0,0) = (int)(Bits[0] | Bits[1] << 8);
The first calibration methods should work ok even if you could do a little improvement
using a better approximation given by Stéphane Magnenat:
distance = 0.1236 * tan(rawDisparity / 2842.5 + 1.1863) in meters
If you really need more accurate calibration values you should really calibrate your kinect using for example a tool such as the matlab kinect calibration:
http://sourceforge.net/projects/kinectcalib/
And double check obtained values with the ones you are currently using provided by Nicolas Burrus.
EDIT
Reading your question again I noticed that you are using Microsoft SDK, so the values
that are returned from kinect sensor are already real distances in mm. You do not need to use the RawDepthToMeters() function, it should be used only with non official sdk.
The hardware creates a depth map, that it is a non linear function of disparity values, and it has 11 bits of precision. The kinect sdk driver convert out of the box this disparity values to mm and rounds it to an integer. MS Kinect SDK has 800mm to 4000mm depth range.
Related
Given a set of 3D points in camera's perspective corresponding to a planar surface (ground), is there any fast efficient method to find the orientation of the plane regarding the camera's plane? Or is it only possible by running heavier "surface matching" algorithms on the point cloud?
I've tried to use estimateAffine3D and findHomography, but my main limitation is that I don't have the point coordinates on the surface plane - I can only select a set of points from the depth images and thus must work from a set of 3D points in the camera frame.
I've written a simple geometric approach that takes a couple of points and computes vertical and horizontal angles based on depth measurement, but I fear this is both not very robust nor very precise.
EDIT: Following the suggestion by #Micka, I've attempted to fit the points to a 2D plane on the camera's frame, with the following function:
#include <opencv2/opencv.hpp>
//------------------------------------------------------------------------------
/// #brief Fits a set of 3D points to a 2D plane, by solving a system of linear equations of type aX + bY + cZ + d = 0
///
/// #param[in] points The points
///
/// #return 4x1 Mat with plane equations coefficients [a, b, c, d]
///
cv::Mat fitPlane(const std::vector< cv::Point3d >& points) {
// plane equation: aX + bY + cZ + d = 0
// assuming c=-1 -> aX + bY + d = z
cv::Mat xys = cv::Mat::ones(points.size(), 3, CV_64FC1);
cv::Mat zs = cv::Mat::ones(points.size(), 1, CV_64FC1);
// populate left and right hand matrices
for (int idx = 0; idx < points.size(); idx++) {
xys.at< double >(idx, 0) = points[idx].x;
xys.at< double >(idx, 1) = points[idx].y;
zs.at< double >(idx, 0) = points[idx].z;
}
// coeff mat
cv::Mat coeff(3, 1, CV_64FC1);
// problem is now xys * coeff = zs
// solving using SVD should output coeff
cv::SVD svd(xys);
svd.backSubst(zs, coeff);
// alternative approach -> requires mat with 3D coordinates & additional col
// solves xyzs * coeff = 0
// cv::SVD::solveZ(xyzs, coeff); // #note: data type must be double (CV_64FC1)
// check result w/ input coordinates (plane equation should output null or very small values)
double a = coeff.at< double >(0);
double b = coeff.at< double >(1);
double d = coeff.at< double >(2);
for (auto& point : points) {
std::cout << a * point.x + b * point.y + d - point.z << std::endl;
}
return coeff;
}
For simplicity purposes, it is assumed that the camera is properly calibrated and that 3D reconstruction is correct - something I already validated previously and therefore out of the scope of this issue. I use the mouse to select points on a depth/color frame pair, reconstruct the 3D coordinates and pass them into the function above.
I've also tried other approaches beyond cv::SVD::solveZ(), such as inverting xyz with cv::invert(), and with cv::solve(), but it always ended in either ridiculously small values or runtime errors regarding matrix size and/or type.
I created a custom CIKernel in Metal. This is useful because it is close to real-time. I am avoiding any cgcontext or cicontext that might lag in real time. My kernel essentially does a Hough transform, but I can't seem to figure out how to read the white points from the image buffer.
Here is kernel.metal:
#include <CoreImage/CoreImage.h>
extern "C" {
namespace coreimage {
float4 hough(sampler src) {
// Math
// More Math
// eventually:
if (luminance > 0.8) {
uint2 position = src.coord()
// Somehow add this to an array because I need to know the x,y pair
}
return float4(luminance, luminance, luminance, 1.0);
}
}
}
I am fine if this part can be extracted to a different kernel or function. The caveat to CIKernel, is its return type is a float4 representing the new color of a pixel. Ideally, instead of a image -> image filter, I would like an image -> array sort of deal. E.g. reduce instead of map. I have a bad hunch this will require me to render it and deal with it on the CPU.
Ultimately I want to retrieve the qualifying coordinates (which there can be multiple per image) back in my swift function.
FINAL SOLUTION EDIT:
As per suggestions of the answer, I am doing large per-pixel calculations on the GPU, and some math on the CPU. I designed 2 additional kernels that work like the builtin reduction kernels. One kernel returns a 1 pixel high image of the highest values in each column, and the other kernel returns a 1 pixel high image of the normalized y-coordinate of the highest value:
/// Returns the maximum value in each column.
///
/// - Parameter src: a sampler for the input texture
/// - Returns: maximum value in for column
float4 maxValueForColumn(sampler src) {
const float2 size = float2(src.extent().z, src.extent().w);
/// Destination pixel coordinate, normalized
const float2 pos = src.coord();
float maxV = 0;
for (float y = 0; y < size.y; y++) {
float v = src.sample(float2(pos.x, y / size.y)).x;
if (v > maxV) {
maxV = v;
}
}
return float4(maxV, maxV, maxV, 1.0);
}
/// Returns the normalized coordinate of the maximum value in each column.
///
/// - Parameter src: a sampler for the input texture
/// - Returns: normalized y-coordinate of the maximum value in for column
float4 maxCoordForColumn(sampler src) {
const float2 size = float2(src.extent().z, src.extent().w);
/// Destination pixel coordinate, normalized
const float2 pos = src.coord();
float maxV = 0;
float maxY = 0;
for (float y = 0; y < size.y; y++) {
float v = src.sample(float2(pos.x, y / size.y)).x;
if (v > maxV) {
maxY = y / size.y;
maxV = v;
}
}
return float4(maxY, maxY, maxY, 1.0);
}
This won't give every pixel where luminance is greater than 0.8, but for my purposes, it returns enough: the highest value in each column, and its location.
Pro: copying only (2 * image width) bytes over to the CPU instead of every pixel saves TONS of time (a few ms).
Con: If you have two major white points in the same column, you will never know. You might have to alter this and do calculations by row instead of column if that fits your use-case.
FOLLOW UP:
There seems to be a problem in rendering the outputs. The Float values returned in metal are not correlated to the UInt8 values I am getting in swift.
This unanswered question describes the problem.
Edit: This answered question provides a very convenient metal function. When you call it on a metal value (e.g. 0.5) and return it, you will get the correct value (e.g. 128) on the CPU.
Check out the filters in the CICategoryReduction (like CIAreaAverage). They return images that are just a few pixels tall, containing the reduction result. But you still have to render them to be able to read the values in your Swift function.
The problem for using this approach for your problem is that you don't know the number of coordinates you are returning beforehand. Core Image needs to know the extend of the output when it calls your kernel, though. You could just assume a static maximum number of coordinates, but that all sounds tedious.
I think you are better off using Accelerate APIs for iterating the pixels of your image (parallelized, super efficiently) on the CPU to find the corresponding coordinates.
You could do a hybrid approach where you do the per-pixel heavy math on the GPU with Core Image and then do the analysis on the CPU using Accelerate. You can even integrate the CPU part into your Core Image pipeline using a CIImageProcessorKernel.
I am working on a project wich involves Aruco markers and opencv.
I am quite far in the project progress. I can read the rotation vectors and convert them to a rodrigues matrix using rodrigues() from opencv.
This is a example of a rodrigues matrix I get:
[0,1,0;
1,0,0;
0,0,-1]
I use the following code.
Mat m33(3, 3, CV_64F);
Mat measured_eulers(3, 1, CV_64F);
Rodrigues(rotationVectors, m33);
measured_eulers = rot2euler(m33);
Degree_euler = measured_eulers * 180 / CV_PI;
I use the predefined rot2euler to convert from rodrigues matrix to euler angles.
And I convert the received radians to degrees.
rot2euler looks like the following.
Mat rot2euler(const Mat & rotationMatrix)
{
Mat euler(3, 1, CV_64F);
double m00 = rotationMatrix.at<double>(0, 0);
double m02 = rotationMatrix.at<double>(0, 2);
double m10 = rotationMatrix.at<double>(1, 0);
double m11 = rotationMatrix.at<double>(1, 1);
double m12 = rotationMatrix.at<double>(1, 2);
double m20 = rotationMatrix.at<double>(2, 0);
double m22 = rotationMatrix.at<double>(2, 2);
double x, y, z;
// Assuming the angles are in radians.
if (m10 > 0.998) { // singularity at north pole
x = 0;
y = CV_PI / 2;
z = atan2(m02, m22);
}
else if (m10 < -0.998) { // singularity at south pole
x = 0;
y = -CV_PI / 2;
z = atan2(m02, m22);
}
else
{
x = atan2(-m12, m11);
y = asin(m10);
z = atan2(-m20, m00);
}
euler.at<double>(0) = x;
euler.at<double>(1) = y;
euler.at<double>(2) = z;
return euler;
}
If I use the rodrigues matrix I give as an example I get the following euler angles.
[0; 90; -180]
But I am suppose to get the following.
[-180; 0; 90]
When is use this tool http://danceswithcode.net/engineeringnotes/rotations_in_3d/demo3D/rotations_in_3d_tool.html
You can see that [0; 90; -180] doesn't match the rodrigues matrix but [-180; 0; 90] does. (I am aware of the fact that the tool works with ZYX coordinates)
So the problem is I get the correct values but in a wrong order.
Another problem is that this isn't always the case.
For example rodrigues matrix:
[1,0,0;
0,-1,0;
0,0,-1]
Provides me the correct euler angles.
If someone knows a solution to the problem or can provide me with a explanation how the rot2euler function works exactly. It will be higly appreciated.
Kind Regards
Brent Convens
I guess I am quite late but I'll answer it nonetheless.
Dont quote me on this, ie I'm not 100 % certain but this is one
of the files ( {OPENCV_INSTALLATION_DIR}/apps/interactive-calibration/rotationConverters.cpp ) from the source code of openCV 3.3
It seems to me that openCV is giving you Y-Z-X ( similar to what is being shown in the code above )
Why I said I wasn't sure because I just looked at the source code of cv::Rodrigues and it doesnt seem to call this piece of code that I have shown above. The Rodrigues function has the math harcoded into it ( and I think it can be checked by Taking the 2 rotation matrices and multiplying them as - R = Ry * Rz * Rx and then looking at the place in the code where there is a acos(R(2,0)) or asin(R(0,2) or something similar,since one of the elements of "R" will usually be a cos() or sine which will give you a solution as to which angle is being found.
Not specific to OpenCV, but you could write something like this:
cosine_for_pitch = math.sqrt(pose_mat[0][0] ** 2 + pose_mat[1][0] ** 2)
is_singular = cosine_for_pitch < 10**-6
if not is_singular:
yaw = math.atan2(pose_mat[1][0], pose_mat[0][0])
pitch = math.atan2(-pose_mat[2][0], cosine_for_pitch)
roll = math.atan2(pose_mat[2][1], pose_mat[2][2])
else:
yaw = math.atan2(-pose_mat[1][2], pose_mat[1][1])
pitch = math.atan2(-pose_mat[2][0], cosine_for_pitch)
roll = 0
Here, you could explore more:
https://www.learnopencv.com/rotation-matrix-to-euler-angles/
http://www.staff.city.ac.uk/~sbbh653/publications/euler.pdf
I propose to use the PCL library to do that with this formulation
pcl::getEulerAngles(transformatoin,roll,pitch,yaw);
you need just to initialize the roll, pitch, yaw and a pre-calculated transformation matrix you can do it
I've been trying to generate a point cloud from a pair of rectified stereo images. I first obtained the disparity map using opencv's sgbm implementation. I then converted it to a point cloud using the following code,
[for (int u=0; u < left.rows; ++u)
{
for (int v=0; v < left.cols; ++v)
{
if(disp.at<int>(u,v)==0)continue;
pcl::PointXYZRGB p;
p.x = v;
p.y = u;
p.z = (left_focalLength * baseLine * 0.01/ disp.at<int>(u,v));
std::cout << p.z << std::endl;
cv::Vec3b bgr(left.at<cv::Vec3b>(u,v));
p.b = bgr\[0\];
p.g = bgr\[1\];
p.r = bgr\[2\];
pc.push_back(p);
}
}][1]
left is the left image, disp is the output disparity image in cv_16s.
Is my disparity to pcl conversion correct or is it a problem with the disparity values?
I've included a screenshot of the disparity map, point cloud and original left image.
Thank you!
screenshot
I'm not confident with this language, but I noticed a thing:
Assuming that this line convert disparty to depth (Z)
p.z = (left_focalLength * baseLine * 0.01/ disp.at<int>(u,v));
What is 0.01? If this calculation gives you a range of depths (Z) from 1 to 10, this factor reduces your range from 0.01 to 0.1. Depth is always close to zero and you have a flat image (flat image = constant depth).
PS I do not see in your code X,Y conversion from u,v pixel values with Z value. Something like
X = u*Z/f
I am trying to port an existing FFT based low-pass filter to iOS using the Accelerate vDSP framework.
It seems like the FFT works as expected for about the first 1/4 of the sample. But then after that the results seem wrong, and even more odd are mirrored (with the last half of the signal mirroring most of the first half).
You can see the results from a test application below. First is plotted the original sampled data, then an example of the expected filtered results (filtering out signal higher than 15Hz), then finally the results of my current FFT code (note that the desired results and example FFT result are at a different scale than the original data):
The actual code for my low-pass filter is as follows:
double *lowpassFilterVector(double *accell, uint32_t sampleCount, double lowPassFreq, double sampleRate )
{
double stride = 1;
int ln = log2f(sampleCount);
int n = 1 << ln;
// So that we get an FFT of the whole data set, we pad out the array to the next highest power of 2.
int fullPadN = n * 2;
double *padAccell = malloc(sizeof(double) * fullPadN);
memset(padAccell, 0, sizeof(double) * fullPadN);
memcpy(padAccell, accell, sizeof(double) * sampleCount);
ln = log2f(fullPadN);
n = 1 << ln;
int nOver2 = n/2;
DSPDoubleSplitComplex A;
A.realp = (double *)malloc(sizeof(double) * nOver2);
A.imagp = (double *)malloc(sizeof(double) * nOver2);
// This can be reused, just including it here for simplicity.
FFTSetupD setupReal = vDSP_create_fftsetupD(ln, FFT_RADIX2);
vDSP_ctozD((DSPDoubleComplex*)padAccell,2,&A,1,nOver2);
// Use the FFT to get frequency counts
vDSP_fft_zripD(setupReal, &A, stride, ln, FFT_FORWARD);
const double factor = 0.5f;
vDSP_vsmulD(A.realp, 1, &factor, A.realp, 1, nOver2);
vDSP_vsmulD(A.imagp, 1, &factor, A.imagp, 1, nOver2);
A.realp[nOver2] = A.imagp[0];
A.imagp[0] = 0.0f;
A.imagp[nOver2] = 0.0f;
// Set frequencies above target to 0.
// This tells us which bin the frequencies over the minimum desired correspond to
NSInteger binLocation = (lowPassFreq * n) / sampleRate;
// We add 2 because bin 0 holds special FFT meta data, so bins really start at "1" - and we want to filter out anything OVER the target frequency
for ( NSInteger i = binLocation+2; i < nOver2; i++ )
{
A.realp[i] = 0;
}
// Clear out all imaginary parts
bzero(A.imagp, (nOver2) * sizeof(double));
//A.imagp[0] = A.realp[nOver2];
// Now shift back all of the values
vDSP_fft_zripD(setupReal, &A, stride, ln, FFT_INVERSE);
double *filteredAccell = (double *)malloc(sizeof(double) * fullPadN);
// Converts complex vector back into 2D array
vDSP_ztocD(&A, stride, (DSPDoubleComplex*)filteredAccell, 2, nOver2);
// Have to scale results to account for Apple's FFT library algorithm, see:
// http://developer.apple.com/library/ios/#documentation/Performance/Conceptual/vDSP_Programming_Guide/UsingFourierTransforms/UsingFourierTransforms.html#//apple_ref/doc/uid/TP40005147-CH202-15952
double scale = (float)1.0f / fullPadN;//(2.0f * (float)n);
vDSP_vsmulD(filteredAccell, 1, &scale, filteredAccell, 1, fullPadN);
// Tracks results of conversion
printf("\nInput & output:\n");
for (int k = 0; k < sampleCount; k++)
{
printf("%3d\t%6.2f\t%6.2f\t%6.2f\n", k, accell[k], padAccell[k], filteredAccell[k]);
}
// Acceleration data will be replaced in-place.
return filteredAccell;
}
In the original code the library was handling non power-of-two sizes of input data; in my Accelerate code I am padding out the input to the nearest power of two. In the case of the sample test below the original sample data is 1000 samples so it's padded to 1024. I don't think that would affect results but I include that for the sake of possible differences.
If you want to experiment with a solution, you can download the sample project that generates the graphs here (in the FFTTest folder):
FFT Example Project code
Thanks for any insight, I've not worked with FFT's before so I feel like I am missing something critical.
If you want a strictly real (not complex) result, then the data before the IFFT must be conjugate symmetric. If you don't want the result to be mirror symmetric, then don't zero the imaginary component before the IFFT. Merely zeroing bins before the IFFT creates a filter with a huge amount of ripple in the passband.
The Accelerate framework also supports more FFT lengths than just powers of 2.