Estimate camera orientation from ground 3D points? - opencv

Given a set of 3D points in camera's perspective corresponding to a planar surface (ground), is there any fast efficient method to find the orientation of the plane regarding the camera's plane? Or is it only possible by running heavier "surface matching" algorithms on the point cloud?
I've tried to use estimateAffine3D and findHomography, but my main limitation is that I don't have the point coordinates on the surface plane - I can only select a set of points from the depth images and thus must work from a set of 3D points in the camera frame.
I've written a simple geometric approach that takes a couple of points and computes vertical and horizontal angles based on depth measurement, but I fear this is both not very robust nor very precise.
EDIT: Following the suggestion by #Micka, I've attempted to fit the points to a 2D plane on the camera's frame, with the following function:
#include <opencv2/opencv.hpp>
//------------------------------------------------------------------------------
/// #brief Fits a set of 3D points to a 2D plane, by solving a system of linear equations of type aX + bY + cZ + d = 0
///
/// #param[in] points The points
///
/// #return 4x1 Mat with plane equations coefficients [a, b, c, d]
///
cv::Mat fitPlane(const std::vector< cv::Point3d >& points) {
// plane equation: aX + bY + cZ + d = 0
// assuming c=-1 -> aX + bY + d = z
cv::Mat xys = cv::Mat::ones(points.size(), 3, CV_64FC1);
cv::Mat zs = cv::Mat::ones(points.size(), 1, CV_64FC1);
// populate left and right hand matrices
for (int idx = 0; idx < points.size(); idx++) {
xys.at< double >(idx, 0) = points[idx].x;
xys.at< double >(idx, 1) = points[idx].y;
zs.at< double >(idx, 0) = points[idx].z;
}
// coeff mat
cv::Mat coeff(3, 1, CV_64FC1);
// problem is now xys * coeff = zs
// solving using SVD should output coeff
cv::SVD svd(xys);
svd.backSubst(zs, coeff);
// alternative approach -> requires mat with 3D coordinates & additional col
// solves xyzs * coeff = 0
// cv::SVD::solveZ(xyzs, coeff); // #note: data type must be double (CV_64FC1)
// check result w/ input coordinates (plane equation should output null or very small values)
double a = coeff.at< double >(0);
double b = coeff.at< double >(1);
double d = coeff.at< double >(2);
for (auto& point : points) {
std::cout << a * point.x + b * point.y + d - point.z << std::endl;
}
return coeff;
}
For simplicity purposes, it is assumed that the camera is properly calibrated and that 3D reconstruction is correct - something I already validated previously and therefore out of the scope of this issue. I use the mouse to select points on a depth/color frame pair, reconstruct the 3D coordinates and pass them into the function above.
I've also tried other approaches beyond cv::SVD::solveZ(), such as inverting xyz with cv::invert(), and with cv::solve(), but it always ended in either ridiculously small values or runtime errors regarding matrix size and/or type.

Related

Pose Estimation - cv::SolvePnP with Scenekit - Coordinate System Question

I have been working on Pose Estimation (rectifying key points on a 3D model with 2D points on an image to match pose) via OpenCV's cv::solvePNP, using features / key points from Apples Vision framework.
TL-DR:
My scene kit model is being translated and the units look correct when introspecting the translation and rotation vectors from solvePnP (ie, they are the right order of magnitude), but the coordinate system of the translation appears off:
I am trying to understand the coordinate system requirements with solvePnP wrt to Metal / OpenGL coordinate system and my camera projection matrix.
What 'projectionMatrix' does my SCNCamera require to match image based coordinate system passed into solvePnP?
Some things ive read / believe I am taking into account.
OpenCV vs OpenGL (thus Metal) have row major vs column major differences.
OpenCV's coordinate system for 3D is different than OpenGL (thus Metal).
Longer with code:
My workflow is as such:
Step 1 - use a 3D model tool to introspect points on my 3D model and get the objects vertex positions for the major key points in the 2D detected features. I am using left pupil, right pupil, tip of nose, tip of chin, left outer lip corner, right outer lip corner.
Step 2 - Run a vision request and extract a list of points in image space (converting for OpenCV's top left coordinate system) and extract the same ordered list of 2D points.
Step 3 - Construct a camera matrix by using the size of the input image.
Step 4 - run cv::solvePnP, and then use cv::Rodrigues to convert the rotation vector to a matrix
Step 5 - Convert the coordinate system of the resulting transforms into something appropriate for the GPU - invert the y and z axis and combine the translation and rotation to a single 4x4 Matrix, and then transpose it for the appropriate major ness of OpenGL / Metal
Step 6 - apply the resulting transform to Scenekit via:
let faceNodeTransform = openCVWrapper.transform(for: landmarks, imageSize: size)
self.destinationView.pointOfView?.transform = SCNMatrix4Invert(faceNodeTransform)
Below is my Obj-C++ OpenCV Wrapper which takes in a subset of Vision Landmarks and the true pixel size of the image being looked at:
/ https://answers.opencv.org/question/23089/opencv-opengl-proper-camera-pose-using-solvepnp/
- (SCNMatrix4) transformFor:(VNFaceLandmarks2D*)landmarks imageSize:(CGSize)imageSize
{
// 1 convert landmarks to image points in image space (pixels) to vector of cv::Point2f's :
// Note that this translates the point coordinate system to be top left oriented for OpenCV's image coordinates:
std::vector<cv::Point2f > imagePoints = [self imagePointsForLandmarks:landmarks imageSize:imageSize];
// 2 Load Model Points
std::vector<cv::Point3f > modelPoints = [self modelPoints];
// 3 create our camera extrinsic matrix
// TODO - see if this is sane?
double max_d = fmax(imageSize.width, imageSize.height);
cv::Mat cameraMatrix = (cv::Mat_<double>(3,3) << max_d, 0, imageSize.width/2.0,
0, max_d, imageSize.height/2.0,
0, 0, 1.0);
// 4 Run solvePnP
double distanceCoef[] = {0,0,0,0};
cv::Mat distanceCoefMat = cv::Mat(1 ,4 ,CV_64FC1,distanceCoef);
// Output Matrixes
std::vector<double> rv(3);
cv::Mat rotationOut = cv::Mat(rv);
std::vector<double> tv(3);
cv::Mat translationOut = cv::Mat(tv);
cv::solvePnP(modelPoints, imagePoints, cameraMatrix, distanceCoefMat, rotationOut, translationOut, false, cv::SOLVEPNP_EPNP);
// 5 Convert rotation matrix (actually a vector)
// To a real 4x4 rotation matrix:
cv::Mat viewMatrix = cv::Mat::zeros(4, 4, CV_64FC1);
cv::Mat rotation;
cv::Rodrigues(rotationOut, rotation);
// Append our transforms to our matrix and set final to identity:
for(unsigned int row=0; row<3; ++row)
{
for(unsigned int col=0; col<3; ++col)
{
viewMatrix.at<double>(row, col) = rotation.at<double>(row, col);
}
viewMatrix.at<double>(row, 3) = translationOut.at<double>(row, 0);
}
viewMatrix.at<double>(3, 3) = 1.0f;
// Transpose OpenCV to OpenGL coords
cv::Mat cvToGl = cv::Mat::zeros(4, 4, CV_64FC1);
cvToGl.at<double>(0, 0) = 1.0f;
cvToGl.at<double>(1, 1) = -1.0f; // Invert the y axis
cvToGl.at<double>(2, 2) = -1.0f; // invert the z axis
cvToGl.at<double>(3, 3) = 1.0f;
viewMatrix = cvToGl * viewMatrix;
// Finally transpose to get correct SCN / OpenGL Matrix :
cv::Mat glViewMatrix = cv::Mat::zeros(4, 4, CV_64FC1);
cv::transpose(viewMatrix , glViewMatrix);
return [self convertCVMatToMatrix4:glViewMatrix];
}
- (SCNMatrix4) convertCVMatToMatrix4:(cv::Mat)matrix
{
SCNMatrix4 scnMatrix = SCNMatrix4Identity;
scnMatrix.m11 = matrix.at<double>(0, 0);
scnMatrix.m12 = matrix.at<double>(0, 1);
scnMatrix.m13 = matrix.at<double>(0, 2);
scnMatrix.m14 = matrix.at<double>(0, 3);
scnMatrix.m21 = matrix.at<double>(1, 0);
scnMatrix.m22 = matrix.at<double>(1, 1);
scnMatrix.m23 = matrix.at<double>(1, 2);
scnMatrix.m24 = matrix.at<double>(1, 3);
scnMatrix.m31 = matrix.at<double>(2, 0);
scnMatrix.m32 = matrix.at<double>(2, 1);
scnMatrix.m33 = matrix.at<double>(2, 2);
scnMatrix.m34 = matrix.at<double>(2, 3);
scnMatrix.m41 = matrix.at<double>(3, 0);
scnMatrix.m42 = matrix.at<double>(3, 1);
scnMatrix.m43 = matrix.at<double>(3, 2);
scnMatrix.m44 = matrix.at<double>(3, 3);
return (scnMatrix);
}
Some questions:
An SCNNode has no modelViewMatrix (just as I understand it, a transform, which is the modelMatrix) to just throw a matrix at - so I've read the inverse of the transform from SolvePNP process can be used to pose the camera instead, which appears to get me the closes result. I want to ensure this approach is correct.
If I have the modelViewMatrix, and the projectionMatrix, I should be able to calculate the appropriate modelMatrix? Is this the approach I should be taking?
Its unclear to me what projectionMatrix I should be using for my SceneKit Scene and If that has any bearing on my results. Do I need a pixel for pixel exact match of my viewport to the image size, and how do I properly configure my SCNCamera to ensure coordinate system agreeance for SolvePnP?
Thank you very much!

opencv Vec3d to Eigen::Quaternion, euler flipping on results

I am using opencv::solvePnP to return a camera pose. I run PnP, and it returns the rvec and tvec values.(rotation vector and position).
I then run this function to convert the values to the camera pose:
void GetCameraPoseEigen(cv::Vec3d tvecV, cv::Vec3d rvecV, Eigen::Vector3d &Translate, Eigen::Quaterniond &quats)
{
Mat R;
Mat tvec, rvec;
tvec = DoubleMatFromVec3b(tvecV);
rvec = DoubleMatFromVec3b(rvecV);
cv::Rodrigues(rvec, R); // R is 3x3
R = R.t(); // rotation of inverse
tvec = -R*tvec; // translation of inverse
Eigen::Matrix3d mat;
cv2eigen(R, mat);
Eigen::Quaterniond EigenQuat(mat);
quats = EigenQuat;
double x_t = tvec.at<double>(0, 0);
double y_t = tvec.at<double>(1, 0);
double z_t = tvec.at<double>(2, 0);
Translate.x() = x_t * 10;
Translate.y() = y_t * 10;
Translate.z() = z_t * 10;
}
This works, yet at some rotation angles, the converted rotation values flip randomly between positive and negative values. Yet, the source rvecV value does not. I assume this means I am going wrong with my conversion. How can i get a stable Quaternion from the PnP returned cv::Vec3d?
EDIT: This seems to be Quaternion flipping, as mentioned here:
Quaternion is flipping sign for very similar rotations?
Based on that, i have tried adding:
if(quat.w() < 0)
{
quat = quat.Inverse();
}
But I see the same flipping.
Both quat and -quat represent the same rotation. You can check that by taking a unit quaternion, converting it to a rotation matrix, then doing
quat.coeffs() = -quat.coeffs();
and converting that to a rotation matrix as well.
If for some reason you always want a positive w value, negate all coefficients if w is negative.
The sign should not matter...
... rotation-wise, as long as all four fields of the 4D quaternion are getting flipped. There's more to it explained here:
Quaternion to EulerXYZ, how to differentiate the negative and positive quaternion
Think of it this way:
Angle/axis both flipped mean the same thing
and mind the clockwise to counterclockwise transition much like in a mirror image.
There may be convention to keep the quat.w() or quat[0] component positive and change other components to opposite accordingly. Assume w = cos(angle/2) then setting w > 0 just means: I want angle to be within the (-pi, pi) range. So that the -270 degrees rotation becomes +90 degrees rotation.
Doing the quat.Inverse() is probably not what you want, because this creates a rotation in the opposite direction. That is -quat != quat.Inverse().
Also: check that both systems have the same handedness (chirality)! Test if your rotation matrix determinant is +1 or -1.
(sry for the image link, I don't have enough reputation to embed them).

How to use OpenCV's Homography for a target and position camera in Unity?

I was able to get a Homography for a WebcamTexture on a texture and a target object.
Now, I want to change the transform of Unity's camera based on the Homography. I've found a way to get the Camera Position from Homography like this:
void cameraPoseFromHomography(const Mat& H, Mat& pose)
{
pose = Mat::eye(3, 4, CV_32FC1); // 3x4 matrix, the camera pose
float norm1 = (float)norm(H.col(0));
float norm2 = (float)norm(H.col(1));
float tnorm = (norm1 + norm2) / 2.0f; // Normalization value
Mat p1 = H.col(0); // Pointer to first column of H
Mat p2 = pose.col(0); // Pointer to first column of pose (empty)
cv::normalize(p1, p2); // Normalize the rotation, and copies the column to pose
p1 = H.col(1); // Pointer to second column of H
p2 = pose.col(1); // Pointer to second column of pose (empty)
cv::normalize(p1, p2); // Normalize the rotation and copies the column to pose
p1 = pose.col(0);
p2 = pose.col(1);
Mat p3 = p1.cross(p2); // Computes the cross-product of p1 and p2
Mat c2 = pose.col(2); // Pointer to third column of pose
p3.copyTo(c2); // Third column is the crossproduct of columns one and two
pose.col(3) = H.col(2) / tnorm; //vector t [R|t] is the last column of pose
}
Source: https://stackoverflow.com/a/10781165/4382683
Now, this is relative to the OpenCV image which has dimensions in order of hundreds. But in Unity, the Quad has LocalScale (1,1,1) - I don't know how to translate the pose with this.
Also, how does a Mat of 3x4 be used a position in Unity - where position is just a Vector3 like (1,5,4) and rotation is also a Vector3 like (90, 180, 0).
Any hints or a direction to follow are good enough. Thanks.

Getting flat point cloud from disparity map

I've been trying to generate a point cloud from a pair of rectified stereo images. I first obtained the disparity map using opencv's sgbm implementation. I then converted it to a point cloud using the following code,
[for (int u=0; u < left.rows; ++u)
{
for (int v=0; v < left.cols; ++v)
{
if(disp.at<int>(u,v)==0)continue;
pcl::PointXYZRGB p;
p.x = v;
p.y = u;
p.z = (left_focalLength * baseLine * 0.01/ disp.at<int>(u,v));
std::cout << p.z << std::endl;
cv::Vec3b bgr(left.at<cv::Vec3b>(u,v));
p.b = bgr\[0\];
p.g = bgr\[1\];
p.r = bgr\[2\];
pc.push_back(p);
}
}][1]
left is the left image, disp is the output disparity image in cv_16s.
Is my disparity to pcl conversion correct or is it a problem with the disparity values?
I've included a screenshot of the disparity map, point cloud and original left image.
Thank you!
screenshot
I'm not confident with this language, but I noticed a thing:
Assuming that this line convert disparty to depth (Z)
p.z = (left_focalLength * baseLine * 0.01/ disp.at<int>(u,v));
What is 0.01? If this calculation gives you a range of depths (Z) from 1 to 10, this factor reduces your range from 0.01 to 0.1. Depth is always close to zero and you have a flat image (flat image = constant depth).
PS I do not see in your code X,Y conversion from u,v pixel values with Z value. Something like
X = u*Z/f

Multiple-View Geometry

I've two images captured from two cameras of same make placed some distance apart, capturing the same scene. I want to calculate the real world rotation and translation among the two cameras. In order to achieve this, I first extracted the SIFT features of both of the images and matched them.
I now have fundamental matrix as well as homography matrix. However unable to proceed further, lots of confusion. Can anybody help me to estimate the rotation and translation in between two cameras?
I'm using OpenCV for feature extraction and matching, homography calculations.
If you have the Homography then you also have the rotation. Once you have homography it is easy to get rotation and translation matrix.
For example, if you are using OpenCV c++:
param[in] H
param[out] pose
void cameraPoseFromHomography(const Mat& H, Mat& pose)
{
pose = Mat::eye(3, 4, CV_32FC1); // 3x4 matrix, the camera pose
float norm1 = (float)norm(H.col(0));
float norm2 = (float)norm(H.col(1));
float tnorm = (norm1 + norm2) / 2.0f; // Normalization value
Mat p1 = H.col(0); // Pointer to first column of H
Mat p2 = pose.col(0); // Pointer to first column of pose (empty)
cv::normalize(p1, p2); // Normalize the rotation, and copies the column to pose
p1 = H.col(1); // Pointer to second column of H
p2 = pose.col(1); // Pointer to second column of pose (empty)
cv::normalize(p1, p2); // Normalize the rotation and copies the column to pose
p1 = pose.col(0);
p2 = pose.col(1);
Mat p3 = p1.cross(p2); // Computes the cross-product of p1 and p2
Mat c2 = pose.col(2); // Pointer to third column of pose
p3.copyTo(c2); // Third column is the crossproduct of columns one and two
pose.col(3) = H.col(2) / tnorm; //vector t [R|t] is the last column of pose
}
This function calculates de camera pose from homography, in which rotation is contained. For further theoretical info follow this thread.

Resources