I'm working on a 3D Pose estimation system. I used OpenCVs function cvPosit to calculate the rotation matrix and the translation vector.
I also need the angles of the rotation matrix, but no algorithms seem to be working.
The function cv::RQDecomp3x3(), which was the answer of topic "in opencv : how to get yaw, roll, pitch from POSIT rotation matrix" cannot work, because the function needs the 3x3 matrix of the projection matrix.
Furthermore I tried to use algorithms from the links below, but nothing worked.
visionopen.com/cv/vosm/doc/html/recognitionalgs_8cpp_source.html
stackoverflow.com/questions/16266740/in-opencv-how-to-get-yaw-roll-pitch-from-posit-rotation-matrix
quad08pyro.groups.et.byu.net/vision.htm
stackoverflow.com/questions/13565625/opencv-c-posit-why-are-my-values-always-nan-with-small-focal-lenght
www.c-plusplus.de/forum/308773-full
I used the most common Posit Tutorial and an own example with Blender, so I could render an image to retreive the image points and to know the exact angles. The object's Z-Axis in Blender was rotated by 10 degrees - And I checked all the degrees of all 3 Axis due to changes in Axis between Blender and OpenCV.
double focalLength = 700.0;
CvPOSITObject* positObject;
std::vector<CvPoint3D32f> modelPoints;
modelPoints.push_back(cvPoint3D32f(0.0f, 0.0f, 0.0f));
modelPoints.push_back(cvPoint3D32f(CUBE_SIZE, 0.0f, 0.0f));
modelPoints.push_back(cvPoint3D32f(0.0f, CUBE_SIZE, 0.0f));
modelPoints.push_back(cvPoint3D32f(0.0f, 0.0f, CUBE_SIZE));
std::vector<CvPoint2D32f> imagePoints;
imagePoints.push_back( cvPoint2D32f( 157,372) );
imagePoints.push_back( cvPoint2D32f(423,386 ));
imagePoints.push_back( cvPoint2D32f( 157,108 ));
imagePoints.push_back( cvPoint2D32f(250,337));
// Moving the points to the image center as described in the tutorial
for (int i = 0; i < imagePoints.size();i++) {
imagePoints[i] = cvPoint2D32f(imagePoints[i].x -320, 240 - imagePoints[i].y);
}
CvVect32f translation_vector = new float[3];
CvTermCriteria criteria = cvTermCriteria(CV_TERMCRIT_EPS | CV_TERMCRIT_ITER,iterations, 0.1f);
positObject = cvCreatePOSITObject( &modelPoints[0], static_cast<int>(modelPoints.size()));
CvMatr32f rotation_matrix = new float[9];
cvPOSIT( positObject, &imagePoints[0], focalLength, criteria, rotation_matrix, translation_vector );
algorithms to get angles...
I already tried to calculate the results from radian to degree and clockwise but I already get bad results using the rotation matrix of cvPosit from OpenCV. I also changed matrix format to check wrong formatting...
I used simple rotation matrices - like only doing a rotation of the x-axis, y and z-axis and some algorithm worked. The rotation matrix of cvPosit didn't work with that algorithm.
I appreciate any support.
Related
I am using opencv::solvePnP to return a camera pose. I run PnP, and it returns the rvec and tvec values.(rotation vector and position).
I then run this function to convert the values to the camera pose:
void GetCameraPoseEigen(cv::Vec3d tvecV, cv::Vec3d rvecV, Eigen::Vector3d &Translate, Eigen::Quaterniond &quats)
{
Mat R;
Mat tvec, rvec;
tvec = DoubleMatFromVec3b(tvecV);
rvec = DoubleMatFromVec3b(rvecV);
cv::Rodrigues(rvec, R); // R is 3x3
R = R.t(); // rotation of inverse
tvec = -R*tvec; // translation of inverse
Eigen::Matrix3d mat;
cv2eigen(R, mat);
Eigen::Quaterniond EigenQuat(mat);
quats = EigenQuat;
double x_t = tvec.at<double>(0, 0);
double y_t = tvec.at<double>(1, 0);
double z_t = tvec.at<double>(2, 0);
Translate.x() = x_t * 10;
Translate.y() = y_t * 10;
Translate.z() = z_t * 10;
}
This works, yet at some rotation angles, the converted rotation values flip randomly between positive and negative values. Yet, the source rvecV value does not. I assume this means I am going wrong with my conversion. How can i get a stable Quaternion from the PnP returned cv::Vec3d?
EDIT: This seems to be Quaternion flipping, as mentioned here:
Quaternion is flipping sign for very similar rotations?
Based on that, i have tried adding:
if(quat.w() < 0)
{
quat = quat.Inverse();
}
But I see the same flipping.
Both quat and -quat represent the same rotation. You can check that by taking a unit quaternion, converting it to a rotation matrix, then doing
quat.coeffs() = -quat.coeffs();
and converting that to a rotation matrix as well.
If for some reason you always want a positive w value, negate all coefficients if w is negative.
The sign should not matter...
... rotation-wise, as long as all four fields of the 4D quaternion are getting flipped. There's more to it explained here:
Quaternion to EulerXYZ, how to differentiate the negative and positive quaternion
Think of it this way:
Angle/axis both flipped mean the same thing
and mind the clockwise to counterclockwise transition much like in a mirror image.
There may be convention to keep the quat.w() or quat[0] component positive and change other components to opposite accordingly. Assume w = cos(angle/2) then setting w > 0 just means: I want angle to be within the (-pi, pi) range. So that the -270 degrees rotation becomes +90 degrees rotation.
Doing the quat.Inverse() is probably not what you want, because this creates a rotation in the opposite direction. That is -quat != quat.Inverse().
Also: check that both systems have the same handedness (chirality)! Test if your rotation matrix determinant is +1 or -1.
(sry for the image link, I don't have enough reputation to embed them).
I have a relative camera pose estimation problem where I am looking at a scene with differently oriented cameras spaced a certain distance apart. Initially, I am computing the essential matrix using the 5 point algorithm and decomposing it to get the R and t of camera 2 w.r.t camera 1.
I thought it would be a good idea to do a check by triangulating the two sets of image points into 3D, and then running solvePnP on the 3D-2D correspondences, but the result I get from solvePnP is way off. I am trying to do this to "refine" my pose as the scale can change from one frame to another. Anyway, In one case, I had a 45 degree rotation between camera 1 and camera 2 along the Z axis, and the epipolar geometry part gave me this answer:
Relative camera rotation is [1.46774, 4.28483, 40.4676]
Translation vector is [-0.778165583410928; -0.6242059242696293; -0.06946429947410336]
solvePnP, on the other hand..
Camera1: rvecs [0.3830144497209735; -0.5153903947692436; -0.001401186630803216]
tvecs [-1777.451836911453; -1097.111339375749; 3807.545406775675]
Euler1 [24.0615, -28.7139, -6.32776]
Camera2: rvecs [1407374883553280; 1337006420426752; 774194163884064.1] (!!)
tvecs[1.249151852575814; -4.060149502748567; -0.06899980661249146]
Euler2 [-122.805, -69.3934, 45.7056]
Something is troublingly off with the rvecs of camera2 and tvec of camera 1. My code involving the point triangulation and solvePnP looks like this:
points1.convertTo(points1, CV_32F);
points2.convertTo(points2, CV_32F);
// Homogenize image points
points1.col(0) = (points1.col(0) - pp.x) / focal;
points2.col(0) = (points2.col(0) - pp.x) / focal;
points1.col(1) = (points1.col(1) - pp.y) / focal;
points2.col(1) = (points2.col(1) - pp.y) / focal;
points1 = points1.t(); points2 = points2.t();
cv::triangulatePoints(P1, P2, points1, points2, points3DH);
cv::Mat points3D;
convertPointsFromHomogeneous(Mat(points3DH.t()).reshape(4, 1), points3D);
cv::solvePnP(points3D, points1.t(), K, noArray(), rvec1, tvec1, 1, CV_ITERATIVE );
cv::solvePnP(points3D, points2.t(), K, noArray(), rvec2, tvec2, 1, CV_ITERATIVE );
And then I am converting the rvecs through Rodrigues to get the Euler angles: but since rvecs and tvecs themselves seem to be wrong, I feel something's wrong with my process. Any pointers would be helpful. Thanks!
I have a PointGrey Ladybug3 Camera. It's a panoramic (multi)camera (5 camera to do a 360º and 1 camera looking up).
I've done all the calibration and rectification so what I end up is from all pixels of the 6 images I know it's 3d position wrt a global frame.
What I would do now is convert this 3d points to a panoramic image. The most common is a radial (Equirectangular) projection like the following one:
For all the 3D points (X,Y,Z) it's possible to find theta and phi coordinate like:
My question is, Is it possible to do this automatically with opencv? Or if I do this manually what is the best way to convert that bunch of pixels in theta,phi coordinates to an image?
The official ladybug SDK uses OpenGL for all this operations, but I was wondering if it's possible to do this in opencv.
Thanks,
Josep
The approach I used to solve this problem was the following:
Create an empty image with the desired output size.
For every pixel in the output image find the theta and phi coordinates. (Linearly) Theta goes from -Pi to Pi and phi from 0 to Pi
Set a projection radius R and find 3D coordinate from theta, phi and R.
Find for how many cameras is the 3D point visible and the correspondent pixel position.
Copy the pixel of the image where the pixel is closer to the principal point. Or any other valid criteria...
My code looks like:
cv::Mat panoramic;
panoramic=cv::Mat::zeros(PANO_HEIGHT,PANO_WIDTH,CV_8UC3);
double theta, phi;
double R=calibration.getSphereRadius();
int result;
double dRow=0;
double dCol=0;
for(int y = 0; y!= PANO_HEIGHT; y++){
for(int x = 0; x !=PANO_WIDTH ; x++) {
//Rescale to [-pi, pi]
theta=-(2*PI*x/(PANO_WIDTH-1)-PI); //Sign change needed.
phi=PI*y/(PANO_HEIGHT-1);
//From theta and phi find the 3D coordinates.
double globalZ=R*cos(phi);
double globalX=R*sin(phi)*cos(theta);
double globalY=R*sin(phi)*sin(theta);
float minDistanceCenter=5000; // Doesn't depend on the image.
float distanceCenter;
//From the 3D coordinates, find in how many camera falls the point!
for(int cam = 0; cam!= 6; cam++){
result=calibration.ladybugXYZtoRC(globalX, globalY, globalZ, cam, dRow, dCol);
if (result==0){ //The 3d point is visible from this camera
cv::Vec3b intensity = image[cam].at<cv::Vec3b>(dRow,dCol);
distanceCenter=sqrt(pow(dRow-imageHeight/2,2)+pow(dCol-imageWidth/2,2));
if (distanceCenter<minDistanceCenter) {
panoramic.ptr<unsigned char>(y,x)[0]=intensity.val[0];
panoramic.ptr<unsigned char>(y,x)[1]=intensity.val[1];
panoramic.ptr<unsigned char>(y,x)[2]=intensity.val[2];
minDistanceCenter=distanceCenter;
}
}
}
}
}
I'm trying to use OpenCV to do some basic augmented reality. The way I'm going about it is using findChessboardCorners to get a set of points from a camera image. Then, I create a 3D quad along the z = 0 plane and use solvePnP to get a homography between the imaged points and the planar points. From that, I figure I should be able to set up a modelview matrix which will allow me to render a cube with the right pose on top of the image.
The documentation for solvePnP says that it outputs a rotation vector "that (together with [the translation vector] ) brings points from the model coordinate system to the camera coordinate system." I think that's the opposite of what I want; since my quad is on the plane z = 0, I want a a modelview matrix which will transform that quad to the appropriate 3D plane.
I thought that by performing the opposite rotations and translations in the opposite order I could calculate the correct modelview matrix, but that seems not to work. While the rendered object (a cube) does move with the camera image and seems to be roughly correct translationally, the rotation just doesn't work at all; it on multiple axes when it should only be rotating on one, and sometimes in the wrong direction. Here's what I'm doing so far:
std::vector<Point2f> corners;
bool found = findChessboardCorners(*_imageBuffer, cv::Size(5,4), corners,
CV_CALIB_CB_FILTER_QUADS |
CV_CALIB_CB_FAST_CHECK);
if(found)
{
drawChessboardCorners(*_imageBuffer, cv::Size(6, 5), corners, found);
std::vector<double> distortionCoefficients(5); // camera distortion
distortionCoefficients[0] = 0.070969;
distortionCoefficients[1] = 0.777647;
distortionCoefficients[2] = -0.009131;
distortionCoefficients[3] = -0.013867;
distortionCoefficients[4] = -5.141519;
// Since the image was resized, we need to scale the found corner points
float sw = _width / SMALL_WIDTH;
float sh = _height / SMALL_HEIGHT;
std::vector<Point2f> board_verts;
board_verts.push_back(Point2f(corners[0].x * sw, corners[0].y * sh));
board_verts.push_back(Point2f(corners[15].x * sw, corners[15].y * sh));
board_verts.push_back(Point2f(corners[19].x * sw, corners[19].y * sh));
board_verts.push_back(Point2f(corners[4].x * sw, corners[4].y * sh));
Mat boardMat(board_verts);
std::vector<Point3f> square_verts;
square_verts.push_back(Point3f(-1, 1, 0));
square_verts.push_back(Point3f(-1, -1, 0));
square_verts.push_back(Point3f(1, -1, 0));
square_verts.push_back(Point3f(1, 1, 0));
Mat squareMat(square_verts);
// Transform the camera's intrinsic parameters into an OpenGL camera matrix
glMatrixMode(GL_PROJECTION);
glLoadIdentity();
// Camera parameters
double f_x = 786.42938232; // Focal length in x axis
double f_y = 786.42938232; // Focal length in y axis (usually the same?)
double c_x = 217.01358032; // Camera primary point x
double c_y = 311.25384521; // Camera primary point y
cv::Mat cameraMatrix(3,3,CV_32FC1);
cameraMatrix.at<float>(0,0) = f_x;
cameraMatrix.at<float>(0,1) = 0.0;
cameraMatrix.at<float>(0,2) = c_x;
cameraMatrix.at<float>(1,0) = 0.0;
cameraMatrix.at<float>(1,1) = f_y;
cameraMatrix.at<float>(1,2) = c_y;
cameraMatrix.at<float>(2,0) = 0.0;
cameraMatrix.at<float>(2,1) = 0.0;
cameraMatrix.at<float>(2,2) = 1.0;
Mat rvec(3, 1, CV_32F), tvec(3, 1, CV_32F);
solvePnP(squareMat, boardMat, cameraMatrix, distortionCoefficients,
rvec, tvec);
_rv[0] = rvec.at<double>(0, 0);
_rv[1] = rvec.at<double>(1, 0);
_rv[2] = rvec.at<double>(2, 0);
_tv[0] = tvec.at<double>(0, 0);
_tv[1] = tvec.at<double>(1, 0);
_tv[2] = tvec.at<double>(2, 0);
}
Then in the drawing code...
GLKMatrix4 modelViewMatrix = GLKMatrix4MakeTranslation(0.0f, 0.0f, 0.0f);
modelViewMatrix = GLKMatrix4Translate(modelViewMatrix, -tv[1], -tv[0], -tv[2]);
modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, -rv[0], 1.0f, 0.0f, 0.0f);
modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, -rv[1], 0.0f, 1.0f, 0.0f);
modelViewMatrix = GLKMatrix4Rotate(modelViewMatrix, -rv[2], 0.0f, 0.0f, 1.0f);
The vertices I'm rendering create a cube of unit length around the origin (i.e. from -0.5 to 0.5 along each edge.) I know with OpenGL translation functions performed transformations in "reverse order," so the above should rotate the cube along the z, y, and then x axes, and then translate it. However, it seems like it's being translated first and then rotated, so perhaps Apple's GLKMatrix4 works differently?
This question seems very similar to mine, and in particular coder9's answer seems like it might be more or less what I'm looking for. However, I tried it and compared the results to my method, and the matrices I arrived at in both cases were the same. I feel like that answer is right, but that I'm missing some crucial detail.
You have to make sure the axis are facing the correct direction. Especially, the y and z axis are facing different directions in OpenGL and OpenCV to ensure the x-y-z basis is direct. You can find some information and code (with an iPad camera) in this blog post.
-- Edit --
Ah ok. Unfortunately, I used these resources to do it the other way round (opengl ---> opencv) to test some algorithms. My main issue was that the row order of the images was inverted between OpenGL and OpenCV (maybe this helps).
When simulating cameras, I came across the same projection matrices that can be found here and in the generalized projection matrix paper. This paper quoted in the comments of the blog post also shows some link between computer vision and OpenGL projections.
I'm not an IOS programmer, so this answer might be misleading!
If the problem is not in the order of applying the rotations and the translation, then suggest using a simpler and more commonly used coordinate system.
The points in the corners vector have the origin (0,0) at the top left corner of the image and the y axis is towards the bottom of the image. Often from math we are used to think of the coordinate system with the origin at the center and y axis towards the top of the image. From the coordinates you're pushing into board_verts I'm guessing you're making the same mistake. If that's the case, it's easy to transform the positions of the corners by something like this:
for (i=0;i<corners.size();i++) {
corners[i].x -= width/2;
corners[i].y = -corners[i].y + height/2;
}
then you call solvePnP(). Debugging this is not that difficult, just print the positions of the four corners and the estimated R and T, and see if they make sense. Then you can proceed to the OpenGL step. Please let me know how it goes.
The iOS 5 documentation reveals that GLKMatrix4MakeLookAt operates the same as gluLookAt.
The definition is provided here:
static __inline__ GLKMatrix4 GLKMatrix4MakeLookAt(float eyeX, float eyeY, float eyeZ,
float centerX, float centerY, float centerZ,
float upX, float upY, float upZ)
{
GLKVector3 ev = { eyeX, eyeY, eyeZ };
GLKVector3 cv = { centerX, centerY, centerZ };
GLKVector3 uv = { upX, upY, upZ };
GLKVector3 n = GLKVector3Normalize(GLKVector3Add(ev, GLKVector3Negate(cv)));
GLKVector3 u = GLKVector3Normalize(GLKVector3CrossProduct(uv, n));
GLKVector3 v = GLKVector3CrossProduct(n, u);
GLKMatrix4 m = { u.v[0], v.v[0], n.v[0], 0.0f,
u.v[1], v.v[1], n.v[1], 0.0f,
u.v[2], v.v[2], n.v[2], 0.0f,
GLKVector3DotProduct(GLKVector3Negate(u), ev),
GLKVector3DotProduct(GLKVector3Negate(v), ev),
GLKVector3DotProduct(GLKVector3Negate(n), ev),
1.0f };
return m;
}
I'm trying to extract camera information from this:
1. Read the camera position
GLKVector3 cPos = GLKVector3Make(mx.m30, mx.m31, mx.m32);
2. Read the camera right vector as `u` in the above
GLKVector3 cRight = GLKVector3Make(mx.m00, mx.m10, mx.m20);
3. Read the camera up vector as `u` in the above
GLKVector3 cUp = GLKVector3Make(mx.m01, mx.m11, mx.m21);
4. Read the camera look-at vector as `n` in the above
GLKVector3 cLookAt = GLKVector3Make(mx.m02, mx.m12, mx.m22);
There are two questions:
The look-at vector seems negated as they defined it, since they perform (eye - center) rather than (center - eye). Indeed, when I call GLKMatrix4MakeLookAt with a camera position of (0,0,-10) and a center of (0,0,1) my extracted look at is (0,0,-1), i.e. the negative of what I expect. So should I negate what I extract?
The camera position I extract is the result of the view transformation matrix premultiplying the view rotation matrix, hence the dot products in their definition. I believe this is incorrect - can anyone suggest how else I should calculate the position?
Many thanks for your time.
Per its documentation, gluLookAt calculates centre - eye, uses that for some intermediate steps, then negatives it for placement into the resulting matrix. So if you want centre - eye back, the taking negative is explicitly correct.
You'll also notice that the result returned is equivalent to a multMatrix with the rotational part of the result followed by a glTranslate by -eye. Since the classic OpenGL matrix operations post multiply, that means gluLookAt is defined to post multiply the rotational by the translational. So Apple's implementation is correct, and the same as first moving the camera, then rotating it — which is correct.
So if you define R = (the matrix defining the rotational part of your instruction), T = (the translational analogue), you get R.T. If you want to extract T you could premultiply by the inverse of R and then pull the results out of the final column, since matrix multiplication is associative.
As a bonus, because R is orthonormal, the inverse is just the transpose.