OpenCV Camera calibration use of rotation matrix - opencv

http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#calibratecamera
I used cv::calibrateCamera method with 9*6 chessboard pattern.
Now I am getting rvecs and tvecs corresponding to each pattern,
Can somebody explain the format of rvecs and tvecs?
As far as I have figured out it is each one is 3*1 matrix.
and OpenCV documentation suggests to see Rodrigues function.
http://en.wikipedia.org/wiki/Rodrigues'_rotation_formula
As far rodrigues is concerned it is way to rotate a vector
around a given axis with angle theta.
but for this we need four values unit Vector(ux,uy,uz) and the angle. but openCV seem to use only 3 values.
OpenCV rodrigues documentation refer the below link http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#void Rodrigues(InputArray src, OutputArray dst, OutputArray jacobian)
says that it will convert 3*1 matrix to 3*3 rotation matrix.
Is this matrix same as which we use 3D graphics.
can I convert it to 4*4 matrix and use it for transformations like below
M4X4 [
x x x 0
x x x 0
x x x 0
0 0 0 1
]
x : are the values from output 3by3 matrix of rodrigues function.
Is the relationship valid:
Vout = M4X4 * Vin;
using the matrix above.

The 3x1 rotation vector can express a rotation matrix by defining an axis of rotation via the direction that the vector points and an angle via the magnitude of the vector. Using the opencv function Rodrigues(InputArray src, OutputArray dst) you can obtain a rotation matrix which fits the function you describe.

Related

pytorch affine_grid: what is the theta input?

When trying to use torch.nn.functional.affine_grid, it requires a theta affine matrix of size (N x 3 x 4) according to the documentation. I thought a general affine matrix is (N x 4 x 4). What is the supposed affine matrix format in pytorch?
An example of 3D rotation affine input would be ideal. Appreciate your help.
The dimensions you mention are applicable for the case of 3D inputs, that is you wish to apply 3D geometric transforms on the input tensor x of shape bxcxdxhxw.
A transformation to points in 3D (represented as 4-vector in homogeneous coordinates as (x, y, z, 1)) should be, in the general case, a 4x4 matrix as you noted.
However, since we restrict ourselves to homogeneous coordinates, i.e., the fourth coordinate must be 1, the 4th row of the matrix must be (0, 0, 0, 1) (see this).
Therefore, there's no need to explicitly code this last row.
To conclude, a 3D transformation composed of a 3x3 rotation R and 3d translation t is simply the 3x4 matrix:
theta = [R t]

Recompose Results of OpenCV RQDecomp3x3

After running RQDecomp3x3 in OpenCV, you get:
mtxR – Output 3x3 upper-triangular matrix.
mtxQ – Output 3x3 orthogonal matrix.
Qx – Optional output 3x3 rotation matrix around x-axis.
Qy – Optional output 3x3 rotation matrix around y-axis.
Qz – Optional output 3x3 rotation matrix around z-axis.
How do you get back from the three rotation matrices (Qx, Qy, Qz) to the original input matrix?
Or in the case where the input matrix was a rotational matrix, mtxR will be the identity matrix so how can you go from the three rotation matrices to mtxQ?
UPDATED
With answer though I don't get why the transpose is needed.
It looks like (at least for a rotational matrix input):
input = (Qx # Qy # Qz)'.

Why OpenCV has only 6 extrinsic parameters in calibration function?

I use the Opencv sample code to do the camera calibration. As far as I know, the extrinsic parameter have 12 elements but in the OpenCV the sum of rotation vector and translation vector is 6.
Why OpenCV has only 6 parameters ?
http://docs.opencv.org/2.4/_downloads/camera_calibration.cpp
calibratecamera method
In calibrateCamera method output rvecs and tvecs, 3D vector for rotation(since any rotation matrix has just 3 degrees of freedom) and translation. They use Rodrigues method to convert 3x4 matrix R to 3D vector r. Thus, only 6 extrinsic parameters.

Exact definition of the matrices in OpenCv StereoRectify

Normally the definition of a projection matrix P is the 3x4 matrix which projects point from world coordinates to image/pixel coordinates. The projection matrix can be split up into:
K: a 3x4 camera matrix K with the intrinsic parameters
T: a 4x4 transformation matrix with the extrinsic parameters
The projection matrix is then P = K * T.
What are the clear definitions of the following input to OpenCV's stereoRectify:
cameraMatrix1 – First camera matrix (I assume it is the instrinsic K part of the projection matrix, correct?).
R – Rotation matrix between the coordinate systems of the first and the second cameras. (what does 'between' means? Is it the rotation from cam1 to cam2 or from cam2 to cam1?)
T – Translation vector between coordinate systems of the cameras. (Same is above. Is the translation from cam1 -> cam2 or cam2->cam1)
R1 – Output 3x3 rectification transform (rotation matrix) for the first camera. (Is this the rotation after rectification so the new extrinsic part of the projection matrix becomes T1new = R1*T1old?)
P1 – Output 3x4 projection matrix in the new (rectified) coordinate systems for the first camera. (What is meant by 'projection matrix in the new coordinate system'? It seems that this projection matrix is dependent on the rotation matrix R1 to project point from world coordinates to image/pixel coordinates, so from the above definition it is neither the 'projection matrix' or the 'camera matrix' but some kind of mixture of the two)
CAMERAMATRIX1 - is the intrinsic K matrix as computed by stereocalibrate() function in opencv. you got it right!!!
R is the rotation matrix of cam2 frame w.r.t cam1 frame. Similarily , T is the translation vector of cam2 origin w.r.t
cam1 origin.
If you'll look in O'Riley book "LEARNING OPENCV" pg.-434, you'll understand what R1(/Rl) and R2(/Rr) are.
Rl=[Rrect][rl]; Rr=[Rect][rr];
let camera's picture planes be plane1 and plane2. When stereo rectification hasn't been done , then plane1 and plane2 will not be parallel at all. Also, the epilines willn't be parallel to the stereo camera baseline. So, what Rl does is that it transforms the left image plane to be parallel to right image plane(which is transformed by Rr) and also , epilines on both images are now parallel .
P1 and P2 are the new projection matrices after stereo rectification. Remember, camera matrix(K) transforms a point in 3d space onto 2d image plane. But P1 and P2 transforms a point in 3d space on rectified 2d image planes.
if you have calibrated a stereo camera rig before and observed the P1 and K1 values, you'll find that they are pretty much similiar if your stereo rig is almost in rectified configuration (obviously within human range)

Camera motion from corresponding images

I'm trying to calculate a new camera position based on the motion of corresponding images.
the images conform to the pinhole camera model.
As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.
I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).
Then I calculate the essential matrix by the camera intrinsic matrix (K):
Mat E = K.t() * F * K;
I decompose the essential matrix to rotation and translation with singular value decomposition:
SVD decomp = SVD(E);
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Matx33d Wt(0,1,0,
-1,0,0,
0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3
Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).
The new position is then calculated with:
new_pos = old_pos + -R.t()*t;
where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).
Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.
Here are some results (just in case someone can confirm that any of them is definitely wrong):
F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
-0.001052930954975217, -0.001278667878010564, 1]
K = [150, 0, 300;
0, 150, 400;
0, 0, 1]
E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
-0.04396975675562629, -0.05262169424538553, 0.04904210357279387]
t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]
R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
-0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
-0.4503860776474556, 0.8236506374002566, 0.3446041331317597]
First of all you should check if
x' * F * x = 0
for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.
Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this
xn = inv(K) * x
xn' = inv(K') * x'
where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.
With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.
Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.
When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this
xp = K * P * X
xp' = K' * P' * X
where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.
I think the
new_pos = old_pos + -R.t()*t;
is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.
So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by
error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2
If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.
There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.
To update your camera position, you have to update the translation first, then update the rotation matrix.
t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;
where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

Resources