I'm wondering if I can determine in OpenCV the transformation kinds (rotation, translation, shear, ...) just from a given transformation matrix?
Following up on this article: http://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/warp_affine/warp_affine.html
I will get an 2x3 Transformation matrix. E.g.:
[ 0.85, 0.20, 0;
-0.06, 0.37, 253.44]
I know that the third "column" stands for the translation. So in this case x=0, y=253.44
But is there a way of determine if in the first two columns only a rotation or a rotation and a scaling is applied? So what I mean is get the matrix multiplication somehow reversed?
This isn't possible in general. For example, consider the affine transformation:
cos(theta) sin(theta) 0
-sin(theta) cos(theta) 0
If theta = pi, then this evaluates to
-1 0 0
0 -1 0
Is this affine transformation scaling by -1, or rotating by 180 degrees?
Affine transformations can indeed be written as a composition of multiple transformations, but as matrix multiplication is not commutative, the order of these compositions matter. And that even goes for translation. The resulting transformation matrix is different if you rotate and then translate than if you translate and rotate, because rotation happens around the (0, 0) location in your image.
In short, there is no unique decomposition. If you know certain constraints, like scaling is uniform, or shearing is uniform, or the scaling is only positive, etc, then you might be able to get a unique decomposition. See here for the mathematics.
Related
According to OpenCV's documentation, solvePnp will return the rotation vector of the object pose from 3D-2D point correspondes. To obtain the rotation matrix, we can use Rodrigues method to convert the rotation vector to rotation matrix. According to OpenCV documentation, we can find theta using the following:
theta = norm(r)
But I thought norm(r) will find the magnitude of the vector r? If that's the case how can we find the angle from the magnitude of the vector r? Correct me if I am wrong. Thank you.
Given a rotation vector r, its length (in Python, numpy.linalg.norm(r)) is the angle of rotation around the axis whose direction is the vector's one. The sense of the rotation obeys the "right hand rule": if your right hand makes a thumbs-up sign, with the thumb pointing as the vector, the other fingers wrap the vectors as the rotation (equivalently, it's the sense of rotation that makes an ordinary screw advance when its tip points as the vector).
The same rotation can be expressed as a 3x3 matrix, or as a triple of (Euler) angles or rotation about up to 3 orthogonal axes. There are ordinarily many different triples of Euler angles that represent the same rotation. Consult a textbook, or Wikipedia, for details.
I am trying to rectify two sequences of images for stereo matching. The usual approach of using stereoCalibrate() with a checkerboard pattern is not of use to me, since I am only working with the footage.
What I have is the correct calibration data of the individual cameras (camera matrix and distortion parameters) as well as measurements of their distance and angle between each other.
How can I construct the rotation matrix and translation vector needed for stereoRectify()?
The naive approach of using
Mat T = (Mat_<double>(3,1) << distance, 0, 0);
Mat R = (Mat_<double>(3,3) << cos(angle), 0, sin(angle), 0, 1, 0, -sin(angle), 0, cos(angle));
resulted in a heavily warped image. Do these matrices need to relate to a different origin point I am not aware of? Or do I need to convert the distance/angle value somehow to be dependent of pixelsize?
Any help would be appreciated.
It's not clear whether you have enough information about the camera poses to perform an accurate rectification.
Both T and R are measured in 3D, but in your case:
T is one-dimensional (along x-axis only), which means that you are confident that the two cameras are perfectly aligned along the other axes (in particular, you have less-than-1 pixel error on the y axis, ie a few microns by today's standards);
R leaves the Y-coordinates untouched. Thus, all you have is rotation around this axis, does it match your experimental setup ?
Finally, you need to check the consistency of the units that you are using for the translation and rotation to match with the units from the intrinsic data.
If it is feasible, you can check your results by finding some matching points between the two cameras and proceeding to a projective calibration: the accurate knowledge of the 3D position of the calibration points is required for metric reconstruction only. Other tasks rely on the essential or fundamental matrices, that can be computed from image-to-image point correspondences.
If intrinsics and extrinsics known, I recommend this method: http://link.springer.com/article/10.1007/s001380050120#page-1
It is easy to implement. Basically you rotate the right camera till both cameras have the same orientation, means both share a common R. Epipols are then transformed to the infinity and you have epipolar lines parallel to the image x-axis.
First row of the new R (x) is simply the baseline, e.g. the subtraction of both camera centers. Second row (y) the cross product of the baseline with the old left z-axis. Third row (z) equals cross product of the first two rows.
At last you need to calculate a 3x3 homography described in the above link and use warpPerspective() to get a rectified version.
I need to find the pose (rotation matrix + translation vector) for a camera, and for that I am using cv2.solvePnP(), but the results I get from photos don't match.
In order to debug, I created (with numpy) a "debugging 3d scene" composed of some object points (four corners of a square), some camera points (focal point, principal point and four corners of the virtual projection plane) and parameters (focal distance, initial orientation).
Then, I construct a general rotation matrix by multiplying three axis rotation matrices, apply this general rotation to the camera (numpy.dot()), project the object points in the virtual projection plane (line-plane intersection algorithm), and calculate the in-plane 2D coordinates (point-line distance) to the projection plane axes.
After doing this (objectpoints to imagepoints via rotationmatrix), I feed the imagepoints and the objectpoints to cv2.Rodrigues(cv2.solvePnP(...)) and get a matrix "not quite identical" to the one I used, only because of transposition and some elements with opposite signal (negative vs. positive), respecting this relation:
solvepnp_rotmatrix = my_original_matrix.transpose * [ 1 1 1]
[ 1 1 -1]
[-1 -1 1]
Although the rotation matrix mismatch is "solveable" with this hack, the TRANSLATION VECTOR gives coordinates that don't make sense to me.
I suspect there are mismatches between my 3D model (handedness, axes orientation, order of rotations) and the model used by opencv:
I use OpenGL-like coordinate system (X increases to the right, Y increases upwards, and Z increases toward the observer;
I applied the rotations in the order that made more sense to me (all right-handed, first around global Z, then around global X, then around global Y);
The image plane is between object and camera focal point (virtual projection plane, instead of real/CCD);
The origin of my image plane (virtual CCD) is lower-left corner (Xpix increases to the right, Ypx increases upwards.
My questions are:
Given that the terms of the rotation matrix are identical, only transposed and with different signal in some terms, is it possible that I am confusing some of openCV conventions (handedness, order of rotations, axis direction)? And how can I discover which one(s)?
Also, is there a way to relate my handmade translation vector to the tvec returned by solvePnP? (of course, ideally, the best would be to make the coordinate systems to match, in the first place).
Any help will be most welcome!
I am playing with the affine transform in OpenCV and I am having trouble getting an intuitive understanding of it workings, and more specifically, just how do I specify the parameters of the map matrix so I can get a specific desired result.
To setup the question, the procedure I am using is 1st to define a warp matrix, then do the transform.
In OpenCV the 2 routines are (I am using an example in the excellent book OpenCV by Bradski & Kaehler):
cvGetAffineTransorm(srcTri, dstTri, warp_matrix);
cvWarpAffine(src, dst, warp_mat);
To define the warp matrix, srcTri and dstTri are defined as:
CvPoint2D32f srcTri[3], dstTri[3];
srcTri[3] is populated as follows:
srcTri[0].x = 0;
srcTri[0].y = 0;
srcTri[1].x = src->width - 1;
srcTri[1].y = 0;
srcTri[2].x = 0;
srcTri[2].y = src->height -1;
This is essentially the top left point, top right point, and bottom left point of the image for starting point of the matrix. This part makes sense to me.
But the values for dstTri[3] just are confusing, at least, when I vary a single point, I do not get the result I expect.
For example, if I then use the following for the dstTri[3]:
dstTri[0].x = 0;
dstTri[0].y = 0;
dstTri[1].x = src->width - 1;
dstTri[1].y = 0;
dstTri[2].x = 0;
dstTri[2].y = 100;
It seems that the only difference between the src and the dst point is that the bottom left point is moved to the right by 100 pixels. Intuitively, I feel that the bottom part of the image should be shifted to the right by 100 pixels, but this is not so.
Also, if I use the exact same values for dstTri[3] that I use for srcTri[3], I would think that the transform would produce the exact same image--but it does not.
Clearly, I do not understand what is going on here. So, what does the mapping from the srcTri[] to the dstTri[] represent?
Here is a mathematical explanation of an affine transform:
this is a matrix of size 3x3 that applies the following transformations on a 2D vector: Scale in X axis, scale Y, rotation, skew, and translation on the X and Y axes.
These are 6 transformations and thus you have six elements in your 3x3 matrix. The bottom row is always [0 0 1].
Why? because the bottom row represents the perspective transformation in axis x and y, and affine transformation does not include perspective transform.
(If you want to apply perspective warping use homography: also 3x3 matrix )
What is the relation between 6 values you insert into affine matrix and the 6 transformations it does? Let us look at this 3x3 matrix like
e*Zx*cos(a), -q1*sin(a) , dx,
e*q2*sin(a), Z y*cos(a), dy,
0 , 0 , 1
The dx and
dy elements are translation in x and y axis (just move the picture left-right, up down).
Zx is the relative scale(zoom) you apply to the image in X axis.
Zy is the same as above for y axis
a is the angle of rotation of the image. This is tricky since when you want to rotate by 'a' you have to insert sin(), cos() in 4 different places in the matrix.
'q' is the skew parameter. It is rarely used. It will cause your image to skew on the side (q1 causes y axis affects x axis and q2 causes x axis affect y axis)
Bonus: 'e' parameter is actually not a transformation. It can have values 1,-1. If it is 1 then nothing happens, but if it is -1 than the image is flipped horizontally. You can use it also to flip the image vertically but, this type of transformation is rarely used.
Very important Note!!!!!
The above explanation is mathematical. It assumes you multiply the matrix by the column vector from the right. As far as I remember, Matlab uses reverse multiplication (row vector from the left) so you will need to transpose this matrix. I am pretty sure that OpenCV uses regular multiplication but you need to check it.
Just enter only translation matrix (x shifted by 10 pixels, y by 1).
1,0,10
0,1,1
0,0,1
If you see a normal shift than everything is OK, but If shit appears than transpose the matrix to:
1,0,0
0,1,0
10,1,1
With FeatureDetector I get features on two images with the same element and match this features with BruteForceMatcher.
Then I'm using OpenCv function findHomography to get homography matrix
H = findHomography( src2Dfeatures, dst2Dfeatures, outlierMask, RANSAC, 3);
and getting H matrix, then align image with
warpPerspective(img1,alignedSrcImage,H,img2.size(),INTER_LINEAR,BORDER_CONSTANT);
I need to know rotation angle, scale, displacement of detected element. Is there any simple way to get this than some big equations? Some evaluated formulas just to put data in?
Homography would match projections of your elements lying on a plane or lying arbitrary in 3D if the camera goes through a pure rotation or zoom and no translation. So here are the cases we are talking about with indication of what is the input to our calculations:
- planar target, pure rotation, intra-frame homography
- planar target, rotation and translation, target to frame homography
- 3D target, pure rotation, frame to frame mapping (constrained by a fundamental matrix)
In case of the planar target, a pure rotation is easy to calculate through your frame-to-frame Homography (H12):
given intrinsic camera matrix A, plane to image homographies for frame H1, and H2 that can be expressed as H1=A, H2=A*R, H12 = H2*H1-1=ARA-1 and thus R=A-1H12*A
In case of elements lying on a plane, rotation with translation of the camera (up to unknown scale) can be calculated through decomposition of target-to-frame homography. Note that the target can be just one of the views. Assuming you have your original planar target as an image (taken at some reference orientation) your task is to decompose the homography between images H12 which can be done through SVD. The first two columns of H represent the first two columns of the rotation martrix and be be recovered through H=ULVT, [r1 r2] = UDVT where D is 3x2 Identity matrix with the last row being all 0. The third column of a rotation matrix is just a vector product of the first two columns. The last column of the Homography is a translation vector times some constant.
Finally for arbitrary configuration of points in 3D and pure camera rotation, the rotation is calculated using the essential matrix decomposition rather than homography, see this
cv::decomposeProjectionMatrix();
and
cv::RQDecomp3x3();
are both similar to what you want to achive.
None of them is perfect. The theory behind them and why you cannot extract all params from a 3x3 matrix is a bit cumbersome. But the short answer is that a 3x3 proj matrix is a simplification from the complete 4x4 one, based on the fact that all points stay in the same plane.
You can try to use levenberg marquardt optimalization, where parameters will be translation and rotation, equations will represent by computed distances between features from two images(use only inliers from ransac homography).
Here is C++ implementation of LM http://www.ics.forth.gr/~lourakis/levmar/