Let x be a vector, namely a flattened image with N pixels. The rotated image (by some angle θ) is represented by its flattened vector y. What is the form of the N-by-N matrix A, such that Ax=y?
Due to cropping and interpolation effects, there might be a slight discrepancy between the result of the matrix multiplcation and y. Therefore we wish to minimize ||Ax-y||^2.
Related
I want to estimate an affine transform given matches between a transformed img and
a reference img (which came from a reference stack).
Can I estimate a 3D affine transform given only the x,y coordinates of the transformed
image and the x, y, z coordinates of the reference
image (z being the slice from the reference stack that the reference img came from)?
The general formula for illustrating a transform is:
x' = M * x, where x' is the transformed point. M is the transformation matrix, and x is the original point. The transform matrix, M, is estimated by multiplying x' by inv(x).
The standard setup for estimating the 3D transformation matrix is this:
How can I estimate the transformation matrix if I don't have the z' of the transformed image? Is there some other setup I can use?
I have two images with known corresponding 2D points, the intrinsic parameters of the cameras and the 3D transformation between the cameras. I want to calculate the 2D reprojection error from one image to the other.
To do so, I thought about getting a fundamental matrix from the transformation, so I can compute the point-to-line distance between the points and the corresponding epipolar lines. How can I get the fundamental matrix?
I know that E = R * [t] and F = K^(-t) * E * K^(-1), where E is the essential matrix and [t] is the skew-symmetric matrix of the translation vector. However, this returns a null matrix if the motion is pure rotation (t = [0 0 0]). I know that in this case a homography explains the motion better than the fundamental matrix, so that I can compare the norm of the translation vector with a small threshold to choose a fundamental matrix or a homogaphy. Is there a better way of doing this?
"I want to calculate the 2D reprojection error from one image to the other."
Then go and calculate it. Your setup is calibrated, so you don't need anything other than a known piece of 3D geometry. Forget about the epipolar error, which may as well be undefined if your camera motion is (close to) a pure rotation.
Take an object of known size and shape (for example, a checkerboard), work out its location in 3D space from one camera view (for a checkerboard you can fit a homography between its physical model and its projection, then decompose it into [R|t]). Then project the now-located 3D shape into the other camera given that camera's calibrated parameters, and compare the projection with the object's actual image.
I am searching lots of resources on internet for many days but i couldnt solve the problem.
I have a project in which i am supposed to detect the position of a circular object on a plane. Since on a plane, all i need is x and y position (not z) For this purpose i have chosen to go with image processing. The camera(single view, not stereo) position and orientation is fixed with respect to a reference coordinate system on the plane and are known
I have detected the image pixel coordinates of the centers of circles by using opencv. All i need is now to convert the coord. to real world.
http://www.packtpub.com/article/opencv-estimating-projective-relations-images
in this site and other sites as well, an homographic transformation is named as:
p = C[R|T]P; where P is real world coordinates and p is the pixel coord(in homographic coord). C is the camera matrix representing the intrinsic parameters, R is rotation matrix and T is the translational matrix. I have followed a tutorial on calibrating the camera on opencv(applied the cameraCalibration source file), i have 9 fine chessbordimages, and as an output i have the intrinsic camera matrix, and translational and rotational params of each of the image.
I have the 3x3 intrinsic camera matrix(focal lengths , and center pixels), and an 3x4 extrinsic matrix [R|T], in which R is the left 3x3 and T is the rigth 3x1. According to p = C[R|T]P formula, i assume that by multiplying these parameter matrices to the P(world) we get p(pixel). But what i need is to project the p(pixel) coord to P(world coordinates) on the ground plane.
I am studying electrical and electronics engineering. I did not take image processing or advanced linear algebra classes. As I remember from linear algebra course we can manipulate a transformation as P=[R|T]-1*C-1*p. However this is in euclidian coord system. I dont know such a thing is possible in hompographic. moreover 3x4 [R|T] Vector is not invertible. Moreover i dont know it is the correct way to go.
Intrinsic and extrinsic parameters are know, All i need is the real world project coordinate on the ground plane. Since point is on a plane, coordinates will be 2 dimensions(depth is not important, as an argument opposed single view geometry).Camera is fixed(position,orientation).How should i find real world coordinate of the point on an image captured by a camera(single view)?
EDIT
I have been reading "learning opencv" from Gary Bradski & Adrian Kaehler. On page 386 under Calibration->Homography section it is written: q = sMWQ where M is camera intrinsic matrix, W is 3x4 [R|T], S is an "up to" scale factor i assume related with homography concept, i dont know clearly.q is pixel cooord and Q is real coord. It is said in order to get real world coordinate(on the chessboard plane) of the coord of an object detected on image plane; Z=0 then also third column in W=0(axis rotation i assume), trimming these unnecessary parts; W is an 3x3 matrix. H=MW is an 3x3 homography matrix.Now we can invert homography matrix and left multiply with q to get Q=[X Y 1], where Z coord was trimmed.
I applied the mentioned algorithm. and I got some results that can not be in between the image corners(the image plane was parallel to the camera plane just in front of ~30 cm the camera, and i got results like 3000)(chessboard square sizes were entered in milimeters, so i assume outputted real world coordinates are again in milimeters). Anyway i am still trying stuff. By the way the results are previosuly very very large, but i divide all values in Q by third component of the Q to get (X,Y,1)
FINAL EDIT
I could not accomplish camera calibration methods. Anyway, I should have started with perspective projection and transform. This way i made very well estimations with a perspective transform between image plane and physical plane(having generated the transform by 4 pairs of corresponding coplanar points on the both planes). Then simply applied the transform on the image pixel points.
You said "i have the intrinsic camera matrix, and translational and rotational params of each of the image.” but these are translation and rotation from your camera to your chessboard. These have nothing to do with your circle. However if you really have translation and rotation matrices then getting 3D point is really easy.
Apply the inverse intrinsic matrix to your screen points in homogeneous notation: C-1*[u, v, 1], where u=col-w/2 and v=h/2-row, where col, row are image column and row and w, h are image width and height. As a result you will obtain 3d point with so-called camera normalized coordinates p = [x, y, z]T. All you need to do now is to subtract the translation and apply a transposed rotation: P=RT(p-T). The order of operations is inverse to the original that was rotate and then translate; note that transposed rotation does the inverse operation to original rotation but is much faster to calculate than R-1.
I have a rotation-translation matrix [R T] (3x4).
Is there a function in opencv that performs the rotation-translation described by [R T]?
A lot of solutions to this question I think make hidden assumptions. I will try to give you a quick summary of how I think about this problem (I have had to think about it a lot in the past). Warping between two images is a 2 dimensional process accomplished by a 3x3 matrix called a homography. What you have is a 3x4 matrix which defines a transform in 3 dimensions. You can convert between the two by treating your image as a flat plane in 3 dimensional space. The trick then is to decide on the initial position in world space of your image plane. You can then transform its position and project it onto a new image plane with your camera intrinsics matrix.
The first step is to decide where your initial image lies in world space, note that this does not have to be the same as your initial R and T matrices specify. Those are in world coordinates, we are talking about the image created by that world, all the objects in the image have been flattened into a plane. The simplest decision here is to set the image at a fixed displacement on the z axis and no rotation. From this point on I will assume no rotation. If you would like to see the general case I can provide it but it is slightly more complicated.
Next you define the transform between your two images in 3d space. Since you have both transforms with respect to the same origin, the transform from [A] to [B] is the same as the transform from [A] to your origin, followed by the transform from the origin to [B]. You can get that by
transform = [B]*inverse([A])
Now conceptually what you need to do is to take your first image, project its pixels onto the geometric interpretation of your image in 3d space, then transform those pixels in 3d space by the transform above, then project them back onto a new 2d image with your camera matrix. Those steps need to be combined into a single 3x3 matrix.
cv::Matx33f convert_3x4_to_3x3(cv::Matx34f pose, cv::Matx33f camera_mat, float zpos)
{
//converted condenses the 3x4 matrix which transforms a point in world space
//to a 3x3 matrix which transforms a point in world space. Instead of
//multiplying pose by a 4x1 3d homogeneous vector, by specifying that the
//incoming 3d vectors will ALWAYS have a z coordinate of zpos, one can instead
//multiply converted by a homogeneous 2d vector and get the same output for x and y.
cv::Matx33f converted(pose(0,0),pose(0,1),pose(0,2)*zpos+pose(0,3),
pose(1,0),pose(1,1),pose(1,2)*zpos+pose(1,3),
pose(2,0),pose(2,1),pose(2,2)*zpos+pose(2,3));
//This matrix will take a homogeneous 2d coordinate and "projects" it onto a
//flat plane at zpos. The x and y components of the incoming homogeneous 2d
//coordinate will be correct, the z component is dropped.
cv::Matx33f projected(1,0,0,
0,1,0,
0,0,zpos);
projected = projected*camera_mat.inv();
//now we have the pieces. A matrix which can take an incoming 2d point, and
//convert it into a pseudo 3d point (x and y correspond to 3d, z is unused)
//and a matrix which can take our pseudo 3d point and transform it correctly.
//Now we just need to turn our transformed pseudo 3d point back into a 2d point
//in our new image, to do that simply multiply by the camera matrix.
return camera_mat*converted*projected;
}
This is probably a more complicated answer than you were looking for but I hope it gives you an idea of what you are asking. This can be very confusing and I glazed over some parts of it quickly, feel free to ask for clarification. If you need the solution to work without the assumption that the initial image appears without rotation let me know, I just didn't want to make it more complicated than it needed to be.
With FeatureDetector I get features on two images with the same element and match this features with BruteForceMatcher.
Then I'm using OpenCv function findHomography to get homography matrix
H = findHomography( src2Dfeatures, dst2Dfeatures, outlierMask, RANSAC, 3);
and getting H matrix, then align image with
warpPerspective(img1,alignedSrcImage,H,img2.size(),INTER_LINEAR,BORDER_CONSTANT);
I need to know rotation angle, scale, displacement of detected element. Is there any simple way to get this than some big equations? Some evaluated formulas just to put data in?
Homography would match projections of your elements lying on a plane or lying arbitrary in 3D if the camera goes through a pure rotation or zoom and no translation. So here are the cases we are talking about with indication of what is the input to our calculations:
- planar target, pure rotation, intra-frame homography
- planar target, rotation and translation, target to frame homography
- 3D target, pure rotation, frame to frame mapping (constrained by a fundamental matrix)
In case of the planar target, a pure rotation is easy to calculate through your frame-to-frame Homography (H12):
given intrinsic camera matrix A, plane to image homographies for frame H1, and H2 that can be expressed as H1=A, H2=A*R, H12 = H2*H1-1=ARA-1 and thus R=A-1H12*A
In case of elements lying on a plane, rotation with translation of the camera (up to unknown scale) can be calculated through decomposition of target-to-frame homography. Note that the target can be just one of the views. Assuming you have your original planar target as an image (taken at some reference orientation) your task is to decompose the homography between images H12 which can be done through SVD. The first two columns of H represent the first two columns of the rotation martrix and be be recovered through H=ULVT, [r1 r2] = UDVT where D is 3x2 Identity matrix with the last row being all 0. The third column of a rotation matrix is just a vector product of the first two columns. The last column of the Homography is a translation vector times some constant.
Finally for arbitrary configuration of points in 3D and pure camera rotation, the rotation is calculated using the essential matrix decomposition rather than homography, see this
cv::decomposeProjectionMatrix();
and
cv::RQDecomp3x3();
are both similar to what you want to achive.
None of them is perfect. The theory behind them and why you cannot extract all params from a 3x3 matrix is a bit cumbersome. But the short answer is that a 3x3 proj matrix is a simplification from the complete 4x4 one, based on the fact that all points stay in the same plane.
You can try to use levenberg marquardt optimalization, where parameters will be translation and rotation, equations will represent by computed distances between features from two images(use only inliers from ransac homography).
Here is C++ implementation of LM http://www.ics.forth.gr/~lourakis/levmar/