estimating 3D Affine transformation, what matrix to use - image-processing

I want to estimate an affine transform given matches between a transformed img and
a reference img (which came from a reference stack).
Can I estimate a 3D affine transform given only the x,y coordinates of the transformed
image and the x, y, z coordinates of the reference
image (z being the slice from the reference stack that the reference img came from)?
The general formula for illustrating a transform is:
x' = M * x, where x' is the transformed point. M is the transformation matrix, and x is the original point. The transform matrix, M, is estimated by multiplying x' by inv(x).
The standard setup for estimating the 3D transformation matrix is this:
How can I estimate the transformation matrix if I don't have the z' of the transformed image? Is there some other setup I can use?

Related

Apply relative radial distortion function to image w/o knowing anything about the camera

I've a radial distortion function which gives me relative distortion from 0 (image center) to the relative full image field (field height 1) in percent. For example this function would give me a distortion of up to 5% at the full relative field height of 1.
I tried to use this together with opencv undistort function to apply distortion but don't know how to fill the matrices.
As said, I've a source image only and don't know anything about the camera parameters like focal length, except that I know the distortion function.
How should I set the matrix in cv2.undistort(src_image, matrix, ...) ?
The OpenCv routine that's easier to use in your case is cv::remap, not undistort.
In the following I assume your distortion purely radial. Similar considerations apply if you have it already decomposed in (x, y).
So you have a distortion function d(r) of the distance r = sqrt((x - x_c)^2 + (y - y_c)^2) of a pixel (x, y) from the image center (x_c, y_c). The function expresses the relative change of the radius r_d of a pixel in the distorted image from the undistorted one r: (r_d - r) / r = d(r), or, equivalently, r_d = r * (1 - d(r)).
If you are given a distorted image, and want to remove the distortion, you need to invert the above equation (i.e. solve it analytically or numerically), finding the value of r for every r_d in the range of interest. Then you can trivially create two arrays, map_x and map_y, that represent the mapping from distorted to undistorted coordinates: for a given pair (x_d, y_d) of integer pixel coordinates in the distorted image, you compute the associated r_d = sqrt(((x_d - x_c)^2 + (y_d - y_c)^2), then the corresponding r as function of r_d from solving the equation, go back to (x, y), and assign map_x[y_d, x_d] = x; map_y[y_d, x_d] = y. Finally, you pass those to cv::remap.

pytorch affine_grid: what is the theta input?

When trying to use torch.nn.functional.affine_grid, it requires a theta affine matrix of size (N x 3 x 4) according to the documentation. I thought a general affine matrix is (N x 4 x 4). What is the supposed affine matrix format in pytorch?
An example of 3D rotation affine input would be ideal. Appreciate your help.
The dimensions you mention are applicable for the case of 3D inputs, that is you wish to apply 3D geometric transforms on the input tensor x of shape bxcxdxhxw.
A transformation to points in 3D (represented as 4-vector in homogeneous coordinates as (x, y, z, 1)) should be, in the general case, a 4x4 matrix as you noted.
However, since we restrict ourselves to homogeneous coordinates, i.e., the fourth coordinate must be 1, the 4th row of the matrix must be (0, 0, 0, 1) (see this).
Therefore, there's no need to explicitly code this last row.
To conclude, a 3D transformation composed of a 3x3 rotation R and 3d translation t is simply the 3x4 matrix:
theta = [R t]

Exact definition of the matrices in OpenCv StereoRectify

Normally the definition of a projection matrix P is the 3x4 matrix which projects point from world coordinates to image/pixel coordinates. The projection matrix can be split up into:
K: a 3x4 camera matrix K with the intrinsic parameters
T: a 4x4 transformation matrix with the extrinsic parameters
The projection matrix is then P = K * T.
What are the clear definitions of the following input to OpenCV's stereoRectify:
cameraMatrix1 – First camera matrix (I assume it is the instrinsic K part of the projection matrix, correct?).
R – Rotation matrix between the coordinate systems of the first and the second cameras. (what does 'between' means? Is it the rotation from cam1 to cam2 or from cam2 to cam1?)
T – Translation vector between coordinate systems of the cameras. (Same is above. Is the translation from cam1 -> cam2 or cam2->cam1)
R1 – Output 3x3 rectification transform (rotation matrix) for the first camera. (Is this the rotation after rectification so the new extrinsic part of the projection matrix becomes T1new = R1*T1old?)
P1 – Output 3x4 projection matrix in the new (rectified) coordinate systems for the first camera. (What is meant by 'projection matrix in the new coordinate system'? It seems that this projection matrix is dependent on the rotation matrix R1 to project point from world coordinates to image/pixel coordinates, so from the above definition it is neither the 'projection matrix' or the 'camera matrix' but some kind of mixture of the two)
CAMERAMATRIX1 - is the intrinsic K matrix as computed by stereocalibrate() function in opencv. you got it right!!!
R is the rotation matrix of cam2 frame w.r.t cam1 frame. Similarily , T is the translation vector of cam2 origin w.r.t
cam1 origin.
If you'll look in O'Riley book "LEARNING OPENCV" pg.-434, you'll understand what R1(/Rl) and R2(/Rr) are.
Rl=[Rrect][rl]; Rr=[Rect][rr];
let camera's picture planes be plane1 and plane2. When stereo rectification hasn't been done , then plane1 and plane2 will not be parallel at all. Also, the epilines willn't be parallel to the stereo camera baseline. So, what Rl does is that it transforms the left image plane to be parallel to right image plane(which is transformed by Rr) and also , epilines on both images are now parallel .
P1 and P2 are the new projection matrices after stereo rectification. Remember, camera matrix(K) transforms a point in 3d space onto 2d image plane. But P1 and P2 transforms a point in 3d space on rectified 2d image planes.
if you have calibrated a stereo camera rig before and observed the P1 and K1 values, you'll find that they are pretty much similiar if your stereo rig is almost in rectified configuration (obviously within human range)

Coordinate transformation in OpenCV

I have a polyline figure, given as an array of relative x and y point coordinates (0.0 to 1.0).
I have to draw the figure with random position, scale and rotation angle.
How can I do it in the best way?
You could use a simple transformation with RT matrix.
Let X = (x y 1)^t be coordinates of one point of your figure. Let R be a 2x2 rotation matrix, and T be 2x1 translation vector of the transformation You plan to make. RT matrix A will have the form of A = [R T;0 0 1]. To get transformed coordinates of point X, You need to do this simple calculation AX = X', where X' are the new coordinates. Now, to get the whole figure transformed, instead of using a single column, You use a matrix where each column has x coordinate in first row, y in the second and 1 in the third row.
Of course You can try to use functions provided by OpenCV, shown in this tutorial, or ones intended for vectors of points instead of whole images, but the way above makes You actually understand what are You doing ;)

Camera motion from corresponding images

I'm trying to calculate a new camera position based on the motion of corresponding images.
the images conform to the pinhole camera model.
As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.
I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).
Then I calculate the essential matrix by the camera intrinsic matrix (K):
Mat E = K.t() * F * K;
I decompose the essential matrix to rotation and translation with singular value decomposition:
SVD decomp = SVD(E);
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Matx33d Wt(0,1,0,
-1,0,0,
0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3
Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).
The new position is then calculated with:
new_pos = old_pos + -R.t()*t;
where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).
Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.
Here are some results (just in case someone can confirm that any of them is definitely wrong):
F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
-0.001052930954975217, -0.001278667878010564, 1]
K = [150, 0, 300;
0, 150, 400;
0, 0, 1]
E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
-0.04396975675562629, -0.05262169424538553, 0.04904210357279387]
t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]
R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
-0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
-0.4503860776474556, 0.8236506374002566, 0.3446041331317597]
First of all you should check if
x' * F * x = 0
for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.
Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this
xn = inv(K) * x
xn' = inv(K') * x'
where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.
With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.
Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.
When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this
xp = K * P * X
xp' = K' * P' * X
where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.
I think the
new_pos = old_pos + -R.t()*t;
is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.
So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by
error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2
If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.
There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.
To update your camera position, you have to update the translation first, then update the rotation matrix.
t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;
where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

Resources