pytorch affine_grid: what is the theta input? - image-processing

When trying to use torch.nn.functional.affine_grid, it requires a theta affine matrix of size (N x 3 x 4) according to the documentation. I thought a general affine matrix is (N x 4 x 4). What is the supposed affine matrix format in pytorch?
An example of 3D rotation affine input would be ideal. Appreciate your help.

The dimensions you mention are applicable for the case of 3D inputs, that is you wish to apply 3D geometric transforms on the input tensor x of shape bxcxdxhxw.
A transformation to points in 3D (represented as 4-vector in homogeneous coordinates as (x, y, z, 1)) should be, in the general case, a 4x4 matrix as you noted.
However, since we restrict ourselves to homogeneous coordinates, i.e., the fourth coordinate must be 1, the 4th row of the matrix must be (0, 0, 0, 1) (see this).
Therefore, there's no need to explicitly code this last row.
To conclude, a 3D transformation composed of a 3x3 rotation R and 3d translation t is simply the 3x4 matrix:
theta = [R t]

Related

How to generate a probability distribution on an image

I have a question as follows:
Suppose I have an image(size=360x640(row by col)), and I have a center coordinate that's say is (20, 100). What I want is to generate a probability distribution that has the highest value in that center (20,100), and lower probability value in the neighbor and much more lower value farer than the center.
All I figure out is to put a multivariate gaussian (since the dimension is 2D) and set mean to the center(20,100). But is that correct and how do I design the covariance matrix?
Thanks!!
You could do it in 2D by generating radial and polar coordinates
Along the line:
Pi = 3.1415926
cx = 20
cy = 100
r = sqrt( -2*log(1-U(0,1)) )
a = 2*Pi*U(0,1)
x = scale*r*cos(a)
y = scale*r*sin(a)
return (x + cx, y + cy)
where scale is a parameter to make it from unitless gaussian to some unit applicable to your problem. U(0,1) is uniform in [0...1) random value.
Reference: Box-Muller sampling.
If you want generic 2D gaussian, meaning ellipse in 2D, then you'll have to use different scales for X and Y, and rotate (x,y) vector by predefined angle using well-known rotation matrix

estimating 3D Affine transformation, what matrix to use

I want to estimate an affine transform given matches between a transformed img and
a reference img (which came from a reference stack).
Can I estimate a 3D affine transform given only the x,y coordinates of the transformed
image and the x, y, z coordinates of the reference
image (z being the slice from the reference stack that the reference img came from)?
The general formula for illustrating a transform is:
x' = M * x, where x' is the transformed point. M is the transformation matrix, and x is the original point. The transform matrix, M, is estimated by multiplying x' by inv(x).
The standard setup for estimating the 3D transformation matrix is this:
How can I estimate the transformation matrix if I don't have the z' of the transformed image? Is there some other setup I can use?

Coordinate transformation in OpenCV

I have a polyline figure, given as an array of relative x and y point coordinates (0.0 to 1.0).
I have to draw the figure with random position, scale and rotation angle.
How can I do it in the best way?
You could use a simple transformation with RT matrix.
Let X = (x y 1)^t be coordinates of one point of your figure. Let R be a 2x2 rotation matrix, and T be 2x1 translation vector of the transformation You plan to make. RT matrix A will have the form of A = [R T;0 0 1]. To get transformed coordinates of point X, You need to do this simple calculation AX = X', where X' are the new coordinates. Now, to get the whole figure transformed, instead of using a single column, You use a matrix where each column has x coordinate in first row, y in the second and 1 in the third row.
Of course You can try to use functions provided by OpenCV, shown in this tutorial, or ones intended for vectors of points instead of whole images, but the way above makes You actually understand what are You doing ;)

Camera motion from corresponding images

I'm trying to calculate a new camera position based on the motion of corresponding images.
the images conform to the pinhole camera model.
As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.
I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).
Then I calculate the essential matrix by the camera intrinsic matrix (K):
Mat E = K.t() * F * K;
I decompose the essential matrix to rotation and translation with singular value decomposition:
SVD decomp = SVD(E);
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Matx33d Wt(0,1,0,
-1,0,0,
0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3
Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).
The new position is then calculated with:
new_pos = old_pos + -R.t()*t;
where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).
Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.
Here are some results (just in case someone can confirm that any of them is definitely wrong):
F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
-0.001052930954975217, -0.001278667878010564, 1]
K = [150, 0, 300;
0, 150, 400;
0, 0, 1]
E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
-0.04396975675562629, -0.05262169424538553, 0.04904210357279387]
t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]
R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
-0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
-0.4503860776474556, 0.8236506374002566, 0.3446041331317597]
First of all you should check if
x' * F * x = 0
for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.
Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this
xn = inv(K) * x
xn' = inv(K') * x'
where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.
With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.
Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.
When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this
xp = K * P * X
xp' = K' * P' * X
where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.
I think the
new_pos = old_pos + -R.t()*t;
is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.
So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by
error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2
If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.
There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.
To update your camera position, you have to update the translation first, then update the rotation matrix.
t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;
where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

OpenCV Camera calibration use of rotation matrix

http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#calibratecamera
I used cv::calibrateCamera method with 9*6 chessboard pattern.
Now I am getting rvecs and tvecs corresponding to each pattern,
Can somebody explain the format of rvecs and tvecs?
As far as I have figured out it is each one is 3*1 matrix.
and OpenCV documentation suggests to see Rodrigues function.
http://en.wikipedia.org/wiki/Rodrigues'_rotation_formula
As far rodrigues is concerned it is way to rotate a vector
around a given axis with angle theta.
but for this we need four values unit Vector(ux,uy,uz) and the angle. but openCV seem to use only 3 values.
OpenCV rodrigues documentation refer the below link http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#void Rodrigues(InputArray src, OutputArray dst, OutputArray jacobian)
says that it will convert 3*1 matrix to 3*3 rotation matrix.
Is this matrix same as which we use 3D graphics.
can I convert it to 4*4 matrix and use it for transformations like below
M4X4 [
x x x 0
x x x 0
x x x 0
0 0 0 1
]
x : are the values from output 3by3 matrix of rodrigues function.
Is the relationship valid:
Vout = M4X4 * Vin;
using the matrix above.
The 3x1 rotation vector can express a rotation matrix by defining an axis of rotation via the direction that the vector points and an angle via the magnitude of the vector. Using the opencv function Rodrigues(InputArray src, OutputArray dst) you can obtain a rotation matrix which fits the function you describe.

Resources