Camera motion from corresponding images - opencv

I'm trying to calculate a new camera position based on the motion of corresponding images.
the images conform to the pinhole camera model.
As a matter of fact, I don't get useful results, so I try to describe my procedure and hope that somebody can help me.
I match the features of the corresponding images with SIFT, match them with OpenCV's FlannBasedMatcher and calculate the fundamental matrix with OpenCV's findFundamentalMat (method RANSAC).
Then I calculate the essential matrix by the camera intrinsic matrix (K):
Mat E = K.t() * F * K;
I decompose the essential matrix to rotation and translation with singular value decomposition:
SVD decomp = SVD(E);
Matx33d W(0,-1,0,
1,0,0,
0,0,1);
Matx33d Wt(0,1,0,
-1,0,0,
0,0,1);
R1 = decomp.u * Mat(W) * decomp.vt;
R2 = decomp.u * Mat(Wt) * decomp.vt;
t1 = decomp.u.col(2); //u3
t2 = -decomp.u.col(2); //u3
Then I try to find the correct solution by triangulation. (this part is from http://www.morethantechnical.com/2012/01/04/simple-triangulation-with-opencv-from-harley-zisserman-w-code/ so I think that should work correct).
The new position is then calculated with:
new_pos = old_pos + -R.t()*t;
where new_pos & old_pos are vectors (3x1), R the rotation matrix (3x3) and t the translation vector (3x1).
Unfortunately I got no useful results, so maybe anyone has an idea what could be wrong.
Here are some results (just in case someone can confirm that any of them is definitely wrong):
F = [8.093827077399547e-07, 1.102681999632987e-06, -0.0007939604310854831;
1.29246107737264e-06, 1.492629957878578e-06, -0.001211264339006535;
-0.001052930954975217, -0.001278667878010564, 1]
K = [150, 0, 300;
0, 150, 400;
0, 0, 1]
E = [0.01821111092414898, 0.02481034499174221, -0.01651092283654529;
0.02908037424088439, 0.03358417405226801, -0.03397110489649674;
-0.04396975675562629, -0.05262169424538553, 0.04904210357279387]
t = [0.2970648246214448; 0.7352053067682792; 0.6092828956013705]
R = [0.2048034356172475, 0.4709818957303019, -0.858039396912323;
-0.8690270040802598, -0.3158728880490416, -0.3808101689488421;
-0.4503860776474556, 0.8236506374002566, 0.3446041331317597]

First of all you should check if
x' * F * x = 0
for your point correspondences x' and x. This should be of course only the case for the inliers of the fundamental matrix estimation with RANSAC.
Thereafter, you have to transform your point correspondences to normalized image coordinates (NCC) like this
xn = inv(K) * x
xn' = inv(K') * x'
where K' is the intrinsic camera matrix of the second image and x' are the points of the second image. I think in your case it is K = K'.
With these NCCs you can decompose your essential matrix like you described. You triangulate the normalized camera coordinates and check the depth of your triangulated points. But be careful, in literature they say that one point is sufficient to get the correct rotation and translation. From my experience you should check a few points since one point can be an outlier even after RANSAC.
Before you decompose the essential matrix make sure that E=U*diag(1,1,0)*Vt. This condition is required to get correct results for the four possible choices of the projection matrix.
When you've got the correct rotation and translation you can triangulate all your point correspondences (the inliers of the fundamental matrix estimation with RANSAC). Then, you should compute the reprojection error. Firstly, you compute the reprojected position like this
xp = K * P * X
xp' = K' * P' * X
where X is the computed (homogeneous) 3D position. P and P' are the 3x4 projection matrices. The projection matrix P is normally given by the identity. P' = [R, t] is given by the rotation matrix in the first 3 columns and rows and the translation in the fourth column, so that P is a 3x4 matrix. This only works if you transform your 3D position to homogeneous coordinates, i.e. 4x1 vectors instead of 3x1. Then, xp and xp' are also homogeneous coordinates representing your (reprojected) 2D positions of your corresponding points.
I think the
new_pos = old_pos + -R.t()*t;
is incorrect since firstly, you only translate the old_pos and you do not rotate it and secondly, you translate it with a wrong vector. The correct way is given above.
So, after you computed the reprojected points you can calculate the reprojection error. Since you are working with homogeneous coordinates you have to normalize them (xp = xp / xp(2), divide by last coordinate). This is given by
error = (x(0)-xp(0))^2 + (x(1)-xp(1))^2
If the error is large such as 10^2 your intrinsic camera calibration or your rotation/translation are incorrect (perhaps both). Depending on your coordinate system you can try to inverse your projection matrices. On that account you need to transform them to homogeneous coordinates before since you cannot invert a 3x4 matrix (without the pseudo inverse). Thus, add the fourth row [0 0 0 1], compute the inverse and remove the fourth row.
There is one more thing with reprojection error. In general, the reprojection error is the squared distance between your original point correspondence (in each image) and the reprojected position. You can take the square root to get the Euclidean distance between both points.

To update your camera position, you have to update the translation first, then update the rotation matrix.
t_ref += lambda * (R_ref * t);
R_ref = R * R_ref;
where t_ref and R_ref are your camera state, R and t are new calculated camera rotation and translation, and lambda is the scale factor.

Related

Apply relative radial distortion function to image w/o knowing anything about the camera

I've a radial distortion function which gives me relative distortion from 0 (image center) to the relative full image field (field height 1) in percent. For example this function would give me a distortion of up to 5% at the full relative field height of 1.
I tried to use this together with opencv undistort function to apply distortion but don't know how to fill the matrices.
As said, I've a source image only and don't know anything about the camera parameters like focal length, except that I know the distortion function.
How should I set the matrix in cv2.undistort(src_image, matrix, ...) ?
The OpenCv routine that's easier to use in your case is cv::remap, not undistort.
In the following I assume your distortion purely radial. Similar considerations apply if you have it already decomposed in (x, y).
So you have a distortion function d(r) of the distance r = sqrt((x - x_c)^2 + (y - y_c)^2) of a pixel (x, y) from the image center (x_c, y_c). The function expresses the relative change of the radius r_d of a pixel in the distorted image from the undistorted one r: (r_d - r) / r = d(r), or, equivalently, r_d = r * (1 - d(r)).
If you are given a distorted image, and want to remove the distortion, you need to invert the above equation (i.e. solve it analytically or numerically), finding the value of r for every r_d in the range of interest. Then you can trivially create two arrays, map_x and map_y, that represent the mapping from distorted to undistorted coordinates: for a given pair (x_d, y_d) of integer pixel coordinates in the distorted image, you compute the associated r_d = sqrt(((x_d - x_c)^2 + (y_d - y_c)^2), then the corresponding r as function of r_d from solving the equation, go back to (x, y), and assign map_x[y_d, x_d] = x; map_y[y_d, x_d] = y. Finally, you pass those to cv::remap.

How to generate a probability distribution on an image

I have a question as follows:
Suppose I have an image(size=360x640(row by col)), and I have a center coordinate that's say is (20, 100). What I want is to generate a probability distribution that has the highest value in that center (20,100), and lower probability value in the neighbor and much more lower value farer than the center.
All I figure out is to put a multivariate gaussian (since the dimension is 2D) and set mean to the center(20,100). But is that correct and how do I design the covariance matrix?
Thanks!!
You could do it in 2D by generating radial and polar coordinates
Along the line:
Pi = 3.1415926
cx = 20
cy = 100
r = sqrt( -2*log(1-U(0,1)) )
a = 2*Pi*U(0,1)
x = scale*r*cos(a)
y = scale*r*sin(a)
return (x + cx, y + cy)
where scale is a parameter to make it from unitless gaussian to some unit applicable to your problem. U(0,1) is uniform in [0...1) random value.
Reference: Box-Muller sampling.
If you want generic 2D gaussian, meaning ellipse in 2D, then you'll have to use different scales for X and Y, and rotate (x,y) vector by predefined angle using well-known rotation matrix

Point Cloud from KITTI stereo images

I try to create a Point Cloud based on the images from the KITTI stereo images dataset so then later I could estimate 3D position of some objects.
Original images looks like this.
What I have so far:
generated disparity with cv2.StereoSGBM_create
window_size = 9
minDisparity = 1
stereo = cv2.StereoSGBM_create(
blockSize=10,
numDisparities=64,
preFilterCap=10,
minDisparity=minDisparity,
P1=4 * 3 * window_size ** 2,
P2=32 * 3 * window_size ** 2
)
calculated Q matrix with cv2.stereoRectify using data from KITTI calibration files.
# K_xx: 3x3 calibration matrix of camera xx before rectification
K_L = np.matrix(
[[9.597910e+02, 0.000000e+00, 6.960217e+02],
[0.000000e+00, 9.569251e+02, 2.241806e+02],
[0.000000e+00, 0.000000e+00, 1.000000e+00]])
K_R = np.matrix(
[[9.037596e+02, 0.000000e+00, 6.957519e+02],
[0.000000e+00, 9.019653e+02, 2.242509e+02],
[0.000000e+00, 0.000000e+00, 1.000000e+00]])
# D_xx: 1x5 distortion vector of camera xx before rectification
D_L = np.matrix([-3.691481e-01, 1.968681e-01, 1.353473e-03, 5.677587e-04, -6.770705e-02])
D_R = np.matrix([-3.639558e-01, 1.788651e-01, 6.029694e-04, -3.922424e-04, -5.382460e-02])
# R_xx: 3x3 rotation matrix of camera xx (extrinsic)
R_L = np.transpose(np.matrix([[9.999758e-01, -5.267463e-03, -4.552439e-03],
[5.251945e-03, 9.999804e-01, -3.413835e-03],
[4.570332e-03, 3.389843e-03, 9.999838e-01]]))
R_R = np.matrix([[9.995599e-01, 1.699522e-02, -2.431313e-02],
[-1.704422e-02, 9.998531e-01, -1.809756e-03],
[2.427880e-02, 2.223358e-03, 9.997028e-01]])
# T_xx: 3x1 translation vector of camera xx (extrinsic)
T_L = np.transpose(np.matrix([5.956621e-02, 2.900141e-04, 2.577209e-03]))
T_R = np.transpose(np.matrix([-4.731050e-01, 5.551470e-03, -5.250882e-03]))
IMG_SIZE = (1392, 512)
rotation = R_L * R_R
translation = T_L - T_R
# output matrices from stereoRectify init
R1 = np.zeros(shape=(3, 3))
R2 = np.zeros(shape=(3, 3))
P1 = np.zeros(shape=(3, 4))
P2 = np.zeros(shape=(3, 4))
Q = np.zeros(shape=(4, 4))
R1, R2, P1, P2, Q, validPixROI1, validPixROI2 = cv2.stereoRectify(K_L, D_L, K_R, D_R, IMG_SIZE, rotation, translation,
R1, R2, P1, P2, Q,
newImageSize=(1242, 375))
The resulting matrix look like this (at this point I have a doubt that it is correct):
[[ 1. 0. 0. -614.37893072]
[ 0. 1. 0. -162.12583194]
[ 0. 0. 0. 680.05186262]
[ 0. 0. -1.87703644 0. ]]
Generated Point Cloud with reprojectImageTo3D which looks like this: point cloud
And now the questions part begins :)
Is it OK that all values returned by reprojectImageTo3D are negative?
What are the units of those values, taking into account that it is the KITTI dataset and their camera calibration data is available?
And finally, is it possible to convert those values to something like longitude\latitude if I have GPS coordinate of the camera that took those photos?
Would be appreciated for any help!
Is it OK for all values returned by reprojectImageTo3D to be negative?
Generally speaking, no, at least for Z values. The values returned by reprojectImageTo3D are real-world coordinates relative to the camera origin, so for a Z value to be negative it means the point is behind the camera (which is geometrically incorrect). The X and Y values can be negative, since the camera origin is at the center of the FOV, so a negative X value means the point is "to the left" and a negative Y value means the point is "below". But for Z values, no, they should not be negative.
Your Q matrix is turning out almost the identity, since I think you are incorrectly setting up the rotation matrices in your call to stereoRectify. When you pass rotation and translation, that is the single rotation from camera 1 to camera 2, not the combined rotation from camera 1 to camera 2. What you are doing is multiplying the two rotations together after transposing one of them; instead you should be passing only R_L (since from your description I assume this means that it is the rotation from left to right camera).
What are the units of those values, taking into account that it is the KITTI dataset and their camera calibration data is available?
I am not familiar with the KITTI dataset, but the values returned after calling reprojectImageTo3D are in real-world units, typically meters.
And finally, is it possible to convert those values to something like longitude\latitude if I have GPS coordinate of the camera that took those photos?
The coordinates returned by reprojectImageTo3D are in real-world coordinates relative to the camera origin. If you have the GPS coordinate of the camera that took the photos, you can manipulate the latitude/longitude values with the (X, Y, Z) coordinates returned from the reprojection.

Coordinate transformation in OpenCV

I have a polyline figure, given as an array of relative x and y point coordinates (0.0 to 1.0).
I have to draw the figure with random position, scale and rotation angle.
How can I do it in the best way?
You could use a simple transformation with RT matrix.
Let X = (x y 1)^t be coordinates of one point of your figure. Let R be a 2x2 rotation matrix, and T be 2x1 translation vector of the transformation You plan to make. RT matrix A will have the form of A = [R T;0 0 1]. To get transformed coordinates of point X, You need to do this simple calculation AX = X', where X' are the new coordinates. Now, to get the whole figure transformed, instead of using a single column, You use a matrix where each column has x coordinate in first row, y in the second and 1 in the third row.
Of course You can try to use functions provided by OpenCV, shown in this tutorial, or ones intended for vectors of points instead of whole images, but the way above makes You actually understand what are You doing ;)

T and R estimation from essential matrix

I created a simple test application to perform translation (T) and rotation (R) estimation from the essential matrix.
Generate 50 random Points.
Calculate projection pointSet1.
Transform Points via matrix (R|T).
Calculate new projection pointSet2.
Then calculate fundamental matrix F.
Extract essential matrix like E = K2^T F K1 (K1, K2 - internal camera matrices).
Use SVD to get UDV^T.
And calculate restoredR1 = UWV^T, restoredR2 = UW^T. And see that one of them equal to initial R.
But when I calculate translation vector, restoredT = UZU^T, I get normalized T.
restoredT*max(T.x, T.y, T.z) = T
How to restore correct translation vector?
I understand! I don't need real length estimation on this step.
When i get first image, i must set metric transformation (scale factor) or estimate it from calibration from known object. After, when i recieve second frame, i calculate normilized T, and using known 3d coordinates from first frame to solve equation (sx2, sy2, 1) = K(R|lambdaT)(X,Y,Z); and find lambda - than lambdaT will be correct metric translation...
I check it, and this is true/ So... maybe who know more simple solution?

Resources