Open CV - Several Methods for SfM - opencv

I got a task:
We have a system working where a camera does a halfcircle around a human head. We know the camera matrix and the rotation/translation of every frame. (Distortion and more... but I want first to work without these parameters)
My task is that I have only the Camera Matrix, which is constant over this move, and the images (more than 100). Now I have to get the translation and rotation from frame by frame and compare it with the rotation and translation in real world (from the system which I have but only for compare, I have too prove it!)
First steps I did so far:
use the robustMatcher from the OpenCV Cookbook - works finde - 40-70 Matches each frame - visible looks it very good!
I get the fundamentalMatrix with getFundamental(). I use the robust Points from robustMatcher and RANSAC.
When I got the F i can get the Essentialmatrix E with my CameraMatrix K like this:
cv::Mat E = K.t() * F * K; //Found at the Bible HZ Chapter 9.12
Now we need to extract R and t out of E with SVD. By the way camera1 position is just zero because we have to start somewhere.
cv::SVD svd(E);
cv::SVD svd(E);
cv::Matx33d W(0,-1,0, //HZ 9.13
1,0,0,
0,0,1);
cv::Matx33d Wt(0,1,0,//W^
-1,0,0,
0,0,1);
cv::Mat R1 = svd.u * cv::Mat(W) * svd.vt; //HZ 9.19
cv::Mat R2 = svd.u * cv::Mat(Wt) * svd.vt; //HZ 9.19
//R1 or R2???
R = R1; //R2
//t=+u3 or t=-u3?
t = svd.u.col(2); //=u3
This is my actual status!
My plans are:
triangulate all points to get 3D points
Join frame i with frame i++
Visualize my 3D points them somehow!
Now my Questions are:
is this robust matcher dated? is there a other method?
Is it wrong to use this points as descriped at my second step? Must they be converted with distortion or something?
What R and t is this i extract here? Is it the rotation and translation between camera1 and camera2 with point of view from camera1?
When I read at the bible or papers or elsewhere i find that there are 4 possibilities how R and t can be!
´P′ = [UWV^T |+u3] oder [UWV^T |−u3] oder [UW^TV^T |+u3] oder [UW^TV^T |−u3]´
P´ is the projectionmatrix of the second image.
That means t could be - or + and R could be total different?!
I found out that I should calculate one point into 3D and find out if this point is infront of both cameras, then I have found the correct matrix!
I found some of this code at the internet and he just said this no further calculating:
cv::Mat R1 = svd.u * cv::Mat(W) * svd.vt
and
t = svd.u.col(2); //=u3
Why is this correct? If it isn't - how would I do this triangulation in OpenCV?
I compared this translation to the translation which is given to me. (First i had to transfer the translation and rotation in relationship to camera1 but I got this now!) But its not the same. The values of my program are just lets call it jumping from plus too minus. But it should be more constant because the camera is moving in a constant circle.
I am sure that some axes may be switched. I know that the translation is only from -1 till 1 but I thought I could extract a factor from my results to my comparevalues and then it should be similiar.
Does somebody have done something like this before?
Many people doing a camera calibration by using a chessboard, but I can't use this method to get the extrinsic parameters.
I know that visual sfm can do this somehow. (At youtube is a video where someone walks around a tree and get from these pictures a reconstruction of this tree using visual sfm)
This is pretty the same what I have to do.
Last question:
Does somebody know an easy way to visualize my 3D Points? I prefere MeshLab. Some experience with that?

Many people doing a camera calibration by using a chessboard, but I can't use this method to get the extrinsic parameters.
A chess board or checker board is used to find the internal/intrinsic matrix/parameters, not the extrinsic parameters. You're saying you have got the internal matrix already, I suppose that's what you meant by
We know the camera matrix and ...
Those videos you have seen on youtube have done the same, the camera is already calibrated, that is the internal matrix is known.
is this robust matcher dated? is there a other method?
I don't have that book so cant see the code and answer this.
Is it wrong to use this points as descriped at my second step? Must they be converted with distortion or something?
You need to cancel the radial distortion first, see undistortPoints.
What R and t is this i extract here? Is it the rotation and translation between camera1 and camera2 with point of view from camera1?
R is the orientation of the second camera in the first camera's coordinate system. And T is position of the second camera in that coordinate system. These have several usages.
When I read at the bible or papers or elsewhere i find that there are 4 possibilities how ....
Read the relevant section of the bible, this is very well explained there, triangulation is naive method, a better approach is explained there.
Does somebody know an easy way to visualize my 3D Points?
To see them in Meshlab a very easy way is to save the coordinate of the 3D points in a PLY file, this is an extremely simple format and supported by Meshlab and almost all other 3D model viewers.

In this article "An Efficient Solution to the Five-Point Relative Pose Problem", Nistér explain a very good method to determine which of the four configurations it the correct one (talking about R and T).
I've tried the robust matcher and I think is quiet good. The problems that has this matcher is that is really slow because it uses SURF, maybe you should try with others detectors and extractors to improve the speed.I also believe that the function in OpenCV that calculates the fundamental matrix does not need the Ransac parameter because the methods rate and symmetry do a great job removing the outliers, you should try the 8-point parameter.
OpenCV has the function triangulate, this only needs two projection Matrices, points that are in the first and the second image. Check the calib3d module.

Related

What could be the reason for triangulation 3D points to result in a warped (paraboloid) plot? Trying to perform 3D reconstruction using SFM

I am trying to do 3D reconstruction using SFM (Structure From Motion). I am pretty new to computer vision and doing this as a hobby, so if you use acronyms please also let me know what it stands for so I can look it up.
Learning wise, I have been following this information :
https://www.youtube.com/watch?v=SyB7Wg1e62A&list=PLgnQpQtFTOGRYjqjdZxTEQPZuFHQa7O7Y&ab_channel=CyrillStachniss
https://imkaywu.github.io/tutorials/sfm/#triangulation
Plus links below from quick question.
My end goal is to use this on persons face, to create a 3D face reconstruction. If people have advice on this topic specifically please let me know as well.
I do the following steps :
IO using OpenCV. A video taken using a single camera.
Find intrinsic parameters and distortion coefficients of the camera using Zhangs method.
Use SIFT to find features from frame 1 and frame 2.
Feature matching is done using cv2.FlannBasedMatcher().
Compute essential matrix using cv2.findEssentialMat().
Projection matrix of frame 1 is set to numpy.hstack((numpy.eye(3), numpy.zeros((3, 1))))
Rotation and Translation are obtained using cv2.recoverPose().
Using Rotation and Translation we get the Projection Matrix of frame 2
curr_proj_matrix = cv2.hconcat([curr_rotation_matrix, curr_translation_matrix]).
I use cv2.undistortPoints() on feature pts for frame 1 and 2, using information from step 2.
Lastly, I do triangulation points_4d = triangulation.triangulate(prev_projection_matrix, curr_proj_matrix, prev_pts_u, curr_pts_u)
Then I reassign prev values to be equal curr values and continue through the video.
I use matplotlib to display the scatter plot.
Quick Question :
Why do some articles do E = (K^-1)T * F * K and some E = (K)T * F * K.
First way : What do I do with the fundamental matrix?
Second way : https://harish-vnkt.github.io/blog/sfm/
Issue :
As you can see the scatter plot looks a bit warped, I am unsure why, or if I am missing a step, or doing something wrong. Hence looking for advise.
Also the Z axis, is all negative.
One of the guesses I had, was that the video is in 60 FPS and even though I am moving the camera relatively quickly, it might not be enough of the rotation + translation to determine the triangulation. However, removing frames in between, did not make much difference.
Please let me know if you would like me to provide some of the code.
I believe I have an answer but I am not sure why it works. Hence if someone could expand, plus mention what the 3rd column of the 4D points is, then I will approve that answer and delete this.
Doing this on 4D points after triangulation : points_4d /= points_4d[3] (1)
The documentation does not mention it : https://docs.opencv.org/4.5.3/d9/d0c/group__calib3d.html#gad3fc9a0c82b08df034234979960b778c
My best guess, is that doing (1) is similar to doing this : cv2.convertPointsFromHomogeneous(). Converting from homogeneous space to euclidean space.
Edit 20211003 : Please see a comment for further explanation.

Finding the intrinsic parameters of a camera without a chessboard

I need to find the intrinsic parameters of a CCTV camera using a set of historic footage images (That is all I got, no control on the environment, thus no chessboard calibration).
The good news is that I have the access to some ground-truth real-world coordinates, visible in most of the images.
Just wondering if there is any solid approach to come up with the camera intrinsic parameters.
P.S. I already found the homography matrix using cv2.findHomography in Python.
P.S. I have already tested QTcalib on two machines, but it is unable to visualize the images in the first place. Not sure what is wrong with it.
Thanks in advance.
intrinsic parameters contain both fx fy cx cy and skew with additional distortion parameters k1-k5 r1-r2.
Assuming you have no distortion and cx and cy are perfectly in the center. Image origin at top left as a normal understanding of the image. As you say you know some ground truth level 3D points.3D measurements are with respect to camera optical axis. Then this 3D point P can be projected into camera image plane called p. The P p O(the camera optical center) with center lines forms isosceles triangle.
fx / (p_x-cx) = P_z / P_x
fx = (p_x-cx) * P_z / P_x
The same goes for the fy. and usually fx and fy are the same.
This is under the perfect assumption that you don't have distortion on camera. If you start to have distortion, then you need to find enough sample points all over the image to form distortion understanding as shown below. One or 2 points won't give you the whole picture understanding.
There are some cheats in some papers that using sea vanishing lines(see ref, it is a series of works) or perfect 3D building vanishing points to detect the distortion. We start from extrinsic to intrinsic and it can get some good guess after some trial eventually. But it is very much in research and can not apply to general cases.
Ref: Han Wang, Wei Mou, Xiaozheng Mou, Shenghai Yuan, Soner Ulun, Shuai Yang and Bok-Suk Shin, An Automatic Self-Calibration Approach for Wide Baseline Stereo Cameras Using Sea Surface Images, unmanned system
If all you have is a video and a few 3d points, your best bet is probably to matchmove it, that is, do a manually assisted bundle adjustment using a 3D computer graphics environment, e.g. Blender. There are a lot of tutorials online on how to do it (example). To add the 3d points as constraints, you build some shapes representing them in the virtual world (e.g. some small spheres) and place them so that their relative positions match the ground truth you have, then add them to the tracker solution.

Strange issue with stereo triangulation: TWO valid solutions

I am currently using OpenCV for a pose estimation related work, in which I am triangulating points between pairs for reconstruction and scale factor estimation. I have encountered a strange issue while working on this, especially in the opencv functions recoverPose() and triangulatePoints().
Say I have camera 1 and camera 2, spaced apart in X, with cam1 at (0,0,0) and cam2 to the right of it (positive X). I have two arrays points1 and points2 that are the matched features between the two images. According to the OpenCV documentation and code, I have noted two points:
recoverPose() assumes that points1 belong to the camera at (0,0,0).
triangulatePoints() is called twice: one from recoverPose() to tell us which of the four R/t combinations is valid, and then again from my code, and the documentation says:
cv::triangulatePoints(P1, P2, points1, points2, points3D) : points1 -> P1 and points2 -> P2.
Hence, as in the case of recoverPose(), it is safe to assume that P1 is [I|0] and P2 is [R|t].
What I actually found: It doesn't work that way. Although my camera1 is at 0,0,0 and camera2 is at 1,0,0 (1 being up to scale), the only correct configuration is obtained if I run
recoverPose(E, points2, points1...)
triangulatePoints([I|0], [R|t], points2, points1, pts3D)
which should be incorrect, because points2 is the set from R|t, not points1. I tested an image pair of a scene in my room where there are three noticeable objects after triangulation: a monitor and two posters on the wall behind it. Here are the point clouds resulting from the triangulation (excuse the MS Paint)
If I do it the OpenCV's prescribed way: (poster points dispersed in space, weird looking result)
If I do it my (wrong?) way:
Can anybody share their views about what's going on here? Technically, both solutions are valid because all points fall in front of both cameras: and I didn't know what to pick until I rendered it as a pointcloud. Am I doing something wrong, or is it an error in the documentation? I am not that knowledgeable about the theory of computer vision so it might be possible I am missing something fundamental here. Thanks for your time!
I ran into a similar issue. I believe OpenCV defines the translation vector in the opposite way that one would expect. With your camera configuration the translation vector would be [-1, 0, 0]. It's counter intuitive, but both RecoverPose and stereoCalibrate give this translation vector.
I found that when I used the incorrect but intuitive translation vector (eg. [1, 0, 0]) I couldn't get correct results unless I swapped points1 and points2, just as you did.
I suspect the translation vector actually translate the points into the other camera coordinate system, not the vector to translate the camera poses. The OpenCV Documentation, seems to imply that this is the case:
The joint rotation-translation matrix [R|t] is called a matrix of extrinsic parameters. It is used to describe the camera motion around a static scene, or vice versa, rigid motion of an object in front of a still camera. That is, [R|t] translates coordinates of a point (X,Y,Z) to a coordinate system, fixed with respect to the camera.
Wikipedia has a nice description on Translation of Axis:
In the new coordinate system, the point P will appear to have been translated in the opposite direction. For example, if the xy-system is translated a distance h to the right and a distance k upward, then P will appear to have been translated a distance h to the left and a distance k downward in the x'y'-system
Reason for this observation is discussed in this thread
Is the recoverPose() function in OpenCV is left-handed?
Translation t is the vector from cam2 to cam1 in cam2 frame.

3D Reconstruction upto real scale

I am working on a project to detect the 3D location of the object. I have two cameras set up at two corners of the room and I have obtained the Fundamental matrix between them. These cameras are internally calibrated. My images are 2592 X 1944
K = [1228 0 3267
0 1221 538
0 0 1 ]
F = [-1.098e-7 3.50715e-7 -0.000313
2.312e-7 2.72256e-7 4.629e-5
0.000234 -0.00129250 1 ]
Now, How do I proceed so that given a 3D point in space, I should be able to get points on the image which correspond to the same object in the room. If I can obtain the right projection matrices (with correct scale) I can use them later as inputs to OpenCV's traingulatePoints function to obtain the location of the object.
I have been stuck at this since a long time. So, please help me.
Thanks.
From what I gather, you have obtained the Fundamental matrix through some means of calibration? Either way, with the fundamental matrix (or the calibration rig itself) you can obtain the pose difference via decomposition of the Essential matrix. Once you have that, you can use matched feature points (using a feature extractor and descriptor like SURF, BRISK, ...) to identify which points in one image belong to the same object point as another feature point in the other image.
With that information, you should be able to triangulate away.
Sorry its not coming in size of comment..
so #user2167617 reply to your comment.
Pretty much. A few pointers, though: the singular values should be (s,s,0), so (1.3, 1.05, 0) is a pretty good guess. About the R: Technically, this is right, however, ignoring signs. It might very well be that you get a rotation matrix which does not satisfy the constraint deteminant(R) = 1 but is instead -1. You might want to multiply it with -1 in that case. Generally, if you run into problems with this approach, try to determine the Essential Matrix using the 5 point algorithm (implemented into the very newest version of OpenCV, you will have to build it yourself). The scale is indeed impossible to obtain with these informations. However, it's all to scale. If you define for example the distance between the cameras being 1 unit, then everything will be measured in that unit.
May be it will be simplier use cv::reprojectImageTo3D function? It will give you 3D coordinates.

Rectification of uncalibrated cameras, via fundamental matrix

I'm trying to do calibration of Kinect camera and external camera, with Emgu/OpenCV.
I'm stuck and I would really appreciate any help.
I've choose do this via fundamental matrix, i.e. epipolar geometry.
But the result is not as I've expected. Result images are black, or have no sense at all.
Mapx and mapy points are usually all equal to infinite or - infinite, or all equals to 0.00, and rarely have regular values.
This is how I tried to do rectification:
1.) Find image points get two arrays of image points (one for every camera) from set of images. I've done this with chessboard and FindChessboardCorners function.
2.) Find fundamental matrix
CvInvoke.cvFindFundamentalMat(points1Matrix, points2Matrix,
_fundamentalMatrix.Ptr, CV_FM.CV_FM_RANSAC,1.0, 0.99, IntPtr.Zero);
Do I pass all collected points from whole set of images, or just from two images trying to rectify?
3.) Find homography matrices
CvInvoke.cvStereoRectifyUncalibrated(points11Matrix, points21Matrix,
_fundamentalMatrix.Ptr, Size, h1.Ptr, h2.Ptr, threshold);
4.) Get mapx and mapy
double scale = 0.02;
CvInvoke.cvInvert(_M1.Ptr, _iM.Ptr, SOLVE_METHOD.CV_LU);
CvInvoke.cvMul(_H1.Ptr, _M1.Ptr, _R1.Ptr,scale);
CvInvoke.cvMul(_iM.Ptr, _R1.Ptr, _R1.Ptr, scale);
CvInvoke.cvInvert(_M2.Ptr, _iM.Ptr, SOLVE_METHOD.CV_LU);
CvInvoke.cvMul(_H2.Ptr, _M2.Ptr, _R2.Ptr, scale);
CvInvoke.cvMul(_iM.Ptr, _R2.Ptr, _R2.Ptr, scale);
CvInvoke.cvInitUndistortRectifyMap(_M1.Ptr,_D1.Ptr, _R1.Ptr, _M1.Ptr,
mapxLeft.Ptr, mapyLeft.Ptr) ;
I have a problem here...since I'm not using calibrated images, what is my camera matrix and distortion coefficients ? How can I get it from fundamental matrix or homography matrices?
5.) Remap
CvInvoke.cvRemap(src.Ptr, destRight.Ptr, mapxRight, mapyRight,
(int)INTER.CV_INTER_LINEAR, new MCvScalar(255));
And this doesn't returning good result. I would appreciate if someone would tell me what am I doing wrong.
I have set of 25 pairs of images, and chessboard pattern size 9x6.
The book "Learning OpenCV," from O'Reilly publishing, has two full chapters devoted to this specific topic. Both make heavy use of OpenCV's included routines cvCalibrateCamera2() and cvStereoCalibrate(); These routines are wrappers to code that is very similar to what you have written here, with the added benefit of having been more thoroughly debugged by the folks who maintain the OpenCV libraries. while they are convenient, both require quite a bit of preprocessing to achieve the necessary inputs to the routines. There may in fact be a sample program, somewhere deep in the samples directory of the OpenCV distribution, that uses these routines, with examples on how to go from chessboard image to calibration/intrinsics matrix. If you take an in depth look at any of these places, I am sure you will see how you can achieve your goal with advice from the experts.
cv::findFundamentalMat cannot work if the intrinsic parameter of your image points is an identity matrix. In other words, it cannot work with unprojected image points.

Resources