Model-based pose estimation - opencv

I would like to estimate the pose of know 3D object by using opencv. I can use solvePnP if the points of the 3D Model and their corresponded points at the image are given. My question is: how I can find the correspondence between the know 3D Model and its projection on the image?
Thank you a lot

Once you have some matches of points in the 3d model and points in the scene, you have to apply cv::findHomography(). This function calculates a matrix that projects any point from the 3D model into the scene. Actually only 4 matches are needed for homography calculation.

poseMatrix = solvePnP(objectPoints, imagePoints);
imagePoint_computed = objectPoints[i] * poseMatrix * cameraMatrix
find the j at which
imagePoints[j] ~= imagePoint_computed.
objectPoints[j] and imagePoints[i] are the corresponding points.

This kept bugging me, so I kept looking.
the SoftPOSIT algorithm is what you want.
http://www.cfar.umd.edu/~daniel/Site_2/Code.html
has a matlab implementation, some folks have translated to c/c++

Related

Find projection matrix

at first I want to apologize for my bad English.
I am really new in OpenCV and in virtual reality. I tried to find out the theory of image processing, but some points are missing there for me. I learned that projection matrix is matrix to transform 3D point to 2D. Am I right? Essential matrix gives me information about rotation between two cameras and fundamental matrix gives information about the relationship between pixel in one image with pixel in other image. The homography matrix relates coordinates of pixel in two image (is that correct?).
What is the difference between fundamental and homography matrix?
Do I need all these matrices to get projection matrix?
I am new in these, so please if you can, try to explain me it simply.
Thanks for your help.
I learned that projection matrix is matrix to transform 3D point to 2D. Am I right?
Yes. But usually these transformations are expressed in homogeneous coordinates. This means that 3D points are represented by 4-vectors (ie vectors of length 4), and 2D points are represented by 3-vectors.
The homography matrix relates coordinates of pixel in two image (is that correct?)
No. This is true in two special cases only: when the scene lie on a plane, or when the two views have been generated by two cameras sharing the same center location.
In all the other cases, ie when the scene is not planar and the two cameras have different centers, there is not an homography transforming one image into the other.
What is the difference between fundamental and homography matrix?
There are many differences. From an algebraic point of view, the most obvious difference is that an homography matrix is non-singular (its rank is 3), while a fundamental matrix is singular (its rank is 2).

What is the difference between the fundamental, essential and homography matrices?

I have two images that are taken from different positions. The 2nd camera is located to the right, up and backward with respect to 1st camera.
So I think there is a perspective transformation between the two views and not just an affine transform since cameras are at relatively different depths. Am I right?
I have a few corresponding points between the two images. I think of using these corresponding points to determine the transformation of each pixel from the 1st to the 2nd image.
I am confused by the functions findFundamentalMat and findHomography. Both return a 3x3 matrix. What is the difference between the two?
Is there any condition required/prerequisite to use them (when to use them)?
Which one to use to transform points from 1st image to 2nd image? In the 3x3 matrices, which the functions return, do they include the rotation and translation between the two image frames?
From Wikipedia, I read that the fundamental matrix is a relation between corresponding image points. In an SO answer here, it is said the essential matrix E is required to get corresponding points. But I do not have the internal camera matrix to calculate E. I just have the two images.
How should I proceed to determine the corresponding point?
Without any extra assumption on the world scene geometry, you cannot affirm that there is a projective transformation between the two views. This is only true if the scene is planar. A good reference on that topic is the book Multiple View Geometry in Computer Vision by Hartley and Zisserman.
If the world scene is not planar, you should definitely not use the findHomography function. You can use the findFundamentalMat function, which will provide you an estimation of the fundamental matrix F. This matrix describes the epipolar geometry between the two views. You may use F to rectify your images in order to apply stereo algorithms to determine a dense correspondence map.
I assume you are using the expression "perspective transformation" to mean "projective transformation". To the best of my knowledge, a perspective transformation is a world to image mapping, not an image to image mapping.
The Fundamental matrix has the relation
x'Fu = 0
with x in one image and u in the other iff x and u are projections of the same 3d point.
Also
l = Fu
defines a line (lx' = 0) where the correponding point of u must be on, so it can be used to confine the searchspace for the correspondences.
A Homography maps a point on one projection of a plane to another projection of the plane.
x = Hu
There are only two cases where the transformation between two views is a projective transformation (ie a homography): either the scene is planar or the two views were generated by a camera rotating around its center.

How to build the correct projection matrices?

Can someone please tell me how the projection Matrices look like for triangulatePoints? Its not that easy, I found several possible projection Matrices but couldn't figure out by now which one is the right one.
If I have KeyPoints from Robust Matcher/or other matchingmethods then I got rotation and translation from E. This would cause matrices like P0=[I|T] and P1=[R|T] where R and T are the extracted, not changed values. Am I right? When I have a stereo Rig where using calibrateCamera() (explicite not stereoCalibrate!!) or using SolvePnP I get R and T from camera1 and camera2. Does those matrices look like this?: P0=[R0|t0] and P1=[R1|t1] or P0=[R0|R0*t0] and P1=[R1|R1*t1] or is it something else?
I found something about "rectify" do you know if I have too rectify bevore getting the KeyPoints or is undistortPoints the only needed function which is needed for tiangulatePoints()?
Thanks for help
we have these in 3D reconstruction:
1-image coordinates
2-camera cordinates
3-world coordinates
we suppose one of them for example world coordinates and put other things like camera in that coordinates.
we have 2 important matrix
1-view matrix or projection matrix that describe the projection
2-model matrix that describe the position an rotation of the object
by multiplying two mentioned matrix we have modelview matrix that describe the scene
opengl is a good library and open examples
you can get help of this link: http://www.songho.ca/opengl/gl_transform.html

Open CV - Several Methods for SfM

I got a task:
We have a system working where a camera does a halfcircle around a human head. We know the camera matrix and the rotation/translation of every frame. (Distortion and more... but I want first to work without these parameters)
My task is that I have only the Camera Matrix, which is constant over this move, and the images (more than 100). Now I have to get the translation and rotation from frame by frame and compare it with the rotation and translation in real world (from the system which I have but only for compare, I have too prove it!)
First steps I did so far:
use the robustMatcher from the OpenCV Cookbook - works finde - 40-70 Matches each frame - visible looks it very good!
I get the fundamentalMatrix with getFundamental(). I use the robust Points from robustMatcher and RANSAC.
When I got the F i can get the Essentialmatrix E with my CameraMatrix K like this:
cv::Mat E = K.t() * F * K; //Found at the Bible HZ Chapter 9.12
Now we need to extract R and t out of E with SVD. By the way camera1 position is just zero because we have to start somewhere.
cv::SVD svd(E);
cv::SVD svd(E);
cv::Matx33d W(0,-1,0, //HZ 9.13
1,0,0,
0,0,1);
cv::Matx33d Wt(0,1,0,//W^
-1,0,0,
0,0,1);
cv::Mat R1 = svd.u * cv::Mat(W) * svd.vt; //HZ 9.19
cv::Mat R2 = svd.u * cv::Mat(Wt) * svd.vt; //HZ 9.19
//R1 or R2???
R = R1; //R2
//t=+u3 or t=-u3?
t = svd.u.col(2); //=u3
This is my actual status!
My plans are:
triangulate all points to get 3D points
Join frame i with frame i++
Visualize my 3D points them somehow!
Now my Questions are:
is this robust matcher dated? is there a other method?
Is it wrong to use this points as descriped at my second step? Must they be converted with distortion or something?
What R and t is this i extract here? Is it the rotation and translation between camera1 and camera2 with point of view from camera1?
When I read at the bible or papers or elsewhere i find that there are 4 possibilities how R and t can be!
´P′ = [UWV^T |+u3] oder [UWV^T |−u3] oder [UW^TV^T |+u3] oder [UW^TV^T |−u3]´
P´ is the projectionmatrix of the second image.
That means t could be - or + and R could be total different?!
I found out that I should calculate one point into 3D and find out if this point is infront of both cameras, then I have found the correct matrix!
I found some of this code at the internet and he just said this no further calculating:
cv::Mat R1 = svd.u * cv::Mat(W) * svd.vt
and
t = svd.u.col(2); //=u3
Why is this correct? If it isn't - how would I do this triangulation in OpenCV?
I compared this translation to the translation which is given to me. (First i had to transfer the translation and rotation in relationship to camera1 but I got this now!) But its not the same. The values of my program are just lets call it jumping from plus too minus. But it should be more constant because the camera is moving in a constant circle.
I am sure that some axes may be switched. I know that the translation is only from -1 till 1 but I thought I could extract a factor from my results to my comparevalues and then it should be similiar.
Does somebody have done something like this before?
Many people doing a camera calibration by using a chessboard, but I can't use this method to get the extrinsic parameters.
I know that visual sfm can do this somehow. (At youtube is a video where someone walks around a tree and get from these pictures a reconstruction of this tree using visual sfm)
This is pretty the same what I have to do.
Last question:
Does somebody know an easy way to visualize my 3D Points? I prefere MeshLab. Some experience with that?
Many people doing a camera calibration by using a chessboard, but I can't use this method to get the extrinsic parameters.
A chess board or checker board is used to find the internal/intrinsic matrix/parameters, not the extrinsic parameters. You're saying you have got the internal matrix already, I suppose that's what you meant by
We know the camera matrix and ...
Those videos you have seen on youtube have done the same, the camera is already calibrated, that is the internal matrix is known.
is this robust matcher dated? is there a other method?
I don't have that book so cant see the code and answer this.
Is it wrong to use this points as descriped at my second step? Must they be converted with distortion or something?
You need to cancel the radial distortion first, see undistortPoints.
What R and t is this i extract here? Is it the rotation and translation between camera1 and camera2 with point of view from camera1?
R is the orientation of the second camera in the first camera's coordinate system. And T is position of the second camera in that coordinate system. These have several usages.
When I read at the bible or papers or elsewhere i find that there are 4 possibilities how ....
Read the relevant section of the bible, this is very well explained there, triangulation is naive method, a better approach is explained there.
Does somebody know an easy way to visualize my 3D Points?
To see them in Meshlab a very easy way is to save the coordinate of the 3D points in a PLY file, this is an extremely simple format and supported by Meshlab and almost all other 3D model viewers.
In this article "An Efficient Solution to the Five-Point Relative Pose Problem", Nistér explain a very good method to determine which of the four configurations it the correct one (talking about R and T).
I've tried the robust matcher and I think is quiet good. The problems that has this matcher is that is really slow because it uses SURF, maybe you should try with others detectors and extractors to improve the speed.I also believe that the function in OpenCV that calculates the fundamental matrix does not need the Ransac parameter because the methods rate and symmetry do a great job removing the outliers, you should try the 8-point parameter.
OpenCV has the function triangulate, this only needs two projection Matrices, points that are in the first and the second image. Check the calib3d module.

Compute transformation matrix from a set of coordinates (with OpenCV)

I have a small cube with n (you can assume that n = 4) distinguished points on its surface. These points are numbered (1-n) and form a coordinate space, where point #1 is the origin.
Now I'm using a tracking camera to get the coordinates of those points, relative to the camera's coordinate space. That means that I now have n vectors p_i pointing from the origin of the camera to the cube's surface.
With that information, I'm trying to compute the affine transformation matrix (rotation + translation) that represents the transformation between those two coordinate spaces. The translation part is fairly trivial, but I'm struggling with the computation of the rotation matrix.
Is there any build-in functionality in OpenCV that might help me solve this problem?
Sounds like cvGetPerspectiveTransform is what you're looking for; cvFindHomograpy might also be helpful.
solvePnP should give you the rotation matrix and the translation vector. Try it with CV_EPNP or CV_ITERATIVE.
Edit: Or perhaps you're looking for RQ decomposition.
Look at the Stereo Camera tutorial for OpenCV. OpenCV uses a planar chessboard for all the computation and sets its Z-dimension to 0 to build its list of 3D points. You already have 3D points so change the code in the tutorial to reflect your list of 3D points. Then you can compute the transformation.

Resources