at first I want to apologize for my bad English.
I am really new in OpenCV and in virtual reality. I tried to find out the theory of image processing, but some points are missing there for me. I learned that projection matrix is matrix to transform 3D point to 2D. Am I right? Essential matrix gives me information about rotation between two cameras and fundamental matrix gives information about the relationship between pixel in one image with pixel in other image. The homography matrix relates coordinates of pixel in two image (is that correct?).
What is the difference between fundamental and homography matrix?
Do I need all these matrices to get projection matrix?
I am new in these, so please if you can, try to explain me it simply.
Thanks for your help.
I learned that projection matrix is matrix to transform 3D point to 2D. Am I right?
Yes. But usually these transformations are expressed in homogeneous coordinates. This means that 3D points are represented by 4-vectors (ie vectors of length 4), and 2D points are represented by 3-vectors.
The homography matrix relates coordinates of pixel in two image (is that correct?)
No. This is true in two special cases only: when the scene lie on a plane, or when the two views have been generated by two cameras sharing the same center location.
In all the other cases, ie when the scene is not planar and the two cameras have different centers, there is not an homography transforming one image into the other.
What is the difference between fundamental and homography matrix?
There are many differences. From an algebraic point of view, the most obvious difference is that an homography matrix is non-singular (its rank is 3), while a fundamental matrix is singular (its rank is 2).
Related
I do have two sets of points and I want to find the best transformation between them.
In OpenCV, you have the following function:
Mat H = Calib3d.findHomography(src_points, dest_points);
that returns you a 3x3 Homography matrix, using RANSAC. My problem is now, that I only need translation and rotation (& maybe scale), I don't need affine and perspective.
The thing is, my points are only in 2D.
(1) Is there a function to compute something like a homography but with less degrees of freedom?
(2) If there is none, is it possible to extract a 3x3 matrix that does only translation and rotation from the 3x3 homography matrix?
Thanks in advance for any help!
Isa
OpenCV estimateRigidTransform function is exactly what you need: it returns Translation, Rotation and Scale (use false value for fullAffine flag). And it DOES use RANSAC (see source code to be sure of it).
Homography is for 2D points, the third dimension is just for casting points in 3 dim homogeneous coordinates and performing perspective effects. You can always cast points back:
homogeneous [x, y, w]
cartesian [x/w, y/w]
However since you calculate 6DOF instead of 4DOF (similarity) you result is pretty different from what you expect with 4DOF. More flexible transformation will fit more points in RANSAC at the expense of distortions in transformations you care about. Bottom line - don’t try to decompose H, instead fit similarity or isometry (also called rigid or euclidean). The reason why they are absent in the library - they are expressed in closed form even with correct least squared metric in point coordinates and thus don't require non-linear optimization. In other words, they are very simple.
If you only have rotation and translation, I wrote a quick functions to find them (no RANSAC though). It is probably similar to a rigidTransform but more understandable (hopefully)
https://stackoverflow.com/a/18091472/457687
With scale there is still a closed form solution, but slightly different formulas for translation and scaling. See Learning similarity parameters, p. 25
I have two images that are taken from different positions. The 2nd camera is located to the right, up and backward with respect to 1st camera.
So I think there is a perspective transformation between the two views and not just an affine transform since cameras are at relatively different depths. Am I right?
I have a few corresponding points between the two images. I think of using these corresponding points to determine the transformation of each pixel from the 1st to the 2nd image.
I am confused by the functions findFundamentalMat and findHomography. Both return a 3x3 matrix. What is the difference between the two?
Is there any condition required/prerequisite to use them (when to use them)?
Which one to use to transform points from 1st image to 2nd image? In the 3x3 matrices, which the functions return, do they include the rotation and translation between the two image frames?
From Wikipedia, I read that the fundamental matrix is a relation between corresponding image points. In an SO answer here, it is said the essential matrix E is required to get corresponding points. But I do not have the internal camera matrix to calculate E. I just have the two images.
How should I proceed to determine the corresponding point?
Without any extra assumption on the world scene geometry, you cannot affirm that there is a projective transformation between the two views. This is only true if the scene is planar. A good reference on that topic is the book Multiple View Geometry in Computer Vision by Hartley and Zisserman.
If the world scene is not planar, you should definitely not use the findHomography function. You can use the findFundamentalMat function, which will provide you an estimation of the fundamental matrix F. This matrix describes the epipolar geometry between the two views. You may use F to rectify your images in order to apply stereo algorithms to determine a dense correspondence map.
I assume you are using the expression "perspective transformation" to mean "projective transformation". To the best of my knowledge, a perspective transformation is a world to image mapping, not an image to image mapping.
The Fundamental matrix has the relation
x'Fu = 0
with x in one image and u in the other iff x and u are projections of the same 3d point.
Also
l = Fu
defines a line (lx' = 0) where the correponding point of u must be on, so it can be used to confine the searchspace for the correspondences.
A Homography maps a point on one projection of a plane to another projection of the plane.
x = Hu
There are only two cases where the transformation between two views is a projective transformation (ie a homography): either the scene is planar or the two views were generated by a camera rotating around its center.
let's say that all we have are corresponding image points in two views. From these points, I can compute a homography/essential matrix, however extracting the angle of rotation of the camera is not understood.
I'm not sure if I have misunderstood your question but if you are looking are looking for the rotation matrix between two camera coordinate systems and have computed the essential matrix then the rotation matrix can be calculated by taking an SVD of the essential matrix and doing some simple matrix multiplication.
The algorithm is listed here.
I have a small cube with n (you can assume that n = 4) distinguished points on its surface. These points are numbered (1-n) and form a coordinate space, where point #1 is the origin.
Now I'm using a tracking camera to get the coordinates of those points, relative to the camera's coordinate space. That means that I now have n vectors p_i pointing from the origin of the camera to the cube's surface.
With that information, I'm trying to compute the affine transformation matrix (rotation + translation) that represents the transformation between those two coordinate spaces. The translation part is fairly trivial, but I'm struggling with the computation of the rotation matrix.
Is there any build-in functionality in OpenCV that might help me solve this problem?
Sounds like cvGetPerspectiveTransform is what you're looking for; cvFindHomograpy might also be helpful.
solvePnP should give you the rotation matrix and the translation vector. Try it with CV_EPNP or CV_ITERATIVE.
Edit: Or perhaps you're looking for RQ decomposition.
Look at the Stereo Camera tutorial for OpenCV. OpenCV uses a planar chessboard for all the computation and sets its Z-dimension to 0 to build its list of 3D points. You already have 3D points so change the code in the tutorial to reflect your list of 3D points. Then you can compute the transformation.
So I have a depth map and the extrinsics and intrinsics of the camera.I want to get back the 3D points and the surface normals .I am using the functionReprojectImageTo3D.In the stereo rectify function to find Q how do I get the The rotation matrix
between
the 1st and the 2nd cameras’ coordinate systems? I have individual rotation matrix and translation vector but how do I get it for "between the cameras?"
.Also this would give me the 3D points .Is there a method to generate the surface normals?
Given that you have the extrinsic matrix of both cameras, can't you simply take the inverse extrinsic matrix of camera 1, multiplied by the extrinsic matrix of camera 2?
Also, for a direct relation between the two cameras, take a look at the Fundamental Matrix (or, more specific, the Essential matrix). See if you can find a copy of the book Multiple View Geometry by Hartley and Zisserman.
As for the surface normals, you can compute those yourself by computing crossproducts on the corners of triangles. However, you then first need the reconstructed 3D point cloud.